COVID-19 epidemic in Malaysia started as a small wave of 22 cases in January 2020 through imported cases. It was followed by a bigger wave mainly from local transmissions resulting in 651 cases. The following wave saw unexpectedly three digit number of daily cases following a mass gathering urged the government to choose a more stringent measure. A limited lock-down approach called Movement Control Order (MCO) was immediately initiated to the whole country as a way to suppress the epidemic trajectory. The lock-down causes a major socio-economic disruption thus the ability to forecast the infection dynamic is urgently required to assist the government on timely decisions. Limited testing capacity and limited epidemiological data complicate the understanding of the future infection dynamic of the COVID-19 epidemic. Three different epidemic forecasting models was used to generate forecasts of COVID-19 cases in Malaysia using daily reported cumulative case data up until 1st April 2020 from the Malaysia Ministry of Health. The forecasts were generated using a Curve Fitting Model with Probability Density Function and Skewness Effect, the SIR Model, and a System Dynamic Model. Method one based on curve fitting with probability density function estimated that the peak will be on 19th April 2020 with an estimation of 5,637 infected persons. Method two based on SIR Model estimated that the peak will be on 20th – 31st May 2020 if Movement Contro (MCO) is in place with an estimation of 630,000 to 800,000 infected persons. Method three based on System Dynamic Model estimated that the peak will be on 17th May 2020 with an estimation of 22,421 infected persons. Forecasts from each of model suggested the epidemic may peak between middle of April to end of May
Keywords: COVID-19, Infection dynamic, Prediction Modeling, SIR, System Learning, Lock-down
A novel coronavirus infectious disease (COVID-19) which is caused by SARS-CoV-2 has been announced by the World Health Organization as a fatal global pandemic . The epidemic of COVID19 started explosively in Wuhan and spread throughout China . Mediated via a massive aviation industry, it turned into a pandemic in just two months . As of April 4, 2020, the number of cases climbed above 1 million with a death toll of over 50 000 worldwide . The global impact and the public health threat of COVID-19 is the most serious seen in a respiratory virus since the 1918 influenza pandemic . Both COVID-19 and the 1918 influenza pandemic are associated with respiratory spread, a significant percentage of infected people with asymptomatic cases transmitting infection to others, and a high fatality rate [5-6]. Globally, the epidemic curve in each country varies from exponential, uncontrolled outbreak (Italy) to slow rising, adequately controlled (Singapore) . Malaysia somehow lies in the middle . As of April 5, 2020, there are 3,662 cases and 61 deaths in Malaysia . Malaysian government are taking prompt public health actions to prevent an exponential rise of cases by continuously screen and test
high risk individuals, isolate patients and trace and quarantine the contacts to prevent secondary spread . These actions seemed to be adequate until a large cluster of cases occurred following a large Tabligh gathering involving more than 10,000 members in late February . The event has changed the direction of the epidemic curve in Malaysia . The dire urgency in controlling the outbreak to prevent the collapse of healthcare system has forced the government to impose a more stringent action . Malaysian Prime Minister announced a limited lock-down called the Movement Control Order (MCO) on 16th March 2020. The first MCO (MCO1) started on 18th March until 31st March 20207. It was then continued for another 2 weeks (MCO2) until 14th April 2020 .
During MCO, all universities, schools, religious places and non-essential sectors are closed. Interstate travel are not allowed unless for valid reasons. Only the head of family is allowed to buy groceries within 10 km radius. Both police and army works together in coordinating and monitoring peoples’ movements.
This stringent action is not without a cost to society. It has a major social and economic disruptions [11-12]. The uncertainties related to the outbreak also creates anxiety. Although the decision to implement MCO was acceptable to many in view of outbreak casualties, but the question is how many weeks is needed? A study showed if we lift the suppressive measure too early, the massive outbreak may recur. On the contrary, if the measure was in place for too long, the social, economic, and psychological effect will be massive . Simple counts of the number of confirmed cases can be misleading indicators of the epidemic’s trajectory if these counts are limited by problems in access to care or bottlenecks in laboratory testing, or if only patients with symptoms are tested . This is where a prediction modelling may assist the authority in making decisions. Model-based predictions can help policy makers make the right decisions in a timely way, even with the uncertainties about COVID-19. Therefore, in this work, we try to predict the projection of COVID-19 outbreak cases in Malaysia using three mathematical models; Curve Fitting Model with Probability Density Function and Skewness Effect, SIR model, System Dynamic model. We used a combination of actual daily data and analysis of patterns and trends from previous cases in other countries to predict the projection of upcoming cases for Malaysia. The projection serves to support the needs for lockdown period and activity to mitigate the spread of coronavirus cases. Accurate prediction is very crucial to support right decision for upcoming lockdown period and activity. For instance, extended period for MCO can be decided based on increasing and descending trends of COVID-19 cases.
This paper is organized in five sections. Section 2 reviews some prediction modelling concepts and their related works. Section 3 describes the three prediction methods (Curve Fitting Model, SIR Model and System Dynamics) used in this study and their experimental assumptions made. Finally, Section 4 presents results obtained by each of these prediction models and Section 5 concludes the paper.
2. Related Work
Many models have been used to predict the outbreak pattern of COVID-19 epidemic. Several models used the normal distribution as a model of the COVID-19 epidemic and to forecast peak hospital load . A curve-fitting tool to fit a nonlinear mixed effects model was developed based on available data. For instance, the cumulative rate is assumed to follow a parameterized Gaussian error function where the function is the Gaussian error function. Parameters such as death rate, the time since death rate exceeded a certain number was used as a location-specific inflection point and location-specific growth parameter, have been used. Other sigmoidal functional forms were also considered. Data was also fit to the log of the death rate in the available data, using an optimization framework. The logistic distribution is based on a continuous probability distribution which resembles the normal distribution in shape but has heavier tails. A generalized logistic growth model was used by , together with the Richards growth model and a sub-epidemic wave model to generate COVID-19 10-day forecasts for Guangdong and Zhejiang. The generalized logistic growth model and the Richards model extend the simple logistic growth model with an additional scaling parameter. The sub-epidemic model accommodates complex epidemic trajectories by assembling the contribution of inferred overlapping sub-epidemics. The model was fit to the “incidence” curve and the best-fit solution for each model was estimated using nonlinear least squares fitting so that model parameters minimizes the sum of squared errors between the model and the data. A parametric bootstrap approach was used to generate uncertainty bounds around the best-fit solution assuming a Poisson error structure. The SIR model is one of the simplest models to predict properties of how a disease spreads, for example total number of infected or the duration of an epidemic. The population is first divided into compartments, with the assumption that every individual in the same compartment has the same characteristics . The model consists of three compartments: S for the number of susceptible, I for the number of infectious, and R for the number of recovered or deceased (or immune) individuals. This model is reasonably predictive for infectious diseases which are transmitted from human to human. During an epidemic, the number of susceptible individual falls rapidly as more of them are infected and thus enter the infectious and recovered compartments. Each member of the population typically progresses from susceptible to infectious to recovered. JP Morgan used a model based on SIR to estimate the COVID-19 epidemic curve in Malaysia . The estimation is based on the potential size of the group that initially interacts with the infected group (i.e., group needs to get the virus test) to be around ~0.2% of the total population based on the total size of the test group in China’s Hubei and South Korea, which is about 0.1
and 0.7% of the total population, respectively. The secondary infection rate (R0) was adopted based on infection parameters used in China (2) and South Korea (1.9) where a country would face a doubling of infection process in every 5-7 days in the early acceleration stage. Due to several containment measures and lower population density of Malaysia (96 people per sq. km of land area vs. Japan/ Korea: 347/ 212 according to World Bank), they set 1.7 as the initial setting of R0. Imperial College COVID-19 Response Team modified an individual-based simulation model for pandemic influenza planning by  to explore scenarios for COVID-19 in Great Britain. For their model, they assumed an incubation period of 5.1 days, infectiousness to occur from 12 hours prior to the onset of symptoms for those that are symptomatic and from 4.6 days after infection in those that are asymptomatic with an infectiousness profile over time that results in a 6.5-day mean generation time. They used R0=2.4 based on fits to the early growth-rate of the epidemic in Wuhan and symptomatic individuals as 50% more infectious than asymptomatic individuals. On recovery from infection, individuals are assumed to be immune to re-infection in the short term. In addition, Pueyo  uses the epidemic calculator provided by Goh  to predict the effect of control measures on spread of infection in the United States and how it will affect their healthcare services. Chen et al. proposed a time-delay dynamic system based on five compartments: external suspected people, infected people, confirmed people, isolated people, and cured people . They also added external sources such as, spread rate, latent period, delay period, exposed people, and cured rate to describe the trend of COVID-19 outbreak. However, curve fitting models mentioned above expects data from a small portion of the behavior to predict the peak. If variations occur in the data, such as dramatic shifts in test coverage, the forecast might not be accurate. To solve this problem, other methods such as machine learning can be used to analyze information from a multitude of sources and track over a hundred infectious diseases (Forbes, 2020). For instance, in December 2019, Blue Dot predicted the COVID19 outbreak using machine learning and sent out a warning to its customers to avoid Wuhan, ahead of both the US Centers for Disease Control and Prevention (CDC) and the World Health Organization (WHO). Blue Dot also predicted where other Asian city outbreaks could be by analyzing traveller itineraries and flight paths. However insufficient amount of available data when an epidemic just started is a big challenge in machine learning. In the past, three popular methods have been proposed, they include 1) augmenting the existing little data, 2) using a panel selection to pick the best forecasting model from several models, and 3) fine-tuning the parameters of an individual forecasting model for the highest possible accuracy. Fong et al. proposed a methodology based on data augmentation to the existing little data, panel selection to pick the best forecasting model from several models and fine-tuning the parameters of an individual forecasting model for the highest possible accuracy . They constructed a polynomial neural network with corrective feedback model to forecast the COVID-19 outbreak with low prediction error, which is useful for predicting disease outbreak when the samples are small. In terms of data, earliest data on COVID-19 are from China, with case fatalities as high as 1% among the infected . The global mortality ratio averaged at around 4.4% whilst Malaysia’s reported mortality ratio is 0.77% . There are several data repository for COVID19. Examples are data available from Worldometer  and the 2019 Novel Coronavirus Visual Dashboard  operated by the Johns Hopkins
University Center for Systems Science and Engineering (JHU CSSE), which is also supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab. For Malaysia, data has been published by the STAR  and Malaysiakini . For Malaysia, JP Morgan forecasted a mid-April infection peak of 6,300 for Malaysia . The lower peak could be due to Malaysia’s ‘test per million capita’ of 482, which is four to 81 times higher than its neighboring ASEAN countries (6 to 109 tests per million population) and higher than several EU countries. Malaysia has made efforts to control the infection curve with school closures, bans on social gatherings, both inbound/ outbound travel restrictions and enforced the Movement Control Order effective from 18th March 2020 It is important that the government take actions to control the peak to ensure that the health facilities can cope with the need for hospitalization. The proportion of infected cases that required hospitalization and ICU admission were 20.7%-31.4% and 4.9-11.5%, respectively . Malaysia’s current critical care beds are estimated at 1,060. The impact on economy can also be estimated, as has been done by Pearce .
3. Datasets, Methods and Results
In order for the statistical and machine learning algorithms to learn and predict the trend and growth of the disease, several online news and related websites (such as the Malaysia’s official health ministry websites) was crawled and fed into a database. Covid-19 data on the number of susceptible, infectious, recovered, and deceased patients for world countries are available from Worldometer  and the 2019 Novel Coronavirus Visual Dashboard . For Malaysia, daily data has been published by the STAR , Malaysiakini  and also by the Ministry of Health Malaysia [8-10]. For the Malaysian COVID-19 dataset, data on medical capacity (e.g.: number of beds in each state) and events that could affect the spread of the disease (example: Tabligh Assembly at Sri Petaling Mosque) were also collected to see if this data could help in making a strategy to flatten the curve of infected cases. The world data on the dates of restriction and quarantine declared by each affected country were also collected in order to gauge and infer how the infection would pattern of COVID-19 cases in Malaysia looks like, if similar controlled measures are implemented.
3.2. Method 1: Curve Fitting with Probability Density Function and Skewness Effect Modelling
In this model, we used a statistical method based on normal distribution function based on probability density function incorporating a skewness effect to estimate the pattern and peak of COVID-19 spread in Malaysia. We divided the model into two phases as will be described in the following paragraphs. Phase 1 is modelled based on curve fitting to project the number of cases by day up to 15 April, 2020. Phase 2 uses probability density function to estimate projection for recovery period
to flatten the curve.
3.1.1. Phase 1: Projection method to estimate number of cases by 15th April 2020
Suspected Coronavirus cases in Malaysia began on the 22nd January. On 25th January four new cases were confirmed. The first MCO was implemented on the 18th March until 31st March. Due to high rate of growth cases within a week, instruction for another MCO was continued from 1st April to 15th April.
The first initiative is to study the current trends in order to predict the rate of growth and number of cases on the 15th of April. Plotting of the actual cases of COVID-19 daily from 22nd January until 1nd April is shown in Figure 1. The plot shows the rate of growth of total cases and recovered cases. To understand the overall scenario of the trends, the active cases (i.e., total cases minus recovered cases), new cases, recovered cases and death cases are plotted together. The increase rate of total cases is around 125 cases per day.
Download full Pdf: https://www.medrxiv.org/content/10.1101/2020.04.08.20057463v1.full.pdf