Exploring malaria prediction models in Togo: a time series forecasting by health district and target group

Introduction

Malaria is a vector-borne infectious disease that is widespread in the WHO African Region, with an estimated 228 million malaria cases in 2020 (95% of cases worldwide).1 In Togo, despite large-scale interventions implemented for at least the last 15 years by the National Malaria Control Programme (NMCP) to prevent malaria in target groups, malaria remains a major public health concern and socioeconomic burden. Indeed, in 2020, malaria accounted for 51% of care provided by community health workers, 36% of outpatient consultations, 27% of hospitalisations and 10% of deaths.2

Seasonal and geographical variations of malaria have been identified in Togo, with an increase in confirmed malaria cases during or shortly after a rainy season and greater seasonal variations of malaria in the northern health districts, particularly in children <5 years old.3 Several studies in sub-Saharan Africa showed the association between meteorological and environmental factors and malaria cases.4–10 A systematic review published by Reiner et al showed that temperature and rainfall were often found to be significant predictors of malaria seasonality, along with vegetation indices such as the normalised difference vegetation index (NDVI).11 Although these factors were the most frequently investigated, other factors such as relative humidity and wind speed were used in studies but more rarely.11 Nevertheless, the association between these meteorological and environmental factors and malaria cases or incidence, and the time lag, varied by location.5 7 11

The development of malaria early warning systems (MEWSs), promoted by the WHO since 2001, can help malaria control programmes to anticipate the response to seasonal epidemics and reduce the morbidity and mortality of malaria.12–14 The lag of several weeks or months between meteorological data and variations of malaria cases gives hope in the interest of malaria forecasting. Furthermore, with the availability of and access to routine malaria data and quality satellite-derived climate data over several years, the conditions were assumed to be met for developing malaria prediction models.12 13 In sub-Saharan Africa, prediction models for malaria cases were developed in Burkina Faso,6 Kenya,15 Uganda,16 Mozambique,17 Burundi18 and South Africa.19 Except for Burundi, the analyses were conducted in a specific district or in health facilities representative of the diversity of malaria transmission. However, these models are difficult to generalise to the whole country.

This study aimed to explore the possibility of using routinely collected malaria data and meteorological and environmental predictors to forecast the monthly number of malaria cases in Togo.

Methods

Setting

Located in West Africa, Togo is bordered by Ghana, Burkina Faso, Benin and the Bight of Benin and has a total area of 56 785 km².20 Togo’s population was estimated at 8.28 million in 2020.21 From 2013 to 2017, Togo was divided into 6 health regions (from north to south: Savanes, Kara, Centrale, Plateaux, Maritime and Lome-Commune) and 40 health districts (figure 1).

Figure 1
Figure 1

Health map of Togo during the study period, 2013–2017.

Malaria data

Monthly data of confirmed malaria cases were routinely collected by the NMCP of Togo and aggregated by health district and target group; that is, children <5 years old, children ≥5 years old and adults (without pregnant women) and pregnant women. A detailed description of the routine collection of these data was presented elsewhere.22 Data from January 2013 to December 2017 were analysed in this study. A confirmed malaria case was defined as a person with fever or history of fever (temperature ≥38.0°C) over the past 2 days and who had a positive malaria diagnostic test (microscopic diagnosis or rapid diagnostic test).

Meteorological and vegetation data

Meteorological and vegetation data were obtained from the National Aeronautics and Space Administration through the Prediction Of Worldwide Energy Resources (POWER) project and the MODerate resolution Imaging Spectroradiometer (MODIS) satellite. Monthly data of total precipitation, wet days, relative humidity, temperatures (mean, minimum and maximum), wind speeds (mean, minimum and maximum) and vegetation from November 2012 to November 2017 covering the surface of Togo were extracted at a spatial resolution of 0.5×0.5° for meteorological data and 1×1 km for vegetation data (table 1).

Table 1

Description of meteorological and environmental variables

Statistical analysis

The time series of confirmed malaria cases showed the presence of an artificial peak in September 2016, as described elsewhere.3 A spline interpolation imputation was performed to replace the values of this peak. Two health districts (Kpendjal and Oti) had no monthly confirmed malaria cases reported in 2016, but the total number of cases for that year was not used for analysis. The zonal statistics were used to obtain monthly meteorological and vegetation data by health district from November 2012 to November 2017. The method provides the average of the values of the cells covering a given district, weighted by the proportion of area that a cell represents in the district.23

Descriptive statistics were used to explore malaria, meteorological and vegetation data by health district and target group. Generalised additive models (GAMs) and generalised additive mixed models (GAMMs) with a negative binomial error distribution were developed to forecast the monthly number of malaria cases using meteorological and vegetation data with a time lag of 1 or 2 months. A cross-correlation analysis was performed to determine the best predictor lags of the meteorological and environmental variables. Spearman’s rank correlation coefficients between malaria data and each predictor variable with a time lag of 1, 2 and 3 months were calculated. The two time lags with the highest correlation coefficient were selected. GAM is a non-parametric regression technique using smooth functions of predictor variables to account for non-linear relationships between the covariates and the variable of interest.24 Choosing the ‘best’ combination of lagged predictor variables can be very complex. Therefore, two methods for selecting lagged meteorological and environmental variables were compared: a first method based on statistical approach and a second method based on biological reasoning. Both methods were applied to obtain a model per target group and health district and a mixed model per target group and health region with the health district as a random effect. The ‘statistical approach’ method used in this study was described by Fisher et al.25 This method allows the construction of a complete model set based on a range of potential predictors using GA(M)Ms. Models with correlated predictors above the defined correlation cut-off value are automatically deleted. All models in this set are compared using a model selection criterion. In this study, a list of 20 potential predictors was identified, that is, the ten meteorological and environmental variables with a time lag of 1 and 2 months. The maximum number of predictors included in a model and the correlation cut-off value were set at 5 and 0.28 (default values), respectively. The model with the smallest corrected Akaike information criterion (AICc) was defined as the most accurate model for this first method. The ‘biological reasoning’ method is an a priori selection of the meteorological variables with a proven effect on the development of Anopheles mosquitoes and malaria parasites. Total precipitation with a time lag of 2 months and mean temperature with a time lag of 1 month were included as predictor variables using smoothers. This choice was based on the theoretical vector-parasite-host cycle, which requires a minimum of 38 days from rainfall and deposition of mosquito eggs to the onset of malaria symptoms in the human host under optimal conditions.8 Indeed, rainfall is a factor in the development of larval sites, and temperature affects the longevity of mosquitoes and the development of the parasite.26 For both methods, the dimension of the basis used to represent the smooth term was set at five in order to reduce overfitting, a cubic spline regression for the smooth term was used and the year was included as a centred variable to take into account the trend over time. The models are written as follows:

Embedded Image

with Embedded Image

the number of confirmed malaria cases in month t, Embedded Image the mean number of cases and Embedded Image the overdispersion parameter.

For models per district, Embedded Image

with Embedded Image the intercept of the model, Embedded Image the regression coefficient of the year and Embedded Image the cubic spline function for each meteorological and environmental variable Embedded Image included in the models with a time lag equal to t-1 or t-2.

For mixed models per region, Embedded Image

with Embedded Image the mean number of cases in month t and district j and Embedded Image the intercept random effect of district j.

The time series for the period 2013–2016 were used for model training, while the 2017 time series were used for model testing. The predictive values and their 95% prediction interval were superimposed on the time series of confirmed malaria cases for graphical analysis. Thus, the predictive skills of the four models were compared for each health district and target group. The AICc and explained deviance were used as measures of accuracy for the training period, and the root mean squared error and mean absolute error (MAE) were used as measures of accuracy for the testing period. The most accurate model for the training period was defined as the model with the highest explained deviance, while the most accurate model for the testing period was defined as the model with the lowest MAE. The prediction errors per month and over the year 2017 (in a number of malaria cases and percentage error) were calculated from the most accurate models for the testing period.

Software and packages

All statistical analyses were performed using R software V.4.1.2.27 The imputation by spline interpolation was performed using the imputeTS package.28 Meteorological and vegetation data were extracted using the nasapower29 and MODIStsp30 packages, respectively. The zonal statistics were performed using the raster package.31 The ‘statistical approach’ method was performed using the FSSgam package.25 The GA(M)Ms were generated using the mgcv package.24

Patient and public involvement

Neither patients nor the public were involved in the design, conduct, reporting or dissemination plans of this research.

Results

Confirmed malaria cases

From 2013 to 2017, 5 522 650 confirmed malaria cases were reported in Togo (online supplemental table S1). Children <5 years old, children ≥5 years old and adults, and pregnant women represented 36.6%, 58.5% and 4.9% of the total number of confirmed malaria cases, respectively. The Plateaux region had the highest number of confirmed malaria cases (1 512 129, 27.4%), and the Lome-Commune region had the lowest number of confirmed malaria cases (330 410, 6.0%).

Supplemental material

Meteorological and vegetation data

Descriptive statistics of monthly meteorological and vegetation data from November 2012 to November 2017 in Togo are presented in online supplemental table S2. One rainy season was observed in the northern health districts from April to October, and two rainy seasons were observed in the southern health districts from March to June and September–October, approximatively. Total precipitation was higher in the north than in the south. The median precipitation during the wettest month ranged from 129 mm in June in the Ave district to 283 mm in August in the Oti district (online supplemental figure S1). From the Cinkasse to the Kpele districts, the number of rainy days peaked in August with a median number of rainy days ranging from 28 to 31. From the Haho district to District 4, the number of rainy days peaked in May, June or October with a median number of rainy days ranging from 26 to 28 (online supplemental figure S2). In the northern health districts, relative humidity was lower during the Harmattan period (from December to February) and increased during the rainy season. For example, the Cinkasse district had a median relative humidity of 23% in February and 84% in August. In the southern health districts, the median relative humidity was above 60% in all months of the year and increased during the rainy season (online supplemental figure S3). The widest temperature variation was observed in the northern health districts during the Harmattan period, ranging from a median minimum temperature of 16–19°C to a median maximum temperature of 34–36°C in January, and the temperature increased in February and March just before the rainy season (median maximum temperature of 35–39°C). In the southern health districts, the temperature varied from a median minimum temperature of 19–23°C to a median maximum temperature of 31–35°C throughout the year. Temperature decreased in all health districts of Togo during the rainy season (online supplemental figure S4). Wind speed was higher in the northern health districts during the Harmattan period and reached up to 5.2 m/s in median in February in the Cinkasse district. The centre of Togo was the least windy area while in the coastal areas had a median value of mean wind speed above 2.1 m/s in all months of the year (online supplemental figure S5). Northern Togo had very little vegetation during the dry season with a median NDVI of 0.2 in March in the Cinkasse, Tone and Kpendjal districts. NDVI increased in all health districts of Togo during the rainy season, reaching a median of 0.78 in October in the Wawa district. Vegetation was densest in central Togo. A low vegetation index was observed in the health district of the Lome-Commune region throughout the year (<0.41) (online supplemental figure S6).

Comparison of the predictive skills of the models

In the three target groups, the selection of meteorological and environmental variables using the ‘statistical approach’ method did not follow any particular pattern (online supplemental figure S7). In children <5 years old, the most selected predictor in the models per district was NDVI, included in 35.0% of the models. In children ≥5 years old and adults, the most selected predictor in the models per district was minimum temperature, included in 42.5% of the models. In pregnant women, the most selected predictor in the models per district was mean temperature, included in 30.0% of the models. Minimum wind speed and maximum temperature were the least selected predictors in the models for all three groups (≤5.0% and ≤7.5%, respectively) (online supplemental figure S7). The ‘statistical approach’ method provided the most accurate models for the training period, except for the Est mono and Danyi districts in children ≥5 years old and adults and for the health districts of the Plateaux region in pregnant women (figure 2). The explained deviance of the most accurate models for the training period ranged from 41.4 to 81.9% in children <5 years old, from 45.8 to 78.3% in children ≥5 years old and adults and from 50.3 to 81.3% in pregnant women (online supplemental figure S3). The ‘statistical approach’ method provided the most accurate models for the testing period in 21 health districts for children <5 years old, in 21 health districts for children ≥5 years old and adults and in 20 health districts for pregnant women (figure 2). The graphical analysis of the observed and predicted malaria time series revealed that the prediction in 2017 was not adequate despite a satisfactory model construction, in particular with the models per district for the ‘statistical approach’ method (figure 3 for children <5 years old, online supplemental figure S8 for children ≥5 years old and adults and online supplemental figure S9 for pregnant women). The difference between the numbers of predicted and observed cases during the 2017 testing period varied between health districts and target groups (figure 4, online supplemental figure S10). In children <5 years old, the difference between the numbers of predicted and observed cases ranged from −2709 to 2421 cases and the percentage error ranged from −22.2 to 43.4%. In children ≥5 years old and adults, this difference ranged from −2717 to 8859 cases and the percentage error ranged from −15.4 to 74.7%. In pregnant women, this difference ranged from −666 to 767 cases and the percentage error ranged from −33.5 to 112.1%.

Figure 2
Figure 2

Most accurate models for the training (2013–2016) and testing (2017) periods for each health district and target group. Health districts were classified according to their latitude. BR, biological reasoning; SA, statistical approach.

Figure 3
Figure 3

Time series of observed (black lines) and predicted (red lines for the training period and blue lines for the testing period) malaria cases and their 95% prediction interval in children <5 years old by health district using the models per district (A) and mixed models per region (B) for the ‘statistical approach’ method and the models per district (C) and mixed models per region (D) for the ‘biological reasoning’ method. For each panel, at the top left is the northernmost district and at the bottom right is the southernmost district. Time series have different Y-axis scales.

Figure 4
Figure 4

Difference between predicted and observed malaria cases during the 2017 testing period for each health district in children <5 years old (A), in children ≥5 years old and adults (B) and in pregnant women (C). The number of cases is shown on the left and the percentage error is shown on the right. Health districts were classified according to their latitude. BR, biological reasoning; SA, statistical approach.

Discussion

This study aimed to develop malaria prediction models by health district and target group in order to use them as an additional tool in malaria control activities in Togo. This is the first study in Togo that explored meteorological and environmental predictors for forecasting malaria cases all over the country. Despite the development of models with four different approaches and their application on 120 time series, the number of malaria cases was inaccurately forecasted.

The graphical analysis showed that the values predicted during the training period (2013–2016) by the models per district for the ‘statistical approach’ method followed the pattern of the time series of confirmed malaria cases in most health districts and target groups (figure 3, online supplemental figure S8 and S9). This confirms a strong association between meteorological and environmental data and malaria cases in Togo, as shown in other studies in sub-Saharan Africa.4–10 NDVI, minimum temperature and mean temperature were the most selected predictors in the models per district. Reiner et al reported that minimum temperature, rainfall and vegetation indices were often considered as significant predictors of malaria seasonality in the scientific literature.11 In addition, the selected meteorological and environmental variables and their time lag differed between health districts and target groups (online supplemental figure S7). Other studies showed spatial and temporal variations.5 7 11 For example, in Uganda, time lags observed between environmental factors (rainfall, temperature and vegetation) and the number of malaria cases varied by transmission setting (low, moderate or high malaria transmission).7 In Mali, Ateba et al showed that the associations between environmental factors and malaria incidence differed between the two ecological zones studied.5

In this study, two methods of variable selection were compared based on the following reasoning. Can we predict in the same way using an a priori variable selection method (the ‘biological reasoning’ method) or a more complex variable selection method (the ‘statistical approach’ method)? In the light of the results, the answer to this question is not obvious. On one hand, the ‘statistical approach’ method seemed to provide more accurate models for the training period but perhaps with an overfitting that led to inaccurate forecasts for the testing period in some health districts.32 On the other hand, the ‘biological reasoning’ method did not appear to provide better prediction results for all health districts and target groups.

Different statistical methods can be used to forecast malaria incidence such as generalised linear models (GLMs), autoregressive integrated moving average (ARIMA) models or Holt-Winters models.33 The GAM is an extension of the conventional GLM and is used for modelling non-linear relationships between the covariates and the variable of interest.24 In this study, GAMs were chosen for their flexibility of use compared with ARIMA models, given the large number of models to be generated. In addition, the prediction method had to be easily reproductible and implementable in a decision support software. Two studies conducted in Burkina Faso and Kenya also used GAMs for malaria forecasting.6 15 The use of GAMs gave the possibility to apply another method of predictor variable selection based on the full-subset information theoretic approaches using the FSSgam package in R. This method described by Fisher et al is presented as an alternative to data reduction techniques and stepwise regression and avoids the problems of multicollinearity.25

Climate-related predictors are frequently used in malaria prediction models.6 15–19 33 Temperature, precipitation and humidity are often cited in these studies, but other predictors such as vegetation index, wind speed, evapotranspiration, atmospheric pressure, cloud cover, visibility or altitude are also used. The availability of and access to quality climate data in Africa are a real challenge. The weather stations are not evenly distributed across the countries and data are of poor quality or missing.34 Real-time access to climate data must also be guaranteed for prediction. Open access to historical and current satellite-derived climate data, such as that made available by the POWER project and the MODIS satellite, is a major advantage for the development of MEWSs in Africa. This study showed that it was possible to use meteorological and environmental data in malaria research in Togo and should encourage other researchers working on malaria or other vector-borne diseases to use them.

It was decided to develop malaria prediction models according to the three target groups, given the different levels of immunity, risks of complication, specific prevention and treatment responses in children, pregnant women or adults.35 36 Mfisimana et al also adopted this strategy in their study in Burundi by developing malaria prediction models in four subgroups (children between 0 and 11 months, children between 12 and 59 months, pregnant women, and pregnant women and children under 5 years) and in the overall general population.18

This analysis has some methodological limitations. First, the performance of the models was tested over a single year. However, data after 2017 were not available. Second, the population size was not included as an offset term in the malaria prediction models because the distribution of the population by target group was not known. This may have an impact on the quality of predictions.

What could be done to improve malaria forecasting? First, the use of finer spatial and temporal scales, such as the weekly number of malaria cases at facility level, could improve malaria prediction. Cottrell et al even showed spatial variations in Anopheles density at the scale of villages and houses, explained by the house’s immediate surroundings such as proximity to a watercourse, soil type and vegetation index (NDVI).37 However, routine data collection in Togo is not organised in this way. It might be interesting to use the malaria sentinel surveillance data, collected weekly in 16 health facilities in Togo,38 to develop climate-driven malaria models in a future study. Second, meteorological and environmental factors do not fully explain the seasonality of malaria. For example, socioeconomic factors play also an important role in malaria incidence. In a systematic review and meta-analysis published in 2013, Tusting et al showed that economic status such as asset ownership, household wealth, socioeconomic index and parents’ occupations was associated with the risk of malaria in children aged 0–15 years.39 Some authors suggest exploring the predictive capacity of other geographic or environmental data (eg, land cover and altitude) and non-environmental data such as intervention data (eg, indoor residual spraying and insecticide-treated nets) or clinical data (eg, diagnostic and treatment) to improve the accuracy of prediction models.8 16 33 The challenge remains the availability and access to this information.

This study shows that malaria forecasting is a complex exercise. Seasonal variations of malaria are not only related to meteorological and environmental conditions but are a more complex process. Care should be taken when using routine information system data to anticipate malaria cases. Many challenges remain to integrate malaria prediction models into malaria control strategies to support decision-making in Togo. We encourage the scientific community and national stakeholders to engage in this research.

This post was originally published on https://bmjopen.bmj.com