• - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Using autoregressive...

Using autoregressive integrated moving average models for time series analysis of observational data

Linked research.

Retail demand for emergency contraception in United States following New Year holiday

  • Related content
  • Peer review
  • Brandon Wagner , associate professor 1 ,
  • Kelly Cleland , executive director 2
  • 1 Department of Sociology, Anthropology, and Social Work, Texas Tech University. Texas, USA
  • 2 American Society for Emergency Contraception
  • Correspondence to: B Wagner brandon.wagner{at}ttu.edu (or @BrandonGWagner on Twitter/X)

This article discusses the use of autoregressive integrated moving average (ARIMA) models for time series analysis. Rather than forecasting future values, we focus here on examining change across time in outcomes of interest and how this change is related to relevant variables.

Time series data

Much of the data that we collect about the world around us—stock prices, unemployment rates, party identification—are measured repeatedly over time. By failing to account for the linked and time dependent nature of these data, common analytic techniques may misrepresent their internal structure. If we wish to describe patterns over time or forecast values beyond the observation period, we need to account for how current values may depend on previous values, trends may exist in the data, or data may vary seasonally. To visualize this, consider an electrocardiogram. The readings expected at a given moment depend not only on the preceding values but also on the position within the entire cycle. For example, following the P wave, we would expect to see the QRS complex. The assumption that each reading is unaffected by preceding values would be valid only in the most distressing circumstances (that is, during fibrillation or after death).

Description of ARIMA model

To incorporate this complex nature of time series data into models, Box and Jenkins introduced the autoregressive integrated moving average (ARIMA) model. 1 As the name implies, this model contains three different components: an autoregressive (AR) component, a differencing for stationarity (I) component, and a moving average (MA) component. The first component allows the outcome at a given moment to depend on previous values of the outcome. As this model requires a time series with properties that do not vary across time (that is, a stationary time series), the second model component (integrated) allows researchers to subtract previous observations to obtain a stationary time series, if needed. The third component (moving average) models the error term as a combination of both contemporaneous and previous error terms.

Box and Jenkins proposed an iterative process of modeling time series data that contains three steps. The first stage (“identification”) involves transforming the data if needed, obtaining a stationary time series though differencing, and examining the data, autocorrelations, and partial autocorrelations to determine potential model specifications (that is, order of the autoregressive, integrated, and moving average components). The second step (“estimation”) estimates the time series model with the sets of potential model parameters and then selects the best model. For example, in the linked paper (doi: 10.1136/bmj-2023-077437 ), 2 we used the bayesian information criterion and Akaike information criterion to select the best fitting model from among candidate models. The model that best fitted the data, an ARIMA(1,1,1) model, had order one for each term (autoregressive, integrated, and moving average). This means that we model the change in sales between week t and week t-1, a first difference. The model also includes the previous week’s value as a predictor of this change (autoregressive order 1) and an error term that is composed of the contemporary week’s and previous week’s errors (moving average order 1). Alternative specifications for the ARIMA model would correspond to the number of differences necessary to construct a stationary time series model, the number of previous values to include as predictors, or the combination of previous errors included in the error for a given observation. The third step (“diagnostic checking”) examines the model for potential deficiencies and, if any are found, restarts the process. Although not without critiques, 3 this modeling approach remains popular today. Field specific texts can provide a helpful introduction to the topic for most readers. For example, we found a text by Becketti helpful in the preparation of the linked paper. 4

The model and process described above allow researchers to explore change in an outcome over time. But what if you think some other variable is affecting your outcome of interest? In many modern computer packages, estimated from the ARIMA model described above can be adjusted with a set of exogenous X variables that also vary across time. The resulting model is often referred to as a regression with ARIMA errors, as the estimated regression includes an error term that is an ARIMA process.

When and why to use ARIMA model

ARIMA models have previously been used to explore time dependent processes in population health. For example, recent work has used ARIMA models to explore disease diagnosis or outcomes and demand for medical services. 5 6 7 8 ARIMA models or, more generally, regressions with ARIMA errors are commonly used for time series data for a few key reasons. Firstly, the model allows us to incorporate relations between observations. For example, the spread of an infectious disease through a population likely depends on previous counts of infection in the population. Consequently, hundreds, if not thousands, of papers applied ARIMA models to counts of infection or death from the covid-19 pandemic, tracing the spread of the disease over time in settings around the world. Enabling data to incorporate dependencies in terms of lags or seasonality allows researchers to better fit such data. The second key benefit of estimating regressions with ARIMA errors is that it allows us to explore changes relative to the underlying background trends in the data. In our case, sales of levonorgestrel emergency contraception have been increasing over time in the United States. 9 A basic model exploring weekly sales as a function of the dichotomous holiday indicators might not correctly differentiate the sales increase following the New Year from such a background increase.

Limitations

ARIMA modeling remains popular today, although researchers must recognize some limitations. Firstly, these models may require relatively long time series, with a common rule of thumb being at least 50, or preferably 100, observations to estimate seasonal components. Although this is not a challenge to frequently measured values or long running time series, it may limit the acceptability of ARIMA models in some cases. Secondly, the described model estimation process fits a model form, specifically the order of autoregressive and moving average terms, to the observed data. Although useful in describing the observed time trend, fitting the ARIMA model this way may limit its utility in describing trends in other contexts. Finally, as with all models, ARIMA models should be examined as one possible model. In some cases, alternative models may better fit observed data, 10 so examination of the data and model specifications is essential before selecting a modeling approach.

Regressions with ARIMA errors can be useful tools to understand time series data. By incorporating linkages between observations, and exploring change across time, these models can both describe trends and explore how these trends vary with predictors of interest.

Acknowledgments

We thank Circana Inc for allowing us to use its data for this project. All estimates and analyses in this paper based on Circana’s data are by the authors and not by Circana Inc.

Funding and competing interests available in the linked paper on bmj.com.

Provenance and peer review: Commissioned; not externally peer reviewed.

  • Earnest A ,
  • Sampurno F ,
  • Benvenuto D ,
  • Giovanetti M ,
  • Vassallo L ,
  • Angeletti S ,
  • Redaniel MT ,

research paper on arima model

  • Technical advance
  • Open access
  • Published: 22 March 2021

Interrupted time series analysis using autoregressive integrated moving average (ARIMA) models: a guide for evaluating large-scale health interventions

  • Andrea L. Schaffer   ORCID: orcid.org/0000-0002-3701-4997 1 ,
  • Timothy A. Dobbins   ORCID: orcid.org/0000-0003-1841-9056 2 &
  • Sallie-Anne Pearson   ORCID: orcid.org/0000-0001-7137-6855 1 , 3  

BMC Medical Research Methodology volume  21 , Article number:  58 ( 2021 ) Cite this article

69k Accesses

229 Citations

35 Altmetric

Metrics details

Interrupted time series analysis is increasingly used to evaluate the impact of large-scale health interventions. While segmented regression is a common approach, it is not always adequate, especially in the presence of seasonality and autocorrelation. An Autoregressive Integrated Moving Average (ARIMA) model is an alternative method that can accommodate these issues.

We describe the underlying theory behind ARIMA models and how they can be used to evaluate population-level interventions, such as the introduction of health policies. We discuss how to select the shape of the impact, the model selection process, transfer functions, checking model fit, and interpretation of findings. We also provide R and SAS code to replicate our results.

We illustrate ARIMA modelling using the example of a policy intervention to reduce inappropriate prescribing. In January 2014, the Australian government eliminated prescription refills for the 25 mg tablet strength of quetiapine, an antipsychotic, to deter its prescribing for non-approved indications. We examine the impact of this policy intervention on dispensing of quetiapine using dispensing claims data.

Conclusions

ARIMA modelling is a useful tool to evaluate the impact of large-scale interventions when other approaches are not suitable, as it can account for underlying trends, autocorrelation and seasonality and allows for flexible modelling of different types of impacts.

Peer Review reports

Before and after study designs are often used to quantify the impact of population-level health interventions on processes of care and population-level health outcomes. They rely on the “natural experiment” resulting from implementing interventions, dividing time into “pre-intervention” and “post-intervention” periods. However, observational studies relying on a small number of measurements pre- and post-intervention are prone to bias as they do not account for pre-existing underlying short- and long-term trends [ 1 ]. In contrast, interrupted time series (ITS) analysis (also called “intervention analysis”) is more robust as it does control for these issues by longitudinally tracking the outcome before and after an intervention. ITS is considered one of the best designs for establishing causality when randomised controlled trials (RCTs) are neither feasible nor ethical [ 2 , 3 ]. In fact, when combined with a control series, ITS designs often generate similar results to RCTs [ 4 ].

Several published papers have addressed the topic of using ITS approaches to evaluate health interventions [ 5 , 6 , 7 , 8 , 9 ]. However, these have focussed primarily on segmented regression, the simplest form of ITS analysis. Segmented regression models use time as a predictor variable; a simple segmented regression model can be expressed as:

Where Y t is the outcome at a given time point ( t ), the time variable represents time since start of the study period, the intervention variable indicates whether the time t is before (0) or after (1) the implementation of the intervention, and the time since intervention variable represents time elapsed since intervention implementation, taking a value of 0 prior to the intervention. A key assumption of linear regression is that the errors (residuals) are independent and not correlated. However, this assumption is often violated with time series.

The segmented regression approach is most appropriate when a time series has a linear or otherwise easily modelled trend and independently distributed residuals. In practice, patterns in data can be unclear or difficult to identify, with considerable variation. Thus, some time series may not be amenable to segmented regression due to the difficulty in modelling the autocorrelation structure. One alternative to segmented regression is Autoregressive Integrated Moving Average (ARIMA) models. ARIMA models differ from segmented regression in that the outcome Y t is regressed only on the outcome measured at previous time points (not on time itself). However, there is little guidance in the literature about how to fit these models in the context of ITS analysis. Given the quantity and complexity of health data now being collected and made available for research, ARIMA has become an increasingly useful tool for researchers interested in evaluating large-scale interventions.

In this paper we will describe the underlying theory behind ARIMA models and how they can be used to evaluate population-level interventions, such as the introduction of health policies, illustrated using an example of the introduction of a health policy to deter inappropriate prescribing of quetiapine, an antipsychotic, in Australia.

Time series properties

A time series is a sequence of data points at equally spaced points in time and ordered chronologically. Time series typically exhibit three features: non-stationarity, autocorrelation, and seasonality.

Non-stationarity

A requirement of ARIMA modelling is that the time series is stationary. A stationary series has three properties: a constant mean; constant variance; and constant covariance that depends only on the time interval between values. A stationary series (also called a “white noise process”) is easier to analyse as it can be modelled with fewer parameters. While it may fluctuate, it will always revert to a constant mean and is thus easier to predict. There are two main sources of non-stationarity: the first is changing variance over time (heteroscedasticity) which can often be addressed by applying a log transformation; and the second is an increasing or decreasing trend which can often be eliminated by taking the first difference (i.e. Y t  −  Y t  − 1 ). Occasionally a second differencing may be required to achieve stationarity, but third-order differencing and above is rare [ 10 ]. To be exact, the above definition is for a weakly stationary series. A time series is considered strictly stationary if the probability distribution of a sequence of observations is unchanged by shifts in time. Strictly stationary series are rare, and it is often enough to assume weak stationarity .

Autocorrelation

Time series observations are often correlated with observations at previous time points and are thus not independently distributed. This correlation is referred to as autocorrelation or serial correlation. As previously mentioned, time series exhibiting autocorrelation do not satisfy standard regression analysis assumptions. As autocorrelated data are typically not stationary, differencing the data is often enough to remove autocorrelation and therefore any necessary data transformations should be performed before testing for autocorrelation.

Autocorrelation functions (ACFs) can be used to check for stationarity and autocorrelation. An ACF plots the correlation between each observation and previous values at various lags, where a lag is the number of time points between an observation and its previous values. The companion to the ACF is the partial ACF (PACF), which is the correlation between an observation and past values that is not explained by correlations at lower order lags. For instance, the PACF value at lag 4 is the correlation between an observation ( Y t ) and the previous observation at lag 4 ( Y t  − 4 ), after adjusting for the correlation between Y t and Y t  − 3 , Y t  − 2 , and Y t  − 1 . For a stationary series, the autocorrelation in the ACF plot should decay quickly; with a non-stationary series, the ACF will decay slowly.

Seasonality

Seasonality refers to variation of a fixed or known frequency, occurring at regular time intervals, such as time of year or day of the week. Seasonality in time series of health data is common and can be due to natural causes, such as weather patterns, or business/administrative processes such as weekend or holiday effects. For instance, antibiotic prescriptions and influenza hospitalisations are more common in the winter months [ 11 , 12 ]. Further, in some jurisdictions medicine dispensings are highest at the end of a calendar or financial year due to the financial incentives to stockpile medicines [ 13 , 14 ]. The extent of seasonality will depend on the unit of time of the series; for instance, seasonality is rare in time series measured at yearly intervals.

With seasonal monthly data, there will likely be significant autocorrelation at lag 12 in the ACF plot. In ARIMA modelling, seasonality is usually dealt with by taking the seasonal difference . That is, with monthly data, you take the difference between each observation and the previous value at lag 12 ( Y t  −  Y t  − 12 ). For quarterly data, you would use lag 4. Note that when taking the seasonal difference for monthly data, the first 12 observations are lost, since the seasonal difference cannot be calculated for those observations. This is important to keep in mind – if you have seasonal data, in general you will need more time points in your series to adequately control for seasonal effects.

Components of ARIMA models

ARIMA models have a single dependent variable ( Y t ) that is a function of past values of Y and the error term ( ϵ t ). As ARIMA models assume that errors are normally distributed, they can accommodate any continuous outcome (such as rates or means), as well large counts that are not bounded by zero. While ARIMA cannot be used with small counts that follow a Poisson distribution, in recent years approaches to modelling serially correlated count data have been developed using generalised linear models [ 15 , 16 ]. Before getting into full ARIMA models, we introduce the basic components.

Autoregressive (AR) model : Y t is predicted by one or multiple lagged values of Y t . This is represented by the equation below, where c is a constant, ϕ is the magnitude of the autocorrelation, p is the number of lags, and ϵ t is the error.

Moving average (MA) model : Y t is predicted by one or multiple lagged values of the error ( ϵ t ). This is not to be confused with moving average smoothing. In the equation below, θ is the value of the autocorrelation of the errors, and q is the number of lags.

Seasonal model : Y t is predicted by lagged values of Y t at a regular interval s (the season). In the equation below, Ф is the value of the autocorrelation, and s is the seasonality (e.g. 52 for weekly, 12 for monthly, 4 for quarterly). Seasonal models will also often require differencing, as well as autoregressive and/or moving average terms.

Differencing (Integration): In an ARIMA model, the time series being modelled must be stationary to obtain meaningful predictions. Stationarity is induced by differencing, which refers to calculating the difference between adjacent observations.

An ARIMA model is a combination of an AR model, MA model, and differencing (Integration). If ϕ  = 0 and θ  = 0 and Ф  = 0 then the time series is a white noise process expressed as Y t  =  c  +  ϵ t  where c is a constant.

The basic notation for describing a non-seasonal ARIMA model is ( p , d , q ), where p , d , and q are positive integers:

p = the order of the AR part of the model;

d = the degree of non-seasonal differencing; and

q = the order of the MA part of the model.

For example, a white noise (stationary) model is ARIMA (0, 0, 0). An AR model with p lags is ARIMA( p , 0, 0), and an MA model with q lags is ARIMA (0, 0, q ). If there is seasonality, the ARIMA model is expressed as: ( p , d , q ) × ( P , D , Q ) S . Here, D is the degree of seasonal differencing, and P and Q are the AR and MA terms for the seasonal component.

Evaluating interventions using ARIMA

The aim of ITS analysis when used to evaluate interventions is to estimate the impact of the intervention’s implementation on a given outcome, or in other words the “intervention effect”. While there is a wide variety of impacts that may be observed, here we will focus on three main types: step change, pulse and ramp. If we use T 0 to represent the starting time of the intervention, these are summarised as:

Step change (also called a level shift) : A sudden, sustained change where the time series is shifted either up or down by a given value immediately following the intervention. The step change variable takes the value of 0 prior to the start of the intervention, and 1 afterwards.

Pulse : A sudden, temporary change that is observed for one or more time points immediately after the intervention and then returns to baseline level. The pulse variable takes the value of 1 on the date of the intervention, and 0 otherwise.

Ramp : A change in slope that occurs immediately after the intervention. The ramp variable takes the value of 0 prior to the start of the intervention and increases by 1 after the date of the intervention.

Ideally, the potential shape of the intervention impact should be hypothesised a priori. The shape depends on several factors, including the nature of the intervention, such as whether it is temporary or ongoing, and the specific outcome being assessed. For instance, in our 2015 study [ 17 ] we evaluated the impact of negative media around use of statin medicines and found that this temporary event resulted in both a temporary increase in statin discontinuation (a “pulse”) but a sustained decrease in statin dispensing (a “step change”). Ongoing or permanent interventions, such as increased restrictions on prescribing of a medicine [ 18 ] or introduction of plain packaging on tobacco products [ 19 ] are more likely to have long-term effects, although these may be immediate or gradual (a “ramp”). For some interventions, the change is best represented by a combination of impact variables; for instance, it is common for there to be both a step change and change in slope (ramp). If there are multiple potential models, the Akaike information criterion (AIC) and/or Bayesian information criterion (BIC) can be used to select the most appropriate combination of impact variables.

It is also important to consider whether changes may occur prior to the implementation of the intervention; for example, when it was announced that there would be increased restrictions placed on prescribing of alprazolam in Australia, prescribing of this medicine started declining in anticipation of this change [ 18 ]. Lastly, in some cases, the impact may be suspected to be delayed by one or more time units. We recommend prespecifying a reasonable period of time in which it would be expected for the impact to be observed based on content knowledge or previous research to avoid spurious associations. The most appropriate delay within this range of options can be determined at the modelling stage [ 20 ].

In ITS analysis, ARIMA forecasts Y t in the absence of the intervention (the “counterfactual”) and determines how the observed diverges from this forecast. Unlike segmented regression, including time or seasonal dummy variables in the ARIMA model is not necessary, as ARIMA can eliminate trends and seasonality through differencing. If the trend is eliminated via differencing then the pre- and post-intervention trends cannot be estimated from the model. However, if estimation of the pre- and/or post-intervention slope is desired, this can be accommodated by including time as a covariate and incorporating AR and MA terms to address autocorrelation (e.g. ARMA models) [ 21 , 22 ].

Fitting an ARIMA model

The next step is determining the parameters of the ARIMA model. A common approach is called the Box-Jenkins method, involving model identification and selection, parameter estimation, and model checking [ 23 ]. There now exist automated algorithms in statistical packages (such as R) that simplify the process by identifying the best fitting ARIMA model based on minimising the information criteria (AIC, BIC). However, we also describe the manual process below, illustrated in Fig.  1 .

Plot data to understand patterns : Before proceeding to model fitting, plot the time series to understand the patterns, specifically pre-existing trends, seasonal effects, and extreme or outlier values. If outliers are present, how to deal with will depend on their cause and influence on the model and the recommendations are the same for ARIMA as for other regression models. For instance, if the researchers are aware that these extreme values are due to external factors, such as other interventions or known misclassification, these should be explicitly modelled in the data.

Transform data to stabilise variance (if necessary). If the variance is changing over time, a log-transformation should be applied.

Model selection : While automated algorithms in several statistical packages can identify candidate p and q parameters, they can sometimes be estimated based on the ACF/PACF plots.

Determine differencing order to induce stationarity : If there is a trend, a first order difference is required and d  = 1. If there is seasonality, a seasonal difference is required and D  = 1. The ACF plot or unit-root tests (e.g. Dickey-Fuller test) can also be used to help identify whether the time series is stationary and whether differencing will be required. Most automated algorithms allow you to prespecify the d and D terms in the model.

Plot the ACF/PACF of stationarity data to determine potential AR/MA orders : After the time series has been made stationary by transformation and/or differencing, next determine which AR ( p / P ) or MA ( q / Q ) orders are needed to correct for remaining autocorrelation. If the stationary series has positive autocorrelation at lag 1, then AR terms typically are needed. If the autocorrelation is negative at lag 1, then the model may need MA terms. Usually models will require only AR terms or MA terms, rarely both. However, it is not always straightforward. Table  1 includes guidance on selecting the most appropriate AR and MA terms.

Estimate model and use information criteria to find the best model : Estimate your model, using the p , d , q , P , D , and Q terms identified previously, and use information criteria (AIC, BIC) to help identify the best model. If an automated algorithm is used to select the terms, it should be viewed as a tool only, as it does not guarantee a well-fitting model.

Check if residuals of chosen model are white noise. This can be done by looking at residual plots and by formally testing for the presence of autocorrelation by using the Ljung-Box test for white noise. If autocorrelation is still present in the residuals or your model is otherwise a poor fit, then choose different AR and/or MA orders. If the data have not previously been transformed, a transformation may help with non-normally distributed residuals. In general, determining the AR and MA terms is an iterative process, involving trial and error. Importantly, there may not be one “right” model. The aim is to select the most parsimonious model (i.e. smallest p / P and q / Q ) that has a good fit and adequately controls for autocorrelation and seasonality. Once the final ARIMA model is selected, the intervention impact can be estimated.

figure 1

Flow chart for ARIMA model selection. Adapted from Hyndman and Athanasopoulos [ 10 ].

Transfer functions

Another advantage of ARIMA models is the ability to move beyond the basic intervention impact shapes and model more complex impacts via “transfer functions”. Transfer functions describe the relationship between the intervention and the outcome series Y t . They modify the relationship between the above inputs (step change, pulse, ramp) and the time series to model more complex relationships, such as gradual level shifts, or a pulse that decays gradually over time, and can also incorporate lagged effects. The general form of a transfer function is \( \frac{\omega (B)}{\delta (B)} \) , or:

where B is the backshift operator (i.e. B p Y t  =  Y t  −  p ). In the transfer function, ω 0 represents the initial value for the impact of the intervention at the time of the intervention ( T ), δ is the decay rate, X t is the intervention variable (step change, pulse, or ramp). The values of h and r must be specified by the researcher; h describes when the effect happens, while r represents the decay pattern. Model fit statistics (such as AIC and BIC) can help determine the most appropriate form for the transfer function as well as the timing of the event (i.e. if the impact was delayed and if so by how much). Table  2 describes the most common scenarios, using the intervention indicator variables described above, and where h  = 0, and r  = 0 or r  = 1. The use of transfer functions is a complex topic, and several texts cover them in more detail [ 23 , 24 , 25 ].

Incorporation of a control series

Including a control series in ITS analysis improves causal inference, as ITS cannot exclude the possibility that any observed change was due to the intervention of interest, or another co-intervention or event. A control series is one that is not impacted by the intervention; selection of an appropriate control is described elsewhere [ 3 ]. As with ITS in segmented regression, including a control series involves running an ARIMA model for the series of interest, and separately for the control series [ 17 ]. If a change is observed in the intervention series but not the control series, this provides evidence that the impact was specific to the intervention.

Sample size requirements

There is no definitive guidance on how many time points are required to apply ARIMA modelling. The oft-quoted value of a minimum of 50 time points is based on a statement by Box and Jenkins, [ 23 ] but this has no empirical basis and has not been tested formally. In reality, a one-size-fits-all approach is simplistic. The more variable and noisier the data, the more observations will be needed to distinguish the underlying patterns from the noise. In uncomplicated cases, ARIMA can perform satisfactorily with short time series, as long as there are enough time points to estimate all parameters [ 26 ]. In the presence of seasonality, there should be enough time points to identify the seasonal effects and to account for seasonal differencing.

Data and context

Here we demonstrate the use of ARIMA modelling to quantify the impact of a health policy intervention, using Australian medicine dispensing claims. The policy restricted the conditions under which quetiapine, an antipsychotic medicine, could be subsidised (data, R code, and SAS code are included in Additional Files   1 , 2 and 3 respectively).

Prior to January 1, 2014, new prescriptions for the lowest quetiapine tablet strength (25 mg) could include up to 5 refills, meaning patients could have their prescription refilled up to 5 times before returning to their doctor for a new prescription. However, due to growing concerns about inappropriate prescribing, after January 1, 2014 new prescriptions for this tablet strength could not include refills [ 27 ]. Our primary outcome was the number of monthly dispensings of 25 mg quetiapine, of which we had 48 months of observations (January 2011 to December 2014).

In Australia, medicine dispensing claims have significant yearly seasonality [ 13 ]. Medicines are subsidised for citizens and eligible residents through the Pharmaceutical Benefits Scheme (PBS), with people paying an out-of-pocket co-payment towards the cost of their medicines, while the remainder is subsidised. If a person’s (or family’s) total out-of-pocket costs reach the “Safety Net threshold” for the calendar year, they are eligible for a reduced co-payment for the remainder of that year. Thus, there is an incentive for people reaching their Safety Net to refill their medicines more frequently towards the end of the year. Hence, we see an increase in prescriptions at the end of the year, followed by a decrease in January.

For the change in dispensing of 25 mg quetiapine, due to the nature of the intervention we postulated there would be an immediate drop in dispensings post-intervention (step change), as well as a change in slope (ramp). Thus, we included variables representing both types of impacts in our model. For both impacts, h  = 0 and r  = 0.

Steps 1 and 2: plot data and transform if necessary

The data are plotted in Fig.  2 a, where we observe that due to the Safety Net effect discussed above, dispensings are higher in December, and lower in January [ 13 ]. As the variance appears stable over time, no data transformation is needed.

figure 2

Monthly dispensings of the 25 mg strength quetiapine (A) and the series after first order and seasonal differencing (B)

Step 3: select model

To help induce stationarity, we determined that a first difference ( d ) was needed due to the visible increasing trend prior to the subsidy change, and that a seasonal difference ( D ) was needed due to the seasonality of the series. Figure  2 b shows the series after these differences have been applied, with the trend eliminated. As the seasonal difference cannot be calculated for the first 12 observations as at least 13 observations are required to calculate the difference between Y t and Y t  − 12 , the first year of data is not represented in the figure. The ACF and PACF plots are in Fig.   3 . In this figure, bars that fall above or below the dashed line represent statistically significant ( p  < 0.05) autocorrelation. In the ACF plot of the raw data, (Fig. 3 a) we see significant autocorrelation that gradually dies off at lag 6. However, according to the PACF plot (Fig. 3 b) the autocorrelation at higher lags is completely explained by autocorrelation at lower lags. We can also see that in Fig. 3 c most of the autocorrelation has been removed just by differencing when compared with Fig. 3 a.

figure 3

Autocorrelation and partial autocorrelation function (ACF and PACF) plots, prior to differencing (A and B) and after differencing (C and D)

In this case the ACF and PACF plots of the stationary (i.e. differenced) series are not particularly helpful in identifying the p and q parameters, as they do not fit any of the options in Table 1 . Therefore, we used an automated algorithm, specifically auto.arima() in the forecast package for R, to identify the ARIMA model terms [ 28 ]. This algorithm iteratively searches over a series of potential ARIMA models for the one with the lowest AIC or BIC, with several constraints applied to avoid convergence problems. These include setting the maximum value of p and q to 5 and P and Q to 2, although these settings can be modified by the researcher if necessary. For our model, we pre-specified a value of d  = 1 (to induce stationarity) and D  = 1 (due to the presence of seasonality) but allowed the algorithm to select the most appropriate values for p , d , P , and Q .

The model with the lowest information criteria selected by the algorithm was (2,1,0) x (0,1,1) 12 . In other words, the autocorrelation order of the model ( p ) was 2, the moving average order of the model ( q ) was 0, the autocorrelation order of the seasonal part of the model ( P ) was 0, and the moving average order of the seasonal part of the model ( Q ) was 1. The model incorporates a first-order difference ( d  = 1) and a first-order seasonal difference ( D  = 1) to eliminate trend and induce stationarity. Thus, we will consider this as our potential final model.

Step 4: check residuals

The residual plots are in Fig.  4 . There is no obvious pattern or significant autocorrelation in the residuals, and they are normally distributed. The p -value for the Ljung-Box test for white noise is 0.50 at 24 lags. As the null hypothesis for the Ljung-Box test is that there is no significant autocorrelation, we do not reject the null and our chosen model has a good fit.

figure 4

Residual check for final model, ARIMA (2,1,0)(0,1,1) 12

Final model

The estimated step change was − 3285 dispensings (95% CI − 4465 to − 2104) while the estimated change in slope was − 1397 dispensings per month (95% CI − 1606 to − 1188). Figure  5 shows the values predicted by our ARIMA model in absence of the intervention (counterfactual) compared with the observed values. This means that the change in subsidy for 25 mg quetiapine in January 2014 was associated with an immediate, sustained decrease of 3285 dispensings, with a further decrease of 1397 dispensings every month. In other words, there were 4682 (3285 + 1397) fewer dispensings in January 2014 than predicted had the subsidy changes not been implemented. In February 2014, there were 6079 fewer dispensings (3285 + 2*1397). Importantly, our findings should only be considered valid for the duration of the study period (i.e. until December 2014).

figure 5

Observed values and predicted values in absence of intervention based on ARIMA model

Many health policies are implemented with a limited evidence base supporting their rationale, and even if well-intended can lead to unintended consequences [ 29 , 30 ]. Thus, evaluation of health interventions is crucial to identify both intended and unintended impacts, to ultimately provide feedback to policy-makers and regulators, improve health care delivery, and inform future public health policy. However, many studies evaluating large-scale interventions use methods that are inadequate or poorly reported [ 31 , 32 ]. As with all analyses, researchers interested in evaluating interventions should use fit-for-purpose tools for a particular research question, as relying on overly simplistic approaches can lead to misleading or biased results [ 1 ].

We have highlighted the importance of controlling for trends, seasonality, and autocorrelation. To a limited extent, segmented regression can also address these issues, typically by inclusion of time and season in the model as covariates, and often this will be enough to eliminate simple autocorrelation. In such cases, segmented regression may be preferred due to its ease of interpretability and implementation. However, there are circumstances in which segmented regression is inadequate. For instance, if the trend in the data is non-linear and/or had an irregular pattern, or if the seasonality is complex, such as weekly or daily, this can be difficult to capture in a segmented regression model. Lastly, if there is residual autocorrelation after running a segmented regression model then alternate approaches will need be considered, of which ARIMA is one.

At times, selecting the most appropriate ARIMA model can be challenging, time-consuming, and subjective, as traditional approaches that rely on ACF/PACF plots to identify model orders are often not informative, as seen in our example. However, there have been attempts over the years to automate the model selection process and simplify the process. We have applied one such algorithm, auto.arima() in the forecast package for R, which we have chosen due to its convenience and ease of use [ 28 ]. Such innovations have made ARIMA modelling more accessible, but as with all automated statistical approaches, still require a knowledgeable user to correctly apply and interpret the results.

It is important for researchers and analysts to have knowledge of a range of statistical tools that can be used as appropriate depending on the nature of the research question and data. ARIMA is one such tool; we have shown how ARIMA modelling can be used to evaluate health interventions when simpler approaches are not appropriate. While we have covered the foundations of ITS analysis using ARIMA models and the most common types of intervention impacts, there are other topics we have not touched on, such as use of cross correlation functions to identify delayed effects, the incorporation of covariates, and more complex transfer functions. These more complex topics have been covered in detail in other texts [ 23 , 24 , 25 ].

Despite the increasing use of ITS analysis, reporting of methods is highly variable and often inadequate [ 32 , 33 ]. In a 2015 review, one third of studies did not report testing for autocorrelation and two thirds did not report adjusting for seasonality [ 33 ]. To maximise reproducibility, we encourage all researchers to publish code to ensure analyses are appropriately conducted and assist others learning these methods, and to follow reporting guidelines where available. While there are currently no EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network reporting guidelines specific to time series analyses, Jandoc et al. [ 33 ] have published methodological and reporting recommendations for studies using ITS analysis which provide a good basis.

ITS analysis, especially when combined with a control series, is a powerful study design for assessing population-level health intervention impacts, and its use is increasing. Segmented regression, the most common method for ITS analysis, is not always adequate. Thus, for researchers interested in ITS analysis, ARIMA modelling is a useful tool, as it can account for underlying trends, autocorrelation and seasonality and allows for flexible modelling of different types of impacts.

Availability of data and materials

The dataset supporting the conclusions of this article is included within its additional files.

Abbreviations

autocorrelation function

autoregressive

Automated regressive integrated moving average model

Enhancing the QUAlity and Transparency Of health Research

interrupted time series

moving average

partial autocorrelation function

randomised controlled trial

Soumerai SB, Starr D, Majumdar SR. How do you know which health care effectiveness research you can trust? A Guide to Study Design for the Perplexed. Prev Chronic Dis. 2015;12:E101.

Article   Google Scholar  

Shadish WR, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Belmont, CA: Wadsworth/Cengage Learning; 2002.

Lopez Bernal J, Cummins S, Gasparrini A. The use of controls in interrupted time series studies of public health interventions. Int J Epidemiol. 2018;47:2082–93.

Fretheim A, Zhang F, Ross-Degnan D, Oxman AD, Cheyne H, Foy R, et al. A reanalysis of cluster randomized trials showed interrupted time-series studies were valuable in health system evaluation. J Clin Epidemiol. 2015;68:324–33.

Bernal JL, Soumerai S, Gasparrini A. A methodological framework for model selection in interrupted time series studies. J Clin Epidemiol. 2018;103:82–91.

Lopez Bernal J, Cummins S, Gasparrini A. Interrupted time series regression for the evaluation of public health interventions: a tutorial. Int J Epidemiol. 2016;46:348–55.

Google Scholar  

Wagner AK, Soumerai SB, Zhang F, Ross-Degnan D. Segmented regression analysis of interrupted time series studies in medication use research. J Clin Pharm Ther. 2002;27:299–309.

Article   CAS   Google Scholar  

Lagarde M. How to do (or not to do) … assessing the impact of a policy change with routine longitudinal data. Health Policy Plan. 2012;27:76–83.

Beard E, Marsden J, Brown J, Tombor I, Stapleton J, Michie S, et al. Understanding and using time series analyses in addiction research. Addiction. 2019;114:1866–84.

Hyndman R, Athanasopoulos G. Forecasting: principles and practice. 2nd edition. 2018. https://otexts.com/fpp2/ . .

Sun L, Klein EY, Laxminarayan R. Seasonality and temporal correlation between community antibiotic use and resistance in the United States. Clin Infect Dis. 2012;55:687–94.

Schaffer A, Muscatello D, Cretikos M, Gilmour R, Tobin S, Ward J. The impact of influenza a(H1N1)pdm09 compared with seasonal influenza on intensive care admissions in New South Wales, Australia, 2007 to 2010: a time series analysis. BMC Public Health. 2012;12:869.

Mellish L, Karanges EA, Litchfield MJ, Schaffer AL, Blanch B, Daniels BJ, et al. The Australian pharmaceutical benefits scheme data collection: a practical guide for researchers. BMC Res Notes. 2015;8:634.

Bødkergaard K, Selmer RM, Hallas J, Kjerpeseth LJ, Pottegård A, Skovlund E, et al. Using the waiting time distribution with random index dates to estimate prescription durations in the presence of seasonal stockpiling. Pharmacoepidemiol Drug Saf. 2020;29:1072–8.

Liboschik T, Fokianos K, Fried R tscount: An R Package for Analysis of Count Time Series Following Generalized Linear Models J Stat Softw 2017;82:1–51.

Dunsmuir WTM, Scott DJ. The glarma package for observation-driven time series regression of counts. J Stat Softw. 2015;67:1–36.

Schaffer AL, Buckley NA, Dobbins TA, Banks E, Pearson S-A. The crux of the matter: did the ABC’s catalyst program change statin use in Australia? Med J Aust. 2015;202:591–4.

Schaffer AL, Buckley NA, Cairns R, Pearson S-A. Interrupted time series analysis of the effect of rescheduling alprazolam in Australia: taking control of prescription drug use. JAMA Intern Med. 2016;176:1223–5.

Young JM, Stacey I, Dobbins TA, Dunlop S, Dessaix AL, Currow DC. Association between tobacco plain packaging and Quitline calls: a population-based, interrupted time-series analysis. Med J Aust. 2014;200:29–32.

Gilmour S, Degenhardt L, Hall W, Day C. Using intervention time series analyses to assess the effects of imperfectly identifiable natural events: a general method and example. BMC Med Res Methodol. 2006;6:16.

Lane TJ, Gray S, Hassani-Mahmooei B, Collie A. Effectiveness of employer financial incentives in reducing time to report worker injury: an interrupted time series study of two Australian workers’ compensation jurisdictions. BMC Public Health. 2018;18:100.

Sun P, Chang J, Zhang J, Khaler K. Evolutionary cost analysis of valsartan initiation among patients with hypertension: a time series approach. J Med Econ. 2011;15:8–18.

Box GEP, Jenkins GM, Reinsel GC. Time series analysis: forecasting and control. Hoboken, NJ: John Wiley & Sons, Inc.; 2008. https://doi.org/10.1002/9781118619193 .

Book   Google Scholar  

Helfenstein U. The use of transfer function models, intervention analysis and related time series methods in epidemiology. Int J Epidemiol. 1991;20:808–15.

Pankratz A. Forecasting with dynamic regression models. Hoboken, NJ, USA: John Wiley & Sons, Inc.; 1991. https://doi.org/10.1002/9781118150528 .

Hyndman R, Kostenko AV. Minimum Sample Size requirements for Seasonal Forecasting Models. Foresight Int J Appl Forecast. 2007;:12–5.

Brett J, Schaffer A, Dobbins T, Buckley NA, Pearson SA. The impact of permissive and restrictive pharmaceutical policies on quetiapine dispensing: evaluating a policy pendulum using interrupted time series analysis. Pharmacoepidemiol Drug Saf. 2018;27:439–46.

Hyndman RJ, Khandakar Y. Automatic time series forecasting: the forecast package for R. J Stat Softw. 2008;27:1–22.

Lu CY, Simon G, Soumerai SB Counter-Point: Staying Honest When Policy Changes Backfire Med Care 2018;56:384.

Shaw J, Murphy AL, Turner JP, Gardner DM, Silvius JL, Bouck Z, et al. Policies for Deprescribing: an international scan of intended and unintended outcomes of limiting sedative-hypnotic use in community-dwelling older adults. Healthc Policy Polit Sante. 2019;14:39–51.

Briesacher BA, Soumerai SB, Zhang F, Toh S, Andrade SE, Wagner JL, et al. A critical review of methods to evaluate the impact of FDA regulatory actions. Pharmacoepidemiol Drug Saf. 2013;22:986–94.

Hudson J, Fielding S, Ramsay CR. Methodology and reporting characteristics of studies using interrupted time series design in healthcare. BMC Med Res Methodol. 2019;19:137.

Jandoc R, Burden AM, Mamdani M, Lévesque LE, Cadarette SM. Interrupted time series analysis in drug utilization research is increasing: systematic review and recommendations. J Clin Epidemiol. 2015;68:950–6.

Download references

Acknowledgements

This research is supported by the National Health and Medical Research Council (NHMRC) Centre of Research Excellence in Medicines Intelligence (ID: 1196900). AS is supported by a National Health and Medical Research Council Early Career Fellowship Scholarship (ID: 1158763). The funders were not involved in the design of the study, collection, analysis, and interpretation of data, or writing.

Author information

Authors and affiliations.

Centre for Big Data Research in Health, UNSW Sydney, Level 2, AGSM Building, Sydney, Australia

Andrea L. Schaffer & Sallie-Anne Pearson

School of Public Health and Community Medicine, UNSW Sydney, Sydney, Australia

Timothy A. Dobbins

Menzies Centre for Health Policy, University of Sydney, Sydney, Australia

Sallie-Anne Pearson

You can also search for this author in PubMed   Google Scholar

Contributions

AS, TD and SP contributed to the conception of the work, interpretation of the data, and revision of the work. AS conducted the analysis and drafted the manuscript. AS, TD and SP all read and approved the submitted version.

Corresponding author

Correspondence to Andrea L. Schaffer .

Ethics declarations

Ethics approval and consent to participate.

As this study relied on publicly available, aggregate data no ethics approval was required.

Consent for publication

Not applicable.

Competing interests

The Centre for Big Data Research in Health, UNSW Sydney has received funding from AbbVie Australia to conduct research unrelated to the present study. AbbVie did not have any knowledge of, or involvement in, the present study. SAP is a member of the Drug Utilisation Sub Committee of the Pharmaceutical Benefits Advisory Committee. The views expressed in this paper do not represent those of the Committee.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Data for reproducing results in this manuscript.

Additional file 2.

R code for reproducing results in this manuscript.

Additional file 3.

SAS code for reproducing results in this manuscript.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Schaffer, A.L., Dobbins, T.A. & Pearson, SA. Interrupted time series analysis using autoregressive integrated moving average (ARIMA) models: a guide for evaluating large-scale health interventions. BMC Med Res Methodol 21 , 58 (2021). https://doi.org/10.1186/s12874-021-01235-8

Download citation

Received : 13 July 2020

Accepted : 19 February 2021

Published : 22 March 2021

DOI : https://doi.org/10.1186/s12874-021-01235-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Interrupted time series analysis
  • Autoregressive integrated moving average models
  • Policy evaluation
  • Intervention analysis

BMC Medical Research Methodology

ISSN: 1471-2288

research paper on arima model

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PLOS Digit Health
  • v.2(2); 2023 Feb

Logo of pdig

Application of ARIMA, and hybrid ARIMA Models in predicting and forecasting tuberculosis incidences among children in Homa Bay and Turkana Counties, Kenya

Stephen Siamba

University of Eldoret, School of Science, Department of Mathematics and Computer Science, Eldoret, Kenya

Argwings Otieno

Julius koech, associated data.

The authors confirm that the data supporting the findings of this study are available within the article [and/or] as part of the supporting information (named S1 File ) in a comma-separated values format.

Tuberculosis (TB) infections among children (below 15 years) is a growing concern, particularly in resource-limited settings. However, the TB burden among children is relatively unknown in Kenya where two-thirds of estimated TB cases are undiagnosed annually. Very few studies have used Autoregressive Integrated Moving Average (ARIMA), and hybrid ARIMA models to model infectious diseases globally. We applied ARIMA, and hybrid ARIMA models to predict and forecast TB incidences among children in Homa Bay and Turkana Counties in Kenya. The ARIMA, and hybrid models were used to predict and forecast monthly TB cases reported in the Treatment Information from Basic Unit (TIBU) system by health facilities in Homa Bay and Turkana Counties between 2012 and 2021. The best parsimonious ARIMA model that minimizes errors was selected based on a rolling window cross-validation procedure. The hybrid ARIMA-ANN model produced better predictive and forecast accuracy compared to the Seasonal ARIMA (0,0,1,1,0,1,12) model. Furthermore, using the Diebold-Mariano (DM) test, the predictive accuracy of ARIMA-ANN versus ARIMA (0,0,1,1,0,1,12) model were significantly different, p<0.001, respectively. The forecasts showed a TB incidence of 175 TB cases per 100,000 (161 to 188 TB incidences per 100,000 population) children in Homa Bay and Turkana Counties in 2022. The hybrid (ARIMA-ANN) model produces better predictive and forecast accuracy compared to the single ARIMA model. The findings show evidence that the incidence of TB among children below 15 years in Homa Bay and Turkana Counties is significantly under-reported and is potentially higher than the national average.

Author summary

Tuberculosis remains a disease of major public health concern especially in resource limited settings. Despite this, tuberculosis is still characterized by high morbidity and mortality from a single infectious disease, particularly among children in developing countries. The actual burden of tuberculosis among children is relatively unknown and about two-thirds of cases are either unreported or undiagnosed in Kenya. The use of novel mathematical models is critical and can be leveraged to guide policymakers in the prevention and control of infectious diseases such as tuberculosis. We use autoregressive moving average and hybrid forms of these models to model and forecast tuberculosis infections among children. We found out that hybrid autoregressive moving average models provide more accurate predictions and forecasts of tuberculosis infections among children. We also found out and confirmed that the actual burden of tuberculosis among children is still under-estimated. Our study highlights on the ever existing gap in the under-estimation of tuberculosis among children and points to the importance of novel modelling methods in the understanding of the actual burden of tuberculosis among children.

Tuberculosis is a highly infectious disease ranked among the top ten most lethal causes of mortality. Approximately 33% of the global population, particularly in developing countries, has been plague-ridden with TB [ 1 ]. In 2016, over 10 million new TB cases were reported globally with children below 15 years of age accounting for about 7% of those cases [ 2 ]. Furthermore, in 2016, developing countries accounted for over 85% of new TB cases globally with Asian and African countries contributing 61% and 25% respectively of global new TB cases while approximately 7 countries, globally, accounted for close to 65% of all new TB cases [ 2 ]. In 2018, about 1 million TB cases and over 230,000 TB-related deaths occurred among children below 15 years with about 55% of these reported TB cases either undiagnosed and/or unreported [ 3 ]. In 2019, 30 high TB burdened countries accounted for 87% of all new TB cases while only 8 countries accounted for approximately 67% of the total new TB cases [ 4 ]. Despite these statistics, pediatric TB is usually overlooked [ 5 ] amid diagnosis and treatment challenges.

The TB burden in Sub-Saharan Africa (SSA) is far much greater and is exacerbated by poverty, political strive, and weak health systems which have curtailed implementation of TB control interventions. Consequently, TB has become an enormous burden to health systems that are already overstretched [ 6 ].

Tuberculosis is a disease of major concern in Kenya and is among the top five causes of mortality. Kenya is listed among the top 30 TB high burdened countries [ 7 ]. Kenya is also among 14 countries globally that suffer from the TB, TB-Human Immunodeficiency Virus (HIV) and Acquired Immunodeficiency Syndrome (AIDS) co-infection, and Multi-Drug Resistant TB [ 8 ] triple burden. The TB incidence for Kenya in 2015 was 233 per 100,000 (95% Confidence Interval (CI): 188–266) population with a mortality of 20 per 100,000 and TB case notification increased from 11,000 to 116,723 between 1990 and 2007 [ 9 ] occasioned by the HIV epidemic and improved case detection due to improved diagnostic capacity.

The use of mathematical models in the modeling of epidemic interactions and occurences within populations has been detailed extensively. While existing interventions to control TB have been partially successful, within the context of resource constraints, mathematical modeling can increase understanding and result in better policies toward implementation of effective strategies that would compound better health and economic benefits [ 10 ]. In addition, mathematical models such as machine learning (ML) methods are essential and can be leveraged [ 11 ] in guiding policymakers in resource allocation toward the prevention and control of diseases.

In Africa, the application of novel machine learning approaches, such as ARIMA models, in modelling disease incidence is well documented. These models, in different forms, have been used to forecast short-term and long-term patterns of non-infectious diseases such as cancer and malaria [ 12 , 13 , 14 ]. In these studies, as much as ARIMA models offered a way of predicting cases, they did not guarantee perfect forecasts especially over a longer forecast horizon [ 12 ] and can best be applied on data that is stable or exhibits a consistent pattern over time and with minimum outliers [ 13 ]. As such, these models would not be suitable if there is no clear strategy of dealing with outliers and suffer from lack of enough data which can result in either under-fitting or over-fitting [ 14 ].

More recently, ARIMA and seasonal ARIMA models have been applied to predict and forecast COVID-19 cases in Sub-Saharan Africa. While noting that time series models have been extensively used as convenient methods to predict the prevalence or spreading of infectious diseases, Takele [ 15 ] applied ARIMA model to project Covid-19 prevalence in East Africa countries of Ethiopia, Djibouti, Sudan and Somalia. They noted that future prediction of COVID-19 cases especially in the context of the four countries considered in the study might be affected because of the nature of the spread of COVID-19 [ 15 ]. In addition, the study did not take into account the effect of seasonality, such as, days of the week where COVID-19 infections were either highest or lowest and this might have impacted on the accuracy of their findings.

Furthermore, Umunna and Olanrewaju [ 16 ] modelled HIV prevalence in Minna in Niger state in Nigeria using ARIMA and SARIMA models using monthly HIV data from 2007 to 2018. A SARIMA model was shown to be the best model for forecasting monthly HIV prevalence. Of interest in their findings was that the average fitted value from January 2007 was half of the actual value reported which in essence would indicate under-fitting and might have been better addressed by considering a more robust approach for model evaluation. In addition, outliers which might have accounted for extraneous variation might have been present in the data basing on the 95% prediction intervals which included negative values. Furthermore, the optimal SARIMA model might have been impacted by the existing non-linearities within the data which were not effectively accounted for by the linear model.

In the context of TB, Aryee et al . [ 17 ] conducted a study to obtain a time series model to estimate the incidence of TB cases at the chest clinic of the Korle-Bu Teaching hospital (KBTH). They utilized the Box-Jenkins ARIMA approach on monthly TB cases reported at the KBTH from 2008 to 2017. Although they found no evidence of increasing or decreasing trend in the TB incidence, they noted that the best model does not always produce the best results with respect to the mean absolute error (MAE) and mean square error (MSE). As such, the study could have utilized a more robust model and methodology that would further result in better accuracy.

In addition, Ade et al . [ 18 ] conducted a study to determine changes in TB epidemiology in last 15 years between 2000 and 2014 in Benin, seasonal variations, and forecasted numbers of TB cases over a period of five years using the Box-Jenkins approach of the ARIMA model. They found existing seasonal variations in TB case finding and notification with the highest numbers recorded within the first quarter of the year. They found that the annual notified cases increased, with the highest reported in 2011 and their 5-year forecast showed a decreasing trend. The study forecasted TB cases over a period of 5 years which would produce inaccurate forecasts because the MSE tends to increase with increase in the forecast horizon. Furthermore, improved accuracy would have been achieved by implementing validation procedures.

Several studies have utilized ARIMA, Seasonal ARIMA (SARIMA), neural network, and hybrid ARIMA models to model TB incidences [ 19 , 20 ] and in these studies, the hybrid models were demonstrated to offer better predictive and forecast accuracy. Azeez et al . [ 20 ] compared the predictive capabilities of the SARIMA and the hybrid SARIMA neural network auto-regression (SARIMA-NNAR) models in modeling TB incidences in South Africa and the SARIMA-NNAR model was found to have better goodness-of-fit. As one of their limitations, Azeez et al . [ 20 ] noted that the data used covered 2010 to 2015 and were verified against only one year of TB prevalence data and as such, the findings should be interpreted with caution. They proposed that the analysis should be revisited with additional time series data using a strong mathematical model. In this case, availability of data was a gap within this study and as such, more robust approaches in model accuracy improvement would have worked better especially in the context of a small set of data.

Li et al . [ 21 ] compared the predictive power of the ARIMA and ARIMA-generalized regression neural network (GRNN) hybrid models in forecasting TB incidences in China and concluded that the hybrid model was superior to the single ARIMA model. In this comparative study, as much as the hybrid ARIMA-GRNN hybrid model produced predictions and forecasts with better accuracy, the ARIMA and GRNN single models might have suffered from their inability to effectively account for non-linearities and linearities existing within the data respectively in addition to the lack of enough data to allow better learning from the GRNN model specifically.

The ARIMA, different forms of Neural Networks models and hybrid models have also been applied in modeling other infectious diseases [ 22 , 23 , 24 ] and in all these studies, hybrid models were found to offer better predictive and forecasting accuracy compared to single models mostly because of their ability to model both linear and non-linear patterns within data.

While hybrid ARIMA models have been applied in forecasting both the short-term and long-term incidences of infectious diseases in other countries, there has been little to no application of these cutting-edge methods in African countries with the majority of the models limited to only ARIMA models. In Kenya, while ARIMA models have been applied in forecasting disease incidence [ 25 ], very little has been done in the application of hybrid ARIMA models in predicting disease incidence except in non-public health settings such as agriculture and economics.

The popularity of ARIMA models stems from their flexibility to represent varieties of time series with simplicity but with a profound limitation stemming from their linear assumptions which in many cases is usually impractical [ 26 ] since real-world applications mainly involve data exhibiting non-linear patterns. Consequently, to overcome this disadvantage, non-linear stochastic models such as the ANN models have been proposed [ 27 ]. Despite this, a single ANN model is not able to incorporate both linear and non-linear patterns and this has led to the adoption of hybrid models to address this challenge [ 28 ]. To attain a higher degree of predictive and forecasting accuracy, theoretical and empirical findings show that combining different models can be effective [ 29 ].

To better understand the status of TB infection among children in Kenya, it is important to assess the trend and forecast these incidences using available surveillance data and novel models to elicit a better understanding and innovative interventions to curtail the spread of pediatric TB in Kenya. This study compares linear-based ARIMA, and hybrid ARIMA models in modeling TB incidences among children below 15 years in Homa Bay and Turkana Counties in Kenya.

Materials and methods

Study design.

This was a retrospective quantitative study that utilized aggregated monthly TB cases data reported by health facilities located in Homa Bay and Turkana Counties to the National Tuberculosis, Leprosy and Lung Disease Program (NTLLDP) in the Treatment Information from Basic Unit (TIBU) electronic system between January 2012 to December 2021 comprising 120 observations of monthly aggregated TB cases for children below 15 years. The study utilized data reported by health facilities in Homa Bay and Turkana Counties which are among the top 10 TB endemic Counties in Kenya [ 30 ].

Study setting

Homa Bay County comprises 8 Sub-Counties and is one of the former districts of Nyanza province in Kenya with Homa Bay town as its headquarter. On the other hand, Turkana County is majorly semi-arid and is made up of 6 Sub-Counties and borders 3 countries of Ethiopia to the North, South Sudan to the North West and Uganda to the West. Homa Bay County is situated on the shores of Lake Victoria, which provides a significant source of income to the local population. Homa Bay County is approximately 3,155 km 2 and lies approximately 0.6221° S, 34.3310° E ( S1 Fig ) [ 31 ]. Turkana County is located 3.3122° N, 35.5658° E within the former Rift Valley province of Kenya ( S1 Fig ) [ 31 ] and is by far the largest County in Kenya by land area and occupies approximately 68,680 km 2 with Lodwar being its larget town and headquarters. Homa Bay and Turkana Counties have a population of approximately 1,131,950 and 926,976 [ 32 ] respectively. Homa Bay County has a HIV prevalence that is 4.5 times higher than the national HIV prevalence [ 33 ] and faces a double burden of TB-HIV co-infection resulting in an increased risk of TB-related deaths [ 34 ]. The population of Turkana is majorly nomadic and is considered a hardship area, prone to drought and faces high disease burden due to inadequate public health resources [ 35 ].

Data collection and analysis

Tuberculosis case data were abstracted and aggregated for each month between January 2012 to December 2021 for health facilities located in Homa Bay and Turkana Counties in Kenya. In 2012, the Kenya Ministry of Health (MoH) through the Division of Leprosy, Tuberculosis and Lung Disease transitioned the reporting of TB cases from paper-based to the TIBU system [ 36 ]. The TIBU system is a national TB case-based surveillance system used in the storage of individual cases of TB that are reported to the national TB program monthly with nationwide coverage [ 37 ]. This study did not collect or utilize patient-level data.

One of the objectives of time series analysis is to use an observed time series to forecast future observations. In the absence of actual new data to forecast, the cross-validation technique offers a way through which a model’s future predictive accuracy is estimated and errors minimized. In addition, Arlot and Celisse [ 38 ], noted that given that training a model and evaluating its performance on the same data results in overfitting and because of working with limited data, splitting the data into a training and validation sample suffices. In the context of cross-validation, while a single data split yields a validation estimation of risk, averaging over a number of splits yields a cross-validation estimate. In this case, a minimum size for the training set was specified and based on one-step forecasts [ 39 ], different training sets, each containing one more observation from the previous one were used [ 40 ].

Data analysis was performed using R statistical software [ 41 ] together with applicable packages for analyzing time-series data. The results were summarized using tables and figures.

The Time Series concept

A time series is a sequential set of data measured over time and is typically composed of the trend, cyclical, seasonal, and irregular (random) components.

An autoregressive (AR) model is a type of random process used to describe certain time-varying processes within a time series [ 42 ]. The basic idea of AR models is that the present value of a series Y t can be linearly explained by a function of p past values, that is, Y t−1 , Y t−2 +,…,+Y t−p .

In the case of this study, the expected value of the series Y t was not equal to zero (0), that is, E(Y t ) = μ≠0, as such, the series E(Y t ) = μ≠0, Y t was replaced by Y t −μ to obtain an AR process of order p [ 42 ] and can be written as.

Where; ε t is white noise (WN), is uncorrelated with Y s for all s < t and α = μ(1−ϕ 1 −…−ϕ p )

A moving average (MA) model uses the dependency between an observed value and the residual error from a moving average model applied to lagged observations. This implies that the output variable is linearly dependent on the current and past values of a stochastic term [ 42 ].

Consequently, Y t is a moving average process of order q if;

Where ε t is WN and θ 1 ,…,θ q are constants

Alternatively, Eq 2 can also be written in the form Y t = θ(B)ε t , where θ ( B ) = 1 + θ 1 B + θ 2 B 2 + … + θ q B q = 1 + ∑ j = 1 q θ j B j is the moving average operator.

Autoregressive Integrated Moving Average (ARIMA) models

A non-seasonal ARIMA ( p , d , q ) model is a class of stochastic processes whose auto-covariance functions depend on a finite number of unknown parameters. The ARIMA model can only be applied when a series is stationary [ 43 ] which can be achieved by differencing the series. Generally, an ARIMA process of orders p , d and q can be represented mathematically [ 44 ] as;

In lag operator notation, a non-seasonal non-differenced ARIMA ( p , d , q ) process is written as ϕ(B)Y t = θ(B)ε t ∀t∈ℤ.

Box and Jenkins introduced the ARIMA model in 1960 [ 45 ]. The ARIMA model requires only historical time series data on the variable under forecasting. Most importantly, ARIMA models are represented as ARIMA ( p , d , q ) where p is the number of AR terms, d is the number of non-seasonal differences, and q is the number of lagged forecast errors [ 46 ]. The ARIMA model assumes that the residuals are independent and normally distributed with ε t ~N(μ, σ 2 ) homogeneity of variance and zero mean value.

Seasonal Autoregressive Integrated Moving Average models (SARIMA) models

The SARIMA model is made up of non-seasonal and seasonal components in a multiplicative model. A SARIMA model can be written as ARIMA ( p , d , q ) ( P , D , Q ) S where p is the non-seasonal AR order, d is the non-seasonal differencing, q is the non-seasonal MA order, P is the seasonal AR order, D is the seasonal differencing, Q is the seasonal MA order and S is the period of repeating seasonal pattern. Generally, S = 12 for monthly data. As such, with the backshift operator presented as B Y t = Y t-1 , without differencing, a SARIMA model can be written formally as [ 47 ];

Where on the left of Eq 4 , the seasonal and non-seasonal AR processes multiply each other, and on the right, the seasonal and non-seasonal MA processes multiply each other. Also, in this study, S = 12 since monthly TB cases was used.

Artificial Neural Networks (ANNs) models

Artificial Neural Networks have been suggested as alternative and better modeling approaches to time series forecasting [ 48 ]. The main goal of ANNs is to construct a model that mimics the human brain intelligence into a machine [ 47 , 48 , 49 ] and are biologically motivated [ 49 ]. The most common ANNs are multi-layer perceptrons (MLPs) [ 50 ] made up of the input layer, the hidden layer, and the output layer connected by acyclic links [ 51 ]. A neuron is a data processing unit while the nodes in the various layers of ANNs are the processing elements.

The ANN model equation can be presented according to Zhang [ 52 ] and it performs a nonlinear mapping from past observations of a time series to a future value. In addition, there is no systematic rule in deciding the choice of q while p , which is the number of neurons, is equal to the number of features in the data [ 52 ]. The logistic function h(.) is applied as the nonlinear activation function represented as, h ( . ) = 1 1 + e − x .

Hybrid (ARIMA-ANN) models

Generally, a time series can be observed as having linear and nonlinear components as Y t = l t +n t Where l t and n t are the linear (from the ARIMA model) and nonlinear (ANN fitted ARIMA model residuals) components respectively. Residuals from the ARIMA model are fitted with the ANN model.

Proposed methodology

The proposed methodology for this study was based on the combination of the Box-Jenkins methodology for ARIMA modeling, and the hybrid ARIMA models. First, the ARIMA model was developed with the optimal model selected based on the minimum AIC and BIC as well as the model that minimizes RMSE, MAE, and MAPE. This was achievable by applying an automated ARIMA function following Box and Jenkins procedure [ 52 ] within a cross-validation procedure. Second, the best parsimonious ARIMA model was used to predict TB cases and accuracy measures calculated by comparing the fitted and the actual TB cases. Later, the model was used to forecast TB cases for the year 2022. Third, the residuals from the ARIMA model were obtained and fit using an autoregressive neural network to ensure that any existing signal was captured. Fourth, the fitted residuals were combined with the ARIMA model fitted TB cases to form the hybrid model. The fitted TB cases of the hybrid model were compared against the actual TB cases and accuracy measures calculated. Fifth, the hybrid model was used to forecast TB cases for the year 2022. Finally, the predictive accuracy of the two models was compared to establish the model with the best predictive accuracy.

Model identification and specification

Optimal values of p , d , q , P , D , and Q for the ARIMA model were determined by examining the autocorrelation functions, and the best model was determined by testing models with different parameters of p , d , q , P , D , and Q. The models were estimated using the maximum likelihood estimation (MLE) method and the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) [ 53 ] penalty function statistics were used to determine the best model that minimizes AIC or BIC.

One assumption of the ARIMA model is that the residuals should be white noise. As such, the Ljung-Box Q test [ 54 ] was used to test the hypothesis of independence, constant variance and zero mean of the model residuals.

Accuracy measures

Various accuracy measures have been proposed [ 55 ] to determine predictive and forecast performance. This study used the Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the Mean Absolute Percent Error (MAPE) to measure the predictive and forecast accuracy of the two models. The lower the values of these accuracy measures the better the model. Furthermore, MAPE values of 10% or below, 10–20%, and 20–50% should be considered as high accuracy, good accuracy, and reasonable accuracy [ 56 ].

The study also compared the predictive accuracy of the forecasts from the three models using the Diebold-Mariano (DM) test [ 57 ]. The test was used to test the null hypothesis that two models have similar predictive accuracy.

To allow implementation of the cross-validation procedure, the minimum number of observations required to fit the ARIMA model was set basing on the recommendation by Hyndman and Kostenko [ 58 ] who proposed that at least p+q+P+Q+d + mD +1 observations are sufficient for a seasonal ARIMA model in which case, the study considered a seasonal ARIMA model, though automatically selected basing on the fact that the data used was monthly data and seasonality had to be accounted for. In addition, the minimum number of observations for model development within the cross-validation framework was set at 60, comprising of observations for 5 years [ 59 ].

Ethical approval and considerations

A research permit was obtained from the National Commission for Science, Technology, and Innovation (NACOSTI) in Kenya. Authorization for use of the data from the TIBU system was obtained through an letter of approval under the Patient and Program Outcomes Protocol (PPOP) by the Elizabeth Glaser Pediatric AIDS Foundation.

Exploratory data analysis

There was a total of 120 observations in this data. The trend of the TB cases among children below 15 years in Homa Bay and Turkana Counties in the data ( Fig 1 ) showing a notable increase in the TB cases reported between 2018 and 2021. The monthly cycle box plot of TB cases ( S2 Fig ) show that there is a potential presence of seasonality within the reported TB cases. However, whether or not to account for seasonality in the model depends on whether this would improve model accuracy. This implies that there is need to account for seasonality within the ARIMA model. Furthermore, outliers were detected in some months.

An external file that holds a picture, illustration, etc.
Object name is pdig.0000084.g001.jpg

Comparison of model performance in predicting TB cases

Model estimation and accuracy.

The Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) were used to pick the best parsimonious model based on the least AIC or BIC estimated values. The best model was ARIMA (0,0,1,1,0,1,12); where p = 0, d = 0 and q = 1 respectively and P = 1, D = 0 and Q = 1 respectively. The Ljung-Box Q test for the best model showed a p-value of 0.079 implying that the ARIMA (0,0,1,1,0,1,12) model residuals were independently distributed. The best model was made up of non-differenced seasonal AR (1), non-seasonal MA (1) model and seasonal MA (1) polynomials. From the model output, the estimated coefficients were ( Table 1 ); ma1 = θ 1 = 0.296, sar1 = ϑ 1 = 0.999, sma1 = Θ 1 = -0.968 and μ = 50.698.

Significance codes: 0 ‘***’ 0.001‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘1

Plugging these estimated coefficients into Eq 4 yields the model equation:

Seasonal ARIMA model diagnostics and performance

The performance of the Seasonal ARIMA (0,0,1,1,0,1,12) model was carried out by comparing predicted and forecasted TB cases with the actual TB cases reported ( Fig 2 ).

An external file that holds a picture, illustration, etc.
Object name is pdig.0000084.g002.jpg

Comparison of the accuracy of parameters/measures of the Seasonal ARIMA (0,0,1,1,0,1,12) model fitted against the actual TB cases showed a RMSE, MAE and MAPE values of 18.69, 14.32, and 38.93 respectively. In addition, the mean number of fitted TB cases from the Seasonal ARIMA (0,0,1,1,0,1,12) model was 51 cases compared to a mean of 51 cases from the actual reported TB cases. The monthly median plots with the fitted and actual median TB cases compared, and it clearly shows that the model is able to capture the seasonal pattern within the fitted TB cases as well ( Fig 3 ).

An external file that holds a picture, illustration, etc.
Object name is pdig.0000084.g003.jpg

The best ARIMA (SARIMA) model was assessed for fit using the standard model residual analysis ( S3 Fig ). The residuals plot was relatively normal except for a few outliers at the tails, with model residuals being normally distributed. Inspection of the Autocorrelation Function (ACF) test residual randomness in order to identify patterns or extreme values showed significant auto-correlations at lag 3.

Hybrid (Seasonal ARIMA-ANN) model estimation and accuracy

Residuals from the optimal Seasonal ARIMA (0,0,1,1,0,1,12) model were fit using an ANN model and the accuracy measures calculated as well as comparison of the forecast and prediction ( Fig 4 ). The residuals from the optimal Seasonal ARIMA (0,0,1,1,0,1,12) model were fit using an ANN model using the Neural Network Auto-Regressive (NNAR) function to produce an NNAR ( p , P , k ) [m] model. The optimal lag parameter, p , and the number of nodes in the hidden layer, k , were automatically selected while P = 1 by default. In addition, a decay parameter of 0.001 and a maximum iteration of 200 were pre-set for the model to help restrict the weights from becoming too large and ensure that the model can test different models until the optimal model that has the minimal RMSE produced respectively.

An external file that holds a picture, illustration, etc.
Object name is pdig.0000084.g004.jpg

The findings show that the hybrid Seasonal ARIMA-ANN model resulted in a RMSE, MAE, MAPE values of 16.41, 12.99, and 36.00 respectively when the fitted TB cases from the hybrid model were compared against the actual TB cases reported. This represents a decrease of 12.2%, 9.3%, and 7.5% on the RMSE, MAE, and MAPE accuracy measures respectively when compared to the accuracy of the Seasonal ARIMA model.

Comparison of model predictive accuracy

The predictive accuracy of the models was compared using the Diebold-Mariano (DM) test with the null hypothesis that the predictive accuracy of the two models compared are the same. The DM statistic was 3.819, with a p-value of <0.001 indicating that the Seasonal ARIMA-ANN and Seasonal ARIMA (0,0,1,1,0,1,12) models present significantly different predictive accuracies. In general, the Seasonal ARIMA-ANN model offers better predictive accuracy compared to the Seasonal ARIMA (0,0,1,1,0,1,12) model.

Comparison of model performance in forecasting temporal trends of TB incidences

The resulting Seasonal ARIMA (0,0,1,1,0,1,12), and ARIMA-ANN models were used to forecast TB cases for 2022. The point forecast results ( Table 2 ) and comparison of the model forecasts ( Fig 5 ) show the mean forecasted TB cases was 52 (80% CI: 48, 56), and 52 (80% CI: 48, 56) cases per month based on the Seasonal ARIMA (0,0,1,1,0,1,12), and hybrid ARIMA-ANN respectively for 2022 (upto November) giving a total of 569, and 573 TB cases forecasted for the year 2022 (upto November) from the Seasonal ARIMA (0,0,1,1,0,1,12), and ARIMA-ANN models respectively.

An external file that holds a picture, illustration, etc.
Object name is pdig.0000084.g005.jpg

Although the two models were able to predict TB cases among children below 15 years, the hybrid Seasonal ARIMA-ANN model was able to offer better predictive performance compared to the single Seasonal ARIMA model. These findings compare with those from other studies which applied either hybridized ARIMA or SARIMA in the modeling of TB incidences and other infectious diseases [ 19 , 23 , 59 , 60 ] with the overall conclusion that hybrid models have better predictive performance. The majority of infectious disease data are neither purely linear nor non-linear and mostly present with both linear and nonlinear properties. As such, single models are not enough in modeling such kinds of data. Hybrid models are found to be most appropriate for the accurate estimation of such data [ 61 ]. The use of hybridized ARIMA models has been proposed in recent years and used extensively with improvements proposed over time.

The estimated TB incidence in Kenya was 259 TB cases per 100,000 population in 2020 [ 5 ] in the general population. This translates to approximately 134,680 TB cases, and with children accounting for about 20% (26,936) of these cases [ 62 ], the incidence among children below 15 years was approximately 121 TB cases per 100,000 population of children. Furthermore, children present as a most vulnerable with a higher risk of contracting TB [ 63 ]. In addition, Makori et al . [ 30 ] noted that the burden of TB in Kenya was higher than previously thought. This study forecasted a mean of 52 TB cases per month in 2022 (till November) for Homa Bay and Turkana Counties and estimates that the mean number of TB cases reported among children below 15 years would be approximately 624 in 2022. However, given that these are estimated reported cases, they most likely represent only about 35% of TB cases since up to 65% of pediatric TB cases are potentially missed each year [ 3 ]. Taking this into account, the estimated TB cases for 2022 will be approximately 1783 and ranging between 1646 to 1920 for Homa Bay and Turkana Counties among children below 15 years on average. The estimated population of children below 15 years in Homa Bay and Turkana Counties for 2022 is approximately 1,020,795 [ 64 ]. As such, the estimated TB incidence among children in Homa Bay and Turkana Counties in 2022 would be approximately 175 TB incidences per 100,000 population (161 to 188 TB cases per 100,000 population). This estimated TB incidence among children below 15 years for Homa Bay and Turkana Counties is slightly lower compared to the estimated TB incidence in 2015 which was estimated at 233 TB cases per 100,000 (95% CI 188–266) population within the general population in Kenya [ 6 ] but higher than the estimated national average of 121 TB cases per 100,000 population of children below 15 years in 2020.

The findings of this study show that the estimated TB incidence among children below 15 years is higher compared to the estimated national average for 2020. These findings are in line with the WHO newsletter that indicated that the number of people developing TB and dying from the disease could be much higher in 2021 and 2022 mainly because of the COVID-19 pandemic [ 65 ] and since this was based on the general population, it is concerning that the same trend is witnessed among children below 15 years. These findings also confirm those by Oliwa et al . [ 66 ] who indicated that notification data may underestimate the TB burden among children while Mbithi et al . [ 67 ] reported a decrease in TB diagnosis in Kenya by an average of 28% in the year 2020. In addition, the conclusions from Makori et al . [ 30 ] about the need to intensify TB case finding among younger, especially pediatric populations affirm the findings of this study.

The findings of this study further reveal that TB infections among children tended to exhibit a seasonal pattern with 3 peaks experienced in March, June and September respectively. Despite very few studies highlighting the importance of seasonal variations coinciding with TB infections, seasonality of TB infections has been documented in other studies [ 68 , 69 , 70 ]. While other studies did not directly attribute TB infections to seasonal patterns, Jaganath et al . [ 71 ] found a link between the peaks of the rain and influenza seasons and increased TB infections among children in Uganda. The findings in this study showing that TB infections among children correlate with seasons might be due to the fact that different seasonal patterns such as dry and rainy seasons carry a major influence on TB transmission and health seeking behavior within the study area. Prior studies have suggested higher TB infections in rainy seasons which are coupled with higher incidence of respiratory illnesses and lower vitamin D levels [ 72 ] and would require further investigation with the aim of putting in specific interventions that would result in increased TB screening and diagnosis in peak seasons and to curtail TB infections in such seasons.

The hybrid ARIMA model offers better predictive accuracy and forecast performance compared to the single ARIMA model in modeling TB cases among children below 15 years in Homa Bay and Turkana Counties.

The findings in this study confirm that the under-reporting of TB cases among children below 15 years and the incidence in this vulnerable group is still persistent and might be higher than previously estimated. As such, there is need to re-look at the TB surveillance framework data more closely to understand existing gaps. There is an urgency to re-align vital resources towards the National TB program to have the TB fight back on track in these two Counties, especially active case finding among children which would also require application of novel methods of TB diagnosis.

Furthermore, the findings of this study point to the fact that TB infections among children below 15 years in Homa Bay and Turkana Counties are influenced by seasonal patterns which might influence the health seeking behavior and transmission pattern of the disease. As such, there is need to invest resources toward increased TB surveillance, screening, and diagnosis efforts within specific months of the year as well as putting in measures to curtail spread of the disease during peak seasons.

Limitations

This study utilized data collected and reported in the TIBU system, as such, the study did not have control over the quality and accuracy of the data. However, it was assumed that given that the data had been reported in the system, all related procedures to assure data quality had been followed by the reporting health facilities within Homa Bay and Turkana Counties.

This study utilized data between 2012 to 2021 which comprised 120 observations representing monthly aggregated TB cases for children below 15 years. Deep learning and machine learning algorithms usually demand a large amount of data to allow the algorithm to effectively learn. As such, the available data might not have been sufficient to allow for better learning by the algorithm. As such, the models can be applied with additional data.

This study combined data and analysis for Turkana and Homa Bay County. However, these two Counties might present different scenarios when it comes to pediatric TB. In addition, since the study focused on modeling TB cases among children below 15 years in Homa Bay and Turkana Counties, the findings might not be generalized to other Counties of Kenya.

The study data covered the period 2012 to 2021 which included the years 2020 and 2021 during which the COVID-19 pandemic was experienced in Kenya and the region. During this period, there were COVID-19 related measures and restrictions put in place by the government of Kenya aimed at reducing the spread of the corona virus. As such, such measures would have had an unprecedented effect on TB related activities at community and health facility level. Consequently, this study could not quantify the COVID-19 impact on TB cases reported among children below 15 years as this was beyond the scope of this study. A possible recommendation is to utilize models such as interrupted time series to measure possible impact of COVID-19 on TB detection, diagnosis and management.

Supporting information

( https://kenya.africageoportal.com/datasets/d2f2df2a08ef42e88cb6bdc00e41dcc9_0/explore?location=0.361948%2C41.711735%2C6.00 ) [ 31 ].

Acknowledgments

We would like to acknowledge the departments of health in Homa Bay and Turkana County and the health facilities within these Counties for the program interventions towards TB identification and management of TB cases. Their efforts have gone a long way in contributing to the data used in this study. We also acknowledge the Elizabeth Glaser Pediatric AIDS foundation for granting permission to use this data within their Patient and Program Outcomes Protocol (PPOP); this made our access process easy while also meeting the ethical requirements.

Funding Statement

The authors received no specific funding for this work.

Data Availability

  • PLOS Digit Health. 2023 Feb; 2(2): e0000084.

Decision Letter 0

PDIG-D-22-00198

Application of ARIMA, hybrid ARIMA and Artificial Neural Network Models in predicting and forecasting tuberculosis incidences among children in Homa Bay and Turkana Counties, Kenya

PLOS Digital Health

Dear Dr. Siamba,

Thank you for submitting your manuscript to PLOS Digital Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Digital Health's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript within 60 days Nov 05 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at gro.solp@htlaehlatigid . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pdig/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Thomas Schmidt

Academic Editor

Journal Requirements:

Additional Editor Comments (if provided):

Thank you for submitting your work to PLOS Digital Health. Your paper has been submitted to two reviews, and based on their feedback, as well as my own assessment. I recommend a major revision before final acceptance.

I agree with both our reviewers comments, but would also like to add a few of my own. Foremost regarding the structure and content of the paper. I’m of the opinion that research papers should limit the elaborate use of formulas for fairly generic algorithms. They make sense when adjustments or customizations have been utilized. So please consider if the many equations listed in Materials and methods section are truly necessary, or can be found in textbooks. Likewise, I recommend a reconsideration of the value of listing metrics for models during both training and testing. These metrics are useful when doing an overall evaluation of each model’s performance, but consider if they are relevant for potential readers.

I find your paper to be well written but would like to have you elaborate the discussion a bit further on the applicability of your approach.

Also, to what extent is your dataset affected by COVID-19? Please consider this in your limitations section.

Finally, I suggest that you use consistent terminology for your dataset. ‘Data points’ are confusing, please use subjects, patients, or children instead. You use both data points, records etc.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria ? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

--------------------

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: General comments

=============

This study applied ARIMA, hybrid ARIMA and ANN to predict incidence of TB in Children in two counties in Kenya and demonstrates that hybrid ARIMA has better predictive and forecast accuracy in comparison to ARIMA and ANN models. This is an interesting study and including more details in methods would improve the manuscript further.

Specific comments

Major comments

---------------------

The authors haven’t provided adequate justification for this study apart from stating that previous studies have used only ARIMA models in forecasting disease incidence in Kenya. It would be helpful if the authors conducted a more rigorous literature review to identify the gaps in literature on how ARIMA and hybrid ARIMA models were applied for forecasting disease incidence in Africa and Kenya. Also, the authors must highlight the gaps in Azeez et al.’s study and the methodological gaps in that study to justify the aims and objectives of this study.

My biggest concern with this study is the selection of ANN models as a comparator. With the limited size of the dataset, these models were always likely to underperform. Adequate justification on why these models have been used for this study and why it is important to compare ANN with ARIMA and hybrid ARIMA would be helpful. Also, multiple other studies have demonstrated that SARIMA

I would strongly encourage the authors to use Jandoc et al.’s study ( https://pubmed.ncbi.nlm.nih.gov/25890805/ ) and Schaffer et al.’s article ( https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01235-8 ) to report the methodological and reporting recommendations.

To test the ARIMA, hybrid ARIMA and ANN, it is not clear if the authors conducted a sensitivity analysis.

Minor comments

In line 204, did the authors mean ARMA (Autoregressive Moving Average) model or ARIMA model? If it is the former, please expand ARMA.

I would also encourage the authors to include the data as supplementary material.

Reviewer #2: The authors developed ARIMA, Hybrid ARIMA and ANN models to predict and forecast the incidence of Tuberculosis among under 15 children. The topic is interesting, primarily focusing on paediatric Tuberculosis.

1. Since the number of data points is significantly less (120) and from the methodology, the authors used an 80/20 split for training and testing, it is unclear whether the ARIMA or hybrid ARIMA orders are determined after splitting the training and test set or using the whole dataset. If authors used the entire dataset for determining the order, they could introduce the bias in the test set. What are the measures taken to mitigate the overfitting issues?

2. It is unclear whether the authors used any cross-validation techniques in this work. Many applicable packages are available in R for utilizing the cross-validation for better performance in predicting and forecasting, especially when the data points are less.

3. Model (ARIMA (0,0,1,1,0,1,12)), Model (NNAR (1,1,2) [12]), and Hybrid ARIMA-ANN shows the test errors are higher than training errors. Even though it is common in models, the large variation indicates the overfitting of the model. Can the authors explain the measures taken to reduce the test errors?

6. PLOS authors have the option to publish the peer review history of their article ( what does this mean? ). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at gro.solp@serugif . Please note that Supporting Information files do not need this step.

Submitted filename: PDIG-D-22-00198.docx

Author response to Decision Letter 0

Submitted filename: Response to Reviewers.docx

Decision Letter 1

15 Dec 2022

PDIG-D-22-00198R1

Dear Mr. Siamba,

We are pleased to inform you that your manuscript 'Application of ARIMA, and hybrid ARIMA Models in predicting and forecasting tuberculosis incidences among children in Homa Bay and Turkana Counties, Kenya' has been provisionally accepted for publication in PLOS Digital Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team. 

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact gro.solp@htlaehlatigid .

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Digital Health.

Best regards,

***********************************************************

Dear Stephen Siamba et al.

Thank you for submitting your revised manuscript to PLOS Digital Health. Sorry about the prolonged processing time. However, as evident from the reviewers final comments, all concerns have been thoroughly and properly dealt with. Thank you. I recommend that your submission be accepted for publication.

Reviewer Comments (if any, and for reference):

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

2. Does this manuscript meet PLOS Digital Health’s publication criteria ? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

3. Has the statistical analysis been performed appropriately and rigorously?

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

5. Is the manuscript presented in an intelligible fashion and written in standard English?

6. Review Comments to the Author

Reviewer #1: Thank you for thoughtfully responding to my comments and for the excellent paper.

Reviewer #2: (No Response)

7. PLOS authors have the option to publish the peer review history of their article ( what does this mean? ). If published, this will include your full peer review and any attached files.

Reviewer #1: No

Reviewer #2: No

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

The research of ARIMA, GM(1,1), and LSTM models for prediction of TB cases in China

Contributed equally to this work with: Daren Zhao, Huiwu Zhang

Roles Conceptualization, Data curation, Formal analysis, Writing – original draft, Writing – review & editing

Affiliation Department of Medical Administration, Sichuan Provincial Orthopedics Hospital, Chengdu, Sichuan, P.R. China

Roles Conceptualization, Data curation, Writing – original draft, Writing – review & editing

Roles Formal analysis, Writing – original draft

Affiliation Department of Medical Administration, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, Sichuan, P.R. China

Affiliation Department of Medical Administration, Sichuan Cancer Hospital & Institute, Chengdu, Sichuan, P.R. China

Roles Writing – original draft

Affiliation Department of Information and Statistics, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, P.R. China

Affiliation Department of Medical Administration, Luzhou People’s Hospital, Luzhou, Sichuan, P.R. China

Roles Conceptualization, Writing – review & editing

* E-mail: [email protected]

Affiliation School of Management, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan, P.R. China

ORCID logo

  • Daren Zhao, 
  • Huiwu Zhang, 
  • Qing Cao, 
  • Zhiyi Wang, 
  • Sizhang He, 
  • Minghua Zhou, 
  • Ruihua Zhang

PLOS

  • Published: February 23, 2022
  • https://doi.org/10.1371/journal.pone.0262734
  • Reader Comments

Table 1

Background and objective

Tuberculosis (Tuberculosis, TB) is a public health problem in China, which not only endangers the population’s health but also affects economic and social development. It requires an accurate prediction analysis to help to make policymakers with early warning and provide effective precautionary measures. In this study, ARIMA, GM(1,1), and LSTM models were constructed and compared, respectively. The results showed that the LSTM was the optimal model, which can be achieved satisfactory performance for TB cases predictions in mainland China.

The data of tuberculosis cases in mainland China were extracted from the National Health Commission of the People’s Republic of China website. According to the TB data characteristics and the sample requirements, we created the ARIMA, GM(1,1), and LSTM models, which can make predictions for the prevalence trend of TB. The mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) were applied to evaluate the effects of model fitting predicting accuracy.

There were 3,021,995 tuberculosis cases in mainland China from January 2018 to December 2020. And the overall TB cases in mainland China take on a downtrend trend. We established ARIMA, GM(1,1), and LSTM models, respectively. The optimal ARIMA model is the ARIMA (0,1,0) × (0,1,0)12. The equation for GM(1,1) model was X(k+1) = -10057053.55e (-0.01k) + 10153178.55 the Mean square deviation ratio C value was 0.49, and the Small probability of error P was 0.94. LSTM model consists of an input layer, a hidden layer and an output layer, the parameters of epochs, learning rating are 60, 0.01, respectively. The MAE, RMSE, and MAPE values of LSTM model were smaller than that of GM(1,1) and ARIMA models.

Conclusions

Our findings showed that the LSTM model was the optimal model, which has a higher accuracy performance than that of ARIMA and GM (1,1) models. Its prediction results can act as a predictive tool for TB prevention measures in mainland China.

Citation: Zhao D, Zhang H, Cao Q, Wang Z, He S, Zhou M, et al. (2022) The research of ARIMA, GM(1,1), and LSTM models for prediction of TB cases in China. PLoS ONE 17(2): e0262734. https://doi.org/10.1371/journal.pone.0262734

Editor: Esteban Tlelo-Cuautle, Instituto Nacional de Astrofisica Optica y Electronica, MEXICO

Received: July 30, 2021; Accepted: January 4, 2022; Published: February 23, 2022

Copyright: © 2022 Zhao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data TB cases were taken from National Health Commission of the People’s Republic of China ( http://www.nhc.gov.cn ). Anyone meeting the requirements can gain access to them. The data were relatively uninvolved in detailed patient personal information. The authors confirm they did not have any special access privileges that other would not have.

Funding: We state that the study and the paper were financially supported by the Hospital Management Institute, the National Health Commission of the People’s Republic of China [grant no. YLZLXZ-2021-005]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors declare no conflicts of interest.

Introduction

Tuberculosis, an infectious disease caused by Mycobacterium tuberculosis, is still a major global public health problem [ 1 ]. According to the GLOBAL TUBERCULOSIS REPORT 2020 released by the World Health Organization, approximately 10 million people are reported to be infected with TB, among the HIV-negative people, with an estimated 1.2 million people died of TB, and among the HIV-positive people, 208,000 people died of TB [ 2 ]. It is classified as a class B infectious disease, and the morbidity and mortality of TB have always been among the top two in the Class A and B infectious diseases in mainland China currently.

If not treated in time, long-term illness could lead to laying a huge economic burden for patients and exerting a stronger influence on social development. There are 30 high TB burden countries account for almost 90% of those who fall sick with TB each year [ 2 ]. China has the second largest burden of TB in the world with huge health and economic losses [ 3 ].

According to the data from the China Health Statistics Yearbook, the TB mortality rate of China shows an ascendant trend although the TB incidence has been declining for the past few years. Therefore, the prevention and control of TB is still the focus of current research. Meanwhile, a few TB patients have experienced interruptions to their treatment schedules because of the COVID-19 pandemic threatens. Consequently, it is important to realize the early warning in infectious disease surveillance. It is of great significance to create an accurate prediction model for the morbidity of TB and then predict the future epidemic situation, which can provide a basis for scientific guidance on its control and prevention [ 4 ].

There are several mathematical methods for infectious disease prediction at present. Through a literature review, the infectious disease prediction model is mainly classified into two categories [ 5 , 6 ]: the traditional mathematical forecasting model and the machine-learning based forecasting models. The traditional mathematical forecasting models consist of Autoregressive Integrated Moving Average model (ARIMA) [ 7 ], Exponential Smoothing model [ 8 ], Regression forecast model [ 9 ], Grey Markov forecasting model [ 10 ], and GM (1,1) model [ 11 ] and so on, and the machine-learning based forecasting models, such as BP artificial neural network model [ 12 ], Multivariate Adaptive Regression Splines (MARS) [ 13 ], Support Vector Machine (SVM) [ 14 ], and Long Short-Term Memory (LSTM) [ 15 ], etc.

However, different models are suitable for different data characteristics [ 8 ]. So there is no doubt that according to the data characteristics and the sample requirements to construct the optimized prediction model, which was a precondition for obtaining accurate prediction performance. Recent studies show that the traditional mathematical forecasting models have performed well in infectious disease prediction. Zheng et al. [ 4 ] revealed that ARIMA models are an important tool for infectious disease prediction, and prediction results can help set public health planning by the government in Xinjiang, China. Ilie et al. [ 16 ] demonstrated that ARIMA time-series models have been successfully applied in the overall prevalence of COVID-19 in Romania. Wang et al. [ 17 ] used ARIMA and GM(1,1) models to predict hepatitis B in China, and the ARIMA model achieved better results than GM(1,1) model. Guo et al. [ 18 ] used GM(1,1) and novel SMGM(1,1) models to predict dysentery and gonorrhea in China. Despite the traditional mathematical forecasting models have better ability in infectious disease prediction, these models can not extract nonlinear relationships in time series [ 15 ]. However, the machine-learning based forecasting models have good performance in handling nonlinear relationships in time series without any limitations [ 15 ].

Yet to date, no researchers have applied the ARIMA, GM(1,1), and LSTM models to predict TB cases in mainland China. Under such a background, the present researcher tried to use the ARIMA, GM(1,1), and LSTM models for prediction of TB cases in mainland China. The TB cases data were obtained from the National Health Commission of the People’s Republic of China website, based on the data characteristics and the sample requirements, we created the ARIMA, GM(1,1), and LSTM models, respectively. In order to achieve more accurate prediction results, three models were compared by MAE, RMSE and MAPE, thus, we find the optimized prediction model to predict the epidemic trend of TB cases from January to December 2021 in mainland China.

Data source

All data TB cases were taken from the National Health Commission of the People’s Republic of China ( http://www.nhc.gov.cn/ ), and all TB cases had been laboratory confirmed. In China, TB is classified as a class B infectious disease, and the hospital physicians must report every confirmed TB case information to the local health authority within 24 hours by national Network report of infectious disease [ 4 ], the final report to the National Center for Disease Control and Prevention. The data of TB cases only include in mainland China.

We collected monthly TB cases data from January 2018 to December 2020 with 36 samples in our study. The data can also be divided into a training set, a validation set, and a test set. The training set and the validation set were the same, that was made up of the TB cases from January 2018 to December 2020, which was used to build and compare the ARIMA, GM(1,1) and LSTM models respectively, and the test set constituted the TB cases from January to December in 2021, which was used to assess the performance of prediction results in the future trend of TB cases in mainland China.

Model descriptions

Arima model..

ARIMA model was proposed by Box and Jenkins in the early 1970s, so it is also called Box-Jenkins model and Box-Jenkins method [ 19 ]. It is the most commonly used prediction techniques in the evaluation and monitoring epidemiological surveillance [ 20 – 26 ]. The ARIMA model consists of auto regressive (AR) model, moving average (MA) model, seasonal auto regressive integrated moving average (SARIMA) model and etc [ 17 , 27 ]. If the data of research showed evidence of seasonal tendency, the seasonal auto regressive integrated moving average (SARIMA) model should be used [ 28 ].

In general, the ARIMA model can be explained as ARIMA (p,d,q),×(P,D,Q), where p represents the order of auto-regression, d represents the degree of trend difference, q represents the order of moving average, and the P represents the seasonal auto regression lag, D represents the degree of seasonal difference, Q represents the seasonal moving average, s represents the length of the cyclical pattern [ 29 ].

There are several steps to construct the ARIMA model, which mainly contains four steps: sequence stationary, model identification, estimation and diagnosis, and model prediction and evaluation.

The first step is sequence stationary . If the sequence shows non-stationary time series, the stationary time series should be transformed by differenceing processes [ 30 ]. The original sequence diagram could help suggesting whether the sequence is stationary or not. In order to achieve the stationary time series, differences and Log transformation can be processed by statistical software.

The second step is model identification . The preliminary judgment and estimation of ARIMA model parameters are found to be dependent on the auto-correlation function (ACF) and partial auto-correlation function (PACF) graphs. And then, the candidate ARIMA model parameters can be determined by both skills and experiences, observing the auto-correlation function (ACF) and partial auto-correlation function (PACF) graphs.

The third step is estimation and diagnosis . The candidate ARIMA models evaluated by the diagnostic checking of residuals with Ljung-Box (Q) test [ 31 ], which requires the residual error must be random (significant level p>0.05). If the Q -statistics is less than 0.8, the tentative model is inadequate [ 4 ]. In other words, the model should be fitted again. Furthermore, the optimal model was determined by the lowest the Bayesian information criterion of Schwarz (BIC) values and its residual was white noise (significant level p>0.05).

The forth step is model prediction and evaluation . The optimal model was applied to predict TB cases from January 2018 to December 2020. And the prediction power were evaluated by comparing the predicted values with actual values [ 17 ].

GM(1,1) model.

GM(1,1) model was proposed by Deng J. L, and it is described a novel mathematical prediction system that can be used to process data which some information is known, unknown and/or incomplete [ 32 ]. The steps of the GM(1,1) model are [ 33 – 37 ]:

research paper on arima model

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0262734.t001

thumbnail

https://doi.org/10.1371/journal.pone.0262734.t002

LSTM model.

research paper on arima model

https://doi.org/10.1371/journal.pone.0262734.g001

Performance measures

research paper on arima model

Data processing and analysis

The data of this study was recorded with EXCEL 2010, and the ARIMA was performed using SPSS23.0 software, and Matlab 2020b was adopted to construct the GM(1,1) and LSTM models. Significant level is 0.05.

Trends of TB cases in mainland China

There were 3,021,995 TB cases in mainland China from January 2018 to December 2020. As is shown in Fig 2 , the overall TB cases data in mainland China take on a downtrend trend. In a year, the monthly TB cases data presents lowest level in January and February, whereas March and April, in contrast, has the highest level. Simultaneously, it may be observed that the monthly TB cases data has roughly seasonal fluctuations.

thumbnail

https://doi.org/10.1371/journal.pone.0262734.g002

ARIMA model

In our study, the SARIMA was selected due to that the data of TB cases has roughly seasonal fluctuations. The basic optical requirement of the ARIMA model is stationary data. As is shown in Fig 3 , the original sequence shows non-stationary time series, so the trend difference and the seasonal difference were processed in order to eliminate data instabilities. After a first-order difference and a first-order seasonal difference ( Fig 4 ), the time series were stationary, thus the parameters d and D were 1, respectively. To confirm the estimation of ARIMA model parameters of q , p , Q , and P , the auto-correlation function (ACF) and partial auto-correlation function (PACF) graphs were performed.

thumbnail

https://doi.org/10.1371/journal.pone.0262734.g003

thumbnail

https://doi.org/10.1371/journal.pone.0262734.g004

Both ACF and PACF graphs showed that the sequence presented high seasonal characteristics with a circle of 12, indicating the parameter of s was 12 (s = 12). The ACF graph showed that after a first-order difference ( Fig 5 ), the values of auto-correlation coefficients did not exceed ± 2 times the estimated standard deviation range, and the peak was at lag 0, which was initially identified as the parameters of q was 0, p was 0 or 1. Similarly, the PACF graph was represented after a first-order seasonal difference ( Fig 6 ), the partial correlation coefficient did not exceed the range of ± 2 times the estimated standard deviation, and it is initially identified that P was 0, Q was 0, or 1.

thumbnail

https://doi.org/10.1371/journal.pone.0262734.g005

thumbnail

https://doi.org/10.1371/journal.pone.0262734.g006

Further, the candidate ARIMA models were based on both the preliminary parameters values and the modeling estimation and diagnosis, which were selected as ARIMA (0,1,0) × (0,1,1) 12 , ARIMA (0,1,0) × (0,1,0) 12 , ARIMA (1,1,0) × (0,1,0)12 and ARIMA (1,1,0) × (0,1,1) 12 . The results are shown in Table 3 .

thumbnail

https://doi.org/10.1371/journal.pone.0262734.t003

By the diagnostic checking of residuals with the Ljung-Box (Q) test, we find that the Q-statistics of four models were all bigger than 0.8 and the p -values were all bigger than 0.05 ( Table 3 ), indicating that these models are all adequate and the white noise sequence. To obtain the optimal model, the lowest BIC values were observed in four models. As a result, the ARIMA (0,1,0) × (0,1,0) 12 was the optimal model, whose BIC value was the lowest (17.79) and its residual was white noise sequence, the Q-statistics was 12.62 and p -value was 0.81 ( Table 3 ).

GM(1,1) model

The results show that the parameters of model a was 0.01 and u were 99740, the equation for the GM(1,1) model as follows: X(k+1) = -10057053.55e (-0.01k) +10153178.55, the Mean square deviation ratio C value was 0.49, and the Small probability of error P was 0.94.

According to the results in ( Table 1 ), it can ensure that the GM(1,1) model we established is Level 2 (Qualified), and it can be adopted in extrapolated prediction. Besides, the GM(1,1) model parameter of a was 0.01, and - a ≤0.3, which demonstrated the Medium- and long-term prediction can be predicted.

In this section, we build an LSTM model, which consists of three parts, that is, an input layer, a hidden layer, and an output layer. In the LSTM modeling process, the TB cases data from January 2018 to December 2020 were divide into two parts, 2/3 percent of which is the training set and the rest (1/3) is the test set. We set epochs, learning rating with 60, 0.01, respectively. The loss function uses mean_squared_error, and the optimizer uses Adam. The Look_back is set 12 to find the optimal situation of the current network structure. The results are shown in Table 4 .

thumbnail

https://doi.org/10.1371/journal.pone.0262734.t004

Model comparison

The only 23 values were compared with ARIMA, GM(1,1), and LSTM models, on account of its differencing and seasonal differencing of the ARIMA model, which caused the first 13 values to be lost in the validation set.

In this section, the various statistical tools, such as MAE, RMSE, and MAPE, which used to evaluate the fitting and predicting accuracy. Both the actual values and the predicted values were put them into the formula (15), (16), (17), and then results of MAE, RMSE and MAPE were by calculation as shown in Table 5 .

thumbnail

https://doi.org/10.1371/journal.pone.0262734.t005

We could see that the MAE, RMSE, and MAPE of LSTM model values were smaller than that of GM(1,1) and ARIMA (0,1,0) × (0,1,0) 12 model. Additionally, the Fig 7 showed that the predicted value obtained by LSTM can better fit the actual value change trend. Therefore, the LSTM model was the most optimal model, which was more suitable to predict the future trend of TB cases of the mainland in China.

thumbnail

https://doi.org/10.1371/journal.pone.0262734.g007

The LSTM model was applied to predict the trend of TB cases in mainland China from January to December in 2021. Predictions results is shown in Table 6 .

thumbnail

https://doi.org/10.1371/journal.pone.0262734.t006

Although TB can be treated effectively, the morbidity and mortality of TB have always been maintained a relative growth trend [ 4 , 44 ]. It is still a major global public health problem, which poses a serious danger to human health and the development of the society and economy in the world. However, based on a large-scale population and with the rapid development of society and economy in China, the prevention and control of TB is still an important public issue and with difficult challenges. So it still regarded as an important topic in the field of public health in mainland China.

Scientific prediction and analysis of the morbidity and mortality of TB can provide suggestions in public health planning and set the proper policy to adopt effective interventions for this infectious disease [ 45 ]. In the field of Disease Surveillance, prediction of the morbidity and mortality infection disease is one of the most important works with higher priorities in public health in China [ 32 ]. Upon the above research background, the purpose of this study is to explore the optimized prediction model to predict the epidemic trend of TB cases and propose its prevention and control in mainland China.

Studies have shown that each type of infectious disease prediction model has distinct advantages and disadvantages [ 46 ]. It is therefore critical that according to the data characteristics and the sample requirements to create the most suitable prediction model for research objectives. ARIMA is applicable to time series with characteristics of seasonality and periodicity [ 47 ]. The GM(1,1) has no special requirements for research data and is also applied in small sample sizes with uncertain time series predictions [ 17 ]. While the LSTM model is more suitable for time series with missing values and where there may be a lag of unknown duration [ 42 ]. Yet, according to characteristics of the data the sample requirements to construct prediction models is also a precondition for obtaining accurate research results. In our study, TB cases data with 36 samples from January 2018 to December 2020 in mainland China presents characteristics of the seasonality and periodicity, which are fully meet the requirements of ARIMA, GM(1,1), and LSTM models. Therefore, in terms of prediction technique we selected is correct and reasonable.

Further analysis, with the advantage of its structured modeling basis and acceptable forecasting performance, the ARIMA model is the most commonly used technique in time series prediction analysis [ 4 ]. Besides, the ARIMA model also can be taken various influencing factors of infection as well as the complicated interactions into consideration in the modeling process [ 47 ]. In this study, the seasonal auto regressive integrated moving average (SARIMA) model was applied to predict TB cases, due to the fact that the research data showed evidence of seasonal tendency.

GM(1,1) model was proposed by Deng J. L in 1982, which is widely used in the field of population [ 48 ], economic [ 49 ], environment [ 50 ], power industry [ 51 ], medicine [ 11 ], etc. This theory is that a system can be defined with a color that represents the amount of clear information about that system [ 11 ]. If the information is entirely unknown, it is called a black system. If on the contrary, it is called a white system. Moreover, if some information is known, unknown and/or incomplete, it is called a grey system. In reality, every system can be viewed as a grey system because that its information is known, unknown and/or incomplete. Provided that there are four sets of data, the model can be fitted. So in this paper, the prediction of TB cases can be seen as a grey system, and the monthly TB cases data from January 2018 to December 2020 with 36 samples can be regarded as known information that seeks its changing laws to predict the future state of TB cases.

LSTM model has been widely used to predict infectious disease incidence, such as HIV [ 15 ], hepatitis E [ 52 ], Dengue fever [ 53 ], hand, foot and mouth disease (HFMD) [ 54 ], COVID-19 [ 39 ], and so on. It is focused on resolving the issues of vanishing gradient [ 52 ]. LSTM model has a strong nonlinear mapping capability, which is applicable to process complex issues with long-term dependencies [ 55 ]. It can train satisfactory network models by learning from a given sample of data and is suitable for cases where sequence laws are very long time lags of unknown size [ 15 ]. Therefore, the LSTM model has a strong capability to address various prediction issues, especially infectious disease prediction.

Model comparison results showed that the MAE, RMSE, and MAPE of LSTM model values were smaller than that of GM(1,1) and ARIMA models. We can deduce that the fitting and predicting performance of the LSTM model was better than that of the ARIMA and GM(1,1) models. The reasons are as follows: first, GM(1,1) and ARIMA models have their disadvantages, for example, they can not extract nonlinear relationships in time series. Second, unlike the other machine-learning based forecasting models, the LSTM model overcomes the shortcomings in vanishing gradient during the training process. Third, the LSTM model is more tolerant to the data and is less vulnerable to model misspecification issues than other time series prediction models. Meanwhile, compared with the GM(1,1) and ARIMA models, it is a more effective deep learning model for continuous data. Thus, the LSTM model has good performance in prediction for infectious disease.

As a result, the LSTM model was applied to predict the TB cases in mainland China from January to December in 2021. The study showed that the predicted values of TB cases present fluctuating and the overall trend is decreasing. However, in practice, the morbidity of TB is the comprehensive effect of manifold causes. The COVID-19 pandemic could affect TB control programs worldwide due to the impairing TB diagnosis and treatment [ 56 , 57 ], China is no exception. Moreover, although the prevention and control of the COVID-19 pandemic is more effective in China, it also can present great challenges for the medical service system and medical care capacity. Therefore, it must be noted that we still can not ignore the occurrence of an outbreak of TB cases in mainland China in the future.

To the best of our knowledge, this is the first study to construct the ARIMA, GM(1,1), and LSTM models for the prediction of TB cases in mainland China at present. The results showed that the accuracy performance of the LSTM model was higher than that of ARIMA and GM (1,1) models, and the LSTM model can give a credible predictions for TB control and prevention.

However, there are several limitations in this study. One of the limitations is that although the LSTM model has a strong nonlinear mapping capability, the social, cultural, economic, and other factors can not be taken into account in the modeling process [ 47 ]. Another limitation is that the prediction results of the LSTM model can act as predictive tools for TB control and prevention, but it should not be used as an exclusive policy-making reference frame due to unexpected emergencies, for example, the COVID-19 pandemic. Since COVID-19 occurred at the end of 2019 in China, missing reports on TB cases are possible to occur in 2020. It results in some inaccuracies in predictions of TB cases to some extent. Therefore, in further work, we will take the influencing factors of TB incidence into the prediction model and continually update data of TB cases so that we can achieve a more suitable and accurate model for its control and prediction.

In this study, we collected the TB cases from January 2018 to December 2020 in mainland China. Based on the data characteristics and the sample requirements, the ARIMA, GM(1,1), and LSTM models were constructed and compared. The fitting and predicting performance of the LSTM model was better than that of the ARIMA and GM (1,1) models. The results of this study can provide policy advice by the government in mainland China.

Supporting information

S1 file. the monthly tb cases data from january 2018 to december 2020..

https://doi.org/10.1371/journal.pone.0262734.s001

https://doi.org/10.1371/journal.pone.0262734.s002

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 2. WHO. Global tuberculosis report; 2020. [cited 30.07.2021] http://www.who.int/tb/publications/global_report/en/ .
  • 48. Lu C, Hao Y, Wang X. World population projections using metabolic GM (1,1) model. IEEE International Conference on Grey Systems and Intelligent Services. 2007. pp. 453–457.
  • Comprehensive Learning Paths
  • 150+ Hours of Videos
  • Complete Access to Jupyter notebooks, Datasets, References.

Rating

Machine Learning

  • Time Series

ARIMA Model – Complete Guide to Time Series Forecasting in Python

  • August 22, 2021
  • Selva Prabhakaran

Using ARIMA model, you can forecast a time series using the series past values. In this post, we build an optimal ARIMA model from scratch and extend it to Seasonal ARIMA (SARIMA) and SARIMAX models. You will also see how to build autoarima models in python

research paper on arima model

  • Introduction to Time Series Forecasting
  • Introduction to ARIMA Models
  • What does the p, d and q in ARIMA model mean?
  • What are AR and MA models
  • How to find the order of differencing (d) in ARIMA model
  • How to find the order of the AR term (p)
  • How to find the order of the MA term (q)
  • How to handle if a time series is slightly under or over differenced
  • How to build the ARIMA Model
  • How to do find the optimal ARIMA model manually using Out-of-Time Cross validation
  • Accuracy Metrics for Time Series Forecast
  • How to do Auto Arima Forecast in Python
  • How to interpret the residual plots in ARIMA model
  • How to automatically build SARIMA model in python
  • How to build SARIMAX Model with exogenous variable
  • Practice Exercises

1. Introduction to Time Series Forecasting

A time series is a sequence where a metric is recorded over regular time intervals.

Depending on the frequency, a time series can be of yearly (ex: annual budget), quarterly (ex: expenses), monthly (ex: air traffic), weekly (ex: sales qty), daily (ex: weather), hourly (ex: stocks price), minutes (ex: inbound calls in a call canter) and even seconds wise (ex: web traffic).

We have already seen the steps involved in a previous post on Time Series Analysis . If you haven’t read it, I highly encourage you to do so.

Forecasting is the next step where you want to predict the future values the series is going to take.

But why forecast?

Because, forecasting a time series (like demand and sales) is often of tremendous commercial value.

In most manufacturing companies, it drives the fundamental business planning, procurement and production activities. Any errors in the forecasts will ripple down throughout the supply chain or any business context for that matter. So it’s important to get the forecasts accurate in order to save on costs and is critical to success.

Not just in manufacturing, the techniques and concepts behind time series forecasting are applicable in any business.

Now forecasting a time series can be broadly divided into two types.

If you use only the previous values of the time series to predict its future values, it is called Univariate Time Series Forecasting .

And if you use predictors other than the series (a.k.a exogenous variables) to forecast it is called Multi Variate Time Series Forecasting .

This post focuses on a particular type of forecasting method called ARIMA modeling.

ARIMA, short for ‘AutoRegressive Integrated Moving Average’, is a forecasting algorithm based on the idea that the information in the past values of the time series can alone be used to predict the future values.

research paper on arima model

2. Introduction to ARIMA Models

So what exactly is an ARIMA model?

ARIMA, short for ‘Auto Regressive Integrated Moving Average’ is actually a class of models that ‘explains’ a given time series based on its own past values, that is, its own lags and the lagged forecast errors, so that equation can be used to forecast future values.

Any ‘non-seasonal’ time series that exhibits patterns and is not a random white noise can be modeled with ARIMA models.

An ARIMA model is characterized by 3 terms: p, d, q

p is the order of the AR term

q is the order of the MA term

d is the number of differencing required to make the time series stationary

If a time series, has seasonal patterns, then you need to add seasonal terms and it becomes SARIMA, short for ‘Seasonal ARIMA’. More on that once we finish ARIMA.

So, what does the ‘order of AR term’ even mean? Before we go there, let’s first look at the ‘d’ term.

3. What does the p, d and q in ARIMA model mean?

The first step to build an ARIMA model is to make the time series stationary.

Because, term ‘Auto Regressive’ in ARIMA means it is a linear regression model that uses its own lags as predictors. Linear regression models, as you know, work best when the predictors are not correlated and are independent of each other.

So how to make a series stationary?

The most common approach is to difference it. That is, subtract the previous value from the current value. Sometimes, depending on the complexity of the series, more than one differencing may be needed.

The value of d, therefore, is the minimum number of differencing needed to make the series stationary. And if the time series is already stationary, then d = 0.

Next, what are the ‘p’ and ‘q’ terms?

‘p’ is the order of the ‘Auto Regressive’ (AR) term. It refers to the number of lags of Y to be used as predictors. And ‘q’ is the order of the ‘Moving Average’ (MA) term. It refers to the number of lagged forecast errors that should go into the ARIMA Model.

Want to get hands-on experience on Time Series Forecasting Project? Join MLPlus university and try the exhaustive Restaurant Visitor Forecasting Project Course. Get proficient in implementing multiple forecasting strategies using ARIMA and other time series algorithms to solve a real world forecasting problem.

4. What are AR and MA models?

So what are AR and MA models? what is the actual mathematical formula for the AR and MA models?

A pure Auto Regressive (AR only) model is one where Yt depends only on its own lags. That is, Yt is a function of the ‘lags of Yt’. 

research paper on arima model

where, Y{t-1} is the lag1 of the series, beta1 is the coefficient of lag1 that the model estimates and `alpha` is the intercept term, also estimated by the model.

Likewise a pure Moving Average (MA only) model is one where Yt depends only on the lagged forecast errors.

research paper on arima model

where the error terms are the errors of the autoregressive models of the respective lags. The errors Et and E(t-1) are the errors from the following equations :

research paper on arima model

That was AR and MA models respectively.

So what does the equation of an ARIMA model look like?

An ARIMA model is one where the time series was differenced at least once to make it stationary and you combine the AR and the MA terms. So the equation becomes:

research paper on arima model

ARIMA model in words:

Predicted Yt = Constant + Linear combination Lags of Y (upto p lags) + Linear Combination of Lagged forecast errors (upto q lags)

The objective, therefore, is to identify the values of p, d and q. But how?

Let’s start with finding the ‘d’.

5. How to find the order of differencing (d) in ARIMA model

The purpose of differencing it to make the time series stationary.

But you need to be careful to not over-difference the series. Because, an over differenced series may still be stationary, which in turn will affect the model parameters.

So how to determine the right order of differencing?

The right order of differencing is the minimum differencing required to get a near-stationary series which roams around a defined mean and the ACF plot reaches to zero fairly quick.

If the autocorrelations are positive for many number of lags (10 or more), then the series needs further differencing. On the other hand, if the lag 1 autocorrelation itself is too negative, then the series is probably over-differenced.

In the event, you can’t really decide between two orders of differencing, then go with the order that gives the least standard deviation in the differenced series.

Let’s see how to do it with an example.

First, I am going to check if the series is stationary using the Augmented Dickey Fuller test ( adfuller() ), from the statsmodels package.

Because, you need differencing only if the series is non-stationary. Else, no differencing is needed, that is, d=0.

The null hypothesis of the ADF test is that the time series is non-stationary. So, if the p-value of the test is less than the significance level (0.05) then you reject the null hypothesis and infer that the time series is indeed stationary.

So, in our case, if P Value > 0.05 we go ahead with finding the order of differencing.

Since P-value is greater than the significance level, let’s difference the series and see how the autocorrelation plot looks like.

Order of Differencing

For the above series, the time series reaches stationarity with two orders of differencing. But on looking at the autocorrelation plot for the 2nd differencing the lag goes into the far negative zone fairly quick, which indicates, the series might have been over differenced.

So, I am going to tentatively fix the order of differencing as 1 even though the series is not perfectly stationary (weak stationarity).

6. How to find the order of the AR term (p)

The next step is to identify if the model needs any AR terms. You can find out the required number of AR terms by inspecting the Partial Autocorrelation (PACF) plot.

But what is PACF?

Partial autocorrelation can be imagined as the correlation between the series and its lag, after excluding the contributions from the intermediate lags. So, PACF sort of conveys the pure correlation between a lag and the series. That way, you will know if that lag is needed in the AR term or not.

So what is the formula for PACF mathematically?

Partial autocorrelation of lag (k) of a series is the coefficient of that lag in the autoregression equation of Y.

Autoregression equation

That is, suppose, if Y_t is the current series and Y_t-1 is the lag 1 of Y , then the partial autocorrelation of lag 3 ( Y_t-3 ) is the coefficient $\alpha_3$ of Y_t-3 in the above equation.

Good. Now, how to find the number of AR terms?

Any autocorrelation in a stationarized series can be rectified by adding enough AR terms. So, we initially take the order of AR term to be equal to as many lags that crosses the significance limit in the PACF plot.

Order of AR Term

You can observe that the PACF lag 1 is quite significant since is well above the significance line. Lag 2 turns out to be significant as well, slightly managing to cross the significance limit (blue region). But I am going to be conservative and tentatively fix the p as 1.

7. How to find the order of the MA term (q)

Just like how we looked at the PACF plot for the number of AR terms, you can look at the ACF plot for the number of MA terms. An MA term is technically, the error of the lagged forecast.

The ACF tells how many MA terms are required to remove any autocorrelation in the stationarized series.

Let’s see the autocorrelation plot of the differenced series.

Order of MA Term

Couple of lags are well above the significance line. So, let’s tentatively fix q as 2. When in doubt, go with the simpler model that sufficiently explains the Y.

8. How to handle if a time series is slightly under or over differenced

It may so happen that your series is slightly under differenced, that differencing it one more time makes it slightly over-differenced.

How to handle this case?

If your series is slightly under differenced, adding one or more additional AR terms usually makes it up. Likewise, if it is slightly over-differenced, try adding an additional MA term.

9. How to build the ARIMA Model

Now that you’ve determined the values of p, d and q, you have everything needed to fit the ARIMA model. Let’s use the ARIMA() implementation in statsmodels package.  (** You can also check out the free video lesson on forecasting restaurant visitors with ARIMA and then check how to test and improve the model )

The model summary reveals a lot of information. The table in the middle is the coefficients table where the values under ‘coef’ are the weights of the respective terms.

Notice here the coefficient of the MA2 term is close to zero and the P-Value in ‘P>|z|’ column is highly insignificant. It should ideally be less than 0.05 for the respective X to be significant.

So, let’s rebuild the model without the MA2 term.

The model AIC has reduced, which is good. The P Values of the AR1 and MA1 terms have improved and are highly significant (<< 0.05).

Let’s plot the residuals to ensure there are no patterns (that is, look for constant mean and variance).

Residuals Density

The residual errors seem fine with near zero mean and uniform variance. Let’s plot the actuals against the fitted values using plot_predict() .

Actual vs Fitted

When you set dynamic=False the in-sample lagged values are used for prediction.

That is, the model gets trained up until the previous value to make the next prediction. This can make the fitted forecast and actuals look artificially good.

So, we seem to have a decent ARIMA model. But is that the best?

Can’t say that at this point because we haven’t actually forecasted into the future and compared the forecast with the actual performance.

So, the real validation you need now is the Out-of-Time cross-validation.

10. How to do find the optimal ARIMA model manually using Out-of-Time Cross validation

In Out-of-Time cross-validation, you take few steps back in time and forecast into the future to as many steps you took back. Then you compare the forecast against the actuals.

To do out-of-time cross-validation, you need to create the training and testing dataset by splitting the time series into 2 contiguous parts in approximately 75:25 ratio or a reasonable proportion based on time frequency of series.

Why am I not sampling the training data randomly you ask?

That’s because the order sequence of the time series should be intact in order to use it for forecasting.

You can now build the ARIMA model on training dataset, forecast and plot it.

Forecast vs Actuals

From the chart, the ARIMA(1,1,1) model seems to give a directionally correct forecast. And the actual observed values lie within the 95% confidence band. That seems fine.

But each of the predicted forecasts is consistently below the actuals. That means, by adding a small constant to our forecast, the accuracy will certainly improve. So, there is definitely scope for improvement.

So, what I am going to do is to increase the order of differencing to two, that is set d=2 and iteratively increase p to up to 5 and then q up to 5 to see which model gives least AIC and also look for a chart that gives closer actuals and forecasts.

While doing this, I keep an eye on the P values of the AR and MA terms in the model summary. They should be as close to zero, ideally, less than 0.05.

Revised Forecast vs Actuals

The AIC has reduced to 440 from 515. Good. The P-values of the X terms are less the < 0.05, which is great.

So overall it’s much better.

Ideally, you should go back multiple points in time, like, go back 1, 2, 3 and 4 quarters and see how your forecasts are performing at various points in the year.

Here’s a great practice exercise: Try to go back 27, 30, 33, 36 data points and see how the forcasts performs. The forecast performance can be judged using various accuracy metrics discussed next.

11. Accuracy Metrics for Time Series Forecast

The commonly used accuracy metrics to judge forecasts are:

  • Mean Absolute Percentage Error (MAPE)
  • Mean Error (ME)
  • Mean Absolute Error (MAE)
  • Mean Percentage Error (MPE)
  • Root Mean Squared Error (RMSE)
  • Lag 1 Autocorrelation of Error (ACF1)
  • Correlation between the Actual and the Forecast (corr)
  • Min-Max Error (minmax)

Typically, if you are comparing forecasts of two different series, the MAPE, Correlation and Min-Max Error can be used.

Why not use the other metrics?

Because only the above three are percentage errors that vary between 0 and 1. That way, you can judge how good is the forecast irrespective of the scale of the series.

The other error metrics are quantities. That implies, an RMSE of 100 for a series whose mean is in 1000’s is better than an RMSE of 5 for series in 10’s. So, you can’t really use them to compare the forecasts of two different scaled time series.

Around 2.2% MAPE implies the model is about 97.8% accurate in predicting the next 15 observations.

Now you know how to build an ARIMA model manually.

But in industrial situations, you will be given a lot of time series to be forecasted and the forecasting exercise be repeated regularly.

So we need a way to automate the best model selection process.

12. How to do Auto Arima Forecast in Python

Like R’s popular auto.arima() function, the pmdarima package provides auto_arima() with similar functionality.

auto_arima() uses a stepwise approach to search multiple combinations of p,d,q parameters and chooses the best model that has the least AIC.

13. How to interpret the residual plots in ARIMA model

Let’s review the residual plots using stepwise_fit.

Residuals Chart

So how to interpret the plot diagnostics?

Top left: The residual errors seem to fluctuate around a mean of zero and have a uniform variance.

Top Right: The density plot suggest normal distribution with mean zero.

Bottom left: All the dots should fall perfectly in line with the red line. Any significant deviations would imply the distribution is skewed.

Bottom Right: The Correlogram, aka, ACF plot shows the residual errors are not autocorrelated. Any autocorrelation would imply that there is some pattern in the residual errors which are not explained in the model. So you will need to look for more X’s (predictors) to the model.

Overall, it seems to be a good fit. Let’s forecast.

Final Forecast of WWW Usage

14. How to automatically build SARIMA model in python

The problem with plain ARIMA model is it does not support seasonality.

If your time series has defined seasonality, then, go for SARIMA which uses seasonal differencing.

Seasonal differencing is similar to regular differencing, but, instead of subtracting consecutive terms, you subtract the value from previous season.

So, the model will be represented as SARIMA(p,d,q)x(P,D,Q), where, P, D and Q are SAR, order of seasonal differencing and SMA terms respectively and 'x' is the frequency of the time series.

If your model has well defined seasonal patterns, then enforce D=1 for a given frequency ‘x’.

Here’s some practical advice on building SARIMA model:

As a general rule, set the model parameters such that D never exceeds one. And the total differencing ‘d + D’ never exceeds 2. Try to keep only either SAR or SMA terms if your model has seasonal components.

Let’s build an SARIMA model on 'a10' – the drug sales dataset.

Seasonal Differencing

As you can clearly see, the seasonal spikes is intact after applying usual differencing (lag 1). Whereas, it is rectified after seasonal differencing.

Let’s build the SARIMA model using pmdarima ‘s auto_arima() . To do that, you need to set seasonal=True , set the frequency m=12 for month wise series and enforce D=1 .

research paper on arima model

The model has estimated the AIC and the P values of the coefficients look significant. Let’s look at the residual diagnostics plot.

The best model SARIMAX(3, 0, 0)x(0, 1, 1, 12) has an AIC of 528.6 and the P Values are significant.

Let’s forecast for the next 24 months.

SARIMA - Final Forecasts

There you have a nice forecast that captures the expected seasonal demand pattern.

15. How to build SARIMAX Model with exogenous variable

The SARIMA model we built is good. I would stop here typically.

But for the sake of completeness, let’s try and force an external predictor, also called, ‘exogenous variable’ into the model. This model is called the SARIMAX model.

The only requirement to use an exogenous variable is you need to know the value of the variable during the forecast period as well.

For the sake of demonstration, I am going to use the seasonal index from the classical seasonal decomposition on the latest 36 months of data.

Why the seasonal index? Isn’t SARIMA already modeling the seasonality, you ask?

You are correct.

But also, I want to see how the model looks if we force the recent seasonality pattern into the training and forecast.

Secondly, this is a good variable for demo purpose. So you can use this as a template and plug in any of your variables into the code. The seasonal index is a good exogenous variable because it repeats every frequency cycle, 12 months in this case.

So, you will always know what values the seasonal index will hold for the future forecasts.

Let’s compute the seasonal index so that it can be forced as a (exogenous) predictor to the SARIMAX model.

The exogenous variable (seasonal index) is ready. Let’s build the SARIMAX model.

research paper on arima model

So, we have the model with the exogenous term. But the coefficient is very small for x1 , so the contribution from that variable will be negligible. Let’s forecast it anyway.

We have effectively forced the latest seasonal effect of the latest 3 years into the model instead of the entire history.

Alright let’s forecast into the next 24 months. For this, you need the value of the seasonal index for the next 24 months.

SARIMAX Forecast

16. Practice Exercises

In the AirPassengers dataset, go back 12 months in time and build the SARIMA forecast for the next 12 months.

  • Is the series stationary? If not what sort of differencing is required?
  • What is the order of your best model?
  • What is the AIC of your model?
  • What is the MAPE achieved in OOT cross-validation?
  • What is the order of the best model predicted by auto_arima() method?

17. Conclusion

Congrats if you reached this point. Give yourself a BIG hug if you were able to solve the practice exercises.

I really hope you found this useful?

We have covered a lot of concepts starting from the very basics of forecasting, AR, MA, ARIMA, SARIMA and finally the SARIMAX model. If you have any questions please write in the comments section. Meanwhile, I will work on the next article.

Happy Learning!

Hands-on implementation on real project: Learn how to implement ARIMA using multiple strategies and multiple other time series models in my Restaurant Visitor Forecasting Course

More Articles

Granger causality test in python, granger causality test, augmented dickey fuller test (adf test) – must read guide, kpss test for stationarity, vector autoregression (var) – comprehensive guide with examples in python, similar articles, complete introduction to linear regression in r, how to implement common statistical significance tests and find the p value, logistic regression – a complete tutorial with examples in r.

Subscribe to Machine Learning Plus for high value data science content

© Machinelearningplus. All rights reserved.

research paper on arima model

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free sample videos:.

research paper on arima model

Combined BiLSTM and ARIMA models in middle- and long-term polar motion prediction

  • Published: 08 April 2024

Cite this article

  • Kehao Yu 1 ,
  • Haowei Shi 1 ,
  • Mengqi Sun 1 ,
  • Lihua Li 1 , 2 ,
  • Shuhui Li 1 ,
  • Honglei Yang 1 &
  • Erhu Wei 3  

26 Accesses

Explore all metrics

As one of the main components of the Earth orientation parameters, short-term prediction of the geodetic polar motion series is crucial in the field of deep-space exploration, high-precision positioning, and timing services, which require high real-time performance. Additionally, its middle- and long-term prediction is equally important in climate forecasting and geodynamics research. In this study, we propose the combined BiLSTM+ARIMA model, which is based on bidirectional long- and short-term memory (BiLSTM) and autoregression integrated moving average (ARIMA). First, ensemble empirical mode decomposition (EEMD) is performed as a filter to decompose the polar motion time series to obtain low- and high-frequency signals. The EOP14 C04 time series provided by International Earth Rotation and Reference Systems Service and decomposed by EEMD includes low-frequency signals like the long-term trend, decadal oscillation, Chandler wobble, and prograde annual wobble, along with shorter-period high-frequency signals. Second, low- and high-frequency signals are predicted using BiLSTM and ARIMA models, respectively. Finally, the low- and high-frequency signal forecast components are reconstructed to obtain geodetic polar motion predictions. In middle- and long-term polar motion prediction, the results show that the proposed model can improve the prediction accuracy by up to 42% and 17%, respectively. This demonstrated that the BiLSTM+ARIMA model can effectively improve the accuracy of polar motion prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Abduljabbar R.L., Dia H. and Tsai P.-W., 2021. Development and evaluation of bidirectional LSTM freeway traffic forecasting models using simulation data. Sci. Rep. , 11 , Art.No. 23899, https://doi.org/10.1038/s41598-021-03282-z

Akyilmaz O. and Kutterer H., 2004. Prediction of Earth rotation parameters by fuzzy inference systems. J. Geodesy , 78 , 82–93, https://doi.org/10.1007/s00190-004-0374-5

Article   Google Scholar  

Akyilmaz O., Kutterer H., Shum C.K. and Ayan T., 2011. Fuzzy-wavelet based prediction of Earth rotation parameters. Appl. Soft. Comput. , 11 , 837–841, https://doi.org/10.1016/j.asoc.2010.01.003

Al-Hnaity B. and Abbod M., 2015. A novel hybrid ensemble model to predict FTSE100 index by combining neural network and EEMD. 2015 European Control Conference (ECC) , Linz, Austria, 3021–3028, https://doi.org/10.1109/ECC.2015.7330997

Bizouard C., Lambert S., Gattano C., Becker O. and Richard J.-Y., 2019. The IERS EOP 14C04 solution for Earth orientation parameters consistent with ITRF 2014. J. Geodesy , 93 , 621–633, https://doi.org/10.1007/s00190-018-1186-3

Dill R., Dobslaw H. and Thomas M., 2019. Improved 90-day Earth orientation predictions from angular momentum forecasts of atmosphere, ocean, and terrestrial hydrosphere. J. Geodesy , 93 , 287–295, https://doi.org/10.1007/s00190-018-1158-7

Gross R., 2015. Earth rotation variations - long period. In: Schubert G. (Ed.), Treatise on Geophysics . Volume 3, Second Edition, 215–261, Elsevier, Amsterdam, The Netherlands, https://doi.org/10.1016/B978-0-444-53802-4.00059-2

Chapter   Google Scholar  

Gross R., 2000. The excitation of the Chandler wobble. Geophys. Res. Lett. , 27 , 2329–2332, https://doi.org/10.1029/2000GL011450

Guo J.Y. and Han Y.B., 2009. Seasonal and inter-annual variations of length of day and polar motion observed by SLR in 1993–2006. Chin. Sci. Bull. , 54 , 46–52, https://doi.org/10.1007/s11434-008-0504-1

Guo J.Y., Li Y.B., Dai C.L. and Shum C.K., 2013. A technique to improve the accuracy of Earth orientation prediction algorithms based on least squares extrapolation. J. Geodyn. , 70 , 36–18, https://doi.org/10.1016/jjog.2013.06.002

Guo W. and Zuo J.M., 2017. Adaptive signal decomposition methods for vibration signals of rotating machinery. in: Demetgul M. and Ünal M. (Eds), Fault Diagnosis and Detection . IntechOpen, Rijeka, Croatia, https://doi.org/10.5772/67530

Google Scholar  

Hu P., Tong J., Wang J., Yang Y. and Oliveira Turci L., 2019. A hybrid model based on CNN and Bi-LSTM for urban water demand prediction. IEEE Congress on Evolutionary Computation (CEC) , Wellington, New Zealand, 1088–1094, https://doi.org/10.1109/CEC.2019.8790060

Jia S., Xu T., Sun Z. and Li J., 2017. Middle and long-term prediction of UT1-UTC based on combination of Gray model and autoregressive integrated moving average. Adv. Space Res. , 59 , 888–894, https://doi.org/10.1016/j.asr.2016.05.044

Kalarus M., Schuh H., Kosek W., Akyilmaz O., Bizouard C., Gambis D., Gross R., Jovanovic B., Kumakshev S., Kutterer H., Cerveira P.J.M., Pasynok S. and Zotov L., 2010 Achievements of the Earth orientation parameters prediction comparison campaign. J. Geodesy , 84 , 587–596, https://doi.org/10.1007/s00190-010-0387-1

King M.A. and Watson C.S., 2014. Geodetic vertical velocities affected by recent rapid changes in polar motion. Geophys. J. Int. , 199 , 1161–1165, https://doi.org/10.1093/gji/ggu325

Kosek W., 2012. Future improvements in EOP prediction. In: Kenyon S., Pacino M.C. and Marti U. (Eds), Geodesy for Planet Earth. International Association of Geodesy Symposia, 136 , Springer, Berlin, Heidelberg, Germany, 513–520, https://doi.org/10.1007/978-3-642-20338-1_62

Kosek W., McCarthy D.D. and Luzum B.J., 1998. Possible improvement of Earth orientation forecast using autocovariance prediction procedures. J. Geodesy , 72 , 189–199, https://doi.org/10.1007/s001900050160

Liao D.C., Wang Q.J., Zhou Y.H., Liao X.H. and Huang C.L., 2012. Long-term prediction of the Earth Orientation Parameters by the artificial neural network technique. J. Geodyn. , 62 , 87–92, https://doi.org/10.1016/j.jog.2011.12.004

Lin C.S., Chiu S.H. and Yu L., 2012. Empirical mode decomposition-based least squares support vector regression for foreign exchange rate forecasting. Econ. Model. , 29 , 2583–2590, https://doi.org/10.1016/j.econmod.2012.07.018

Liu G. and Guo J., 2019. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing , 337 , 325–338, https://doi.org/10.1016/j.neucom.2019.01.078

Liu X., Zhang Y. and Zhang Q., 2022. Comparison of EEMD-ARIMA, EEMD-BP and EEMD-SVM algorithms for predicting the hourly urban water consumption. J. Hydroinf. , 24 , 535–558, https://doi.org/10.2166/hydro.2022.146

McCarthy D.D. and Seidelmann P.K., 2018. Time: From Earth Rotation to Atomic Physics . Cambridge University Press, Cambridge, U.K.

Book   Google Scholar  

Modiri S., Belda S., Hoseini M., Heinkelmann R., Ferrándiz J.M. and Schuh H., 2020. A new hybrid method to improve the ultra-short-term prediction of LOD. J. Geodesy , 94 , Art.No. 23, https://doi.org/10.1007/s00190-020-01354-y

Noll C.E., 2010. The crustal dynamics data information system: A resource to support scientific analysis using space geodesy. Adv. Space Res. , 45 , 1421–1440, https://doi.org/10.1016/j.asr.2010.01.018

Article   CAS   Google Scholar  

Schuh H., Nagel S. and Seitz T., 2001. Linear drift and periodic variations observed in long time series of polar motion. J. Geodesy , 74 , 701–710, https://doi.org/10.1007/s001900000133

Schuster M. and Paliwal K.K., 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. , 45 , 2673–2681

Shahvandi M.K., Schartner M. and Soja B., 2022. Neural ODE differential learning and its application in polar motion prediction. J. Geophys. Res.-Solid Earth , 127 , Art.No. e2022JB024775, https://doi.org/10.1029/2022jb024775

Shen Y., Guo J., Liu X., Kong Q., Guo L. and Li W., 2018. Long-term prediction of polar motion using a combined SSA and ARMA model. J. Geodesy , 92 , 333–343, https://doi.org/10.1007/s00190-017-1065-3

Spada G., Galassi G. and Olivieri M., 2015. Empirical mode decomposition of long-term polar motion observations. Stud. Geophys. Geod. , 59 , 200–211, https://doi.org/10.1007/s11200-014-1151-4

Su X., Liu L., Houtse H. and Wang G., 2013. Long-term polar motion prediction using normal time-frequency transform. J. Geodesy , 88 , 145–155, https://doi.org/10.1007/s00190-013-0675-7

Sun Z., Xu T., Jiang C., Yang Y. and Jiang N., 2019. An improved prediction algorithm for Earth’s polar motion with considering the retrograde annual and semi-annual wobbles based on least squares and autoregressive model. Acta Geod. Geophys. , 54 , 499–511, https://doi.org/10.1007/s40328-019-00274-4

Wang G., Liu L., Su X., Liang X., Yan H., Tu Y., Li Y.H. and Li W.P., 2016. Variable Chandler and annual Wobbles in Earth’s polar motion during 1900–2015. Surv. Geophys. , 37 , 1075–1093, https://doi.org/10.1007/s10712-016-9384-0

Yang Y., Xu T., Sun Z., Nie W. and Fang Z., 2022. Middle- and long-term UT1-UTC prediction based on Constrained Polynomial Curve Fitting, Weighted Least Squares and Autoregressive Combination model. Remote Sens. , 14 , https://doi.org/10.3390/rs14143252

Yao Y., Yue S. and Chen P., 2013. A new LS+AR model with additional error correction for polar motion forecast. Sci. China-Earth Sci. , 56 , 818–828, https://doi.org/10.1007/s11430-012-4572-3

Yu Y., Zhang H. and Singh V.P., 2018. Forward prediction of runoff data in data-scarce basins with an improved Ensemble Empirical Mode Decomposition (EEMD) model. Water , 10 , Art.No. 388, https://doi.org/10.3390/w10040388

Zajdel R., Sośnica K., Bury G., Dach R., Prange L. and Kazmierski K., 2020. Sub-daily polar motion from GPS, GLONASS, and Galileo. J. Geodesy , 95 , Art.No. 3, https://doi.org/10.1007/s00190-020-01453-w

Zhang G., Tan F. and Wu Y., 2020. Ship motion attitude prediction based on an Adaptive Dynamic Particle Swarm Optimization algorithm and Bidirectional LSTM Neural Network. IEEE Access , 8 , 90087–90098, https://doi.org/10.1109/ACCESS.2020.2993909

Zhang X., Wang Q., Zhu J. and Zhang H., 2012. Application of general regression neural network to the prediction of LOD change. Chin. Astron. Astrophys. , 36 , 86–96, https://doi.org/10.1016/j.chinastron.2011.12.010

Zhao D., Lei Y. and Cai H., 2018. Enhancement of the prediction accuracy of pole coordinates with empirical mode decomposition. Astron. Res. Technol. , 15 , 140–150, https://doi.org/10.14005/j.cnki.issn1672-7673.2018.02.001

Zhao J., Nie G. and Wen Y., 2022. Monthly precipitation prediction in Luoyang city based on EEMD-LSTM-ARIMA model. Water Sci. Technol. , 87 , 318–335, https://doi.org/10.2166/wst.2022.425

Download references

Acknowledgments

The scholars and reviewers of this study are to be thanked for their expert recommendations and insightful comments. We are also grateful to the IERS for providing the polar motion time series data. This research was supported and funded by Beijing Key Laboratory of Urban Spatial Information Engineering (Grant No. 20230111), the National Natural Science Foundation of China (Grant Nos. 41574011, 42174026 and 42374015), and the 2023 Graduate Innovation Fund Project of China University of Geosciences, Beijing (Grant No. ZD2023YC055).

Author information

Authors and affiliations.

School of Land Science and Technology, China University of Geosciences Beijing, 29 Xueyuan Road, Beijing, 100083, China

Kehao Yu, Haowei Shi, Mengqi Sun, Lihua Li, Shuhui Li & Honglei Yang

Beijing Key Laboratory of urban Spatial information Engineering, 60 Nanlishi Road, Xicheng District, Beijing, 100045, China

School of Geodesy and Geomatics, Wuhan University, Wuhan, 430079, China

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Lihua Li .

Rights and permissions

Reprints and permissions

About this article

Yu, K., Shi, H., Sun, M. et al. Combined BiLSTM and ARIMA models in middle- and long-term polar motion prediction. Stud Geophys Geod (2024). https://doi.org/10.1007/s11200-023-0134-y

Download citation

Received : 09 June 2023

Revised : 27 September 2023

Accepted : 30 January 2024

Published : 08 April 2024

DOI : https://doi.org/10.1007/s11200-023-0134-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • ensemble empirical mode decomposition
  • combined BiLSTM+ARIMA model
  • fast Fourier transform
  • polar motion prediction
  • Earth orientation parameters
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 27 February 2023

VAR, ARIMAX and ARIMA models for nowcasting unemployment rate in Ghana using Google trends

  • Williams Kwasi Adu   ORCID: orcid.org/0000-0001-6915-0347 1 ,
  • Peter Appiahene 1 &
  • Stephen Afrifa 2   nAff1  

Journal of Electrical Systems and Information Technology volume  10 , Article number:  12 ( 2023 ) Cite this article

2525 Accesses

2 Citations

Metrics details

The analysis of the high volume of data spawned by web search engines on a daily basis allows scholars to scrutinize the relation between the user’s search preferences and impending facts. This study can be used in a variety of economics contexts. The purpose of this study is to determine whether it is possible to anticipate the unemployment rate by examining behavior. The method uses a cross-correlation technique to combine data from Google Trends with the World Bank's unemployment rate. The Autoregressive Integrated Moving Average (ARIMA), Autoregressive Integrated Moving Average with eXogenous variables (ARIMAX) and Vector Autoregression (VAR) models for unemployment rate prediction are fit using the analyzed data. The models were assessed with the various evaluation metrics of mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), median absolute error (MedAE), and maximum error (ME). The average outcome of the various evaluation metrics proved the significant performance of the models. The ARIMA (MSE = 0.26, RMSE = 0.38, MAE = 0.30, MAPE = 7.07, MedAE = 0.25, ME = 0.77), ARIMAX (MSE = 0.22, RMSE = 0.25, MAE = 0.29, MAPE = 6.94, MedAE = 0.25, ME = 0.75), and VAR (MSE = 0.09, RMSE = 0.09, MAE = 0.20, MAPE = 4.65, MedAE = 0.20, ME = 0.42) achieved significant error margins. The outcome demonstrates that Google Trends estimators improved error reduction across the board when compared to model without them.

Introduction

The vast amount of information provided by the internet such as Google [ 1 , 2 ], Twitter [ 3 ], social media [ 4 ], or combinations of web-based data sources [ 5 , 6 ] have necessitated its numerously used in recent decades to find the potential of digital information for predictions in a wide range of sectors. Study reviews that Google handles over 92% of all online search requests in the world [ 7 ], and has demonstrated to be valid [ 8 ], valuable [ 9 ], accurate [ 10 ], and beneficial [ 11 ] for predictions. Google Trends has proven to be a dependable source of trend data for online searches and it is being extensively used by researchers around the world mostly for a real-time prediction of macroeconomic trends [ 12 , 13 ].

Information that people provide through the internet describes the current state of the people and offers a good understanding mostly of the economic processes, particularly unemployment [ 14 , 15 ]. Upon all these useful online sources with all the availability of high-frequency data and recent technological advancement, statistical information published on unemployment by nations is released with delays and may still be revised [ 16 , 17 ]. The way of gathering data for unemployment estimation seems exasperating making it impossible to know how the economy is performing right now but only how it was several months or years ago. This challenge is almost common in all countries, with Ghana not an exemption. This results in Policymakers making assessments in real-time using inadequate information, and knowing the present unemployment state which could help them better understand whether an economy is contracting or expanding and respond [ 18 ]. This paper tackles the case by using real-time Google trends data for prediction of unemployment claims in Ghana.

According to the Ghana Statistical Service's most recent census, Ghana's UnEmployment Rate (UER) increased to 13.4% in 2021, up from 6% in 2010, with 32.8% of Ghanaians aged 15 to 24 unemployed. Ghana faces a desperate downturn in economy, and the economy robust growth over the last two decades has not converted into job creation or improved employment circumstances [ 19 ]. This unfortunate situation and pressure on jobs have resulted in the loss of hundreds of jobs [ 20 ]. It would be of communal interest to produce real-time estimates of the unemployment rate to help policy making to produce real-time unemployment rate. The novelty of this paper is as follows:

this is the current paper that considers the use of ARIMA, ARIMAX, and VAR in predicting unemployment rate in Ghana.

the paper considers Google Trends indicators to predict unemployment rate in Ghana, which in turn can be used for the West African sub region.

the paper is the current to consider unemployment rate predictions in the literature.

the current paper provides the strategies and benchmarks for governments, agencies and organizations to make informed decisions on unemployment in Ghana, Africa, and the world as a whole.

The rest of the paper is organized as follows: The next section discusses related literature on forecasting using online search data. Section “ Methodology ” describes the methodology used for identifying a large number of keywords that may help in the prediction of unemployment claims, also provides a brief overview of the models used for comparison of results. The results of the models are discussed in section “ Results and discussion .” Section “ Conclusion and future works ” gives the conclusion and discusses the importance of using different categories of keywords for the prediction of the unemployment claims.

Related works

Online search engines are frequently used for real-time research. Due to the huge amount of daily search queries, Ettredge et al. [ 21 ] took the first initiative by first looking into how real-time forecasting may be done by using the Internet and the study's findings reveal a strong link between Internet-related web search activity and unemployment rate in the USA [ 22 , 23 ] continued by looking at how web search data, particularly Google, could be utilized to improve forecasting of a range of economic parameters, such as jobless claims, retail sales, real estate demand, and vacation destination preferences. Several studies of real-time forecasting utilizing internet data, particularly Google Trends (GT) data, have been published since these papers, but this work focuses on unemployment prediction.

To anticipate UERs during the COVID-19 pandemic in Indonesia, Rizky et al. [ 2 ] used GT data query share for the keyword "phk" (work termination) and earlier series from the official labor force survey performed by Badan Pusat Statistik (Statistics Indonesia). As a result of using the GT index query as an exogenous variable to capture current conditions of a phenomenon that is occurring, results of predicting open UER using ARIMAX during the COVID-19 period generate forecast values that are reliable and near to reality. Petropoulos et al. [ 24 ] used text mining algorithms to develop a financial lexicon based on a collection of 10,000 Central Bank speeches. Google inquiries, according to experts, can predict future market volatility in a short time (one month). Tuhkuri [ 25 ] used the ETLAnow model and no Google search data to estimate official UER in the European Union (EU) - 28 countries. Google Inc.'s Google Trends database, as well as Eurostat's Labor Force Statistics, are the model's primary data sources. Findings suggest that Google searches are linked to the EU UER, even after controlling for country-level, delayed, and seasonal effects.

Tuhkuri [ 26 ] used GT's database from Google Inc. and Labor Force Statistics from the Current Population Survey and US Bureau of Labor Statistics. Results reveal that Google searches' predictive ability is inadequate for short-term forecasting, that the utility of Google data for forecasting purpose is occasional, and forecasting accuracy increases are relatively modest. Mulero and García-Hiernaux [ 1 ] used data from GT and the Spanish State Employment Service to examine a large number of potential explanatory factors for UERs. The results reveal an increase in expected accuracy of 10% to 25%.

Lasso and Snijders [ 27 ] adopted GT method to forecast Brazil's UER. The findings reveal that Google search volumes for job-related phrases have significant predictive power, with biweekly search data forecasting the direction of the UER with over 80% accuracy, exceeding baseline methodologies based on seasonal trends by over 15%. Brake and Ramos [ 28 ] estimate the UER in the Netherlands using a variable based on the amount of Google search keywords. The predictive capability of the Google Indicator is determined by comparing the accuracy of a benchmark model to an upgraded model with the Google Indicator. According to the statistics, the Google improved models produce up to 27.8% more accurate estimations when considering a one-month forecast horizon.

Simionescu and Zimmermann [ 14 ] looked into how internet usage information is used in various industries, with unemployment modeling being a particular area of interest. The results of the research show that there is a lot of potentials that should be investigated further. A vast majority of nations base their unemployment estimation and modeling on internet data. However, the forecast's accuracy is based on each country's internet penetration, the age distribution of online users, and the stability of the generated internet variables. Maas [ 29 ] studied if Google search data, and other more traditional predictor elements, may be utilized to anticipate the UER in the USA. The findings indicate that GT forecasting methods proposed in this study are most beneficial in short term.

Jung and Hwang [ 30 ] constructed unemployment prediction models for specific age groups using Google search queries related to them (the 30s and 40s) and known unemployment statistics from Statistics Korea. The findings demonstrate that employing a web search query to improve unemployment prediction models for Korea is still useful. Smit [ 31 ] investigates whether and to what extent Google search data may be utilized to forecast the US UER. They concluded that GTs enhances the anticipated accuracy of all currently used forecasting approaches.

Methodology

The study explored the effectiveness of the Google trends by adopting several testing techniques. Figure 1 displays a detailed procedure for the experiment. The steps below are a detailed explanation of Fig. 1 .

figure 1

Work flow of experimental design

To start, data from GT were joint with interpolated World Bank (WB) UER data to create a single, special dataset for the visualization and study of UER in Ghana using Granger causality.

Time Series (TS) data are split into training and test sets after input.

Training sets and test sets were used to train and evaluate the models (ARIMA, ARIMAX, and VAR).

The World Bank (WB) and Google Trends (GT) provided the data for this collection. Google launched the website Google Trends for search analysis in 2006. GT offers a search trend that starts with the year 2004 and shows the frequency with which a certain search phrase is entered into Google's search engine over time about the site's overall search traffic.

GT shows changes in internet interest for any TS in any nation or location over a selected period of time, such as one year, several years, four months, three weeks, thirty days, seven days, four hours, or one hour. Additionally, several sentences from various places can be compared simultaneously. The GT and World Bank data can be downloaded in ".csv" format. In short, GT calculates the number of searches represented mathematically in equation 1 , 2 , and 3 as follows [ 32 ]:

where i  = Terms or expressions of the study, k  = possible terms to search on Google, m  = months of the study. Additionally, \(S(e{)}_{\mathrm{tot},m}\) = total search on Google for one-month m in a particular country, \(S(e{)}_{i,m}\) = total searches on Google for a term i of our study for a month m and a country, \(Qs(e{)}_{t,m}\) = Query share of a term in a certain month and country, and \(\mathrm{RSV}(e{)}_{t,m}\) = Relative search volume of a term in a certain month and country.

Our sample of search terms comprises 50 Google Trends which have been chosen based on the methodology as shown in Table 1 . Our data window is restricted to begin with 2010–2020 - since this is the earliest data point for which Ghanaian migrated to using internet. The variable of interest is the unemployment rate date for Ghana downloaded from the World Bank website.

Interpolation

For extracting high-frequency data (such as monthly or weekly data) from low-frequency data (such as annual data), the Chow-Lin approach, a disaggregation method, is utilized [ 33 ]. The method makes sure that the high-frequency series' average, first, and last values correspond to those of the low-frequency series. The following two-step additive structure is the general temporal disaggregation framework for developing a high-frequency estimate, according to [ 33 ]. Equation 4 describes the Chow-Lin approach.

Make a preliminary high-frequency series \({\overline{\upsilon }}_{j}\) using auxiliary data from several indicator series. To incorporate this data, a generalized least squares regression strategy is frequently utilized. Analyze the differences in residuals between the observed low-frequency series and the high-frequency series that have been aggregated to the low-frequency scale (through the matrix \(H\in {f}^{n\times m}\) ). Then, create a temporally consistent high-frequency version \({y}_{i}\) by distributing these differences among the high-frequency periods using the distribution matrix \(F\in {R}^{n\times m}\) .

Causality (granger causality (GC))

GC test examines the connection between the current value of one variable and the historical values of another variable to find a causal direction between two or more time series [ 34 ]. According to [ 35 ] GC indexes of two series Y and X can be computed by finding the variance of the error samples. If X and Y are independent, then X ( \(var(\varepsilon )\) ) =  Y ( \(var(\varepsilon )\) ), where \(var(\varepsilon )\) denotes the variance of the error e . Otherwise, the two equations do not hold. For example, if X is the cause of Y , then X ( \(v\mathit{ar}(\varepsilon )\) ) >  Y ( \(v\mathit{ar}(\varepsilon )\) ). It can be represented by the formula in Eq.  5 [ 36 ]

If \({F}_{(X\to Y)}\) ≥ 0 and \({F}_{(Y\to X)}\) ≥ 0 then the indexes of causality can be analyzed. Specifically, if \({F}_{(X\to Y)}\) > \({F}_{(Y\to X)}\) , then X is the cause of Y, or the information flowing from X to Y is more than that from Y to X; if \({F}_{(X\to Y)}\) < \({F}_{(Y\to X)}\) , then Y is the cause of X .

Training, and test

The overall data set was split into training and test data sets with the shares close to 80% from 2010 to 2018 dataset, with the remaining 20% from 2019 to 2020 designated for testing. Table 2 shows specific splitting procedure that divides the dataset. In the second step, the test set of two years frames is further divided into yearly (Y1), half-year, quartile, and monthly such that UER was tested in the different time frames.

The data science project of TS forecasting is crucial for many processes that happen over time. TS forecasting is a practical method for figuring out how past data influence present results. Making short- and long-term projections and pattern-spotting using previous data allows for this. The TS used were ARIMA, VAR and ARIMAX.

VAR is a forecasting method that can be used when two or more TS interact. In other words, the TS in question has a two-way relationship. VAR models can be used to assess and predict multivariate TS data, which sets them apart from univariate autoregressive models. VAR models are often used in economics. For a VAR model with a large number of interconnected TS variables. Equation 6 represents the VAR model

where the c is the intercept, \(\phi\) coefficient of lags of y till order p, and ɛ error. Here, it is shown as a system of equations with one equation per TS variable. VAR is adaptable, requires less time and information [ 37 ], and makes it simple to integrate additional data [ 38 ]. VAR models, however, have the drawback of being unable to take into account when the measure of the dispersion between numbers in a data set changes across various time series values [ 39 ].

ARIMA combines the ideas of autoregression and moving average to provide forecasts that are linear combinations of previous variable values and forecast errors. ARIMA is characterized by three factors: p , d , and q signify the number of lagged (or previous) data to consider for autoregression, the number of times the raw observations are differenced, as well as size of the moving average window, respectively.

The forecasting equation is structured in Eq.  7 as follows:

where \({F}_{t}\) = forecast point at time t , \({L}_{t}\) = Level at time t (straight line approximation of all your data at one time point—calculated in ARIMA, it uses the mean of differenced data time smoothing constants), \({D}_{t-p}^{`}\) = Previous difference observed data points, \({E}_{t-q}\) = Error in prediction on previous data points, and \(\Omega\) and β are smoothing constants.

Many scholars who used time series recently explored ARIMA. However, the ARIMA model only applies to one variable, does not adequately describe some data turning points, and cannot adequately convey relationships between variables [ 40 , 41 ]. As a result, it is insufficient to describe genuine issues.

The ARIMAX model is an extension of the ARIMA model. The model includes other independent variables that are the X added to the end and stands for “exogenous variables.” This involves adding a separate different outside variable to help measure our endogenous variable.

Equation  8 is structured as follows:

where Pt and Pt −1 represent the values in the current period and 1 period ago, respectively. Similarly, ϵ t and ϵ t −1 are the error terms for the same two periods. C is just a baseline constant. ϕ 1 and θ 1, express what parts of the value Pt −1 and error ϵ t −1 last period are relevant in estimating the current one. β is a coefficient which will be estimated based on the model selection and the data. X is the exogenous variable of interest. ARIMAX is helpful since it combines the time series and regression components into one model. However, it can be challenging to interpret the independent variable that may have an impact on the result.

Evaluation metrics

We compute the mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), median absolute error (MedAE), and maximum error (ME) to assess the forecasting accuracy of each model. Equations 9 , 11 , 12 , 13 , 14 , and 14 represent the aforementioned evaluation metrics.

where y denotes current UER and \(\widehat{y}\) is expected UER. Our study used six (6) different valuation metrics to evaluate the models. By employing more evaluation metrics, we were able to choose the optimum strategy while also confirming that each model was able to complete the underlying predicting task.

Results and discussion

The basic goal of temporal disaggregation methods is to create a new TS while preserving the short-term behavior of higher frequency indicator series. This TS must be coherent with low-frequency data. For the UER and interpolated UER in question, standard descriptive data are provided in Table 3 . The table demonstrates that the UER and the interpolated UER are nearly equal. Visual representations of the UER and interpolated UER are shown in Fig.  2 .

figure 2

Unemployment rate and an interpolation unemployment rate

The graph shows that the interpolated unemployment rate, which comes from a dataset of 574 recordings, and the actual unemployment rate, which comes from a dataset of 11 records, both vary in the same way over time, proving that our dataset is equal in mean and standard deviation.

Cross-correlation function (CCF) analysis

Table 4 outlines the keywords whose trends were highly linked with UER using Granger causality Test (GCT). We compiled a list of terms from Table 4 with a high p for lag << 1 0.05 that are associated with the UER.

The table demonstrates how 14 of the 50 GT (x1 to 50) estimators for Ghana are related to the WB UERs series ( y ). The cells in the table with p ( v ) values that are less than 0.05 for the first lag were chosen. Figure 3 displays a graph of GT estimators with p0.05 analysis results. The graph shows that there is a range of correlations between +1 and -1, where +1 represents the total positive correlation, 0 represents the absence of any correlation, and -1 represents the total negative correlation. The lags and past values of the 14 indicators are statistically significant in the equation and predicting the future values of unemployment rate.

figure 3

GCT analysis results of selected GT ( x ) and WB UER ( y ) keywords

Model result

According to the experimental design aforementioned, detailed experiments with different TS models were conducted using univariate or multivariate models. The models and order utilized in building series are ARIMA (1, 2, 1), ARIMAX (4, 1, 3) and VAX (3, 0). Table 5 illustrates the data evaluation metrics for the Models.

Evaluation of the models

The selected significant prospective determinants of the unemployment rate are taken into account with the aid of various evaluation metrics. Consideration was given to the significant y chosen for the unemployment rate in all periods. Table 5 provides an overview of the performance metrics MSE, RMSE, MAE, MAPE, MedAE, and ME for all the periods. The results show that over the first five measurement periods, the model was able to forecast with little error. Additionally, for all models, the error margin rises as the anticipated period grows. Furthermore, for nearly all periods and virtually all evaluation techniques, VAR was able to forecast with minimum error.

We created an average based on each evaluation metric results for all models, as shown in Table 5 , to decide and choose the best models for the forecast. The VAR model had the best and least average error values, with MSE = 0.09, RMSE = 0.09, MAE = 0.20, MAPE = 4.65, MedAE = 0.20, and ME = 0.42, as demonstrated by the average findings in Fig. 4 . This demonstrated how better the proposed model VAR (multivariate TS) with GT estimators is compared to ARIMA and ARIMX. The VAR was able to detect a minor growth even if the models did not follow the major trend of UER change. The graph demonstrated how much better and more effective the VAR model is than the other models.

figure 4

Visualization comparison of the average evaluation result for the models

Figure 5 shows the actual UER for Ghana as well as the predicted visualization for each of the models over the two-year timeframe. Except for VAR, which is somewhat in line and reflected the modest shift, all models were not in line with the UER, according to the figure. The VAR Model outperforms all other models (ARIMA and ARIMAX). Most models in economic condition approximation perform well in a stable environment, but they lack the prudence to foresee hidden economic change. In both steady and dynamic settings, the VAR Model linking input factors derived from rich high-frequency timely variables for predicting UER perform better.

figure 5

Real UER and forecasted UER for Ghana for the 2 years over Models

Conclusion and future works

The issue is not a dearth of data, but rather a dearth of information that can be used for planning, strategy, and decision-making. Using big data, such as Google Trends, can assist the entire government system. Google Trends provides access to a huge unfiltered collection of actual Google search requests. People use Google for a wide range of informational and topical searches, making it a valuable search engine. 50 words or phrases were of interest. Google Trends (GT) search query data were used to derive values for search relating to Jobs, society, social services, and economic indicators. The study identified a number of factors that influence the unemployment rate, including "how to make money," "how to start a business," "jobs in Ghana," "jobs in the USA," "online money," "nurse application," "visa application," and "police recruiting." This study proposes a technique to first implementing pre-processing to overcome the difficulty of handling the vast data and describes an in-depth look into the use of ARIMA, ARIMAX and VAR in nowcasting unemployment in Ghana as a use-case.

In terms of prediction accuracy, error margin, and model reliability, results show that the VAR method surpassed all other techniques. VAR (MSE = 0.09, RMSE = 0.09, MAE = 0.20, MAPE = 4.65, MedAE = 0.20, ME = 0.42) achieved significant error margins. This is compelling evidence that real-time UER forecasting at a daily level of generality is possible. Most models in economic condition approximation perform well in a stable environment, but they lack the prudence to foresee hidden economic change. In both steady and dynamic settings, the VAR Model linking input factors derived from rich high-frequency timely variables for predicting UER perform better. The objective of successful citizen care management can be attained with the use of Google Trends by offering effective data-driven services to citizens and predicting their needs based on the analysis of surveys taken among various groups of citizens. In future, more data will be collected to train with artificial intelligence techniques to generate decision support systems.

In the current study, we have highlighted a few predictor variables that contribute to the nation's unemployment rate and are crucial in figuring out unemployment. The government can also use this study's crucial information to make data-driven decisions. The government will be assisted in strengthening technical and vocational institutions. These will then bring in revenue and be put toward development. Additionally, it will be useful in establishing the state of the economy while formulating monetary policy. We recommend using machine learning model for future work.

Availability data and materials

The data presented in this study are publicly available through the Fig Share repository via Afrifa, Stephen (2022): unemployment_data.csv. figshare. Dataset. https://doi.org/10.6084/m9.figshare.20311167.v1 .

Mulero R, García-Hiernaux A (2021) Forecasting Spanish unemployment with Google Trends and dimension reduction techniques. SERIEs 12(3):329–349. https://doi.org/10.1007/s13209-021-00231-x

Article   Google Scholar  

Rizky O, Fajar M, Prasetyo OR, Nonalisa S (2020) Forecasting unemployment rate in the time of COVID-19 pandemic using Google Trends Data (Case of Indonesia). Munich Pers. RePEc Arch, no. 105042

Nirmala CR, Roopa GM, Kumar KRN (2015) Twitter data analysis for unemployment crisis. In: Proceedings of 2015 international conference applications theoretical computer communications and technology. iCATccT 2015, pp 420–423. https://doi.org/10.1109/ICATCCT.2015.7456920

Ryu PM (2018) Predicting the unemployment rate using social media analysis. J Inf Process Syst 14(4):904–915. https://doi.org/10.3745/JIPS.04.0079

Mavragani A, Ochoa G, Tsagarakis KP (2018) Assessing the methods, tools, and statistical approaches in Google trends research: Systematic review. J Med Internet Res 20(11):1–20. https://doi.org/10.2196/jmir.9366

Twumasi E, Frimpong EA, Kwegyir D, Folitse D (2021) Improvement of grey system model using particle swarm optimization. J Electr Syst Inf Technol. https://doi.org/10.1186/s43067-021-00036-9

Naccarato A, Falorsi S, Loriga S, Pierini A (2018) Combining official and Google Trends data to forecast the Italian youth unemployment rate. Technol Forecast Soc Change 130:114–122

McCallum ML, Bury GW (2014) Public interest in the environment is falling: a response to Ficetola (2013). Biodivers Conserv 23(4):1057–1062

Jun SP, Park DH (2016) Consumer information search behavior and purchasing decisions: empirical evidence from Korea. Technol Forecast Soc Change 107:97–111. https://doi.org/10.1016/j.techfore.2016.03.021

Han SC, Chung H, Kang BH (2012) It is time to prepare for the future: forecasting social trends. In: Kim Th, Ma J, Fang Wc, Zhang Y, Cuzzocrea A (eds) Computer applications for database, education, and ubiquitous computing. EL DTA 2012. Communicat. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35603-2_48 .

Vosen S, Schmidt T (2011) Forecasting private consumption: Survey-based indicators vs. Google trends. J Forecast 30(6):565–578. https://doi.org/10.1002/for.1213

Article   MathSciNet   MATH   Google Scholar  

Kundu S, Singhania R (2020) Forecasting the United States unemployment rate by using recurrent neural networks with Google Trends data. 11(6). https://doi.org/10.18178/ijtef.2020.11.6.679

Heidary J, Rastegar H (2022) A novel computational technique using coefficient diagram method for load frequency control in an interconnected power system. J Electr Syst Inf Technol 9(1):1–24. https://doi.org/10.1186/s43067-022-00062-1

Simionescu M, Zimmermann KF (2017) “Big Data and Unemployment Analysis,” GLO Discuss. Pap., p. No. 81

Hacıevliyagil N, Drachal K, Eksi IH (2022) Predicting house prices using DMA method: evidence from Turkey. Economies 10(3):1–27. https://doi.org/10.3390/economies10030064

Naccarato A, Pierini A, Falorsi S (2015) Using Google Trend data to predict the Italian unemployment rate. Dep. Work. Pap. Econ. - Univ. “Roma Tre

Junior MA, Appiahene P, Appiah O (2022) Forex market forecasting with two - layer stacked Long Short - Term Memory neural network ( LSTM ) and correlation analysis. J Electr Syst Inf Technol 1:1–24. https://doi.org/10.1186/s43067-022-00054-1

Simionescu M, Cifuentes-Faura J (2022) Forecasting National and Regional Youth Unemployment in Spain Using Google Trends. Soc Indic Res 164(3):1187–1216. https://doi.org/10.1007/s11205-022-02984-9

Simionescu M, Cifuentes-Faura J (2022) Can unemployment forecasts based on Google Trends help government design better policies? An investigation based on Spain and Portugal. J Policy Model 44(1):1–21. https://doi.org/10.1016/j.jpolmod.2021.09.011

Şentürk G (2022) Can Google search data improve the unemployment rate forecasting model? AN empirical analysis for Turkey. J Econ Policy Res 9(2):229–244. https://doi.org/10.26650/jepr963438

Ettredge M, Gerdes J, Karuga G (2005) Using web-based search data to predict macroeconomic statistics. Commun ACM 48(11):87–92. https://doi.org/10.1145/1096000.1096010

Choi H, Varian H (2009) Predicting the present with Google Trends. Tech. report, Google. [Cited 1 April 2012.]

Choi H, Varian H (2009) Predicting initial claims for unemployment insurance using Google Trends. Tech. report, Google. [Cited 1 April 2012.]

Petropoulos A, Siakoulis V, Stavroulakis E, Lazaris P, Vlachogiannakis N (2021) Employing Google Trends and deep learning in forecasting financial market turbulence. J Behav Financ. https://doi.org/10.1080/15427560.2021.1913160

Tuhkuri J (2016) ETLAnow: a model for forecasting with Big Data forecasting unemployment with Google Searches. ETLA Reports 54, no. 54, p 20

Tuhkuri J (2016) Forecasting unemployment with Google Searches. ETLA Work. Pap. No 35

Lasso F, Snijders S (2016) The power of Google search data2 an alternative approach to the measurement of unemployment in Brazil

te Brake G, Ramos R (2017) Unemployment ? Google it ! Analyzing the usability of Google queries in order to predict unemployment

Maas B (2019) Short-term forecasting of the US unemployment rate. J Forecast. https://doi.org/10.1002/for.2630

Jung JU, Hwang J (2019) Application of Google Search queries for predicting the unemployment rate for Koreans in their 30s and 40s. 17(9):135–145

A. O. O. Smit (2018) Unemployment rate forecasting using Google trends, Bachelor Thesis in Econometrics & Operations Research erasmus university rotterdam erasmus school of economics, pp 1–22

Jimenez A, Santed-Germán MA, Ramos V (2020) Google Searches and Suicide Rates in Spain, 2004–2013: Correlation Study. JMIR Public Heal Surveill 6(2):2004–2013. https://doi.org/10.2196/10919

Mosley L, Eckley I, Gibberd A (2021) Sparse temporal disaggregation, no. 2019, pp 1–33

Ghouali S et al (2017) The granger causality effect between cardiorespiratory hemodynamic signals to cite this version : HAL Id : hal-01573108 The Granger Causality Effect between. https://doi.org/10.5176/2251-1911

Chen B, Ma R, Yu S, Du S, Qin J (2019) Granger causality analysis based on quantized minimum error entropy criterion. IEEE Signal Process Lett 26(2):347–351. https://doi.org/10.1109/LSP.2019.2890973

Bressler SL, Seth AK (2011) Wiener–Granger causality: a well established methodology. Neuroimage 58(2):323–329. https://doi.org/10.1016/j.neuroimage.2010.02.059

Bai P, Safikhani A, Michailidis G (2022) Multiple change point detection in reduced rank high dimensional vector autoregressive models. J Am Stat Assoc. https://doi.org/10.1080/01621459.2022.2079514

Article   MATH   Google Scholar  

Odekina GO, Adedotun AF, Imaga OF (2022) Modeling and forecasting the third wave of Covid-19 incidence rate in Nigeria using vector autoregressive model approach. J Niger Soc Phys Sci 4(1):117–122. https://doi.org/10.46481/jnsps.2022.431

Cho H, Maeng H, Eckley IA, Fearnhead P (2022) High-dimensional time series segmentation via factor-adjusted vector autoregressive modelling, pp 1–62

Victor-Edema UA, Essi PID (2016) Autoregressive integrated moving average with exogenous variable (ARIMAX ) model for Nigerian Non Oil Export 8(2014):2010–2015

Yucesan M, Gul M, Celik E (2018) Performance comparison between ARIMAX , ANN and ARIMAX-ANN hybridization in sales forecasting for furniture industry. RES Gate. https://doi.org/10.5552/drind.2018.1770

Download references

Acknowledgements

We express our sincere gratitude to Mrs. Nancy Addia who encouraged and motivated us throughout the research. Finally, we would like to thank Google and World Bank, for making the data available.

Not applicable.

Author information

Stephen Afrifa

Present address: University of Energy and Natural Resources, Sunyani, Ghana

Authors and Affiliations

University of Energy and Natural Resources, Sunyani, Ghana

Williams Kwasi Adu & Peter Appiahene

Tianjin University, Tianjin, China

You can also search for this author in PubMed   Google Scholar

Contributions

“Conceptualization, WKA and PA; methodology, SA and WKA; software, WKA.; validation, WKA, PA and SA; formal analysis, WKA; investigation, WKA, PA and SA; resources, PA; data curation, WKA; writing—original draft preparation, WKA; writing—review and editing, PA; visualization, WKA; supervision, PA; project administration, PA; funding acquisition, PA. All authors have read and agreed to the published version of the manuscript.”

Corresponding author

Correspondence to Williams Kwasi Adu .

Ethics declarations

Competing interests.

Competing interest statement declared by the corresponding author on behalf of all authors. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Adu, W.K., Appiahene, P. & Afrifa, S. VAR, ARIMAX and ARIMA models for nowcasting unemployment rate in Ghana using Google trends. Journal of Electrical Systems and Inf Technol 10 , 12 (2023). https://doi.org/10.1186/s43067-023-00078-1

Download citation

Received : 26 August 2022

Accepted : 13 February 2023

Published : 27 February 2023

DOI : https://doi.org/10.1186/s43067-023-00078-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

research paper on arima model

This paper is in the following e-collection/theme issue:

Published on 11.4.2024 in Vol 26 (2024)

A Perspective on Crowdsourcing and Human-in-the-Loop Workflows in Precision Health

Authors of this article:

Author Orcid Image

  • Peter Washington, PhD  

Information and Computer Sciences, University of Hawaii at Manoa, Honolulu, HI, United States

Corresponding Author:

Peter Washington, PhD

Information and Computer Sciences

University of Hawaii at Manoa

1680 East-West Road

Honolulu, HI, 96822

United States

Email: [email protected]

Modern machine learning approaches have led to performant diagnostic models for a variety of health conditions. Several machine learning approaches, such as decision trees and deep neural networks, can, in principle, approximate any function. However, this power can be considered to be both a gift and a curse, as the propensity toward overfitting is magnified when the input data are heterogeneous and high dimensional and the output class is highly nonlinear. This issue can especially plague diagnostic systems that predict behavioral and psychiatric conditions that are diagnosed with subjective criteria. An emerging solution to this issue is crowdsourcing, where crowd workers are paid to annotate complex behavioral features in return for monetary compensation or a gamified experience. These labels can then be used to derive a diagnosis, either directly or by using the labels as inputs to a diagnostic machine learning model. This viewpoint describes existing work in this emerging field and discusses ongoing challenges and opportunities with crowd-powered diagnostic systems, a nascent field of study. With the correct considerations, the addition of crowdsourcing to human-in-the-loop machine learning workflows for the prediction of complex and nuanced health conditions can accelerate screening, diagnostics, and ultimately access to care.

Introduction

Crowdsourcing, a term first coined in 2006 [ 1 ], is the use of distributed human workers to accomplish a central task. Crowdsourcing exploits the “power of the crowd” to achieve goals that are only feasible with a distributed group of humans collaborating, either explicitly or implicitly, toward a common goal. Crowdsourcing has often been applied to public health surveillance [ 2 ], such as for tracking epidemics [ 3 , 4 ], quantifying tobacco use [ 5 ], monitoring water quality [ 6 ], tracking misinformation [ 7 ], and understanding the black-market price of prescription opioids [ 8 ]. In the context of health care, crowdsourcing is most often used for public health, a domain that can clearly benefit from scalable and distributed assessments of health status. Although sampling bias can be an issue in epidemiological uses of crowdsourcing [ 9 ], approaches that account for these issues have performed quite robustly.

A smaller but potentially transformative effort to apply crowdsourcing to precision health rather than population health has recently emerged. In precision health contexts, the goal is to provide a diagnosis using information labeled by crowd workers. There are several variations to this basic setup. Crowdsourcing workflows for diagnostics can diverge with respect to the underlying task, worker motivation strategies, worker training, worker filtering, and privacy requirements.

Here, I describe the existing research in the relatively small and early but growing field of crowdsourcing for precision health. I then discuss ongoing challenges and corresponding opportunities that must be addressed as this field matures.

Existing Examples of Crowdsourcing in and Adjacent to Health Care

There are relatively few examples of crowdsourcing in precision health. The vast successes of machine learning for health [ 10 - 15 ] and the human labor costs required for crowdsourcing make purely automated approaches more appealing when they are possible and feasible. However, the crowdsourcing approaches that have been tested tend to perform well for prediction tasks that are beyond the scope of current automated approaches, especially in psychiatry and the behavioral sciences.

I want to begin by highlighting successes in science, as they can often be applied to health and have started to lead to improvements in diagnostics. Framing crowdsourcing tasks as “citizen science” opportunities can be an effective incentive mechanism [ 16 ]. Oftentimes, these projects are “gamified.” Gamification refers to the incorporation of engaging elements into traditionally burdensome workflows, and in particular game-like affordances, to foster increased participation. A combination of large crowd sizes, worker training procedures, and easy identification tasks have led to previous success in the existing gamified citizen science experiments applied to precision health. For example, in a study involving nearly 100,000 crowd workers who scored images on a citizen science platform, cancer was correctly identified with an area under the receive operating characteristic of around 95% [ 17 ]. In the BioGames app, users who performed with greater than 99% accuracy in a training tutorial were invited to diagnose malaria [ 18 , 19 ]. It was discovered that with a large crowd size, the aggregated diagnostic accuracy of nonexpert crowd workers approached that of experts [ 20 ]. Another citizen science malaria diagnosis application, MalariaSpot, resulted in 99% accuracy in the diagnosis of malaria from blood films [ 21 ]. If the annotation task is relatively simple and nonexperts can be trained with minimal onboarding efforts, then citizen science can be an effective and affordable approach.

“Gamified” crowdsourcing for citizen science has also been successful without explicitly requiring workers to undergo a formal training process. Foldit [ 22 - 25 ] and EteRNA [ 26 - 31 ] are 2 games where players with no biology or chemistry background can explore the design space of protein and RNA folding, respectively. These are both NP-hard (ie, computationally complex) problems, and human players in aggregate have designed solutions that outcompete state-of-the-art computational approaches. These solutions have been used to solve health challenges, such as finding a gene signature for active tuberculosis, which can potentially be used in tuberculosis diagnostics [ 32 ]. Other gamified experiences have been used to build training libraries for complex classification tasks in precision psychiatry. Notably, GuessWhat is a mobile charades game played between children with autism and their parents [ 33 , 34 ]. While the game provides therapeutic benefits to the child with autism [ 35 ], the game simultaneously curates automatic labels of behaviors related to autism through the metadata associated with gameplay [ 36 , 37 ]. These automatically annotated video data have been used to develop state-of-the-art computer vision models for behaviors related to the diagnosis of autism, such as facial expression evocation [ 38 - 41 ], eye gaze [ 42 ], atypical prosody [ 43 ], and atypical body movements [ 44 , 45 ].

An alternative incentive mechanism is paid crowdsourcing. The most popular paid crowdsourcing platform, by far, is Amazon Mechanical Turk (MTurk) [ 46 ]. While paid crowdsourcing specifically for precision health is a relatively nascent field, the general study of paid crowdsourcing (particularly on MTurk) is quite mature. Studies have explored worker quality management [ 47 ], understanding crowd worker demographics [ 48 ], the generation of annotations for use in the training of machine learning models [ 49 - 53 ], the rights of crowd workers [ 54 - 56 ], and understanding crowd worker communities and economics [ 57 - 59 ]. Preliminary studies of paid crowdsourcing have yielded mixed success. Around 81% of images were correctly classified on MTurk in a study involving the grading of diabetic retinopathy from images, with workers failing to correctly indicate the level of severity [ 60 ]. In a separate binary labeling task for glaucomatous optic neuropathy, workers achieved sensitivity in the 80s but reached a specificity below 50% [ 61 ].

In a broader classification task of various medical conditions, workers consistently labeled the “easy” cases while struggling to correctly label and even refusing to label more complicated and nuanced tasks [ 62 ]. Clearly, there is a need for extensive innovations to the traditional paid crowdsourcing workflow to translate this methodology to precision health.

I have extensively investigated the utility of paid crowdsourcing for the diagnosis of autism from unstructured home videos, achieving relatively high diagnostic performance [ 63 - 66 ]. In these experiments, untrained annotators watched short videos depicting children with and without autism and answered questions about the behaviors depicted within the videos. These annotations were provided as input into previously developed machine learning models, achieving binary test performance in the 90s across performance metrics due to the reduction of the complex feature space (unstructured videos) into a low-dimensional representation (vectors of a few categorical ordinal values). This pipeline combining crowdsourcing and machine learning can possibly be extended to other diagnostic domains in psychiatry where the input feature space is complex, heterogeneous, and subjective.

Ongoing Challenges and Corresponding Opportunities

Since crowdsourcing for precision health care is an emerging field of study, numerous challenges must be solved for clinical translation to develop. In the proceeding sections, I highlight several areas that are pressing for the field and for which preliminary work has been published.

Worker Identification and Training

While traditional crowdsourcing can work with minimal to no worker training, complex annotation tasks require the identification of qualified workers. I have found that worker identification can occur through the quantification of their performance on test tasks [ 67 , 68 ] and training promising workers [ 66 ]. Such crowd filtration paradigms will require domain-specific procedures. There is ample room to develop new crowdsourcing systems that inherently support natural worker identification and training procedures for crowdsourcing workflows that require well-designed training processes.

Worker Retention

Once proficient workers are identified, continually engaging and retaining these workers is critical. I have found that workers who are repeatedly encouraged by a human (or human-like chatbot) and treated as members of a broader research team tend to enjoy paid work and even ask for more tasks after the completion of the study [ 69 ]. Thus, it is possible that the guarantee of job security can lead to long-term worker retention. However, worker retention in unpaid settings that rely on intrinsic motivation will require additional innovations. For example, there exists an opportunity to explore the creation of crowd worker communities to provide a means of intrinsic motivation leading to worker retention.

Task Assignment

Certain workers perform exceptionally well on a subset of tasks while underperforming on other assigned tasks [ 70 , 71 ]. There is an opportunity to develop algorithmic innovations involving the effective and optimal assignment of workers to subtasks in a dynamic manner. Reinforcement learning could be a promising approach but has yet to be explored in such scenarios.

Privacy of Human Participants

Data in psychiatry and behavioral sciences are particularly sensitive. Ensuring that sensitive health information is handled appropriately and that workers’ privacy is maintained is essential from an ethical perspective. There are 2 general families of approaches to achieving privacy in crowd-powered health care. First, the data can be modified to obscure sensitive information without removing information required for a diagnosis. I have explored privacy-preserving alterations to video data that obfuscate the identity of participants while maximizing the capacity for workers to annotate behaviors of interest [ 70 , 71 ]. For example, in the case of video analytics on bodily movements, the face can be tracked and blurred, or the body can be converted to a stick figure using a pose-based computer vision library. Sometimes, however, it is impossible to modify the data without severely degrading the diagnostic performance. Therefore, the second family of approaches involves carefully vetting crowd workers, training them, and onboarding them into a secure computing environment. In my previous experiences with this process [ 40 ], I discovered that crowd workers were enthusiastic about the prospect of the “job security” that is implied from the thorough vetting procedure and were, therefore, willing to complete extra privacy and security training (in our case, Research, Ethics, Compliance, and Safety training). There is ample room to expand upon these methods and to develop new paradigms and systems for crowdsourcing involving identifiable and protected health information.

Ensuring Reliability and Reproducibility

An intrinsic challenge when incorporating human workers into precision health workflows is the variability in human responses, both within workers and between workers. I have found that while most crowd workers are inconsistent in their annotation patterns, there are workers who provide consistently sensitive and specific annotations across a wide spectrum of data points [ 67 ]. It is therefore critical to measure both internal consistency and consistency against a gold standard when recruiting crowd workers for precision health care workflows.

Handling Financial Constraints

The crowdsourcing method with the lowest setup barriers is paid crowdsourcing. In such scenarios, financial constraints can limit the scalability of crowdsourcing workflows. One approach is to migrate from a paid system to a gamified system or another means of providing intrinsic motivation to crowd workers. However, achieving critical mass for large-scale pipelines is likely unattainable for such unpaid solutions. Paid crowd workers who consistently perform well could be recruited as full-time or long-term part-time employees for companies and organizations providing crowd-powered services. Integrating such workflows into a Food and Drug Administration (FDA)–approved process can be challenging, but it is worth exploring if it turns out that crowd-powered solutions for digital psychiatry continue to remain superior to pure-artificial intelligence (AI) approaches in the coming years.

Translation Outside of Research Contexts

While pure machine learning approaches for precision health are beginning to translate to clinical settings through formal FDA approval procedures, the prospect of translating human-in-the-loop methods that integrate crowd workers rather than expert clinicians is daunting, especially in light of the challenges mentioned above. However, if such approaches lead to clinical-grade performance for certain conditions that are challenging to diagnose using machine learning alone, then the extra implementation and regulatory effort required to migrate these methods into production-level workflows are likely to be warranted.

While machine learning for health has enabled and will continue to enable more efficient, precise, and scalable diagnostics for a variety of conditions, such models are unlikely to generalize to more difficult scenarios such as psychiatry and the behavioral sciences, which require the ability to identify complex and nuanced social human behavior. Crowd-powered human-in-the-loop workflows have the potential to mitigate some of these current limitations while still offering a high degree of automation. I invite researchers in the fields of digital phenotyping [ 72 - 76 ], mobile sensing [ 77 - 83 ], affective computing [ 84 - 90 ], and related subjects to consider integrating crowdsourcing and human-in-the-loop approaches into their methods when pure-AI leads to suboptimal performance.

Acknowledgments

This project is funded by the NIH Director’s New Innovator Award (DP2) from the National Institutes of Health (award DP2-EB035858).

Conflicts of Interest

None declared.

  • Howe J. The rise of crowdsourcing. Wired. Jun 01, 2006. URL: https:/​/sistemas-humano-computacionais.​wdfiles.com/​local--files/​capitulo%3Aredes-sociais/​Howe_The_Rise_of_Crowdsourcing.​pdf [accessed 2024-03-29]
  • Brabham DC, Ribisl KM, Kirchner TR, Bernhardt JM. Crowdsourcing applications for public health. Am J Prev Med. 2014;46(2):179-187. [ CrossRef ] [ Medline ]
  • Leung GM, Leung K. Crowdsourcing data to mitigate epidemics. Lancet Digit Health. 2020;2(4):e156-e157. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Stockham N, Washington P, Chrisman B, Paskov K, Jung JY, Wall DP. Causal modeling to mitigate selection bias and unmeasured confounding in internet-based epidemiology of COVID-19: model development and validation. JMIR Public Health Surveill. 2022;8(7):e31306. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kraemer JD, Strasser AA, Lindblom EN, Niaura RS, Mays D. Crowdsourced data collection for public health: a comparison with nationally representative, population tobacco use data. Prev Med. 2017;102:93-99. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Jakositz S, Pillsbury L, Greenwood S, Fahnestock M, McGreavy B, Bryce J, et al. Protection through participation: crowdsourced tap water quality monitoring for enhanced public health. Water Res. 2020;169:115209. [ CrossRef ] [ Medline ]
  • Ghenai A, Mejova Y. Catching Zika fever: application of crowdsourcing and machine learning for tracking health misinformation on Twitter. arXiv. Preprint posted online Jul 12, 2017. [ CrossRef ]
  • Dasgupta N, Freifeld C, Brownstein JS, Menone CM, Surratt HL, Poppish L, et al. Crowdsourcing black market prices for prescription opioids. J Med Internet Res. 2013;15(8):e178. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wazny K. "Crowdsourcing" ten years in: a review. J Glob Health. 2017;7(2):020602. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Chen PHC, Liu Y, Peng L. How to develop machine learning models for healthcare. Nat Mater. 2019;18(5):410-414. [ CrossRef ] [ Medline ]
  • Dua S, Acharya UR, Dua P, editors. Machine Learning in Healthcare Informatics, Volume 56. Berlin, Heidelberg. Springer; 2014.
  • Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24-29. [ CrossRef ] [ Medline ]
  • Ghassemi M, Naumann T, Schulam P, Beam AL, Chen IY, Ranganath R. A review of challenges and opportunities in machine learning for health. AMIA Jt Summits Transl Sci Proc. 2020;2020:191-200. [ FREE Full text ] [ Medline ]
  • Shailaja K, Seetharamulu B, Jabbar MA. Machine learning in healthcare: a review. 2018. Presented at: 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA); March 29-31, 2018;910-914; Coimbatore, India. [ CrossRef ]
  • Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng. 2018;2(10):719-731. [ CrossRef ] [ Medline ]
  • Das R, Keep B, Washington P, Riedel-Kruse IH. Scientific discovery games for biomedical research. Annu Rev Biomed Data Sci. 2019;2(1):253-279. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Dos Reis FJC, Lynn S, Ali HR, Eccles D, Hanby A, Provenzano E, et al. Crowdsourcing the general public for large scale molecular pathology studies in cancer. EBioMedicine. 2015;2(7):681-689. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mavandadi S, Dimitrov S, Feng S, Yu F, Sikora U, Yaglidere O, et al. Distributed medical image analysis and diagnosis through crowd-sourced games: a malaria case study. PLoS One. 2012;7(5):e37245. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ozcan A. Educational games for malaria diagnosis. Sci Transl Med. 2014;6(233):233ed9. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Feng S, Woo M, Chandramouli K, Ozcan A. A game-based platform for crowd-sourcing biomedical image diagnosis and standardized remote training and education of diagnosticians. In: Optics and Biophotonics in Low-Resource Settings, Volume 9314. 2015. Presented at: SPIE BIOS; February 7-12, 2015; San Francisco, CA. [ CrossRef ]
  • Luengo-Oroz MA, Arranz A, Frean J. Crowdsourcing malaria parasite quantification: an online game for analyzing images of infected thick blood smears. J Med Internet Res. 2012;14(6):e167. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Cooper S, Khatib F, Makedon I, Lu H, Barbero J, Baker D, et al. Analysis of social gameplay macros in the Foldit cookbook. 2011. Presented at: FDG '11: Proceedings of the 6th International Conference on Foundations of Digital Games; June 29, 2011 - July 1, 2011;9-14; Bordeaux, France. [ CrossRef ]
  • Curtis V. Motivation to participate in an online citizen science game: a study of Foldit. Sci Commun. 2015;37(6):723-746. [ CrossRef ]
  • Eiben CB, Siegel JB, Bale JB, Cooper S, Khatib F, Shen BW, et al. Increased Diels-Alderase activity through backbone remodeling guided by Foldit players. Nat Biotechnol. 2012;30(2):190-192. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kleffner R, Flatten J, Leaver-Fay A, Baker D, Siegel JB, Khatib F, et al. Foldit standalone: a video game-derived protein structure manipulation interface using Rosetta. Bioinformatics. 2017;33(17):2765-2767. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Andreasson JOL, Gotrik MR, Wu MJ, Wayment-Steele HK, Kladwang W, Portela F, et al. Crowdsourced RNA design discovers diverse, reversible, efficient, self-contained molecular switches. Proc Natl Acad Sci U S A. 2022;119(18):e2112979119. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Krüger A, Watkins AM, Wellington-Oguri R, Romano J, Kofman C, DeFoe A, et al. Community science designed ribosomes with beneficial phenotypes. Nat Commun. 2023;14(1):961. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lee J, Kladwang W, Lee M, Cantu D, Azizyan M, Kim H, et al. RNA design rules from a massive open laboratory. Proc Natl Acad Sci U S A. 2014;111(6):2122-2127. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Shi J, Das R, Pande VS. SentRNA: improving computational RNA design by incorporating a prior of human design strategies. arXiv. Preprint posted online Mar 06, 2019. [ CrossRef ]
  • Treuille A, Das R. Scientific rigor through videogames. Trends Biochem Sci. 2014;39(11):507-509. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wayment-Steele HK, Kladwang W, Strom AI, Lee J, Treuille A, Becka A, et al. RNA secondary structure packages evaluated and improved by high-throughput experiments. Nat Methods. 2022;19(10):1234-1242. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sweeney TE, Braviak L, Tato CM, Khatri P. Genome-wide expression for diagnosis of pulmonary tuberculosis: a multicohort analysis. Lancet Respir Med. 2016;4(3):213-224. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kalantarian H, Jedoui K, Washington P, Wall DP. A mobile game for automatic emotion-labeling of images. IEEE Trans Games. 2020;12(2):213-218. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kalantarian H, Washington P, Schwartz J, Daniels J, Haber N, Wall D. A gamified mobile system for crowdsourcing video for autism research. 2018. Presented at: 2018 IEEE International Conference on Healthcare Informatics (ICHI); June 4-7, 2018;350-352; New York, NY. [ CrossRef ]
  • Penev Y, Dunlap K, Husic A, Hou C, Washington P, Leblanc E, et al. A mobile game platform for improving social communication in children with autism: a feasibility study. Appl Clin Inform. 2021;12(5):1030-1040. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kalantarian H, Washington P, Schwartz J, Daniels J, Haber N, Wall DP. Guess what? Towards understanding autism from structured video using facial affect. J Healthc Inform Res. 2019;3(1):43-66. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kalantarian H, Jedoui K, Washington P, Tariq Q, Dunlap K, Schwartz J, et al. Labeling images with facial emotion and the potential for pediatric healthcare. Artif Intell Med. 2019;98:77-86. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Hou C, Kalantarian H, Washington P, Dunlap K, Wall DP. Leveraging video data from a digital smartphone autism therapy to train an emotion detection classifier. medRxiv. Preprint posted online Aug 01, 2021. [ CrossRef ]
  • Kalantarian H, Jedoui K, Dunlap K, Schwartz J, Washington P, Husic A, et al. The performance of emotion classifiers for children with parent-reported autism: quantitative feasibility study. JMIR Ment Health. 2020;7(4):e13174. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Washington P, Kalantarian H, Kent J, Husic A, Kline A, Leblanc E, et al. Improved digital therapy for developmental pediatrics using domain-specific artificial intelligence: machine learning study. JMIR Pediatr Parent. 2022;5(2):e26760. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Washington P, Kalantarian H, Kent J, Husic A, Kline A, Leblanc E, et al. Training an emotion detection classifier using frames from a mobile therapeutic game for children with developmental disorders. arXiv. Preprint posted online Dec 16, 2020 . [ CrossRef ]
  • Varma M, Washington P, Chrisman B, Kline A, Leblanc E, Paskov K, et al. Identification of social engagement indicators associated with autism spectrum disorder using a game-based mobile app: comparative study of gaze fixation and visual scanning methods. J Med Internet Res. 2022;24(2):e31830. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Chi NA, Washington P, Kline A, Husic A, Hou C, He C, et al. Classifying autism from crowdsourced semistructured speech recordings: machine learning model comparison study. JMIR Pediatr Parent. 2022;5(2):e35406. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lakkapragada A, Kline A, Mutlu OC, Paskov K, Chrisman B, Stockham N, et al. The classification of abnormal hand movement to aid in autism detection: machine learning study. JMIR Biomed Eng. 2022;7(1):e33771. [ FREE Full text ] [ CrossRef ]
  • Washington P, Kline A, Mutlu OC, Leblanc E, Hou C, Stockham N, et al. Activity recognition with moving cameras and few training examples: applications for detection of autism-related headbanging. 2021. Presented at: CHI '21: CHI Conference on Human Factors in Computing Systems; May 8-13, 2021;1-7; Yokohama, Japan. [ CrossRef ]
  • Paolacci G, Chandler J, Ipeirotis PG. Running experiments on Amazon Mechanical Turk. Judgm Decis Mak. 2010;5(5):411-419. [ FREE Full text ] [ CrossRef ]
  • Ipeirotis PG, Provost F, Wang J. Quality management on Amazon Mechanical Turk. 2010. Presented at: HCOMP '10: Proceedings of the ACM SIGKDD Workshop on Human Computation; July 25, 2010;64-67; Washington DC. [ CrossRef ]
  • Lee YJ, Arida JA, Donovan HS. The application of crowdsourcing approaches to cancer research: a systematic review. Cancer Med. 2017;6(11):2595-2605. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ross J, Zaldivar A, Irani L, Tomlinson B. Who are the turkers? Worker demographics in Amazon Mechanical Turk. ResearchGate. 2009. URL: https:/​/www.​researchgate.net/​publication/​268427703_Who_are_the_Turkers_Worker_Demographics_in_Amazon_Mechanical_Turk [accessed 2024-03-22]
  • Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. ImageNet large scale visual recognition challenge. Int J Comput Vis. 2015;115:211-252. [ CrossRef ]
  • Sheng VS, Zhang J. Machine learning with crowdsourcing: a brief summary of the past research and future directions. Proc AAAI Conf Artif Intell. 2019;33(01):9837-9843. [ CrossRef ]
  • Sorokin A, Forsyth D. Utility data annotation with Amazon Mechanical Turk. 2008. Presented at: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; June 23-28, 2008;1-8; Anchorage, AK. [ CrossRef ]
  • Xintong G, Hongzhi W, Song Y, Hong G. Brief survey of crowdsourcing for data mining. Expert Syst Appl. 2014;41(17):7987-7994. [ CrossRef ]
  • Irani LC, Silberman MS. Turkopticon: interrupting worker invisibility in Amazon Mechanical Turk. 2013. Presented at: CHI '13: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; April 27, 2013 - May 2, 2013;611-620; Paris, France. [ CrossRef ]
  • Irani L, Silberman MS. From critical design to critical infrastructure: lessons from turkopticon. Interactions. 2014;21(4):32-35. [ CrossRef ]
  • Kummerfeld JK. Quantifying and avoiding unfair qualification labour in crowdsourcing. arXiv. Preprint posted online May 26, 2021. [ CrossRef ]
  • Hansson K, Ludwig T. Crowd dynamics: conflicts, contradictions, and community in crowdsourcing. Comput Support Coop Work. 2019;28:791-794. [ FREE Full text ] [ CrossRef ]
  • Shen XL, Lee MKO, Cheung CMK. Exploring online social behavior in crowdsourcing communities: a relationship management perspective. Comput Human Behav. 2014;40:144-151. [ CrossRef ]
  • Wu W, Gong X. Motivation and sustained participation in the online crowdsourcing community: the moderating role of community commitment. Internet Res. 2021;31(1):287-314. [ CrossRef ]
  • Brady CJ, Villanti AC, Pearson JL, Kirchner TR, Gupta OP, Shah C. Rapid grading of fundus photographs for diabetic retinopathy using crowdsourcing. J Med Internet Res. Oct 30, 2014;16(10):e233. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mitry D, Peto T, Hayat S, Blows P, Morgan J, Khaw KT, et al. Crowdsourcing as a screening tool to detect clinical features of glaucomatous optic neuropathy from digital photography. PLoS One. 2015;10(2):e0117401. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Cheng J, Manoharan M, Zhang Y, Lease M. Is there a doctor in the crowd? Diagnosis needed! (For less than $5). iConference 2015 Proceedings. 2015. URL: https://www.ideals.illinois.edu/items/73844 [accessed 2024-03-22]
  • Leblanc E, Washington P, Varma M, Dunlap K, Penev Y, Kline A, et al. Feature replacement methods enable reliable home video analysis for machine learning detection of autism. Sci Rep. 2020;10(1):21245. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tariq Q, Daniels J, Schwartz JN, Washington P, Kalantarian H, Wall DP. Mobile detection of autism through machine learning on home video: a development and prospective validation study. PLoS Med. 2018;15(11):e1002705. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tariq Q, Fleming SL, Schwartz JN, Dunlap K, Corbin C, Washington P, et al. Detecting developmental delay and autism through machine learning models using home videos of Bangladeshi children: development and validation study. J Med Internet Res. 2019;21(4):e13822. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Washington P, Tariq Q, Leblanc E, Chrisman B, Dunlap K, Kline A, et al. Crowdsourced privacy-preserved feature tagging of short home videos for machine learning ASD detection. Sci Rep. 2021;11(1):7620. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Washington P, Leblanc E, Dunlap K, Penev Y, Kline A, Paskov K, et al. Precision telemedicine through crowdsourced machine learning: testing variability of crowd workers for video-based autism feature recognition. J Pers Med. 2020;10(3):86. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Washington P, Leblanc E, Dunlap K, Penev Y, Varma M, Jung JY, et al. Selection of trustworthy crowd workers for telemedical diagnosis of pediatric autism spectrum disorder. Pac Symp Biocomput. 2021;26:14-25. [ FREE Full text ] [ Medline ]
  • Washington P, Kalantarian H, Tariq Q, Schwartz J, Dunlap K, Chrisman B, et al. Validity of online screening for autism: crowdsourcing study comparing paid and unpaid diagnostic tasks. J Med Internet Res. 2019;21(5):e13668. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Washington P, Chrisman B, Leblanc E, Dunlap K, Kline A, Mutlu C, et al. Crowd annotations can approximate clinical autism impressions from short home videos with privacy protections. Intell Based Med. 2022;6:100056. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Insel TR. Digital phenotyping: technology for a new science of behavior. JAMA. 2017;318(13):1215-1216. [ CrossRef ] [ Medline ]
  • Huckvale K, Venkatesh S, Christensen H. Toward clinical digital phenotyping: a timely opportunity to consider purpose, quality, and safety. NPJ Digit Med. 2019;2:88. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Onnela JP. Opportunities and challenges in the collection and analysis of digital phenotyping data. Neuropsychopharmacology. 2021;46(1):45-54. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Onnela JP, Rauch SL. Harnessing smartphone-based digital phenotyping to enhance behavioral and mental health. Neuropsychopharmacology. 2016;41(7):1691-1696. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Washington P, Mutlu CO, Kline A, Paskov K, Stockham NT, Chrisman B, et al. Challenges and opportunities for machine learning classification of behavior and mental state from images. arXiv. Preprint posted online Jan 26, 2022. [ CrossRef ]
  • Baumeister H, Montag C, editors. Digital Phenotyping and Mobile Sensing: New Developments in Psychoinformatics. Cham, Switzerland. Springer International Publishing; 2019.
  • Laport-López F, Serrano E, Bajo J, Campbell AT. A review of mobile sensing systems, applications, and opportunities. Knowl Inf Syst. 2020;62(1):145-174. [ CrossRef ]
  • Macias E, Suarez A, Lloret J. Mobile sensing systems. Sensors (Basel). 2013;13(12):17292-17321. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Nazir S, Ali Y, Ullah N, García-Magariño I. Internet of things for healthcare using effects of mobile computing: a systematic literature review. Wirel Commun Mob Comput. 2019;2019:1-20. [ FREE Full text ] [ CrossRef ]
  • Silva BMC, Rodrigues JJPC, de la Torre Díez I, López-Coronado M, Saleem K. Mobile-health: a review of current state in 2015. J Biomed Inform. 2015;56:265-272. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sim I. Mobile devices and health. N Engl J Med. 2019;381(10):956-968. [ CrossRef ]
  • Yürür O, Liu CH, Sheng Z, Leung VCM, Moreno W, Leung KK. Context-awareness for mobile sensing: a survey and future directions. IEEE Commun Surv Tutor. 2014;18(1):68-93. [ CrossRef ]
  • Picard RW. Affective Computing. Cambridge, MA. MIT Press; 2000.
  • Picard RW. Affective computing: challenges. Int J Hum-Comput Stud. 2003;59(1-2):55-64. [ CrossRef ]
  • Poria S, Cambria E, Bajpai R, Hussain A. A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fusion. 2017;37:98-125. [ CrossRef ]
  • Scherer KR, Bänziger T, Roesch E, editors. A Blueprint for Affective Computing: a Sourcebook and Manual. Oxford, United Kingdom. Oxford University Press; 2010.
  • Tao J, Tan T. Affective computing: a review. In: Affective Computing and Intelligent Interaction. Berlin, Heidelberg. Springer; 2005. Presented at: First International Conference, ACII 2005; October 22-24, 2005;981-995; Beijing, China. [ CrossRef ]
  • Wang Y, Song W, Tao W, Liotta A, Yang D, Li X, et al. A systematic review on affective computing: emotion models, databases, and recent advances. Inf Fusion. 2022;83-84:19-52. [ CrossRef ]
  • Zhao S, Wang S, Soleymani M, Joshi D, Ji Q. Affective computing for large-scale heterogeneous multimedia data: a survey. ACM Trans Multimed Comput Commun Appl. 2019;15(3s):1-32. [ CrossRef ]

Abbreviations

Edited by A Mavragani; submitted 22.07.23; peer-reviewed by E Vashishtha, MO Khursheed, L Guo; comments to author 02.09.23; revised version received 15.11.23; accepted 30.01.24; published 11.04.24.

©Peter Washington. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 11.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Help | Advanced Search

Computer Science > Computation and Language

Title: realm: reference resolution as language modeling.

Abstract: Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in reference resolution, particularly for non-conversational entities, remains underutilized. This paper demonstrates how LLMs can be used to create an extremely effective system to resolve references of various types, by showing how reference resolution can be converted into a language modeling problem, despite involving forms of entities like those on screen that are not traditionally conducive to being reduced to a text-only modality. We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for on-screen references. We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

  1. (PDF) The application of ARIMA model in forecasting population data

    research paper on arima model

  2. (PDF) Forecasting of demand using ARIMA model

    research paper on arima model

  3. ARIMA model modeling process.

    research paper on arima model

  4. ARIMA Approaching Model Scheme.

    research paper on arima model

  5. ARIMA model flow chart.

    research paper on arima model

  6. ARIMA modeldevelopment

    research paper on arima model

VIDEO

  1. Snowflake ML Auto Arima Model as UDF

  2. Anime OSHI NO KO

  3. D.I.Y Arima Kana Paper Doll By:Solymar

  4. (Part-1)ARIMA Models in SPSS || Following Box-Jenkins (1976) steps ||Using Data of 5 Exchange Rates

  5. Easy and beautiful paper wall Hanging #Arima artandcraft

  6. Predict on India's Weather for 12 months of a year using Statsmodels ARIMA model

COMMENTS

  1. (PDF) Forecasting of demand using ARIMA model

    The ARIMA Box-Jenkins principle consists of three iterative steps of model identification, parameter estimation, and diagnostic checking steps (Fattah et al., 2018). The ARIMA model is denoted as ...

  2. Using autoregressive integrated moving average models for time series

    This article discusses the use of autoregressive integrated moving average (ARIMA) models for time series analysis. Rather than forecasting future values, we focus here on examining change across time in outcomes of interest and how this change is related to relevant variables. Much of the data that we collect about the world around us—stock prices, unemployment rates, party identification ...

  3. An Introductory Study on Time Series Modeling and Forecasting

    Autoregressive Integrated Moving Average (ARIMA) [6, 8, 21, 23] model. The basic assumption made to implement this model is that the considered time series is linear and follows a particular known statistical distribution, such as the normal distribution. ARIMA model has subclasses of other models, such as the Autoregressive (AR) [6, 12, 23 ...

  4. Forecasting Economics and Financial Time Series: ARIMA vs. LSTM

    Forecasting time series data is an important subject in economics, business, and finance. Traditionally, there are several techniques to effectively forecast the next lag of time series data such as univariate Autoregressive (AR), univariate Moving Average (MA), Simple Exponential Smoothing (SES), and more notably Autoregressive Integrated Moving Average (ARIMA) with its many variations. In ...

  5. PDF Lecture 6: Autoregressive Integrated Moving Average Models

    Lecture 6: Autoregressive Integrated Moving Average Models Introduction to Time Series, Fall 2023 Ryan Tibshirani Relatedreading: Chapters3.1,3.3,and3.6inShumwayandStoffer(SS);Chapters9.1-9.5and9.8-9.9of HyndmanandAthanasopoulos(HA). 1 AR models

  6. Forecasting of demand using ARIMA model

    The best model is as simple as possible and minimizes certain criteria, namely AIC, SBC, variance and maximum likelihood. 43-45 The chosen model is that of ARIMA (0, 1, 1). For the other models, either Student "T-RATIO" test values are found in the range ±1.96, or one of the values of the minimization criteria is higher than that found ...

  7. An Application of ARIMA Model to Forecast the Dynamics of COVID-19

    Hence, the superiority of forecasting ability of the ARIMA model over the other machine learning methods and AI models has been established by expanding amount of literature on ARIMA modelling (Ceylan, 2020; Fong et al., 2020; Kyungjoo et al., 2007; Merh et al., 2010; Singh et al., 2020).The ARIMA modelling has been successfully applied in the past literature to predict the prevalence of ...

  8. Interrupted time series analysis using autoregressive integrated moving

    Interrupted time series analysis is increasingly used to evaluate the impact of large-scale health interventions. While segmented regression is a common approach, it is not always adequate, especially in the presence of seasonality and autocorrelation. An Autoregressive Integrated Moving Average (ARIMA) model is an alternative method that can accommodate these issues.

  9. The use of ARIMA models for reliability forecasting and analysis

    Abstract. This paper investigates the approach to repairable system reliability forecasting based on the Autoregressive Integrated Moving Average (ARIMA) models. This time series technique makes very few assumptions and is very flexible. It is theoretically and statistically sound in its foundation and no a priori postulation of models is ...

  10. Time-series forecasting of seasonal items sales using machine learning

    ARMA (1, 1) is the first model that is used for sales forecasting, and the second model is ARIMA. To find the optimal order for this ARIMA model, the grid search method is utilized that indicates its order as (6, 0, 0) (Brownlee, 2017a). Fig. 6 demonstrates the sales forecasting. None of these models could capture the seasonality pattern and ...

  11. Testing the Accuracy of the ARIMA Models in Forecasting the Spreading

    Building an ARIMA model for any given time-series involves the checking of four steps: assessment of the model, estimation of parameters, diagnostic checking, and prediction. The first, which is otherwise imperative, is to verify if the mean, variance, and autocorrelation of the time-series are consistent throughout the established interval [ 20 ].

  12. Application of ARIMA, and hybrid ARIMA Models in predicting and

    Most importantly, ARIMA models are represented as ARIMA (p, d, q) where p is the number of AR terms, d is the number of non-seasonal differences, and q is the number of lagged forecast errors . The ARIMA model assumes that the residuals are independent and normally distributed with ε t ~N(μ, σ 2) homogeneity of variance and zero mean value.

  13. Autoregressive models in environmental forecasting time series: a

    The classes of autoregressive moving average (ARMA) models are frequently used while modeling linear and stationary time series due to their outstanding results and effectiveness (Al-Saba and El-Amin 1999).In 1921, Yule presented pure moving average process, whereas he introduced pure autoregressive process in 1927.Box and Jenkins (), Hannan (), and Anderson have been pioneers in building ...

  14. PDF Distributed ARIMA Models for Ultra-long Time Series

    VAR (Vector AutoRegressive) models, and ETS models (see the discussion in Section5). ARIMA models are among the most widely used statistical time series approaches and can handle non-stationary series via di erencing and seasonal patterns by including additional seasonal terms. ARIMA models also frequently serve as the benchmark methods for model 3

  15. Time series analysis of climate variables using seasonal ARIMA approach

    The dynamic structure of climate is governed by changes in precipitation and temperature and can be studied by time series analysis of these factors. This paper describes investigation of time series and seasonal analysis of the monthly mean minimum and maximum temperatures and the precipitation for the Bhagirathi river basin situated in the state of Uttarakhand, India. The data used is from ...

  16. The research of ARIMA, GM(1,1), and LSTM models for prediction ...

    Background and objective Tuberculosis (Tuberculosis, TB) is a public health problem in China, which not only endangers the population's health but also affects economic and social development. It requires an accurate prediction analysis to help to make policymakers with early warning and provide effective precautionary measures. In this study, ARIMA, GM(1,1), and LSTM models were constructed ...

  17. ARIMA Model

    This post focuses on a particular type of forecasting method called ARIMA modeling. ARIMA, short for 'AutoRegressive Integrated Moving Average', is a forecasting algorithm based on the idea that the information in the past values of the time series can alone be used to predict the future values. 2.

  18. Combined BiLSTM and ARIMA models in middle- and long-term ...

    In this study, we propose the combined BiLSTM+ARIMA model, which is based on bidirectional long- and short-term memory (BiLSTM) and autoregression integrated moving average (ARIMA). ... We are also grateful to the IERS for providing the polar motion time series data. This research was supported and funded by Beijing Key Laboratory of Urban ...

  19. VAR, ARIMAX and ARIMA models for nowcasting unemployment ...

    Online search engines are frequently used for real-time research. Due to the huge amount of daily search queries, Ettredge et al. [] took the first initiative by first looking into how real-time forecasting may be done by using the Internet and the study's findings reveal a strong link between Internet-related web search activity and unemployment rate in the USA [22, 23] continued by looking ...

  20. Energies

    The paper addresses the problem of insufficient knowledge on the impact of noise on the auto-regressive integrated moving average (ARIMA) model identification. The work offers a simulation-based solution to the analysis of the tolerance to noise of ARIMA models in electrical load forecasting. In the study, an idealized ARIMA model obtained from real load data of the Polish power system was ...

  21. Unemployment Rate Prediction Using a Hybrid Model of Recurrent ...

    Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. ... The ARIMA model is fitted to ...

  22. Modeling opening price spread of Shanghai Composite Index based on

    The findings indicate that the hybrid models, ARIMA-LSTM and ARIMA-GRU, perform better in forecasting the opening price spread of the Shanghai Composite Index than their standalone counterparts. This outcome suggests that combining traditional statistical methods with advanced deep learning algorithms can enhance stock market prediction.

  23. PDF Mixture-of-Depths:Dynamicallyallocating computeintransformer

    Mixture-of-Depths:Dynamicallyallocatingcomputeintransformer-basedlanguagemodels We leverage an approach akin to Mixture of Experts (MoE) transformers, in which dynamic

  24. Journal of Medical Internet Research

    Modern machine learning approaches have led to performant diagnostic models for a variety of health conditions. Several machine learning approaches, such as decision trees and deep neural networks, can, in principle, approximate any function. However, this power can be considered to be both a gift and a curse, as the propensity toward overfitting is magnified when the input data are ...

  25. Measuring the Persuasiveness of Language Models \ Anthropic

    Measuring the Persuasiveness of Language Models. Apr 9, 2024. While people have long questioned whether AI models may, at some point, become as persuasive as humans in changing people's minds, there has been limited empirical research into the relationship between model scale and the degree of persuasiveness across model outputs.

  26. UMBRAE: Unified Multimodal Decoding of Brain Signals

    We address prevailing challenges of the brain-powered research, departing from the observation that the literature hardly recover accurate spatial information and require subject-specific models. To address these challenges, we propose UMBRAE, a unified multimodal decoding of brain signals. First, to extract instance-level conceptual and spatial details from neural signals, we introduce an ...

  27. [2403.20329] ReALM: Reference Resolution As Language Modeling

    ReALM: Reference Resolution As Language Modeling. Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the ...