Understanding Time Series Analysis: Patterns, Autocorrelation, and Forecasting


A time serie is the finite realization of a stochastic process which is a sequence of random variables.


Studying the behaviour of the phenomenon over time, relying only on info from their own past and present. Predict futurre values.


Analyze patterns and noise. Is not possible to analyze time series because in every moment we have a random var with different distribution.



 a trend exist when there is a long-term increase or decrease in the data. It does not have to be linear. 


is a regularly repeating pattern of highs and lows related to the calendar time such as seasons.

Cyclical components:

regular rises and falls. Usually have not a fixed period but can vary in length throughout time.

Constant variance (error)

, Correspond to random fluctuations that cannot be explained by a deterministic pattern.


Observations in time series tend to be correlated. The data has memory, observations taday are affected by what happened in the past. Autocorrelation computes the correlation between the value of the time series at time t and its value at time t-1. PArtial autocorrelation also computed at different lags, but without considering the effect od the intermediate observations.


Model-based (AR,MA,ARMA,ARIMA):

Use a statistical, mathematical or other models to approximate the time series. Useful for short time series. Preferable for forecasting series with global patterns. Suitable to compute prediction intervals.

Data-driven (exponential smoothing methods and decomposition methods):

The algorithm learn patterns from the data. Useful when the model assumptions are likely to be violated.  Require less imput from the user. Prefereable for series with local patterns.


A model that fits the data well does not necessarily forecast well. A perfect fit can always be obtained by using a model with enough parameters. Overfitting a model to data is as bad as failing to identify the systematic pattern in the data.


Unlike cross-sectional data, the training set consists only of observations that occurred prior to the observation that forms the test set. Thus, no future observations can be used in constructing the forecast. However, it is not possible to get a reliable forecast based on a very small training set, so the earliest observations are not considered as test sets.

RESIDUAL ANALYSIS– Properties of the residuals:

The residuals are uncorrelated, if there are correlations between residuals, then there is information left in the residuals which should be used in computing forecast. The residuals have zero mean. It is useful for the residuals to have constant variance and a normal distribution.


Smoothing is usually done to help us better see patterns. It can be a good first step in describing various components of the series. Smoothing is a way to extrapolate, in a way that recent data points have more weight in forecasting than older values, they are used for prediction. Models have deterministic parameters that should be calibrated to suit hte evolution of the series. Fitting the model is made by recursive methods.

Simple exponential smoothing:

series with no trend or seasonality. It uses one parameter 0 ≤ α ≤ 1. The idea: forecast future values using a weighed average of all previous values in the series. It is suitable for series with no trend or seasonality. Assumption: the series has only level (Lt = ˆyt+1|t ) and noise (unpredictible).  

Holt double exponential smoothing:

series with trend and no seasonal variation. It uses two parameters α, β. If α and β ≈ 0 the evolution of the time series is near constant. If β ≈ 0 slope is near constant. Assumption: the series has only level (Lt), trend (Tt) and noise (unpredictible). 

Holt-Winters triple exponential smoothing:

series with linear trend and seasonal variation. It uses three parameters α, β and γ. Assumption: the series has level (Lt), trend (Tt), seasonality with M seasons and noise (unpredictible). In R ets().


(in R decompose()) 2 basic structures: Additive decomposition(Observed= Trend+Seasonality+Irregular). Multiplicative decomposition: is equaivalent to the additive decomposition after Log transformation (Observed= Trend*Seasonality*Irregular). Function stl() allows us to include some non-seasonal models to forecast the seasonally adjusted component (trend + error).


Observations in time series tend to be correlated. The data has memory, observations today are affected by what happened in the past. Autocorrelation: computes the correlation coefficient between the value of the time series at time t and its value at time t − 1. This is going to be r1 or the autocorrelation coefficient at lag
1. But the memory of the data might go past observation at time t − 1. Partial Autocorrelation: also computed at different lags, but without considering the effect of the intermediate observations.


Stationary Time Series:

the mean (tendencia), variance (amplitud de onda) and autocorrelation structure do not change over time. This means a flat looking series, without trend, with constant variance over time, with constant autocorrelation structure and no periodic fluctuations (seasonality). The future is similar to the past, in a probabilistic sense. A time series is stationary if its statistical properties are constant over time.
Strictly Stationary if P(Yt+1,…,Yt+k)=P(Yt+1+s,…,Yt+k+s), t>0, k>0, s>0 Weakly stationary if the mean value is constant and does not depend on the time t. The varaince is finite and constant. The autocovarianve function depends only on the time difference s.

Non-Stationary Time Series:

when the mean, variance and/or the relationships between equally spaced data change over time. Seasonal fluctuations are considered a kind of non-stationarity.


If the series presents non-constant variance, increasing with the level of the series: log transformation (transformations such as logarithms can help to stabilize the varaince of a time series)(if the variability of the time series increases or decreases with the level, log trans might be useful). If the series presents a linear trend: take one regular difference. If the series presents a quadratic trend: take two regular differences. If the series presents a seasonal component: take seasonal differences.


(in R diff())it consist in Transforming the series to a new one where the values are the difference between consecutive values. This procedure may be applied more than once. Some forecasting methods can’t really deal with trend and/or seasonality.  Differencing is a simple and popular method for removing a trend and/or seasonality from a time series. Differencing is suitable when the trend and seasonal patterns are global as well as when they are local. Because we are taking local differences we can handle both types.

First-order differencing:

 this creates a new series of data taking the differences between consecutive observations. First differences are the change between one observation and the next. The differenced series will have only T-1 values since it is not possible to calculate a difference for the first observation.

Second-order differencing:

Occasionally the differenced data will not appear staionary and it amy be necessary to difference the data a second time to obtain a stationary series.

Seasonal differencing:

  is defined as a difference between a value and a value with lag that is a multiple of s. Seasonal differences are the change between one year to the next.

Differencing for trend and seasonality:

when trend and seasonality are present, we may need to apply both a non-seasonal first difference and a seasonal difference. The optimal order of differencing is often the order of differencing at which the standard deviation is lowest INTEGRATED PROCESS–  An integrated process is a process that is not estationary but after take some regular differences it follows a estationary process. An integrated process of order h needs to be differenced h times to become a stationary process. STATIONARY PROCESS: WHITE NOISE– A white noise process {at} has no structure, no correlations. Its properties are: Zero mean, Constant variance, Correlations between observations at lags 1,2,3,…, are 0. If we add the property that the distribution of the variables is Normal, we get a Gaussian white noise process (independent normal with common variance).LAG–

lag means how far apart are the values we are using to compute the correlation CORRELOGRAMS INTERPRETATION–
The correlogram of a stationary time series goes to 0 quickly.

The correlation are high if are upset the boundary or going down

All spikes in the boundary then no correlation structure 

Slow decrease of autocorrelations indicates that serie is not stationary.

Slow decrease of autocorrelations in the seasonal lags indicates that series is non stationary with seasonality. Values in the same season tend to be correlated

. –

For non-stationary data, the value of the autocorrelation coefficient at lag-1 is often large and positive.

A large positive spike at lag-1 is called stickiness. It means that high values follow high values and low values follow low values.

A large negative spike at lag-1 is called swings. It means that the series swings between high and low values.


check whether a series is stationary. Once we have a stationary series we are going to look for the remaining patterns or information. 


Observations in time series tend to be correlated.
The data has memory, observations today are affected by what happened in the past. The autocorrelation coefficient measures the degree of (linear) association between the varaibles that are separated k points in time. These rk coefficients at lags k are plotted in the ACF correlogram. The Partial Autocorrealtion coeffcient measures the dependency between observation separated k points in time, taking into account the varaibles in between.


Drift is oging to plot the line joining the trend of the last obs. Naive: predict the future obs using the seasonal param of last obs. Mean is the worst


ARIMA models are, in theory, the most general class of models for forecasting a time series which can be made to be stationary by differencing (if necessary), perhaps in conjunction with nonlinear transformations such as logging.


AR models capture autocorrelation in a series in a regression-type model and use it to improve short-term forecast. It is similar to linear regression where the predictors are lagged versions of the series. It is estimated by the maximum likelihood method. Major assumption: series is stationary. No trend or seasonality, constant level, variance and autocorrelations. The future is similar to the past (in a probabilistic sense).



Exponential decay or damped sine (alternating positive and negative, decaying to zero), tailing off. It dies gradually.


One stick positive (if ACF exponential decay) or one stick negative (if ACF alternate positive and negative) Zero after lag p. It displays a sharp cutoff after lag p. An AR series is usually positively correlated at lag 1 (ACF).


ACF: ariba varios spikes siendo cada vez mas pequeños, PACF: Solo (o dos) un spike arriba.||| ACF: abajo,arriba,abajo… Cada vez mas pequeños PACF: un (o dos) solo spike abajo. ||| ACF: 3 arriba pero uno mas pequeños q otros, 3 abajo, 3 arriba,… PACF: 2 arriba ||| ACF: 2 abajo, 1 arriba,…. PACF: 2 abajo.


A Moving Average model uses past forecast errors in a regression-type model. As we don’t observe the values of et, it is not a regression model in the usual sense. This model is used for forecasting NOT for estimating the trend-cycle component of past values.


ACF Zero after lag q. It displays a sharp cutoff after lag q. 


It tails off with an exponential or sinusoidal pattern. It dies out more gradually. An MA series is usually negatively autocorrelated at lag 1 MA PATTERN–
ACF: uno arriba PACF: uno arriba otro abajo,…||| ACF: uno abajo PACF: 4 abajo cada vez mas peques. ACF: 2 arriba PACF: 1 arriba, 1 abajo, 2 arriba, 1 abajo 2 arriba,.. ||| ACF: 2 abajo PACF: muchos abajo cada vez mas peques ||| ACF: 1 arriba, 1 abajo PACF: 1 arriba, 1abajo,…. ||| ACF: 1 abajo y 1 arriba, PACF: 3 abajo, 3 arriba, 3 abajo cada vez mas peques.


Wether a series displays AR or MA behaviour often depends on the extent to which it has been differenced. An underdifferenced series has an AR signature (positive autocorrelation). After one or more orders of differencing, the autocorrelation will become more negative and a MA signature will emerge.


The idea behind the ARMA model is to capture all forms of autocorrelation by including lags of the series and of the forecast errors. ARMA models are mainly used for forecasting, they are less interpretable than linear regression models. Current observation linearly depends on the last p observations and on the last q error terms (also called innovations). It is a combination of AR and MA models.


ACF Exponential decay from lag p + q 1, or damped sine. 


Exponential decreasing or sinusoidal pattern. Diminishing slowly. ARMA models (including both AR and MA terms) have ACFs and PACFs that both tail off to 0. These are the trickiest because the order will not be particularly obvious.


ACF: arriba varios cayendo PACF: uno arriba otro abajo ||| ACF: Uno abajo otro rriba,… PACF: uno abajo otro arriba ||| ACF: 1 arriba otro abajo…., PACF: 1 arriba otro abajo,.. ||| ACF: 1 abajo 1arriba,… PACF: varios abajo ||| ACF: varios abajo PACF: varios abajo || ACF: varios arriba PACF: varios arriba.


It implies that the dynamic of any (purely nondeterministic) covariance-stationary process can be arbitrarily well approximated by an ARMA process ARIMA TERMINOLOGY–
ARIMA(p, d, q) models: they combine ARMA models with differencing. I stands for Integrated process, d is the number of regular differences you are taking to make the process stationary. A non-seasonal ARIMA model can be (almost) completely summarized by three numbers: p = the number of autoregressive terms; the order of the autoregressive part. D = the number of non-seasonal differences involved q = the number of moving-average terms; the order of the moving average part This is called an ARIMA(p, d, q) model This model may also include a constant term (or not).


ACF (autocorrelation function):

In the first lags, the regular part is observed. In seasonal lags, the seasonal part is observed. Around the seasonal lags, it is shown the repetition of the regular part of the autocorrelation function on both sides of each seasonal lag. Specifically, if the regular part is a moving average of order q, on both sides of each non-null seasonal lag there will be q coefficients different from 0. If the regular part is autoregressive, we will observe the decreased imposed by the AR structure on both sides of the seasonal lags.

PACF (Partial autocorrelation function):

The PACF of a multiplicative seasonal process is complex because it depends on the PACF of the regular and seasonal parts as well as on the ACF of the regular part. In the first lags, the PACF of the regular part is observed and in the seasonal lags, the PACF of the seasonal part appears. To the right of each seasonal coefficient, the PACF of the regular part will appear. If the seasonal coefficient is positive, the regular PACF appears inverted in sign. It it is negative, the regular PACF appears with its sign. To the left of the seasonal coefficients, we observe the ACF of the regular part.


an ARIMA model with additional seasonal parameters. These factors operate across multiples of lag s (i.E., the number of observations in a season). Xt ⇡ ARIMA(p, d, q)(P, D, Q)s d number of regular differences to obtain stationarity. P and q, order of regular ARMA model. D number of seasonal differences to obtain stationarity. P and Q, order of seasonal ARMA model.


If our model successfully captures the dependence structure in the data, then, the residuals should look random. There should be no dependence in the residuals. We can check the residuals for any left-over dependence. Correlations between the coefficients of the model shouldn’t be higher than 0.8 RESIDUALS–
siduals have to be unpredictible, like white noise. White noise is a stationary process without autocorrelation structure. All ACF and PACF coefficients are zero. Correlogram and Partial Correlogram should be very similar with all individual correlations at different lags inside the confidence bands. The p-value of the Ljung-Box-Pierce statistics for the joint significance of first h autocorrelations should be greater than 0.05.


Any statistical software provides forecasts and forecasting intervals for new values of ARIMA models. A few words about prediction in ARIMA models: Prediction uncertainty comes from three sources of error: Innovations (errors) are random. Model may not be precisely identified. Parameter estimates are random variables. MA(1) model provides only one prediction different from the marginal mean. ARMA(1, 1) model may provide infinite predictions different from the marginal mean.