Time Series Analysis and Regression Modeling in R
R Setup and Initial Data Handling
Setting the working directory:
setwd("/Users/hajdumarcell/Downloads/Öko. II. R Jegyzet")
Data inspection and preparation:
str(Titanic)
PS4$Date <- as.Date(PS4$Date)
Basic visualization using ggplot2
:
ggplot(PS4, aes(x=Date, y=Google_PS4)) + geom_line()
Regression Modeling with Dummy Variables
The general regression model structure, including trend ($t$) and quarterly dummy variables ($DQ$):
$$Y_t = \beta_0 + \beta_1 \times t + \beta_2 \times DQ_1 + \beta_3 \times DQ_2 + \beta_4 \times DQ_3 + u_t$$
PS4$t <- 1:20
PS4$Quarter <- as.factor(PS4$Quarter)
PS4$Quarter <- relevel(PS4$Quarter, ref="Q4") # Reference coding
Visualizing Actual vs. Estimated Values
ggplot(PS4, aes(x=Date)) +
geom_line(aes(y = Google_PS4, color = "Actual Y_t"), size=1) +
geom_line(aes(y = Y_kalap, color = "Estimated Y_t"), size=1)
Handling Trend and Residuals
Saving the estimated trend and residuals for gas prices:
# Saving the estimated trend gas prices and residuals
GazAr$Trend <- TrendGazAr$fitted.values
GazAr$residual <- TrendGazAr$residuals
Time Series Decomposition Techniques
library(forecast)
ma(SmartTV$SmartTV_Google, order = 7)
Filtering Trend and Isolating Seasonal/Noise Components
If we now say that the TREND effects are embodied in the moving average (MA), then what is left in our new column SmartTV$TrendFiltered
is a SEASONAL × NOISE component!
SmartTV$TrendFiltered <- SmartTV$SmartTV_Google / SmartTV$MA12
From this TrendFiltered
column, we can easily filter out the NOISE component by averaging these values by the basic unit of seasonality (i.e., the months):
SeasonEffect <- aggregate(TrendFiltered ~ Month, data = SmartTV, FUN = mean)
Modeling and Decomposition using ts
and decompose
SmartTV$t <- 1:nrow(SmartTV)
LinTrend <- lm(SeasAdjust ~ t, data = SmartTV)
SmartTV$Trend <- LinTrend$fitted.values
str(SmartTV)
SmartTV_ts <- ts(SmartTV$SmartTV_Google, start = c(2011, 1), frequency = 12)
DecompMult <- decompose(SmartTV_ts, type = "multiplicative")
Forecasting and Model Visualization
Re-entering the time index t
for the linear trend calculation:
SmartTV$t <- 1:nrow(SmartTV)
LinTrend <- lm(SeasAdjust ~ t, data = SmartTV)
Calculating the modeled time series (estimated trend value × corrected seasonal index):
SmartTV$Modelled <- LinTrend$fitted.values * as.numeric(DecompMult$seasonal)
Plotting the original and modeled series:
library(ggplot2)
ggplot(SmartTV, aes(x=Date)) +
geom_line(aes(y=SmartTV_Google, color = "Original time series"), size = 1) +
geom_line(aes(y=Modelled, color = "Modelled time series"), size = 1)
Setting up forecast dates using lubridate
:
library(lubridate)
ForecastDates <- SmartTV$Date[nrow(SmartTV)] %m+% months(c(1:12))
Forecast$Modelled <- predict(LinTrend, newdata = Forecast)
Advanced Regression: Seasonality and Structural Breaks
Modeling Mobility Data with Trend and Day-of-Week Effects
Model_Season <- lm(FacebookMobility ~ t + I(t^2) + DayofWeek, data = FB_Filtered)
summary(Model_Season)
Handling Holiday Effects
Creating a binary holiday flag:
FB_Filtered$HolidayFlag <- 0 # Every day initially 0
# Give 1 for the holidays
FB_Filtered$HolidayFlag[FB_Filtered$Date=="2020-03-15"] <- 1
FB_Filtered$HolidayFlag[FB_Filtered$Date=="2020-04-10"] <- 1
FB_Filtered$HolidayFlag[FB_Filtered$Date=="2020-04-12"] <- 1
Using Sum Contrasts for Categorical Variables
Setting up sum contrasts (where the reference category coefficient is the negative sum of the others):
contrasts(FB_Filtered$DayofWeek)
contr.sum(7) # 7 is the parameter value because I have 7 days.
ContrastMatrix <- contr.sum(7)
rownames(ContrastMatrix) <- rownames(contrasts(FB_Filtered$DayofWeek))
contrasts(FB_Filtered$DayofWeek) <- ContrastMatrix
Model_Season_Contrast <- lm(FacebookMobility ~ t + I(t^2) + DayofWeek, data = FB_Filtered)
summary(Model_Season_Contrast)
Detecting and Modeling Structural Changes
Defining sections based on date ranges:
FB_Filtered$Section <- "Section 1" # Everyone belongs to section 1 first
FB_Filtered$Section[FB_Filtered$Date > "2020-07-25" & FB_Filtered$Date <= "2021-01-02"] <- "Section 2" # Overwrite those that are in section 2
Modeling structural change using interaction terms:
Model_Struct <- lm(FacebookMobility ~ t + Section + t*SectionDayofWeek, data = FB_Filtered)
summary(Model_Struct)
Using the strucchange
package to find breakpoints:
library(strucchange)
BreakPoints <- breakpoints(FB_Filtered$FacebookMobility ~ 1)
length(BreakPoints$breakpoints)
Trend Filtering Methods
Moving Average (MA) for GDP Data
GerGDP$MA12 <- rollmean(GerGDP$gdp_per_capita, k=12, fill = NA)
Hodrick-Prescott (HP) Filter
# HP Filter
install.packages("mFilter")
library(mFilter)
HP_GerGDP <- hpfilter(GerGDP$gdp_per_capita, freq = 100, type = "lambda")
GerGDP$HP <- HP_GerGDP$trend[,1] # Extracting the trend column from the resulting data frame.
Autocorrelation and Residual Analysis
Analyzing Bus Travel Data Trend
library(ggplot2)
Bustravel_1_$t <- 1:78
ggplot(Bustravel_1_, aes(x=t, y=travelno)) + geom_line() # Understand trend
Calculating Lagged Residuals
# Residuals
Bustravel_1_$res <- Model_Struct$residuals
# Lag 1
Bustravel_1_$reslag1 <- c(NA, Model_Struct$residuals[1:(length(Model_Struct$residuals)-1)])
# Lag 2
Bustravel_1_$reslag2 <- c(NA, NA, Model_Struct$residuals[1:(length(Model_Struct$residuals)-2)])
Correlation Analysis
# Correlation
cor(Bustravel_1_[3:nrow(Bustravel_1_), c("res", "reslag1", "reslag2")])
library(ppcor)
# Partial Correlation
pcor(Bustravel_1_[3:nrow(Bustravel_1_), c("res", "reslag1", "reslag2")])
ggplot(Bustravel_1_, aes(x=t, y=res))
ACF and CCF
acf(Finland$Cik_GDP, lag.max = 1, plot = FALSE)
ccf(Finland$Cik_GDP, Finland$Cik_I, lag.max = 6, plot = FALSE)