Time Series Analysis and Regression Modeling in R

R Setup and Initial Data Handling

Setting the working directory:

setwd("/Users/hajdumarcell/Downloads/Öko. II. R Jegyzet")

Data inspection and preparation:

str(Titanic)
PS4$Date <- as.Date(PS4$Date)

Basic visualization using ggplot2:

ggplot(PS4, aes(x=Date, y=Google_PS4)) + geom_line()

Regression Modeling with Dummy Variables

The general regression model structure, including trend ($t$) and quarterly dummy variables ($DQ$):

$$Y_t = \beta_0 + \beta_1 \times t + \beta_2 \times DQ_1 + \beta_3 \times DQ_2 + \beta_4 \times DQ_3 + u_t$$

PS4$t <- 1:20
PS4$Quarter <- as.factor(PS4$Quarter)
PS4$Quarter <- relevel(PS4$Quarter, ref="Q4") # Reference coding

Visualizing Actual vs. Estimated Values

ggplot(PS4, aes(x=Date)) +
  geom_line(aes(y = Google_PS4, color = "Actual Y_t"), size=1) +
  geom_line(aes(y = Y_kalap, color = "Estimated Y_t"), size=1)

Handling Trend and Residuals

Saving the estimated trend and residuals for gas prices:

# Saving the estimated trend gas prices and residuals
GazAr$Trend <- TrendGazAr$fitted.values
GazAr$residual <- TrendGazAr$residuals

Time Series Decomposition Techniques

library(forecast)
ma(SmartTV$SmartTV_Google, order = 7)

Filtering Trend and Isolating Seasonal/Noise Components

If we now say that the TREND effects are embodied in the moving average (MA), then what is left in our new column SmartTV$TrendFiltered is a SEASONAL × NOISE component!

SmartTV$TrendFiltered <- SmartTV$SmartTV_Google / SmartTV$MA12

From this TrendFiltered column, we can easily filter out the NOISE component by averaging these values by the basic unit of seasonality (i.e., the months):

SeasonEffect <- aggregate(TrendFiltered ~ Month, data = SmartTV, FUN = mean)

Modeling and Decomposition using ts and decompose

SmartTV$t <- 1:nrow(SmartTV)
LinTrend <- lm(SeasAdjust ~ t, data = SmartTV)
SmartTV$Trend <- LinTrend$fitted.values

str(SmartTV)
SmartTV_ts <- ts(SmartTV$SmartTV_Google, start = c(2011, 1), frequency = 12)
DecompMult <- decompose(SmartTV_ts, type = "multiplicative")

Forecasting and Model Visualization

Re-entering the time index t for the linear trend calculation:

SmartTV$t <- 1:nrow(SmartTV)
LinTrend <- lm(SeasAdjust ~ t, data = SmartTV)

Calculating the modeled time series (estimated trend value × corrected seasonal index):

SmartTV$Modelled <- LinTrend$fitted.values * as.numeric(DecompMult$seasonal)

Plotting the original and modeled series:

library(ggplot2)
ggplot(SmartTV, aes(x=Date)) +
  geom_line(aes(y=SmartTV_Google, color = "Original time series"), size = 1) +
  geom_line(aes(y=Modelled, color = "Modelled time series"), size = 1)

Setting up forecast dates using lubridate:

library(lubridate)
ForecastDates <- SmartTV$Date[nrow(SmartTV)] %m+% months(c(1:12))
Forecast$Modelled <- predict(LinTrend, newdata = Forecast)

Advanced Regression: Seasonality and Structural Breaks

Modeling Mobility Data with Trend and Day-of-Week Effects

Model_Season <- lm(FacebookMobility ~ t + I(t^2) + DayofWeek, data = FB_Filtered)
summary(Model_Season)

Handling Holiday Effects

Creating a binary holiday flag:

FB_Filtered$HolidayFlag <- 0 # Every day initially 0
# Give 1 for the holidays
FB_Filtered$HolidayFlag[FB_Filtered$Date=="2020-03-15"] <- 1
FB_Filtered$HolidayFlag[FB_Filtered$Date=="2020-04-10"] <- 1
FB_Filtered$HolidayFlag[FB_Filtered$Date=="2020-04-12"] <- 1

Using Sum Contrasts for Categorical Variables

Setting up sum contrasts (where the reference category coefficient is the negative sum of the others):

contrasts(FB_Filtered$DayofWeek)
contr.sum(7) # 7 is the parameter value because I have 7 days.
ContrastMatrix <- contr.sum(7)
rownames(ContrastMatrix) <- rownames(contrasts(FB_Filtered$DayofWeek))
contrasts(FB_Filtered$DayofWeek) <- ContrastMatrix

Model_Season_Contrast <- lm(FacebookMobility ~ t + I(t^2) + DayofWeek, data = FB_Filtered)
summary(Model_Season_Contrast)

Detecting and Modeling Structural Changes

Defining sections based on date ranges:

FB_Filtered$Section <- "Section 1" # Everyone belongs to section 1 first
FB_Filtered$Section[FB_Filtered$Date > "2020-07-25" & FB_Filtered$Date <= "2021-01-02"] <- "Section 2" # Overwrite those that are in section 2

Modeling structural change using interaction terms:

Model_Struct <- lm(FacebookMobility ~ t + Section + t*SectionDayofWeek, data = FB_Filtered)
summary(Model_Struct)

Using the strucchange package to find breakpoints:

library(strucchange)
BreakPoints <- breakpoints(FB_Filtered$FacebookMobility ~ 1)
length(BreakPoints$breakpoints)

Trend Filtering Methods

Moving Average (MA) for GDP Data

GerGDP$MA12 <- rollmean(GerGDP$gdp_per_capita, k=12, fill = NA)

Hodrick-Prescott (HP) Filter

# HP Filter
install.packages("mFilter")
library(mFilter)
HP_GerGDP <- hpfilter(GerGDP$gdp_per_capita, freq = 100, type = "lambda")
GerGDP$HP <- HP_GerGDP$trend[,1] # Extracting the trend column from the resulting data frame.

Autocorrelation and Residual Analysis

Analyzing Bus Travel Data Trend

library(ggplot2)
Bustravel_1_$t <- 1:78
ggplot(Bustravel_1_, aes(x=t, y=travelno)) + geom_line() # Understand trend

Calculating Lagged Residuals

# Residuals
Bustravel_1_$res <- Model_Struct$residuals
# Lag 1
Bustravel_1_$reslag1 <- c(NA, Model_Struct$residuals[1:(length(Model_Struct$residuals)-1)])
# Lag 2
Bustravel_1_$reslag2 <- c(NA, NA, Model_Struct$residuals[1:(length(Model_Struct$residuals)-2)])

Correlation Analysis

# Correlation
cor(Bustravel_1_[3:nrow(Bustravel_1_), c("res", "reslag1", "reslag2")])
library(ppcor)
# Partial Correlation
pcor(Bustravel_1_[3:nrow(Bustravel_1_), c("res", "reslag1", "reslag2")])
ggplot(Bustravel_1_, aes(x=t, y=res))

ACF and CCF

acf(Finland$Cik_GDP, lag.max = 1, plot = FALSE)
ccf(Finland$Cik_GDP, Finland$Cik_I, lag.max = 6, plot = FALSE)