Understanding Generalized Linear Models and Regression Analysis

GENERAL LINEAR MODEL

y=Bo+B1X1+…+BkXk+e. They support inference and prediction. Estimation by Least Squares, Maximum Likelihood,… Interpretable, computation is fast. Simple interactions and transformations are easy. Dummy variables allow the use of categorical information.

GENERALIZED LINEAR MODEL

g(E[y|x])=Bo+B1X1+…+BpXp. Flexible: availability of alternative link functions. They relax some of the assumptions of the linear mode. Example: ogistic regression model, Poisson regression model.

ADDITIVE MODEL

E[y|x]=Bo+f1(X1)+…+fp(Xp). Flexible, non parametric model. fk are unknown smooth functions fit from the data. The parameters are {fk }, β0 and σ 2 . Estimation: Backfitting Algorithm.

GENERALIZED ADDITIVE MODEL

g(E[y|x])=Bo+f1(X1)+…+fp(Xp). Flexible, non parametric model. fk are unknown smooth functions fit from the data. The link function is chosen by the user based on domain knowledge.

GOOD REGRESSION MODEL REQUIRES

Gathering useful data and making sure you know where it came from and how it was measured. Performing descriptive analysis on it to understand its general patterns and to spot data-quality problems. Applying appropriate data transformations if you see strong evidence of relationships that are nonlinear or noise that is non-normal or time-dependent. Fitting and refining and comparing models. Checking to see whether a given model’s assumptions are reasonably well satisfied or whether an alternative model is suggested. Choosing among reasonable models based on the appropriate bottom-line accuracy measure. Deriving some useful insights from the whole process.

SIMPLE LINEAR REGRESSION MODEL

B0, the intercept, is the value of the response for X = 0. Most of the time it has not a meaning or sensible interpretation itself. B1, the slope, represents the variation in the response Y when X is increased (or decreased) one unit.

LINE

Linearity: the mean of the response E(Yi |Xi) at each value of the predictor is a linear function of the Xi . Independence: the residuals (errors) ei are independent. Normality: the residuals, for each value of the predictor Xi are normally distribuited. Equal variances (σ^2 ): the residuals, at each value of the predictor Xi have equal variances.

STATISTICAL ASSUMPTIONS

Normality (and Linearity): the conditional distribution of Y for any combinations of values of the X1, . . . , Xp is Normal. The expected value is a linear function of the X 0 s

Independence. The observations from Y are statistically independent. Homocedasticity. The conditional variance of Y given any specific combination of values of the X1, . . . , Xp is the same, i.e., σ 2 .

MULTIPLE CORRELATION

R(multiple correlation coefficient)=The correlation between the predicted scores (Yˆ) and the observed (criterion) scores. R^2 measures the proportion of the variance of the dependent variable about its mean that is explained by the independent, or predictor, variables. Low values may be explained because important variables have been left out of the model.

R^2 PROPERTIES

The larger the value, the better the explanatory variables collectively predict Y. | R^2 =1 when residuals are 0. | R^2=0 when the estimated slopes all equal 0 and the correlation between Y and each explanatory variable equals 0 | The addition of independent variables will always cause R^2 to rise or, at least, stay the same. | The value of R^2 doesn’t depend on the units of measurement.. R^2 = 1-(ResidualSumSquare/TotalSumSquare).

ANOVA VS ANCOVA

The difference between ANCOVA and ANOVA is that ANCOVA is the process of eliminating the impact of metric-scaled varaibles from dependent varaibles before carrying out a research project. Meanwhile, ANOVA is a method used for investigating the difference among the means of various groups of daa for the purpose of uniformity.

NESTED MODELS

One model is nested within another if it is a special case of the other in which some model coeffcients are constrained to be zero. When two models are nested multiple regression models, there is a simple procedure for comparing them. This procedure tests whether the more complex model is significantly better than the simpler model. In the sample, of course, the more complex of two nested models will always fit at least as well as the less complex model. This is done via Partial F-tests.

ANOVA PRAC CORRELATIONS

Anova is sequential test, you introduce the first variable and do the test, Then you introduce the second and do the test with the first and the second, etc. The table is going to tell the secuential nature of the way that we use the model. The anova function teel us whether the factor has any explanatory value on the response variable

                    Df           Sum Sq            Mean Sq        F value          Pr(>F)

X20               1             82.144              82.144            166.41           2.2e-16

X21               1             10.603              10.603             21.48           1.114e-05

Residuals       97            47.811              0.494

TotalSumSquare=82.144+10.603=140.6 | The first var (X20) explains 82.144% (R^2) of the variability and the second 10.6% | The variability left to explain once X20 enters the model is TotalSumSquare-82.144 = 58,46. | Correlation between YX21 X20 (correlation of Y and X21 taking in count X20) sqrt(10.6/58.6)=0.42. |||| R^2 = 1-(SumTOdosMenosResiduals / SumaDeTodoSumSq) HIgh pval in anova and low in summary taht varaible must be in the modelANOVA NESTED MODELS Model1=X19~X6+X9Model2 = X19~X6+X9+X7Model 3=X19~X6+X9+X7+X11    

              Res.Df         RSS       Df    Sum of Sq        F          Pr(>F)

1              97           64.101

2              96           53.927      1       10.173         18.175     4.75e-05

3              95           53.174      1        0.7533         1.3458     0.2489

We compare between model 1 and 2. Pval is 4.75e-05 then  we reject the null hypothesis that model 1 is better. FInally we compare model 2 and 3 but pval is 0.248 then we accept null hypotesis that model 1 is better. THE COMPLEXITY in this case is 3 (3 variables). Incase of empate se coje primero el que tenga menor SUm of Sq y menor F o sea mas simple.


SUMMARY PRACTIn the summary we are testing the model with only one varaible per iteration and the model with all the variables. This are tested when the variables enter the last in the model, the test is been doing when the variable enter in the model the last.

                     Estimate       Std. Error              t value        Pr(>|t|)

(Intercept)      -259.63          17.32                   -14.99         2e-16

X2                  3721.02         81.79                    45.50         2e-16

The parameters for the fitted model are βˆ 0 = −259.63 and βˆ 1 = 3721.02. The fitted line (model) is: yi= −259.63 + 3721.02xi

The output also shows their standard errors, t-values and p-values for the respective test with null hypothesis H0 : β0 = 0 and β1 = 0. Observing these p-values, we reject H0 so we have evidence for β0 != 0 and β1 != 1 and they are significantly different from 0. The p-value in the last line is for the overall test for the significance of the regression model: H0 : β1 = 0 vs H1 : β1 != 0

The residual standard error (31.84) is an estimate of σ, being σ 2 the unknown population variance of the residualsThe closer to 1 or 100 % the better. It measures the ability to predict Y using X.

The Multiple R-squared (or coefficient of determination) is a proportion. The closer to 1 or 100 % the better. It measures the ability to predict Y using X. Only in the simple linear regression model (with only one predictor x), holds this equality: R^2 = r^2 , being r the Pearson correlation coefficient Its squared root, R, is known as the multiple correlation coefficient, and is the correlation between the observed variable Y and the predicted values by the model Yˆ .What proportion of the variation in X19 is explained by X20? –> mirar multiple R-square si solo hay intercept y otra cosa How is this proportion related to Pearson’s correlation coefficient? Lo mimo q la anterior (solo en el simple linear regression model. General pval – son todos los coef 0? Adj R-sqaured- lo bien que predice y a partir de las x penaliza cuantas mas usemos. Multiple square-lo bien que predice y a partir de las x

INTERACTIONS BETWEEN LINEAR AND CONT VAR AND CONT AND CATEG–

MODEL WITH X = ONE CATEGORICAL VAR–

X = categorical var taking values {1,2,3}. Y=βo + β1X. The model fitted in R is: Y=βo + β1X(2) + β2X(3) where X(2) and X(3) are dummy varaibles.

X(2) = 1 if X=2 or 0 otherwhise   X(3)= 1 if X=3 or  0 otherwise

for X=1, the model is Y=βo,  for X=2 model is Y=βo + β1 (row test is for Ho: β1 = 0, H1: β1 !=0),  for X=3, the model is Y=βo + β2 (row test is for Ho: β2 = 0, H1: β2 !=0), 

EXAMPLE WITH INTERACTIONS–

Sales = Bo+B1TV+B2Radio+B3Newsp

States that the average effect on sales of a one-unit increase in TV is always B1, regardless of the amout spent on radio.