Multiple Regression & Cluster Analysis in Market Research

Multiple Regression Analysis

The use of two or more independent variables to estimate the value of the dependent variable. The equation identifies the best-fitting line based on the method of least squares.

Additional Concepts

Constant (b0 intercept): DV value when all IV are = 0.

Partial Regression Coefficient (conditional coefficient): Shows the impact of one IV on the DV while keeping other IVs constant.

F-test: Assesses if the overall model (all IVs together) is significant.

T-test (partial regression coefficient): Evaluates if each IV contributes significantly to the model.

Stepwise Regression Analysis: A method of selecting variables from a regression model either by adding (forward) or removing (backward) them one by one based on criteria.

Coefficient of Multiple Correlation (R): Shows how well a group of IVs predicts the DV.

Coefficient of Multiple Determination (R2): Explains the portion of variance in the DV that’s explained by the IVs.

Adjusted Coefficient of Multiple Determination (Adjusted R2): Similar to R2 but adds precision and reliability by considering the impact of additional IVs that skew the results of R-squared measurements. Adjusts for the number of predictors to give more accurate measures.

Coefficient of Partial Correlation: Indicates the correlation between one IV and the DV, with other independent variable(s) held constant.

Coefficient of Partial Determination: Indicates the portion of variance accounted for by one IV with other IVs held constant.

Dummy (Indicator) Variables

Used to include qualitative data in a regression model by using binary values (0 & 1) (e.g., female (0) or male, country of origin). One is not better than the other; it represents a difference in category. For K categories, K-1 dummy variables are needed.

Multiple Regression and Correlation Analysis Example

1) Test the Null Hypothesis: No Relationship Between 3 Predictors at a 1% Level of Significance

< 1% reject: There is a statistically significant relationship between the three IVs and the DV at a level of risk of x% (change 0.000 x 100). > 1% accept.

2) Backward Stepwise Regression to Remove Variables That Do Not Contribute at a 10% Level of Significance

> 10% remove (x 100), < 10% contribute (keep). (Experience is contributing at a 10% level of risk because it is lower than 10%).

3) Regression Equation

Y = b0 + b1 * X1 + b2 * X2 + b3 * X3 + b4 * X4 + b5 * X5

The salary for a female with 16 years of experience and 2.5 years of post-secondary education will be around $x. This is what you would expect.


Two or more predictor variables in a regression model are highly linearly related. A set of points lying in a single line. Can lead to: inflated standard errors, unstable estimates, difficulty in assessing individual predictor effects. Remove highly correlated predictors (e.g., house price with size, number of rooms/bedrooms, probability bigger houses have more so more expensive).

Table 1 – Coefficient Table

Formal Regression Equation: Y = b0 + b1 * X1 + b2 * X2 + b3 * X3 + b4 * X4 + b5 * X5

Coefficient: (keep 0.000..%)

Intercept = b0: If we do not take into consideration the independent variables (X1, X2..), the average dependent variable (Y) will be x.

X1 = b1: Partial regression coefficient → for each additional unit in X1, there will be a change of b1 in Y (will have to pay on average around x). If X1 increases by 1 (something % or €), the DV will increase on average by $x.

Standard Error: Coefficient / standard error = t-statistic (measures the precision of the coefficient estimates)

P-Value = Risk = Probability = 1 – Risk

Intercept: Statistically significant by x% (change to %).

X1: We are at an x% level of probability that X1 will be statistically significant OR is not that important because there is a probability of x% (small number), not statistically significant when it comes to the prediction of the DV.

Lower 95% & Upper 95% (confidence interval)

Intercept, X1, X2: We are 95% sure that our coefficient b0 will be between x & x.

Table 3 – ANOVA Analysis

F-statistic: MS (mean of square differences) / SS (sum of square differences)

Significance F = Probability: Our model is statistically significant at a (1 – Significance F) level of probability. Our model is good, > 99%. If small, not statistically significant.

Table 2 – Regression Statistics

(change 0.005 to 0.5%)

Multiple Correlation Coefficient (Multiple R): x% of variability in the DV is connected with x% of variability in the IVs (X1, X2..) taken together.

R Square (Multiple Determination Coefficient): Our regression model explains x% of the variance in the DV. Very good model = the higher the better.

Adjusted R Square: If we take into consideration the fact that adding additional IVs to our model will skew our results, it will still explain around x% of the variance in the DV. Still a very good model. When we add factors to the model, the R-squared increases and improves.

Standard Error: The prediction of the DV made using our model will differ from reality by $x.

Observations: In our many models, there are x observations for each of the variables in the model. Small sample.

Cluster Analysis

A statistical method of processing data. Organizes items into clusters based on how closely associated they are. Subjects are more similar to others within their group than to those outside their group. Objective: Find similar groups of subjects (same features between each pair of subjects). Used when there is no assumption made about the likely relationship within the data. In market research: Used to identify categories (age, income, location).

Intra-cluster distance: Observations within a cluster are as similar as possible (minimize distance between clusters).

Inter-cluster distance: Observations in different clusters are as different as possible (maximize distance between clusters).

Ambiguity: There is no exact number of clusters we need to create; we can create as many as we want.


Partitional Clustering

Division of data objects into non-overlapping clusters. They do not overlap and do not belong to two clusters at the same time.

Hierarchical Clustering

A set of nested clusters organized as a hierarchical tree. The second cluster has some observations from the first cluster but not all, each cluster embraces another.

Exclusive vs. Non-Exclusive

Exclusive: Each data point belongs to only one cluster.

Non-exclusive: Data points can belong to multiple clusters.

Fuzzy vs. Non-Fuzzy

Fuzzy Clustering: One point belongs to every cluster with some weight between 0 & 1 (membership in multiple clusters).

Non-fuzzy: Data points are distinctly assigned to one cluster (weight must sum to 1).

Partial vs. Complete

Complete: Create clusters for all observations we have. Clustering the entire dataset.

Partial: Clustering only a subset of the data.

Heterogeneous vs. Homogeneous

Heterogeneous: Clusters have diverse data points, more diversified observation in the categories for which we are making clusters.

Homogeneous: Clusters have similar data points, similar observations.