Pearson Correlation and Linear Regression Formulas
Pearson Correlation & Linear Regression Cheat Sheet
Pearson Product-Moment Correlation & Linear Regression
Correlation Definition
Correlation measures the strength and direction of a relationship between two variables.
Warning: Correlation does NOT imply causation.
Types of Correlation
| Type | Description | Graph Trend |
|---|---|---|
| Positive Correlation | Variables increase/decrease together | Upward slope |
| Negative Correlation | One increases while the other decreases | Downward slope |
| Zero Correlation | No predictable relationship | Random scatter |
Strength of Correlation
| Correlation Coefficient (r) | Interpretation |
|---|---|
| 0.00 | No correlation |
| ±0.01 – ±0.20 | Very low |
| ±0.21 – ±0.40 | Slight |
| ±0.41 – ±0.70 | Moderate |
| ±0.71 – ±0.90 | High |
| ±0.91 – ±0.99 | Very high |
| ±1.00 | Perfect correlation |
Pearson Correlation Coefficient Formula
r=\frac{n\sum xy-(\sum x)(\sum y)}{\sqrt{\left[n\sum x^2-(\sum x)^2\right]\left[n\sum y^2-(\sum y)^2\right]}}
Formula Variables
- r = correlation coefficient
- n = number of observations
- x, y = variables
Interpreting the Correlation Coefficient (r)
| Value of r | Meaning |
|---|---|
| r > 0 | Positive relationship |
| r < 0 | Negative relationship |
| r = 0 | No linear relationship |
| r → ±1 | Strong relationship |
Example: Math & English Scores
Given data:
| Variable | Sum |
|---|---|
| ∑ x | 48 |
| ∑ y | 50 |
| ∑ x² | 296 |
| ∑ y² | 310 |
| ∑ xy | 298 |
| n | 10 |
Solution and Interpretation
r=\frac{10(298)-48(50)}{\sqrt{(10(296)-48^2)(10(310)-50^2)}}\approx0.92
Interpretation: There is a very high positive correlation between Math and English scores.
Coefficient of Determination (r²)
Formula
r^2=(0.92)^2=0.8464\approx0.85
Interpretation
- About 85% of the variation in one variable is explained by the other.
- The remaining 15% is caused by other factors.
Scatterplot Patterns and Meanings
| Pattern | Meaning |
|---|---|
| Tight upward cluster | Strong positive correlation |
| Tight downward cluster | Strong negative correlation |
| Random dots | No correlation |
Linear Regression Analysis
Linear regression predicts a dependent variable using an independent variable.
General Equation
\hat{y}=a+bx
Where:
- ˆy = predicted value
- a = y-intercept
- b = slope
Slope Formula and Example
b=\frac{n\sum xy-(\sum x)(\sum y)}{n\sum x^2-(\sum x)^2}
Example Calculation
b=\frac{10(298)-48(50)}{10(296)-48^2}=\frac{580}{656}\approx0.88
Interpretation: Every 1-point increase in Math score increases the English score by 0.88.
Y-Intercept Formula and Example
a=\frac{\sum y}{n}-b\frac{\sum x}{n}
Example Calculation
a=\frac{50}{10}-0.88\left(\frac{48}{10}\right)=0.76
Regression Equation Result
\hat{y}=0.76+0.88x
Making Predictions with Regression
Example 1: If x = 95
\hat{y}=0.76+0.88(95)=84.36
Predicted English score = 84.36
Example 2: If x = 80
\hat{y}=0.76+0.88(80)=71.16
Predicted English score = 71.16
Important Limitations and Notes
When Correlation May Fail
- The relationship is non-linear
- Outliers exist in the dataset
- The data range is restricted
Assumptions of Pearson Correlation
Variables must be:
- Continuous
- Normally distributed
- Linearly related
Real-World Applications
| Field | Application |
|---|---|
| Education | Predict student performance |
| Business | Sales forecasting |
| Real Estate | Price prediction |
| Manufacturing | Quality control |
| Healthcare | Risk analysis |
Quick Memory Tricks
| Concept | Shortcut |
|---|---|
| Correlation | Measures relationship |
| Regression | Predicts values |
| r | Strength + direction |
| r² | Explained variation |
| Positive r | Variables move together |
| Negative r | Variables move opposite |
Formula Summary Reference
Pearson Correlation
r=\frac{n\sum xy-(\sum x)(\sum y)}{\sqrt{\left[n\sum x^2-(\sum x)^2\right]\left[n\sum y^2-(\sum y)^2\right]}}
Coefficient of Determination
r^2
Regression Equation
\hat{y}=a+bx
Slope
b=\frac{n\sum xy-(\sum x)(\sum y)}{n\sum x^2-(\sum x)^2}
Y-Intercept
a=\frac{\sum y}{n}-b\frac{\sum x}{n}
Final Takeaways
Correlation tells us:
- Whether variables are related
- How strong the relationship is
- The direction of the relationship
Linear Regression helps us:
- Model relationships
- Predict future values
- Make data-driven decisions
Strong correlation = r close to ±1
Strong prediction power = high r²
