Pearson Correlation and Linear Regression Formulas

Posted on May 21, 2026 in Statistics

Pearson Correlation & Linear Regression Cheat Sheet

Pearson Product-Moment Correlation & Linear Regression

Correlation Definition

Correlation measures the strength and direction of a relationship between two variables.

Warning: Correlation does NOT imply causation.

Types of Correlation

Type	Description	Graph Trend
Positive Correlation	Variables increase/decrease together	Upward slope
Negative Correlation	One increases while the other decreases	Downward slope
Zero Correlation	No predictable relationship	Random scatter

Strength of Correlation

Correlation Coefficient (r)	Interpretation
0.00	No correlation
±0.01 – ±0.20	Very low
±0.21 – ±0.40	Slight
±0.41 – ±0.70	Moderate
±0.71 – ±0.90	High
±0.91 – ±0.99	Very high
±1.00	Perfect correlation

Pearson Correlation Coefficient Formula

r=\frac{n\sum xy-(\sum x)(\sum y)}{\sqrt{\left[n\sum x^2-(\sum x)^2\right]\left[n\sum y^2-(\sum y)^2\right]}}

Formula Variables

r = correlation coefficient
n = number of observations
x, y = variables

Interpreting the Correlation Coefficient (r)

Value of r	Meaning
r > 0	Positive relationship
r < 0	Negative relationship
r = 0	No linear relationship
r → ±1	Strong relationship

Example: Math & English Scores

Given data:

Variable	Sum
∑ x	48
∑ y	50
∑ x²	296
∑ y²	310
∑ xy	298
n	10

Solution and Interpretation

r=\frac{10(298)-48(50)}{\sqrt{(10(296)-48^2)(10(310)-50^2)}}\approx0.92

Interpretation: There is a very high positive correlation between Math and English scores.

Coefficient of Determination (r²)

Formula

r^2=(0.92)^2=0.8464\approx0.85

Interpretation

About 85% of the variation in one variable is explained by the other.
The remaining 15% is caused by other factors.

Scatterplot Patterns and Meanings

Pattern	Meaning
Tight upward cluster	Strong positive correlation
Tight downward cluster	Strong negative correlation
Random dots	No correlation

Linear Regression Analysis

Linear regression predicts a dependent variable using an independent variable.

General Equation

\hat{y}=a+bx

Where:

ˆy = predicted value
a = y-intercept
b = slope

Slope Formula and Example

b=\frac{n\sum xy-(\sum x)(\sum y)}{n\sum x^2-(\sum x)^2}

Example Calculation

b=\frac{10(298)-48(50)}{10(296)-48^2}=\frac{580}{656}\approx0.88

Interpretation: Every 1-point increase in Math score increases the English score by 0.88.

Y-Intercept Formula and Example

a=\frac{\sum y}{n}-b\frac{\sum x}{n}

Example Calculation

a=\frac{50}{10}-0.88\left(\frac{48}{10}\right)=0.76

Regression Equation Result

\hat{y}=0.76+0.88x

Making Predictions with Regression

Example 1: If x = 95

\hat{y}=0.76+0.88(95)=84.36

Predicted English score = 84.36

Example 2: If x = 80

\hat{y}=0.76+0.88(80)=71.16

Predicted English score = 71.16

Important Limitations and Notes

When Correlation May Fail

The relationship is non-linear
Outliers exist in the dataset
The data range is restricted

Assumptions of Pearson Correlation

Variables must be:

Continuous
Normally distributed
Linearly related

Real-World Applications

Field	Application
Education	Predict student performance
Business	Sales forecasting
Real Estate	Price prediction
Manufacturing	Quality control
Healthcare	Risk analysis

Quick Memory Tricks

Concept	Shortcut
Correlation	Measures relationship
Regression	Predicts values
r	Strength + direction
r²	Explained variation
Positive r	Variables move together
Negative r	Variables move opposite

Formula Summary Reference