week_45

Multiple Linear Regression Explained Simply

The Math Behind Fitting a Plane Instead of a Line

Overview

This guide explores multiple linear regression from first principles, focusing on the mathematical foundations rather than just applying algorithms. While simple linear regression uses one independent variable to predict one target, multiple linear regression extends this to two or more independent variables—requiring us to fit a plane instead of a line.

Dataset: Fish Market

To illustrate the concepts, we use the Fish Market dataset, which includes physical attributes of fish:

Species: The type of fish (e.g., Bream, Roach, Pike)
Weight: The weight of the fish in grams (target variable)
Length1, Length2, Length3: Various length measurements in cm
Height: The height of the fish in cm
Width: The diagonal width of the fish body in cm

For simplicity and visualization, this guide uses two independent variables (Height and Width) and a 20-point sample from the full dataset.

Quick Start: Fitting a Model in Python

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

# 20-point sample data from Fish Market dataset
data = [
    [11.52, 4.02, 242.0],
    [12.48, 4.31, 290.0],
    [12.38, 4.70, 340.0],
    # ... (additional 17 data points)
]

# Create DataFrame
df = pd.DataFrame(data, columns=["Height", "Width", "Weight"])

# Independent variables (Height and Width)
X = df[["Height", "Width"]]

# Target variable (Weight)
y = df["Weight"]

# Fit the model
model = LinearRegression().fit(X, y)

# Extract coefficients
b0 = model.intercept_  # β₀
b1, b2 = model.coef_   # β₁ (Height), β₂ (Width)

print(f"Intercept (β₀): {b0:.4f}")
print(f"Height slope (β₁): {b1:.4f}")
print(f"Width slope (β₂): {b2:.4f}")

Results:

Intercept (β₀): -1005.2810
Height slope (β₁): 78.1404
Width slope (β₂): 82.0572

Understanding the Geometry

From Lines to Planes

In simple linear regression, we fit a line through 2D data. The equation is:

$$y = \beta_0 + \beta_1 x_1$$

In multiple linear regression with two features, we fit a plane through 3D data:

$$\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2$$

Where:

$\hat{y}$: The predicted value of the dependent (target) variable
$\beta_0$: The intercept (the value of y when all x's are 0)
$\beta_1$: The coefficient (or slope) for feature $x_1$
$\beta_2$: The coefficient for feature $x_2$
$x_1, x_2$: The independent variables (features)

The Residuals and Sum of Squared Residuals

For any point $i$ in our dataset, we can compute:

Predicted value: $\hat{y}i = \beta_0 + \beta_1 x{i1} + \beta_2 x_{i2}$

Residual: $\text{Residual}_i = y_i - \hat{y}_i$

The Sum of Squared Residuals (SSR) measures total prediction error:

$$\text{SSR} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 = \sum_{i=1}^{n} (y_i - \beta_0 - \beta_1 x_{i1} - \beta_2 x_{i2})^2$$

Squaring ensures all errors contribute positively and gives more weight to larger deviations.

The Mathematical Derivation

Why Calculus?

The goal is to find the values of $\beta_0$, $\beta_1$, and $\beta_2$ that minimize the SSR. This is where differentiation becomes essential.

For curves with multiple variables, we use partial differentiation—differentiating each variable separately while treating others as constants.

Setting Up the Equations

We define the loss function:

$$L(\beta_0, \beta_1, \beta_2) = \sum_{i=1}^{n} (y_i - \beta_0 - \beta_1 x_{i1} - \beta_2 x_{i2})^2$$

At the minimum, all partial derivatives equal zero:

$$\frac{\partial L}{\partial \beta_0} = 0, \quad \frac{\partial L}{\partial \beta_1} = 0, \quad \frac{\partial L}{\partial \beta_2} = 0$$

Partial Derivatives

With respect to β₀:

$$\frac{\partial L}{\partial \beta_0} = -2 \sum_{i=1}^{n} (y_i - \beta_0 - \beta_1 x_{i1} - \beta_2 x_{i2}) = 0$$

This simplifies to:

$$\beta_0 = \bar{y} - \beta_1 \bar{x}_1 - \beta_2 \bar{x}_2$$

With respect to β₁ and β₂:

Similar partial differentiation yields two more equations that, when solved together using Cramer's Rule, give:

$$\beta_1 = \frac{(\sum x_{i2}^2)(\sum x_{i1} y_i) - (\sum x_{i1} x_{i2})(\sum x_{i2} y_i)}{(\sum x_{i1}^2)(\sum x_{i2}^2) - (\sum x_{i1} x_{i2})^2}$$

$$\beta_2 = \frac{(\sum x_{i1}^2)(\sum x_{i2} y_i) - (\sum x_{i1} x_{i2})(\sum x_{i1} y_i)}{(\sum x_{i1}^2)(\sum x_{i2}^2) - (\sum x_{i1} x_{i2})^2}$$

Centering the Data

Centering means subtracting the mean from each variable:

$$x'_{i1} = x_{i1} - \bar{x}_1, \quad x'_{i2} = x_{i2} - \bar{x}_2, \quad y'_i = y_i - \bar{y}$$

Benefits of Centering:

Simplifies formulas by eliminating extra terms
Ensures the mean of all variables is zero
Improves numerical stability
Makes the intercept easier to calculate: $\beta_0 = \bar{y}$ (for centered data)

Example:

For a small dataset with 3 observations:

i	Original x₁	Original x₂	Original y	Centered x'₁	Centered x'₂	Centered y'
1	2	3	10	-2	-2	-4
2	4	5	14	0	0	0
3	6	7	18	+2	+2	+4

After centering: $\sum x'{i1} = 0$, $\sum x'{i2} = 0$, $\sum y'_i = 0$

Computing Coefficients for Our Sample Data

Step 1: Compute Means (Original Data)

$$\bar{x}_1 = 13.841, \quad \bar{x}_2 = 4.9385, \quad \bar{y} = 481.5$$

Step 2: Center the Data

Subtract means from all observations.

Step 3: Compute Centered Summations

$$\sum x'_{i1} y'_i = 2465.60, \quad \sum x'_{i2} y'_i = 816.57$$

$$\sum (x'_{i1})^2 = 24.3876, \quad \sum (x'_{i2})^2 = 3.4531, \quad \sum x'_{i1} x'_{i2} = 6.8238$$

Step 4: Compute Shared Denominator

$$\Delta = (24.3876)(3.4531) - (6.8238)^2 = 37.6470$$

Step 5: Compute Slopes

$$\beta_1 = \frac{(3.4531)(2465.60) - (6.8238)(816.57)}{37.6470} = \frac{2940.99}{37.6470} = 78.14$$

$$\beta_2 = \frac{(24.3876)(816.57) - (6.8238)(2465.60)}{37.6470} = \frac{3089.79}{37.6470} = 82.06$$

Step 6: Compute Intercept

$$\beta_0 = \bar{y} - \beta_1 \bar{x}_1 - \beta_2 \bar{x}_2$$

$$\beta_0 = 481.5 - (78.14)(13.841) - (82.06)(4.9385) = -1005.28$$

Final Regression Equation

$$\hat{y}_i = -1005.28 + 78.14 \cdot x_{i1} + 82.06 \cdot x_{i2}$$

This is how we derive the coefficients that Python's LinearRegression computes behind the scenes!

Key Takeaways

Multiple linear regression extends simple linear regression by fitting a plane (or hyperplane) through multi-dimensional data.
The goal is to find coefficients that minimize the Sum of Squared Residuals.
Calculus is essential: We use partial differentiation to find the point where the gradient is zero—the minimum of the cost function.
Three unknowns ($\beta_0$, $\beta_1$, $\beta_2$) require solving a system of three equations.
Data centering simplifies calculations and improves numerical stability.
The final equation is a direct result of mathematical optimization, not trial-and-error.

What's Next?

Part 2 of this series will cover model evaluation, interpreting coefficients, and handling more than two features. Stay tuned!

References

Fish Market Dataset
Linear Algebra and Calculus fundamentals
Multiple Linear Regression theory

Author: Simanga Mchunu

Name		Name	Last commit message	Last commit date
parent directory ..
data		data
Figure_1.png		Figure_1.png
Fish.csv		Fish.csv
Fish.csv:Zone.Identifier		Fish.csv:Zone.Identifier
README.md		README.md
model.ipynb		model.ipynb
model.py		model.py
plot_sample.py		plot_sample.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Multiple Linear Regression Explained Simply

The Math Behind Fitting a Plane Instead of a Line

Overview

Dataset: Fish Market

Quick Start: Fitting a Model in Python

Understanding the Geometry

From Lines to Planes

The Residuals and Sum of Squared Residuals

The Mathematical Derivation

Why Calculus?

Setting Up the Equations

Partial Derivatives

Centering the Data

Benefits of Centering:

Example:

Computing Coefficients for Our Sample Data

Step 1: Compute Means (Original Data)

Step 2: Center the Data

Step 3: Compute Centered Summations

Step 4: Compute Shared Denominator

Step 5: Compute Slopes

Step 6: Compute Intercept

Final Regression Equation

Key Takeaways

What's Next?

References

FilesExpand file tree

week_45

Directory actions

More options

Directory actions

More options

Latest commit

History

week_45

Folders and files

parent directory

README.md

Multiple Linear Regression Explained Simply

The Math Behind Fitting a Plane Instead of a Line

Overview

Dataset: Fish Market

Quick Start: Fitting a Model in Python

Understanding the Geometry

From Lines to Planes

The Residuals and Sum of Squared Residuals

The Mathematical Derivation

Why Calculus?

Setting Up the Equations

Partial Derivatives

Centering the Data

Benefits of Centering:

Example:

Computing Coefficients for Our Sample Data

Step 1: Compute Means (Original Data)

Step 2: Center the Data

Step 3: Compute Centered Summations

Step 4: Compute Shared Denominator

Step 5: Compute Slopes

Step 6: Compute Intercept

Final Regression Equation

Key Takeaways

What's Next?

References