Linear Regression is one of the simplest and most widely used techniques in machine learning. It establishes a linear relationship between an independent variable (x) and a dependent variable (y). This guide explains Linear Regression step-by-step with detailed calculations and practical implementation.
1. The Equation of Linear Regression
The general form of the linear regression equation is:
Where:
- : Predicted value of y
- : Intercept (value of y when x = 0)
- : Slope (rate of change in y for a unit change in x)
- x: Independent variable
2. How It Works: Step-by-Step Process
Step 1: Define the Hypothesis
The hypothesis of Linear Regression aims to find the best-fit line for the given data by minimizing the difference between the actual values and predicted values.
Step 2: Calculate the Error (Residuals)
The residual is the difference between the actual value (y) and the predicted value ():
The objective is to minimize the total error using the Mean Squared Error (MSE):
Step 3: Optimize the Line
To optimize and , Gradient Descent is used. Gradient Descent iteratively updates the parameters to minimize the MSE:
Where is the learning rate.
Derivation of Gradients:
3. Example: Step-by-Step Calculation
Dataset:
x (Independent) | y (Dependent) |
---|---|
1 | 2 |
2 | 4 |
3 | 5 |
4 | 4 |
5 | 5 |
Step 1: Compute Means
Step 2: Calculate (Slope)
Substituting values:
Step 3: Calculate (Intercept)
Final Equation:
4. Predictions
Using the equation :
x | Actual y | Predicted |
1 | 2 | 3.2 |
2 | 4 | 3.6 |
3 | 5 | 4.0 |
4 | 4 | 4.4 |
5 | 5 | 4.8 |
5. Implementation in Python
Here is how you can implement this step-by-step calculation in Python:
import numpy as np
# Dataset
X = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])
# Calculate means
x_mean = np.mean(X)
y_mean = np.mean(y)
# Calculate b1 (slope)
numerator = np.sum((X - x_mean) * (y - y_mean))
denominator = np.sum((X - x_mean)**2)
b1 = numerator / denominator
# Calculate b0 (intercept)
b0 = y_mean - b1 * x_mean
# Predicted values
y_pred = b0 + b1 * X
print(f"Intercept (b0): {b0}")
print(f"Slope (b1): {b1}")
print(f"Predicted values: {y_pred}")
6. Applications of Linear Regression
- Real Estate: Predicting house prices based on size, location, etc.
- Business Analytics: Forecasting sales trends.
- Healthcare: Estimating patient metrics like blood pressure.
- Finance: Modeling stock prices.
Conclusion
Linear Regression is a powerful tool for understanding and predicting relationships between variables. By following the above steps, you can easily apply it to any dataset, ensuring a thorough understanding of the underlying process.