# How to Code Linear Regression from Scratch in Python

## Using only NumPy

13 min readMar 19

--

## What is Linear Regression? Why is it Important?

Linear regression is a common data science and machine learning technique for predicting numerical outcomes based on input features.

It’s foundational and has many real-world applications, from forecasting sales revenue to predicting housing prices. The basic idea behind linear regression is to fit a line to a set of data points that best captures the relationship between the input features and the target variable. We’re trying to find the line that gives us the smallest error between our predicted and actual values. This minimizes a cost function, usually the mean squared error.

In this blog post, you’ll implement linear regression from scratch using the Python NumPy library.

We’ll walk through everything from loading and prepping the data to defining the linear regression model, setting up the cost function and gradient descent algorithm, to evaluating the model’s performance. By the end of this post, you’ll have a solid grasp of how linear regression works and be ready to apply it to your data sets.

## A Brief Overview of how Linear Regression Works

Before jumping into code, you need to understand how linear regression works. In a nutshell: linear regression aims to find the line that best fits the data. The line is represented this a linear equation:

y = mx + b

where:

y is the target variable (the one we want to predict)

x is the input feature

m is the slope of the line

b is the intercept

In a more generalized form, we can represent the linear equation as follows:

y = b0 + b1x1 + b2x2 + … + bnxn

Where:

• y is still the target variable

• x1, x2, …, xn are the input features

• b0 is the bias term (also known as the intercept or constant)

• b1, b2, …, bn are the coefficients that we want to learn

The goal of linear regression is to find the values for b0, b1, b2, …, bn that minimize the difference between the predicted values and the actual values.

By representing the equation like this, you can now use matrix notation.

You can stack all the input features in a matrix X, where each row is an observation, and add a column of ones for the bias term. You can also stack all the target variables in a vector y.

The equation then becomes:

y = Xb

Where:

• y is a column vector of size (m, 1) (where m is the number of observations)

X is a matrix of size (m, n+1) (where n is the number of input features)

b is a column vector of size (n+1, 1) (which contains the bias term and coefficients)

This matrix notation is important because it allows you to easily perform linear regression on multiple input features.

In the next section, you’ll learn how to calculate the cost function and use the gradient descent algorithm to optimize the values of b.

## The Cost Function and Gradient Descent

The cost function is used in linear regression to measure the difference between predicted and actual values.

The most common cost function used in linear regression is the mean squared error (MSE). MSE calculates the average of the squared differences between the predicted and actual values for each observation in the data set. This gives you an overall measure of how well your model is performing.

The MSE formula is:

MSE = (1/m) * Σ(yi — yhat)²

Where:

• yhat is the predicted value for a given observation

yi is the actual value,

• m is the total number of observations in the data set.

The goal of linear regression is to minimize the MSE by finding the values of the coefficients that give the lowest possible value for the MSE.

This is where gradient descent comes in.

## Gradient Descent

Gradient descent is an iterative optimization algorithm that minimizes the cost function in linear regression.

The basic idea is to update the values of the coefficients in each iteration until the cost function is minimized. The algorithm works by calculating the gradient (the partial derivative) of the cost function for each coefficient and then updating the values of the coefficients by subtracting the gradient times the learning rate.

The learning rate is a hyperparameter that determines how much the coefficients should be updated in each iteration.

The key parameters in gradient descent are the learning rate, the number of iterations, and the stopping conditions. The learning rate determines the step size of the gradient descent algorithm. A small learning rate may cause the algorithm to converge slowly. In contrast, a large learning rate may cause the algorithm to overshoot the minimum and diverge.

Choosing an appropriate learning rate that balances convergence speed and stability is important.

The number of iterations is the number of times the gradient descent algorithm will update the coefficients. More iterations may result in a better solution but can also increase the risk of overfitting the training data. Stopping conditions are used to determine when to stop the gradient descent algorithm. There are different stopping conditions, such as reaching a certain number of iterations, reaching a minimum threshold for the cost function, or when the change in the cost function between iterations is smaller than a certain threshold.

Combining multiple stopping conditions is recommended to ensure the gradient descent algorithm converges to a good solution.

## Implementing Linear Regression

Now that we’ve covered the theory behind linear regression and gradient descent let’s walk through how to implement linear regression using NumPy.

Step 1: Initialize the model parameters
The first step in implementing linear regression is to initialize the model parameters. This includes setting the learning rate, the number of iterations, and the initial values for the coefficients. For simplicity, we’ll set the learning rate to 0.01 and the number of iterations to 1000. We’ll also initialize the coefficients to random values using NumPy’s random module.

Step 2: Prepare the data
The next step is to prepare the data by loading it into NumPy arrays and splitting it into training and test sets. We’ll also add a column of ones to the training data to represent the bias term.

Step 3: Train the model using gradient descent
Now it’s time to train the model using gradient descent. We’ll use a for loop to iterate over the number of iterations and update the coefficients using the gradient descent algorithm. In each iteration, we’ll calculate the predicted values for the training data using the current values of the coefficients and then calculate the mean squared error. We’ll also print the mean squared error value every 100 iterations.

Step 4: Make predictions on new data
Once the model is trained, we can use it to predict new data. We’ll do this by calculating the dot product of the test data and the optimized coefficients. We can then calculate the mean squared error for the test data to evaluate the model’s performance.

Implementing linear regression using NumPy involves setting up the model parameters, preparing the data, training the model using gradient descent, and making predictions based on new data.

Following these steps, you can build a simple linear regression model using Python and NumPy.

## The Code

The following defines a class called `Linear_Regression` that contains several methods for performing linear regression using gradient descent.

The `__init__` method sets default values for the learning rate and number of iterations.

The `initialize_betas` method takes in a matrix `X` and adds a column of ones to the left side of `X` to account for the bias term. It then initializes the betas with random values and returns both `X` and `betas`.

The `yhat` method takes in `X` and `betas` and returns the predicted values of `y` by multiplying `X` with the transpose of `betas`.

The `loss` method takes in the predicted values `yhat` and the true values `y` and calculates the mean squared error between them.

The `gradient_desc` method takes in the current `betas`, `X`, `y`, and `yhat` and calculates the gradient of the loss function with respect to `betas`. It then updates the values of `betas` using gradient descent and returns the new values.

The `main` method takes in `X` and `y` and initializes `betas` using the `initialize_betas` method. It then runs a loop for `iterations` number of times, where it calculates the predicted values of `y` using `yhat`, calculates the loss using `loss`, and updates `betas` using `gradient_desc`. The loss is printed every 1000 iterations, and the final loss is printed at the end. The method returns the final values of `betas`.

Finally, if the script is run directly (i.e., not imported), an instance of the `Linear_Regression` class is created and the `main` method is called with `X` and `y` as inputs. The final values of `betas` are printed.

`import numpy as npimport randomclass Linear_Regression:    """    A class for performing linear regression using gradient descent.    """    def __init__(self):        """        Constructor method that sets default values for the learning rate and number of iterations.        """        self.learning_rate = 0.001        self.iterations = 10000        def initialize_betas(self, X):        """        Method that takes in a matrix X and initializes the betas with random values, adding a column of ones to the left side of X to account for the bias term.                Args:            X: A matrix of shape (m, n) where m is the number of training examples and n is the number of features.                Returns:            X: A modified version of X with an additional column of ones to account for the bias term.            betas: A numpy array of shape (1, n+1) containing the initial values for the betas.        """        ones = np.ones((X.shape[0], 1))        X = np.append(ones, X, axis=1)                self.m = X.shape[0]        self.n = X.shape[1]                betas = np.array([random.random() for i in range(self.n)]).reshape(1, self.n)        return X, betas        def yhat(self, X, betas):        """        Method that takes in X and betas and returns the predicted values of y.                Args:            X: A matrix of shape (m, n+1) where m is the number of training examples and n is the number of features (with an additional column of ones to account for the bias term).            betas: A numpy array of shape (1, n+1) containing the values for the betas.                    Returns:            yhat: A numpy array of shape (m, 1) containing the predicted values of y.        """        yhat = np.dot(X, betas.T)        return yhat        def loss(self, yhat, y):        """        Method that takes in the predicted values of y (yhat) and the true values of y and calculates the mean squared error between them.                Args:            yhat: A numpy array of shape (m, 1) containing the predicted values of y.            y: A numpy array of shape (m, 1) containing the true values of y.                    Returns:            loss: The mean squared error between y and yhat.        """        loss = np.sum(np.power((yhat - y), 2)) / self.m        return loss        def gradient_desc(self, betas, X, y, yhat):        """        Method that takes in the current values of betas, X, y, and yhat, calculates the gradient of the loss function with respect to betas, and updates the values of betas using gradient descent.                Args:            betas: A numpy array of shape (1, n+1) containing the current values of the betas.            X: A matrix of shape (m, n+1) where m is the number of training examples and n is the number of features (with an additional column of ones to account for the bias term).            y: A numpy array of shape (m, 1) containing the true values of y.            yhat: A numpy array of shape (m, 1) containing the predicted values of y.                    Returns:            betas: A numpy array of shape (1, n+1) containing the updated values of the betas.        """        dLdb = 2 * np.dot((yhat - y).T, X) / self.m        betas = betas - self.learning_rate * dLdb        return betasdef main(self, X, y):    """    Method that takes in X and y, initializes betas, and performs gradient descent to optimize the values of betas.        Args:        X: A matrix of shape (m, n) where m is the number of training examples and n is the number of features.        y: A numpy array of shape (m, 1) containing the true values of y.            Returns:        betas: A numpy array of shape (1, n+1) containing the optimized values of the betas.    """    X, betas = self.initialize_betas(X)        for iteration in range(self.iterations+1):        yhat = self.yhat(X, betas)        loss = self.loss(yhat, y)        if iteration % 1000 == 0:            print(loss)        if iteration == self.iterations:            print(f"Cost at iteration {iteration} is {loss}")        betas = self.gradient_desc(betas, X, y, yhat)    return betasif name=="main":"""Main entry point of the script. Initializes an instance of the Linear_Regression class and calls the main method with a sample X and y."""  X = np.array([[1, 2], [2, 4], [3, 6], [4, 8], [5, 10]])  y = np.array([[2], [4], [6], [8], [10]])  regression = Linear_Regression()  w = regression.main(X, y)  print(w)`

## Recapping the Code

This code defines a class `Linear_Regression` with various methods to perform linear regression using gradient descent.

• The class can be imported and instantiated, and its `main` method is called with input data `X` and `y`.
• The `__init__` method initializes two variables, `learning_rate` and `iterations,` to 0.01 and 10000, respectively.
• The `initialize_betas` method takes in `X,` adds a column of ones to the left side of `X` to account for the bias term, and initializes `betas` with random values. It returns the modified `X` and `betas.`
• The `yhat` method takes in `X` and `betas.` It returns the predicted values of `y` by multiplying `X` with the transpose of `betas.`
• The `loss` method takes in `yhat` and `y` and calculates the mean squared error between them.
• The `gradient_desc` method takes in the current values of `betas,` `X,` `y`, and `yhat,` calculates the gradient of the loss function for `betas,` and updates the values of `betas` using gradient descent.
• The `main` method takes in `X` and `y`, initializes `betas` using `initialize_betas,` and performs gradient descent to optimize the values of `betas.` It iterates `iterations` many times, updating the value of `betas` each time using the `gradient_desc` method, and prints the loss every 1000 iterations. It returns the optimized values of `betas.`
• The script can be run directly to test the class with a sample `X` and `y`. It instantiates a `Linear_Regression` object, calls the `main` method with the sample.

## Evaluation and Interpretation

Once you’ve trained a linear regression model and made predictions on new data, you must evaluate the model’s performance.

You can use several metrics, including R-squared and mean absolute error (MAE).

R-squared, also known as the coefficient of determination, is a statistical measure that represents the proportion of the variance in the dependent variable explained by the model’s independent variables. It ranges from 0 to 1, with higher values indicating a better fit.

To calculate R-squared, we can use the following formula:

R-squared = 1 — (SSres / SStot)

SSres is the sum of squares of the residuals (the differences between the predicted and actual values). Sstot is the total sum of squares (the differences between the actual values and the mean value of the dependent variable).

MAE is a metric representing the average absolute difference between the predicted and actual values.

It is calculated as:

MAE = (1/n) * Σ|yi — yhat|

Where yi is the actual value for a given observation, yhat is the predicted value, and n is the number of observations.

Interpreting the model coefficients is also an important part of evaluating the model. The coefficients represent the strength and direction of the relationship between the independent and dependent variables. A positive coefficient indicates a positive relationship, while a negative coefficient indicates a negative relationship. The magnitude of the coefficient indicates the strength of the relationship, with larger values indicating a stronger relationship.

Remember that the coefficients are only meaningful in the context of the input data.

For example, suppose you’re predicting housing prices. In that case, a coefficient of 100 for the square footage variable means that for each additional square foot of living space, the predicted price increases by \$100. This relationship may not hold for other types of data.

Evaluating the performance of a linear regression model involves calculating metrics such as R-squared and MAE and interpreting the coefficients to gain insights into the relationship between the input features and the target variable.

## Conclusion

Alright, you made it to the end!

Here’s a quick summary of what you learned from this blog post:

• Linear regression is a foundational technique in data science and machine learning used to predict numerical outcomes based on input features.
• The goal of linear regression is to find the line that best fits the data by minimizing the cost function, which is usually the mean squared error.
• Gradient descent is an iterative optimization algorithm that minimizes the cost function in linear regression.
• To implement linear regression using NumPy, you need to initialize the model parameters, prepare the data, train the model using gradient descent, and evaluate the performance using metrics such as R-squared and mean absolute error.

Now that you understand how linear regression works and how to implement it using NumPy, what’s next? The sky’s the limit!

Here are some potential next steps for applying linear regression to real-world problems:

• Try different learning rates and numbers of iterations to see how they affect the model’s performance.
• Experiment with different input features and see how they impact the model’s performance.
• Combine linear regression with other techniques, such as regularization, to improve the model’s performance.
• Apply linear regression to real-world data sets, like housing or stock prices, and see how well the model performs.
• Expand your machine learning knowledge by exploring other techniques like decision trees, random forests, and neural networks.

Thanks for reading, and good luck with your linear regression journey!

--

--

I create content about Deep Learning | DevRel Manager @ Deci AI | Forever Learning