This article discusses the basics of linear regression and its implementation in Python programming language.






Linear regression is a statistical approach for modelling relationship between a dependent variable with a given set of independent variables.

Note: In this article, we refer dependent variables as response and independent variables as features for simplicity.

In order to provide a basic understanding of linear regression, we start with the most basic version of linear regression, i.e. Simple linear regression.

 Simple linear regression is an approach for predicting a response using a single feature

It is assumed that the two variables are linearly related. Hence, we try to find a linear function that predicts the response value(y) as accurately as possible as a function of the feature or independent variable(x).
Let us consider a dataset where we have a value of response y for every feature x:

x = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
y = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

n = number of observations = 10

we have to define best fit line for scatter point that predicts output y for x
so lets define linear line that is called linear regression

The equation can be defined as

  • h(x_i) represents the predicted response value for ith observation.
  • b_0 and b_1 are regression coefficients and represent y-intercept and slope of regression line respectively.

To create our model, we must “learn” or estimate the values of regression coefficients b_0 and b_1. And once we’ve estimated these coefficients, we can use the model to predict responses!
In this article, we are going to use the Least Squares technique.
Now consider:


Here, e_i is residual error in ith observation.
So, our aim is to minimize the total residual error.
We define the squared error or cost function, J as:


and our task is to find the value of b_0 and b_1 for which J(b_0,b_1) is minimum!
Without going into the mathematical details, we present the result here:

where SS_xy is the sum of cross-deviations of y and x:


and SS_xx is the sum of squared deviations of x:


Given below is the python implementation of above technique on our small dataset:



   the predicted regression line will be plotted