## Detailed layout of a logistic regression algorithm with a project

Logistic regression is very popular in machine learning and statistics. It can work on both binary and multiclass classification very well. I wrote tutorials on both binary and multiclass classification with logistic regression before. This article will be focused on image classification with logistic regression.

If you are totally new to logistic regression, please go to this article first. This article has a detailed explanation of how a simple logistic regression algorithm works.

It will be helpful if you are familiar with logistic regression already. If not, I hope you will still understand the concepts here. I tried to explain it well.

If you are reading this to learn, the only way is to run all the codes by yourself.

## Problem Statement

The idea of this project is to develop and train a model that is able to take the pixel values of a digit and identify if it is an image of the digit one or not.

The dataset that will be used in this tutorial is very commonly used in machine learning tutorials. The famous digits dataset. Each row of the dataset represents the flattened pixel values of a digit. I will show you in detail later.

## Data Preparation

This dataset contains the pixel values of the digits from zero to nine. But because this tutorial is about binary classification, the goal of this model will be to return 1 if the digit is one and 0 otherwise. Please feel free to download the dataset from the link below to follow along:

Here I am importing the dataset:

`import pandas as pdimport numpy as npdf= pd.read_excel('ex3d1.xlsx', 'X', header=None)df.head()`

You can see that the dataset has 400 columns. That means each row has 400-pixel values and each row represents one digit. Let’s check some of the digits using the ‘imshow’ function of the matplotlib library. Notice that the pixel values of images are originally not one-dimensional. That’s why it was reshaped into a 20 x 20 two-dimensional array before passing into the ‘imshow’ function.

`import matplotlib.pyplot as pltplt.imshow(np.array(df.iloc[500, :]).reshape(20,20))`

It’s one! Here I used the 500th row of the dataset.

Here is another one using the 1750th row of the dataset:

`plt.imshow(np.array(X.iloc[1750, :]).reshape(20,20))`

It’s three.

Let’s check how many rows are in this dataset:

`len(df)`

Output:

`5000`

Labels are stored in a different sheet in this excel file. Here are the labels:

`df_y= pd.read_excel('ex3d1.xlsx', 'y', header=None)df_y.head()`

I am only showing the head of the dataset that brings the first five rows. Because this model will identify the digit 1 only, it will return 1 if the digit is 1 and 0 otherwise. So, in the label, I will keep only 1 and the rest of the digits will become zero. Let’s convert the rest of the digits as zeros.

For that

`y = df_y[0]for i in range(len(y)):if y[i] != 1:y[i] = 0y = pd.DataFrame(y)y`

Out of these 5000 rows of data, 4000 rows will be used to train the model, and the remaining 1000 rows will be used to test the model. It is important for any machine learning or deep learning model to be tested by unseen data to the model.

`x_train = X.iloc[0:4000].Ty_train = y.iloc[0:4000].Tx_test = X.iloc[4000:].Ty_test = y.iloc[4000:].T`

Using .T, we are taking the transpose of each dataset. These training and test datasets are in DataFrame form. They need to be in an array format for the convenience of calculation.

`x_train = np.array(x_train)y_train = np.array(y_train)x_test = np.array(x_test)y_test = np.array(y_test)`

The training and test datasets are ready to be used in the model. This is the time to develop the model.

Step 1:

The logistic regression uses the basic linear regression formula that we all learned in high school:

Y = AX + B

Where Y is the output, X is the input or independent variable, A is the slope and B is the intercept.

In logistic regression variables are expressed in this way:

Formula 1