Building a Handwritten Multi-Digit Calculator | by Neerav Gala | Dec, 2020
Using Convolutional Neural Networks (Keras API with Tensorflow backend)
The aim of this article is to :
- Create a CNN model that can identify digits and simple mathematical operators from an image
- Set up the mathematical expression and compute the answer
- If incorrect, update the model with the correct answer
In order to easily follow this article, I recommend understanding the basics of the following topics first:
- Machine Learning using Python
- Convolutional Neural Networks (refer to this youtube video by MIT 6.S191)
- Keras API
If you are familiar with these topics, the lets dive right into it!
- Creating the CNN Model
- Model Predictions
- Creating the Calculator
- Model Update
1.1 What is CNN?
Convolutional Neural Networks are a subclass of Deep Learning algorithms mainly used for analyzing visual imagery. High amounts of training data, increasing computational powers and advanced deep learning techniques have paved the way for CNNs to perform complex visual tasks .
It all started in 1958 when David H. Hubel and Torsten Wiesel performed a series of experiements to understand the structure of the visual cortex (for which the won the Nobel Prize in 1981). The authors found that some neurons had larger receptive fields and would react to complex patterns that were a combination of lower-level patterns detected by other neurons. These observations led to the idea that the higher level neurons are based on the outputs of neighboring lower-level neurons.
This powerful visual architecture evolved over the years until, a paper by Yann LeCun et al. in 1998, formulated the famous LeNet-5 architecture. In this paper, the concept of convolutional and pooling layers was introduced.
Today, many companies employ CNN for analyzing visual imagery. The following are a few examples:
- Tesla for Autopilot 
- Google Photos for Image Classification 
- Facebook Artificial Intelligence Research (FAIR) for Language Translation 
The objective of this article is to show you how to build a CNN model that can do the following:
The model is build with keras API (Tensorflow backend) using Python.
From the next section onwards, I request you to first copy the section’s code into your IDE before reading the section.
2.1 Preparing the dataset
The dataset used for this project can be found in here (file name is dataset.csv). It is an extension of the MNIST dataset and contains 85,709 images of Arabic numerals (0 to 9) and mathematical operators (+, – , * and /). Each image is of size 28 X 28 pixels. In order to be easily encoded, the mathematical operators are numerically labelled as follows:
- 10 represents “/” (division)
- 11 represents “+” (addition)
- 12 represents “-” (subtraction)
- 13 represents “/” (multiplication)
In the first code snippet, the downloaded datase is loaded as a Pandas dataframe of size (85709 X 785). The labels (y) are separated from the dataset (X).
785 = 1 (label) + 28 (height in pixels) * 28 (width in pixels)
To reduce the effect of illumination, the dataset is gray scale normalized by dividing the dataset by 255. Next the dataset is reshaped to fit the standards of a 4D Tensor ([mini-batch size, height = 28px, width = 28px, channels = 1 due to grayscale]).
Next, the labels are converted from class vectors (14 class vectors) to binary class matrices (necessary for train Keras models).
E.g. 2 -> [0,0,1,0,0,0,0,0,0,0,0,0,0,0]
Lastly, the data is split into training(90%) and validation(10%). The dataset is now ready to be used to train the model.
2.1 Building and training CNN Model
The CNN model is build using Keras’ Sequential model class. The table below shows the list of layers (and hyperparameters) passed to create the model.
Input and First Hidden layer
- First, a set of two Convolutional layers (32 filters and ReLU activation function) to identify the low level image patterns (lines, edges, etc.).
- Next, the Max Pooling layer (pooling size of 2 X 2) simply downsamples the filters to reduce computational load, memroy usage and number of parameters.
- Finally, a Dropout is used as regularization method. It randomly ignore 25% of the nodes during every training iteration.
Second Hidden Layer
- A set of two Convolutional layers (64 filters and ReLU activation function) to identify complex patterns that are a combination of lower-level patterns detected by the first layer.
- Max Pooling layer (pooling size of 2 X 2) for downsampling.
- Dropout for regularization.
Fully Connected Layer and Output
- First, a Flatten layer is use to convert the final feature maps into a single 1D vector (necessary for Dense layer input).
- Next, a fully-connected (Dense) layer that acts as an Artificial Neural Network.
- Dropout for regularization.
- Finally, another fully-connected (Dense) layers with Softmax activation as the output layer. The softmax function takes the elements of the output layer and transforms them into a net output distribution of the probability of each class. The class with the highest probability is taken as the model prediction.
Once the layers are added, the training parameters (loss function, score function and optimzation function) are defined.
The loss function measures the error rate between the observed and predicted labels. For this model, categorical_crossentropy is used as th loss function.
Next, the famous RMSprop is used as the optimizer algorithm to iteratively improves various model parameters. RMSprop is also one of the fastest optimizers.
The metric function “accuracy” is used as the score function to evaluate the performance our model.
In order to make the optimizer converge faster and closest to the global minimum of the loss function, the ReduceLROnPlateau function from Keras.callbacks is used as an annealing method of the learning rate (LR).
To avoid overfitting, the dataset is expanded by artificially altering the existing dataset (zooming, shifting, etc). This process is called Data Augmentation and is executed using Keras’ ImageDataGenerator.
Finally the model is trained on the dataset with 5 epoches and a batch size of 86.
- 412s - loss: 0.2837 - accuracy: 0.9143 - val_loss: 0.0606 - val_accuracy: 0.9831
- 439s - loss: 0.0776 - accuracy: 0.9771 - val_loss: 0.0348 - val_accuracy: 0.9901
- 410s - loss: 0.0622 - accuracy: 0.9818 - val_loss: 0.0292 - val_accuracy: 0.9919
- 413s - loss: 0.0578 - accuracy: 0.9837 - val_loss: 0.0285 - val_accuracy: 0.9916
- 410s - loss: 0.0540 - accuracy: 0.9845 - val_loss: 0.0322 - val_accuracy: 0.9917
The model took about 35 mins to train and reached a validation accuracy of 99.17 % after the 5th epoch.
The trained model is now ready to make some predictions! The model (with its trained structure and weights) can also be saved your computer.
Note: Section 1 & 2 of the article was inspired from this Kaggle Notebook.
Now it is time to use the trained model to analyze the image of a handwritten mathematical expression. The image (file name is testing.png) used in this artciel is available in my GitHub repository.
3.1 Element Separation
To analyze the mathematical expression, it first needs to broken down into individual elements which can be then identified by the model.
To do that, the image file (3066 pixels * 208 pixels) is first loaded in grayscale using Pillow (PIL).
Then the image is resized to a height of 28 px (model input requirement). Keeping the same aspect ratio, the width is adjusted as well.
Next, the image is converted from PIL.Image to a Numpy array. The background is changed to black by subtracting 255 and the image is gray scale normalized by dividing by 255. The image looks something like this.
Note: Use matplotlib.pyplot.imshow to display arrays as an image
Now the image array is split into individual element arrays. This is done by searching the image array for successive non zero columns and grouping them to form one element array. These element arrays are all stored in one list. Since there are 14 elements in the mathematical expression, the size of the list is 14.
In : len(out)
In order to identify the element using the model, it is important to have the image size of 28px*28px. The height is already 28px. However, due to the splitting process, the width of each element is not exactly 28px.
Therefore, after the splitting, the width of each element is adjusted to 28 px. by adding zero value columns (filler columns) to the element arrays.
This process is repeated for all elements until all are 28px *28px.
Finally, the elements list is converted back into an array of shape (14, 28, 28, 1) to fit the model input requirements of a 4D Tensor [mini_batch_size, height, width, channels].
In : elements_array.shape
Out: (14, 28, 28, 1)
Now these elements array is ready to for the model!
3.2 Prediction of elements
The elements array is passed through the model for prediction. The model returns the probabilities of all the number classes (Softmax function). The class with the highest probability is chosen.
In : print(elements_pred)Out: [ 9 8 10 7 6 13 5 4 11 3 2 12 1 0]
As you can see, the model was able to predict all the elements properly!
The next step is to build the mathematical expression out of these elements and calculate it. Lets get right into it.
Once all the individual elements are identified, it is time to create the mathematical expression and calculate it.
In order to create the mathematical expression, the number classes (0 to 9) are regrouped to form the multi digit numbers. While 10, 11, 12 and 13 classes are converted to “/”, “+”, “-” and “*” operators respectively.
[1 0] → Elements of class 1 and 0 are regrouped to form the number 10
 → Elements of class 10 are converted to the “/” operator
Finally, the numbers and operators are joined together to form a string respresenting the mathematical expression.
In : m_exp_str = math_expression_generator(elements_pred)In : print(m_exp_str)Out: 98 / 76 * 54 + 32 - 10
The eval() method is then used on the string to calculate the answer.
In : print(equation)Out: 98 / 76 * 54 + 32 - 10 = 91.63
Sometimes, due to wrong predictions like “4++4” the mathematical expressions are incorrect and hence, the eval() method will prompt a Syntax error. Hence, Error Handling method is used.
That’s it. The Multi Digit Handwritten Calculator is ready!
But wait, what to do about wrong prediction? The next section is all about that.
What makes Machine learning algorithms so powerful is its ability to learn from its mistakes. In order to do that the model needs to be retrained with correct information.
When there is a false prediction, the above code asks for the correct mathematical expression from the user. It then compares the predicted with the correct expression and identifies the elements that are wrongly predicted.
The code then trains the model from its second hidden layer till its output layer using the correct data (first hidden layer is not trained).
The updated model is saved and can be used later to make improved predictions.
The article aims to teach you the basics of how CNN models can be used to make simple tools like the calculator. In the future, these basic concepts can help you perform complex visualization analysis and build sophisticated models.
I hope you found this article helpful. If you have any questions or feedback, just leave a comment. Cheers!
 ‘Convolutional neural network’, Wikipedia. Dec. 03, 2020, Accessed: Dec. 09, 2020. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&oldid=992073762.
 A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc., 2019.
 ‘Autopilot AI’, Tesla, 2020. https://www.tesla.com/autopilotAI (accessed Dec. 09, 2020).
 ‘ML Practicum: Image Classification | Machine Learning Practica’, 2020. https://developers.google.com/machine-learning/practica/image-classification (accessed Dec. 09, 2020).
 ‘A novel approach to neural machine translation’, Facebook Engineering, May 09, 2017. https://engineering.fb.com/2017/05/09/ml-applications/a-novel-approach-to-neural-machine-translation/ (accessed Dec. 09, 2020).
 ‘Introduction to CNN Keras — 0.997 (top 6%)’. https://kaggle.com/yassineghouzam/introduction-to-cnn-keras-0-997-top-6 (accessed Dec. 09, 2020).
 M. D. Learning, ‘MIT Deep Learning 6.S191’, MIT Deep Learning 6.S191. http://introtodeeplearning.com (accessed Dec. 09, 2020).
Read More …