Creating VGG from Scratch using Tensorflow | by Arjun Sarkar | Oct, 2020


We will see easy methods to implement VGG16 from scratch using Tensorflow 2.0

Figure 1. VGG 16 structure (Source: Image created by creator)

LeNet-5 was one of many oldest convolutional neural community architectures, designed by Yann LeCun in 1998, which was used to acknowledge handwritten digits. It used 5×5 filters, common pooling, and no padding. But by trendy requirements, this was a really small neural community and had solely 60 thousand parameters. Nowadays, we see networks which have a variety of 10 million to some billion parameters. The subsequent large Convolutional neural community that revolutionized using a convolutional community was AlexNet which had roughly 60 million parameters. The first layer of AlexNet makes use of 96 filters with kernel measurement 11×11, with strides of 4. The subsequent layer makes use of 3×3 filters, and so forth. Also, AlexNet makes use of Max Pooling and padding, which weren’t utilized in LeNet-5. AlexNet was similar to LeNet-5, nevertheless it was a lot greater. Also, AlexNet makes use of the ReLU activation perform, whereas LeNet-5 primarily used the Sigmoid activation. What these networks had in frequent is that, as we go deeper into the community, the scale of the tensor saved on reducing, whereas the variety of channels saved on rising. Also, one other development that’s nonetheless used these days whereas creating neural community architectures is using Convolutional layers (one or a number of) adopted by some Pooling layers, and in the long run, some totally related layers.

The subsequent large convolutional neural community was the VGG community. The exceptional factor about VGG was that, as an alternative of getting so many hyperparameters, the authors used a a lot easier community, the place the main target was on using convolutional layers with small sizes of 3×3 filters, with a stride of 1 and using the ‘same’ padding, and make all of the MaxPooling layers 2×2 with a stride of two. VGG tremendously simplified the beforehand made neural community architectures.

VGG paper hyperlink —

VGG 16 structure and implementation using Tensorflow:

Figure 2. VGG architectures. VGG 16 highlighted in crimson (Source: Image is from the unique paper)

Figure 2 exhibits all of the VGG architectures. The structure of VGG 16 is highlighted in crimson. An easier model of the structure is introduced in Figure 1.

VGG community makes use of Max Pooling and ReLU activation perform. All the hidden layers use ReLU activation and the final Dense layer makes use of Softmax activation. MaxPooling is carried out over a 2×2 pixel window with a stride of two.

VGG 16 has 5 convolutional blocks and three totally related layers. Each block consists of two or extra Convolutional layers and a Max Pool layer.


  1. import all the required layers
  2. Write code for the convolutional blocks
  3. Write code for the Dense layers
  4. Build the mannequin

Importing the libraries:


Input is a 224×224 RGB picture, so Three channels.

Conv Block 1:

It has two Conv layers with 64 filters every, adopted by Max Pooling.

Conv Block 2:

It has two Conv layers with 128 filters adopted by Max Pooling.

Conv Block 3:

It has three Conv layers with 256 filters adopted by Max Pooling.

Conv Block Four and 5:

Both Conv blocks Four and 5 have 3 Conv layers with 512 filters adopted by Max Pooling.

Dense layers:

There are Three totally related layers, the primary two layers with 4096 hidden models and ReLU activation and the final output layer with 1000 hidden models and Softmax activation.

Creating the Model:


Plotting the Model:

Output Snippet:


The VGG community is a quite simple Convolutional Neural Network, and resulting from its simplicity may be very straightforward to implement using Tensorflow. It has solely Conv2D, MaxPooling, and Dense layers. VGG 16 has a complete of 138 million trainable parameters.

VGG was the deepest CNN mannequin structure throughout its publication with a most of 19 weight layers. It achieved state-of-the-art efficiency within the ImageWeb problem and confirmed that deeper networks are helpful for higher classification accuracy.


Source hyperlink

Write a comment