Convolutional Neural Network Architectures

CNN Architectures

Petty neural networks with a couple of perceptrons are good for academic study. But for solving real life problems, we need much bigger networks. In fact, neural networks has been an academic topic for few decades. But today, it has started hitting the reason because of the ability to process bigger networks. But just increasing the count of perceptrons is not enough. They need to be laid out in a good architecture. Else it just does not add any value to the network. It can also load the network and reduce its performance.
We always have a risk of overfitting. In deep networks, we have another very serious problem. It is possible that the entire model does not align well to the data. The regression is based on the first and the last layer, it is quite possible that the intermediate layers sway around and the model is not created properly.
Identifying a good network architecture is not an easy job. Years of research have gone into it, and we look forward to a lot more research and improvement in the performance. Below is a list of important architectures that researchers have identified.


This was one of the first successful applications of Convolutional Networks. It was developed by Yann LeCun in 1998. It can be used for minor tasks like reading zip codes, digits, etc. It is targetted for a small input size of 32x32 pixels. The first layer of the network consists of a convolution followed by a pooling. This second layer is another layer of convolution followed by pooling. The output of this is stretched into a linear vector and processed using dense networks.


The first work that popularized Convolutional Networks in Computer Vision was the AlexNet, developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton. The AlexNet was submitted to the ImageNet ILSVRC challenge in 2012 and significantly outperformed the second runner-up (top 5 error of 16% compared to runner-up with 26% error). The Network had a very similar architecture to LeNet, but was deeper, bigger, and featured Convolutional Layers stacked on top of each other (previously it was common to only have a single CONV layer always immediately followed by a POOL layer).

ZF Net

The ILSVRC 2013 winner was a Convolutional Network from Matthew Zeiler and Rob Fergus. It became known as the ZFNet (short for Zeiler & Fergus Net). It was an improvement on AlexNet by tweaking the architecture hyperparameters, in particular by expanding the size of the middle convolutional layers and making the stride and filter size on the first layer smaller.


VGGNet. The runner-up in ILSVRC 2014 was the network from Karen Simonyan and Andrew Zisserman that became known as the VGGNet. Its main contribution was in showing that the depth of the network is a critical component for good performance. The network consists of sequential layers of convolution + pooling with increasing number of channels and decreasing Their final best network contains 16 CONV/FC layers and, appealingly, features an extremely homogeneous architecture that only performs 3x3 convolutions and 2x2 pooling from the beginning to the end. Their pretrained model is available for plug and play use in Caffe. A downside of the VGGNet is that it is more expensive to evaluate and uses a lot more memory and parameters (140M). Most of these parameters are in the first fully connected layer, and it was since found that these FC layers can be removed with no performance downgrade, significantly reducing the number of necessary parameters.

Residual Networks

ResNet. Residual Network developed by Kaiming He et al. was the winner of ILSVRC 2015. It features special skip connections and a heavy use of batch normalization. The architecture is also missing fully connected layers at the end of the network. The reader is also referred to Kaiming’s presentation (video, slides), and some recent experiments that reproduce these networks in Torch. ResNets are currently by far state of the art Convolutional Neural Network models and are the default choice for using ConvNets in practice (as of May 10, 2016). In particular, also see more recent developments that tweak the original architecture from Kaiming He et al. Identity Mappings in Deep Residual Networks (published March 2016).
This layout of perceptrons is termed as a residual block. Essentially it feeds forward the output of one layer two layers ahead. This helps ensure that the values do not deviate in the intermediate layers.
A chain of such residual blocks allows a deep network ensuring that each layer adds value to the model.

Inception Network (GoogleNet)

GoogLeNet. The ILSVRC 2014 winner was a Convolutional Network from Szegedy et al. from Google. Its main contribution was the development of an Inception Module that dramatically reduced the number of parameters in the network (4M, compared to AlexNet with 60M). Additionally, this paper uses Average Pooling instead of Fully Connected layers at the top of the ConvNet, eliminating a large amount of parameters that do not seem to matter much. There are also several followup versions to the GoogLeNet, most recently Inception-v4.
This layout of perceptrons is termed as an inception block.

The GoogleNet consists a number of such blocks laid out in sequence. It introduces an interesting way to reduce overfitting. Multiple points in the network are pulled out to create intermediate check points that are used for training. That ensures that the model does not deviate in any of the intermediate layers and each layer indeed adds value to the model.


The deep learning architectures is an ongoing field of study. More and more architectures are proposed by researchers and that generates better and better accuracy.