ELI5: what is a convolutional neural network?

Artificial intelligence, machine learning, deep learning, neural networks. The deeper you investigate the world of AI programming, the more detailed (and complicated) it gets.

And the field of computer vision is no different. Indeed, computer vision immediately branches out into subfields such as image classification, facial recognition, handwriting recognition, etc.

Then, the same applies to the deep learning and neural networks that make computer vision possible. That is, there are different types used for different functions.

As you explore in-depth, computer vision and deep learning become all about the convolutional neural network (CNN). But what exactly is a CNN, in layman’s terms?

What’s a neural network?

Before understanding what makes a neural network ‘convolutional’, it’s helpful to cover what a neural network is.

A neural network is a way for a computer to process data input. They’re inspired by biological processes found in human and animal brains. Neural networks are comprised of various layers of ‘nodes’ or ‘artificial neurons’. Each node processes the input and communicates with the other nodes.

In this way, input filters through the processing of a neural network to create the output, or answer.  

Convolutional neural networks were inspired by animal vision. The way the nodes in a CNN communicate with each other resembles the way some animals see the world.

So, rather than taking everything in as a whole, small areas of an image are taken. And these small areas overlap to cover the whole image.

The different hidden layers

All neural networks have an input layer, hidden layers, and an output layer. It’s in the different types of hidden layers that differentiate a convolutional neural network from other types of neural nets.

A CNN has the following hidden layers:

•       Convolutional layers

Convolutional layers are the layers that give convolutional neural networks the name. In convolutional layers, the nodes apply their filters to an input image.

Instead of looking at the whole picture at once, it scans it in overlapping blocks of pixels. The goal of convolutional layers is to identify and extract features from the image.

In simple terms, the filters assign a value to the pixels that match them. The more they match, the higher the value. In this way, the neural network convolves (entwines) the filter information with the input image to create a ‘feature map’.

•       Pooling layers

Next in the hidden layers of a convolutional neural network are pooling layers.

In a pooling layer, all the values of the pixels in each feature map are ‘pooled’ together. This reduces the resolution of the feature maps from the convolution layers. The smaller representation of the images means fewer parameters and less computation.

Pooling is the step that makes it possible to detect objects regardless of where in the image they’re located.

In other words, pooling layers give flexibility to your convolutional neural network. They stop the computer from putting too much weight on specific features being specific places. (A problem known as overfitting.)

•       ReLU layers

Then comes a CNN’s ReLU layers.

‘ReLU’ stands for ‘rectified linear unit’. The purpose of a ReLU layer is to introduce non-linearity. Linearity is where things happen in a specific order. It does this by changing the value of pixels with negative values to zero.

In simple terms, ReLU layers essentially allow the computer to better handle more complicated data. (Like images, which are non-linear.)

•       Fully connected layers

Last, a CNN has fully connected layers.

In a convolutional layer, the nodes only receive or share information from part of the layer before it. In a fully connected layer, every node receives the input from every node in the previous layer. 

Fully connected layers are like those you would find in the hidden layers of an artificial neural network. This is where all the features extracted by the convolutional neural network get combined. This means that the computer sees the whole image — which can help with generating an accurate output.

What are CNNs used for?

Convolutional neural networks sit behind a few AI functions. Along with enabling deep learning, CNNs are particularly useful for tasks that use computer vision.

Perhaps the most common use of CNNs, for example, is in image classification and recognition. Think things like facial recognition, and handwriting recognition. Convolutional neural networks let computers ‘see’ pictures. (Including pictures of handwritten words and numbers.)

Although much less common, CNNs are also being looked at to help with video analysis. This is more challenging than image classification (an already difficult task for computers). This is because it adds measurements of movement in space and time.

You can even find convolutional neural networks behind the machines that play games like checkers and Go.

Convolutional neural networks

This overview has only scratched the surface of convolutional neural networks. They’re an advanced and complex topic.

But in easy terms, all you need to know is that CNNs are a type of neural network that help computers understand images.

Useful links

ELI5: what is an artificial neural network?

What is Moravec’s paradox and what does it mean for modern AI?

ELI5: what is image classification in deep learning?