ELI5: what is image classification in deep learning?



When any of us look at a picture, we can (usually) identify what it depicts with ease. Pointy ears, whiskers, look of annoyance: obviously a cat. Wheels, windows, red metal: it’s a car. We learn this skill early — it’s second nature to us.

Computers don’t find this task quite as easy. They don’t ‘see’ the world the same way that we do. Image classification, then, is a challenge for machines. Which is where deep learning comes in.

So, what exactly is image classification in deep learning? Here’s an ELI5 overview.


Image classification

Image classification is where a computer can analyse an image and identify the ‘class’ the image falls under. (Or a probability of the image being part of a ‘class’.) A class is essentially a label, for instance, ‘car’, ‘animal’, ‘building’ and so on. 

For example, you input an image of a sheep. Image classification is the process of the computer analysing the image and telling you it’s a sheep. (Or the probability that it’s a sheep.)

For us, classifying images is no big deal. But it’s a perfect example of Moravec’s paradox when it comes to machines. (That is, the things we find easy are difficult for AI.)

Early image classification relied on raw pixel data. This meant that computers would break down images into individual pixels. The problem is that two pictures of the same thing can look very different. They can have different backgrounds, angles, poses, etcetera. This made it quite the challenge for computers to correctly ‘see’ and categorise images.

Enter deep learning.


Teaching computer to recognize images and classify them. Source: Medium

Adding deep learning

Deep learning is a type of machine learning; a subset of artificial intelligence (AI) that allows machines to learn from data. Deep learning involves the use of computer systems known as neural networks.

In neural networks, the input filters through hidden layers of nodes. These nodes each process the input and communicate their results to the next layer of nodes. This repeats until it reaches an output layer, and the machine provides its answer.

There are different types of neural networks based on how the hidden layers work. Image classification with deep learning most often involves convolutional neural networks, or CNNs. In CNNs, the nodes in the hidden layers don’t always share their output with every node in the next layer (known as convolutional layers).

Deep learning allows machines to identify and extract features from images. This means they can learn the features to look for in images by analysing lots of pictures. So, programmers don’t need to enter these filters by hand.


Computer “vision” via data. Source: KDnuggets

Why is image classification useful?

Image classification has a few uses — and vast potential as it grows in reliability. Here are just a few examples of what makes it useful.

Self-driving cars use image classification to identify what’s around them. I.e. trees, people, traffic lights and so on.

Image classification can also help in healthcare. For instance, it could analyse medical images and suggest whether they classify as depicting a symptom of illness.

Or, for example, image classification could help people organise their photo collections.


Self driving car object detection & classification. Source: YouTube

Image classification explained

Simply put, image classification is where machines can look at an image and assign a (correct) label to it. It’s a key part of computer vision, allowing computers to see the world as we do. And with the invention of deep learning, image classification has become more widespread.

Deeper exploration into image classification and deep learning involves understanding convolutional neural networks. But for now, you have a simple overview of image classification and the clever computing behind it.


Further reading