Machine learning poisoning in 500 words or less



For every new thing we create, there’s a chance that someone out there wants to ruin it. Artificial intelligence and machine learning are no different.

As more functions of daily life get handed to machine learning and AI, more ways to attack the technologies emerge.

And one way that adversaries attack these algorithms is by poisoning them. Enter a little-known practice called ‘machine learning poisoning’.


What is machine learning poisoning?

Machine learning poisoning is one of the most prevalent methods used to attack ML systems. It describes attacks in which someone purposefully ‘poisons’ the training data the algorithm uses. The goal: corrupting or weakening it.

Poisoning attacks see malicious parties add or create bad data in the machine learning training data pool. Doing so seeks to shift or change the learned boundaries of an ML program. (That is, make the algorithm misinterpret input and learn incorrect classifications.)


Types of attack explained

There are different types of attack that count as machine learning poisoning. Which attack gets used depends on a variety of factors. These include the goal of the attacker and the level of knowledge and access they have to a machine learning system.

Most poison attacks are those that inject (or corrupt) enough training data that the system gives incorrect or skewed outputs. In other words, the system learns incorrect classifications and biases. And, as a result, it cannot reach correct or unbiased answers.

A more sophisticated machine learning poisoning attack is one that poisons the training data not to shift boundaries, but to create a backdoor. This means that the bad data teaches the system a weakness that the attacker can later use. Other than this one weakness, the ML system should work as expected.


How attackers poison machine learning algorithms

There are a few ways that attackers can conduct machine learning poisoning.

  • Poison through transfer learning

Attackers can teach an algorithm poison and then spread it to a new ML algorithm with transfer learning. This method is the weakest, as the poisoned data can become drowned out by more, non-poisoned learning.

  • Data injection and data manipulation

Data injection is where the attackers add (or ‘inject’) the bad data to the training data pool of the ML algorithm. Meanwhile, data manipulation requires more access to the system’s training data, as it’s where the attackers change existing data. For example, they might manipulate labels. (Saying a picture of a cat is of a dog, for example.)

  • Logic corruption

The most impactful poisoning attack is known as logic corruption. Here, attacker poisons (and changes) the way the algorithm learns. So, the algorithm cannot learn correctly.


Machine learning poisoning

Machine learning poisoning is a problem for AI engineers. While there are a few ways to defend against it — from anomaly detection to accuracy tests — no methods are 100% effective.

However, carefully monitoring the training data that machine learning programs use is a good start to keeping algorithms poison-free.


Useful links

What is machine learning? A beginner’s guide

Algorithmic bias was born 40 years ago

Transfer learning in layman’s terms


Download