The history of the Turing test

The Turing test is a staple in the history of AI. It’s still discussed and used today, despite being almost 70 years old. Named after its creator Alan Turing, it tests a machine’s intelligence.

Specifically, the Turing test examines a machine’s ability to show intelligence indistinguishable from human behaviour. It measures this human-like intelligence based on natural language conversations. In the test, a human evaluator has text-based conversations with both a human and a machine. If the evaluator can’t tell the two apart, the machine has passed the test.

But how did this AI staple reach such widespread usage? Here, we look back on the history of the Turing test and the debates that surround it.

Alan Turing: can machines think?

Alan Turing, an English mathematician, invented the Turing test in 1950. He introduced the measure in his paper, titled Computing Machinery and Intelligence. The paper focused on machine intelligence, questioning whether machines can think.

In his now-famous work, Turing argued that a sufficiently trained computer could exhibit human-level intelligence.

The test itself drew from a popular party game at the time known as ‘The Imitation Game’. Based on this game, Turing asked two questions that would come to form the Turing test.

  1. What will happen if the player trying to deceive the interrogator is a machine?
  2. What happens if both the machine player and the human player try to deceive the interrogator?

The imitation game

So, what exactly was the game that the Turing test draws from?

The imitation game involves three players: A, B and C. Players A and B are a man and a woman, who sit in separate rooms. Player C is an interrogator.

Player C’s task is to ask questions and determine, based on the typed answers they receive from A and B, which player/gender is in each room. Player A’s task is to imitate the opposite gender, and trick C into getting the wrong answer. Meanwhile, Player B’s task is to answer as they normally would, as if to assist the interrogator.

Turing’s version

Transposed to AI, then, Turing’s version of the imitation game seeks to test whether we can detect if we are talking to machines or humans.

On one side of a computer screen sits a human judge. It is this judge’s job to chat to unknown interlocutors on the other side of a chat window. Most participants will be humans. One, however, will be a chatbot seeking to trick the judge into thinking that it is the real human.

If a machine (i.e. a chatbot) is mistaken for a human more than 30% of the time during a series of five-minute keyboard conversations, it passes the test.

Passing the Turing test

From its conception in 1950, the Turing test became a yardstick for measuring AI intelligence. It was used to test the earliest chatbots, ELIZA and PARRY, in the late 60s and early 70s. By the 90s, it had been incorporated into an annual competition – the Loebner Prize.

But it would take over sixty years for a machine powerful enough to pass it.

In 2014, a chatbot posing as a 13-year-old Ukrainian boy earned the title of the first machine to pass the Turing test. The bot, dubbed Eugene Goostman, convinced a third of the judges he spoke to that he was human.

But this success comes with controversy. Detractors were quick to point out that other chatbots had convinced a higher percentage of judges that they were human in the past. For instance, Cleverbot in 2011, which convinced 59% of its judges.

Further claims suggest that Eugene ‘passed’ the Turing test by cheating. That is, explaining odd or confusing answers away by claiming a foreign mother-tongue and young age.

Crucially, many argue that Eugene doesn’t embody the intelligence envisaged by Alan Turing when he introduced the test. If the bot isn’t displaying intelligence, but instead artificial stupidity, does it count?

The issues

The Turing test itself has held some issues from the beginning. For example, many have argued that the test draws a false equivalent. It suggests that intelligence and human behaviours are one and the same. But there’s a difference between true intelligence and human intelligence.

As a result, the Turing test could encourage artificial stupidity. For example, bots that have won competitions by reproducing typing errors. Because the test is about tricking people into thinking a bot is human, the goal becomes mimicry. (Rather than intelligence and ability.)

The Turing test is also arguably not useful in practice. Not every AI tool needs natural language processing to work. In these cases, AI doesn’t need to talk to people to prove intelligent. For these tools, the Turing test is of little use.

The flipside

But there must be some redeeming qualities. Otherwise, people wouldn’t talk about the Turing test today, almost 70 years later.

The Turing test brings simplicity to a complex topic. ‘Intelligence’ is a tricky quality to define. Often, what we consider intelligence in machines one day loses that label the next. The Turing test, despite its flaws, is a measurable solution.

The test is also flexible, to an extent. It allows for discussion covering a breadth of topics. This arguably provides a goalpost for artificial general intelligence (AGI). Similarly, it can measure machine intelligence around a narrow scope of topics. So, it can also signify strong artificial narrow intelligence (ANI).

The history of the Turing test

The Turing test isn’t perfect, but it is a core element of the history of AI. Who would have thought it started with a party game?

Useful links

What is machine learning? A beginner’s guide

A history of automation: The rise of robots and AI

Types of AI: distinguishing between weak, strong, and super AI