Amanda Maxwell

Mar 15th 2023

Using the Turing Test to Assess AI for Human Intelligence


Artificial intelligence (AI) is defined by the U.S. Department of Defense as “the ability of machines to perform tasks that normally require human intelligence.”

But just how intelligent is AI, and how can that be measured? One tool that seems perfect for the task is the Turing test, also known as the “imitation game,” proposed by British mathematician Alan Turing. His 1950 paper, “Computing Machinery and Intelligence,” laid out a framework for deciding if machines could pass a test of being indistinguishable from a human.

However, this might not be the best measure of intelligence in AI today. As Peter VanMaasdam, Lead AI Architect, Advanced Intelligent Systems Business Unit with Northrop Grumman, explains, the test itself focuses on only one aspect of human behavior: language.

“There are many more tasks humans do to display intelligence,” he explains. “Not to mention that many non-humans (e.g., dogs, parrots and so on) don’t really have language, and yet, we view them as intelligent.”

What Is the Turing Test?

In essence, the Turing test measures the ability of a computer to pass itself off as a human being.

VanMaasdam describes the process as such: “The Turing test, in its classical form, refers to a thought experiment by Alan Turing in the very early days of AI development in the 1950s. He called it the ‘imitation game,’ whereby two people go into separate rooms and write each other questions, which the other person then answers.”

An observer or judge, who is blinded to whom the responses are coming from, assesses if a machine replies in the same way as a person. In other words, is the computer behaving like Turing predicted and appearing like a person when holding a conversation?

According to VanMaasdam, the Turing test could be applied to AI. “He [Turing] claimed that if a machine replaced one of those people, that machine (an AI in our case) would be intelligent if the other person couldn’t tell the difference between the AI responses and a human’s responses.”

Currently, with the rise of smartphones, chatbots and other tools, you’d be forgiven if you thought AI has already passed the Turing test. In 2018, Google’s Duplex successfully phoned in and booked a hair appointment. The new kid on the scene, ChatGPT, is certainly ruffling feathers in the coding and writing communities with its seemingly erudite responses to the most general of prompts. There’s even a tool that will write scripts for your online business meetings.

If these tools can fool your colleagues, has the imitation game been won already?

Does AI Measure Up to Human Intelligence?

Not so fast! Before declaring that AI is already on a level with human intelligence, let’s consider what exactly AI is and what it isn’t. As VanMaasdam points out, the Turing test is only directly applicable to measuring one facet of an AI’s intelligence: namely, its language skills and ability to convincingly hold a conversation.

The Department of Defense definition is useful, as it directly ties AI solutions to defined tasks and missions. According to VanMaasdam, this is more focused than the concept of “artificial general intelligence,” which is how the public often thinks of AI.

“[The Turing test] missed the concept of generalization by only focusing on the single task of language,” he explains. “Most people assume that any AI can act in a general fashion; for example, an Aided Target Recognition (AiTR) system trained for object detection will somehow gain consciousness and understand language.”

However, this isn’t what AI is right now, and human intelligence relies on more than just a narrow band of expertise. Current AI tools are more task-focused, looking only at information in one subject area. This isn’t a bad thing, though. Tying AI to only one specific task means it can function really well, often better than a person could. And by expanding on what is measured during the imitation game, the definition of intelligence could be further expanded.

“One could broaden the test to other AI tasks. For example, with an AiTR, we could call that AI intelligent if it performed as well as or better than a human at a given task of object detection and recognition,” suggests VanMaasdam. “In fact, these kinds of tests have often been performed in that field, and we see that the AiTR usually performs equally well or slightly better than a human but has the advantage of not getting tired or distracted.”

Even so, that’s not quite on par with human intelligence.

Measuring Intelligence in AI

In the end, maybe the test only measures how well AI deceives us. Is AI not actually intelligent but merely extremely clever at impersonation, as suggested by The Washington Post?

Or maybe, as VanMaasdam explains, we’re just not measuring the correct parameters to assess AI intelligence. “Once we’ve tied AI to a specific task, let’s say AiTR, the best way to measure its intelligence is by using standard performance metrics within that domain. In that case, probability of detection, false alarm rate and other metrics have been used by the community for decades.”

In other words, to correctly measure the intelligence of AI, we not only need to recognize that human intelligence is more than correct responses, but we also must apply the correct parameters to show performance. To support the latter point, VanMaasdam says we’ve already proved that modern AI techniques — deep neural networks, in particular — are out-performing previous methods in a rigorous engineering fashion.

For the former point, though, we need to recognize that awareness, sentience and an understanding of what it means to be human are required before AI achieves human intelligence. For this, it would need to become artificial general intelligence with a much broader scope, awareness and learning capabilities. The AI would need to be able to detect and respond to changing situations in the world around it, much as people do, collecting physical interaction and sensory data that fine-tunes a response. AI needs to read its world.

“Another way of measuring intelligence is the ability of the AI to cope with conditions outside the realm of the data used to train it,” says VanMaasdam. With the AiTR example, if it was trained on imagery during the day and with grass backgrounds, measuring how well it performs on images from the night on a parking lot background could show intelligence. “A well-trained AI will show little performance difference or at least gracefully degrade when faced with these new datasets.”

Awareness and ability for self-improvement also count. “AI would become even more intelligent at the AiTR task if it could measure itself when operating,” continues VanMaasdam. “And if it sees that it isn’t correctly categorizing objects (either through human correction or, ideally, by itself), it would continue to learn as it sees new data and new objects.”

Intelligence is more similar to Skynet becoming self-aware than being able to trick a remote observer with eloquent conversation. Right now, though, AI is only as intelligent as the datasets used in training and on the algorithms used to build it.

“We are nowhere near an understanding of human intelligence in a mathematical fashion that would allow us to engineer a true artificial general intelligence,” says VanMaasdam.

In short, you can sleep soundly knowing that robots aren’t in charge yet — and probably won’t be for at least the rest of this century.

Are you interested in all things related to technology? We are, too. Check out Northrop Grumman career opportunities to see how you can participate in this fascinating time of discovery.