A Universal Turing Test

This was originally written as a journal entry for my philosophy class. I’ve lightly edited it to re-purpose it for a blog format.

Someone told me about a video game–playing computer that learned how to pause the game of Tetris to avoid a loss. I was surprised and somewhat skeptical. On the Web, I located the original paper (which, given its publishing date, is quite light by research paper standards, but nevertheless scientifically rigorous) by Dr. Tom Murphy, named “The First Level of Super Mario Bros. is Easy with Lexicographic Orderings and Time Travel . . . after that it gets a little tricky.”

The software is able to play video games—in fact, it can play any video game for the NES provided it “watches” a human play first—and surprisingly, the method applied is very simple (Murphy 1–22). The software does not consider video or sound feedback from the game, but instead inspects the game’s memory directly.

From looking at data accumulated during a human’s successful playthrough, it identifies regions of memory where values generally increase as the human gets closer to winning. It then concludes that increasing those values will also get it closer to winning. Then, it looks at input sequences that the human player uses frequently. When playing the game, the software simulates each of those input sequences, determines which one will increase the values it identified earlier most, and executes that input sequence.

This strategy works well for some games. In the case of Super Mario Bros, the memory regions it identifies from human play include numbers like the score or position in the level, so the software then attempts to maximize score and position when playing the game itself through brute-force simulation. However, the strategy is extremely simple and is far away from anything that could be classified as intelligent “thought”.

Even the advanced behaviour the computer displays, such as the ability to pause Tetris to prevent losing, is just a consequence of testing possible input sequences and discovering that none of them except pausing the game prevent losing. Being used to very “dumb” computers, however, it’s shocking to see something so advanced now. Will a computer eventually display human-like intelligent thought?

Perhaps why learning about this software surprised me so much was how general it was, being able to play games as different as Super Mario Bros and Tetris—even if it wasn’t good at playing the latter. The first computer software, which still makes up the majority of software used today, are incredibly fast at doing a very specific computation-related task. This software takes inputs and applies a linear sequence of steps to get the desired output.

A calculator app would be an example of this. Even a auto-correcting word processor is extremely linear—when I press the “Space” key, it looks at the last word I typed, compares it to all the words in its dictionary, and if it matches one word very closely, it will correct the word for me. This software is generally good at tasks that humans are not good at, but it can only perform a very restricted set of tasks. It is useful, but not intelligent.

More advanced and interesting computer software tackles decision-making based on a variety of information. A chess-playing computer, like IBM’s Deep Blue, is an example of this. This software considers all the information about where the pieces are, and then simulates millions of possible moves before making a decision. The limitation of this kind of software is that still, the behaviour is dictated by a human. A human “told” the computer the rules of chess, what kind of positions are good, and how to search for the best move. The computer’s only input was doing the actual search.

The Mario Bros–playing computer is a step above this. It was never told what made a Mario Bros position good or bad; instead, it got this information by watching the human play. In this case, it even figured out for itself a “goal” of sorts—to maximize the score of the game—despite never being told that the game even had a score. The fact that it is so simple means that much of the computer’s behaviour was determined by itself instead of prescribed by a human.

Of course, the method the computer used to decide on these goals, and the way it searched through possible actions to find the best action, were still nevertheless programmed by a human. But if the program were allowed to watch another human player play another game, it could play that game too. However, the new game could have been one that Tom Murphy (the programmer) had never even played, or even known about. This adaptability seems to be some lower form of intelligence, at least beyond the intelligence of Deep Blue, which could not even play Checkers despite its similarity to Chess.

Alan Turing believed that determining whether machines “think”, in the common-sense interpretation of that word, was “ambiguous and biased” (Paquette et al. 147). He instead proposed the Turing test as a reasonable assay for displaying human-like intelligence. His original test involved an independent judge trying to distinguish between a human and a machine claiming to be human.

This test could take many forms, however, the most common being where communication is through email and text-only. I personally think that this is a very rigged test, in the machine’s favour, since text is a very restrictive format. To take an example to an extreme, suppose that the judge were restricted to the numeric digits 0–9 and the symbols “+” and “−”. This test would not be very useful at demonstrating intelligence, since even a calculator could pass the test. Indeed, the first machine to pass the text-through-email test would probably be failed if the judge were allowed to send a picture of a bird with the caption “What is this?”. Failing that, the judge can send an instruction like “draw me a picture of a bird using crayons”.

A universal Turing test should allow the judge to use whatever method he or she likes to try to tell apart human and machine, and such a Turing test obviously cannot be passed yet. The main reason that universal Turing test can’t be passed is the same reason Deep Blue couldn’t play Checkers—up to recently, computers could only do specifically what they were told to do. It could not draw a bird unless the programmer told it how to draw a bird. But perhaps that is changing, with Murphy’s computer being able to play games Murphy doesn’t even know about. If Murphy’s innovations are adapted to other fields, perhaps eventually a computer would be able to draw a bird after watching a human do it.

If this progress continues, I think that it’s certain that computers will eventually pass the universal Turing test, and therefore display human-like intelligence (whether that means they “think” is a question that, like Turing mentioned, ambiguous and perhaps even more difficult to answer). This answers my question in the affirmative. The remaining hurdle is the one that Murphy has somewhat successfully solved for video games: computers must be able to learn to do things beyond what they are explicitly told how to do.

Works Cited

Murphy, Tom. “The First Level of Super Mario Bros. is Easy with Lexicographic Orderings and Time Travel . . . after that it gets a little tricky.” (2013).

Paquette, Paul G., et al. Philosophy: Questions & Theories. Toronto: McGraw-Hill Ryerson, Limited. Print.