The impact of AI—can a robot get into the University of Tokyo?

The ‘Todai Robot Project (Can a robot get into the University of Tokyo?)’ was initiated by the National Institute of Informatics (NII), the Japan’s only general academic research institution of informatics, in 2011 as an AI grand challenge. The goal of the project is to create an AI system that answers real questions on university entrance examinations. This paper reports the current status of the project including the underlying technologies we have developed thus far, and the results we obtained from evaluations. There have been various types of AI challenges in the past: the chess and shogi matches against professional players, the quiz show challenge against human champions. Todai Robot Project is unique mainly in the following two senses. First, it targets the integrated intelligent tasks rather than single ones. Examinees must take tests of eight subjects, two social studies, two sciences, Japanese (including Japanese and Chinese classics), English, and two mathematics. The tasks require not only the development of ground-breaking underlying technologies in various AI research areas but also their interdisciplinary research synthesis. Second, it enables us to compare the performance of the software and numerous well-educated students. University entrance examinations in Eastern Asian countries, including Japan, are known to be quite competitive and they cover various skill areas and fields. More than half a million high school graduates take National Center Test for University Admissions (NCTUA), standardized multiple choice style tests, every year in Japan, and less than top 3% students are allowed to take the second written test specially designed to select entrants of theUniversity of Tokyo. Our research team developed a testbed [1, 2] that utilized the resources taken from the history problems asked in NCTUA, and organized international evaluation tasks at NTCIR-9, 10 and 11. The participants pursued mainly two approaches to solving history questions. One is traditional statistical factoid approach. Kanayama and Miyao [3] manually converted original true/false world history questions to a set of factoid-style questions and achieved an accuracy of 65% on the NCTUA data by employing Watson’s factoid QA engine as the backend system. Kano [4] developed domain-independent and language-independent keyword-based system, and achieved an accuracy of 51% on the same data. The other approach is to combine a logical inference engine based on their semantic representation with a statistical classifier. Considering that the accuracy rate of Watson at Jeopardy! Challenge was 69%, it was most likely some deeper language analysis would be required to achieve the accuracy rate of 80%.Tian et al. [5] developed a semantic representation framework that allowed efficient inference while capturing various aspects of natural language semantics. Okita and Liu [6] built a QA system in which they used a semantic parser to acquire meaning representation from history textbooks using common-sense knowledge, and they achieved an accuracy of 68% on the NCTUA data. Higashinaka et al. [7] applieddialogue and machine translation technologies to English exams; taking into account both user intentions and sentiment analysis results yielded significant gains in the dialogue completion task andusing language models constructed from large text corpora led to an accuracy of around 80% in the sentence completion task. It has been a long dream to solve mathematical problems (expressed in natural language and images) by machine since Turing’s age. Though the possibility of complete automatization of mathematics was denied by the undecidable results, a very important theory of mathematics was proved to be decidable: the theory of real closed field (RCF). Furthermore, RCF admits the quantifier elimination. It suggests that the major part of Euclidean geometry, algebra and calculus especially that is asked in the entrance exams is automatizable. We first targeted the problems falling in this category, and developed a prototype system to solve them [8–10].

The 'Todai Robot Project (Can a robot get into the University of Tokyo?)' was initiated by the National Institute of Informatics (NII), the Japan's only general academic research institution of informatics, in 2011 as an AI grand challenge.The goal of the project is to create an AI system that answers real questions on university entrance examinations.This paper reports the current status of the project including the underlying technologies we have developed thus far, and the results we obtained from evaluations.
There have been various types of AI challenges in the past: the chess and shogi matches against professional players, the quiz show challenge against human champions.Todai Robot Project is unique mainly in the following two senses.First, it targets the integrated intelligent tasks rather than single ones.Examinees must take tests of eight subjects, two social studies, two sciences, Japanese (including Japanese and Chinese classics), English, and two mathematics.The tasks require not only the development of ground-breaking underlying technologies in various AI research areas but also their interdisciplinary research synthesis.Second, it enables us to compare the performance of the software and numerous well-educated students.University entrance examinations in Eastern Asian countries, including Japan, are known to be quite competitive and they cover various skill areas and fields.More than half a million high school graduates take National Center Test for University Admissions (NCTUA), standardized multiple choice style tests, every year in Japan, and less than top 3% students are allowed to take the second written test specially designed to select entrants of the University of Tokyo.
Our research team developed a testbed [1,2] that utilized the resources taken from the history problems asked in NCTUA, and organized international evaluation tasks at NTCIR-9, 10 and 11.The participants pursued mainly two approaches to solving history questions.One is traditional statistical factoid approach.Kanayama and Miyao [3] manually converted original true/false world history questions to a set of factoid-style questions and achieved an accuracy of 65% on the NCTUA data by employing Watson's factoid QA engine as the backend system.Kano [4] developed domain-independent and language-independent keyword-based system, and achieved an accuracy of 51% on the same data.The other approach is to combine a logical inference engine based on their semantic representation with a statistical classifier.Considering that the accuracy rate of Watson at Jeopardy!Challenge was 69%, it was most likely some deeper language analysis would be required to achieve the accuracy rate of 80%.Tian et al. [5] developed a semantic representation framework that allowed efficient inference while capturing various aspects of natural language semantics.Okita and Liu [6] built a QA system in which they used a semantic parser to acquire meaning representation from history textbooks using common-sense knowledge, and they achieved an accuracy of 68% on the NCTUA data.
Higashinaka et al. [7] applied dialogue and machine translation technologies to English exams; taking into account both user intentions and sentiment analysis results yielded significant gains in the dialogue completion task and using language models constructed from large text corpora led to an accuracy of around 80% in the sentence completion task.
It has been a long dream to solve mathematical problems (expressed in natural language and images) by machine since Turing's age.Though the possibility of complete automatization of mathematics was denied by the undecidable results, a very important theory of mathematics was proved to be decidable: the theory of real closed field (RCF).Furthermore, RCF admits the quantifier elimination.It suggests that the major part of Euclidean geometry, algebra and calculus especially that is asked in the entrance exams is automatizable.We first targeted the problems falling in this category, and developed a prototype system to solve them [8][9][10].The system receives an annotated problem text as the input.The sentences in the problems are then translated to logical forms through grammarbased parsing, using the annotations on the text as the constraints in the parsing process.The sentence-level logical forms are then combined into a problemlevel semantic representation according to the intersentence logical relations annotated on the problem.The output is expressed in the most powerful language in mathematics, the higher order formula in Zermelo-Fraenkel's (ZF) set theory, which is, of course, undecidable.The obtained ZF formula is iteratively rewritten by applying several kinds of equivalencepreserving transformation rules until we find that the rewritten formula is directly reinterpretable in RCF.When it succeeds, we invoke the quantifier elimination algorithm to find the answer.Fig. 1 illustrates the process.Though the linguistic part still requires the human error correction, it solved 23 problems out of 47 taken from the past entrance exams of seven major national universities including the University of Tokyo.We plan to apply our framework to non-RCF problems as well as to physics.
Our research team took a (digitalized and annotated version of) mock National Center Test in 2013 and 2014 provided by a preparatory school, with more than five thousand students.The results revealed that its abilities were still far below the average scores of entrants to the University of Tokyo.However, it was beyond the median of the human test takers.It was competent enough to pass the entrance exams of 472 out of 581 private universities in Japan (Table 1).
Our results suggest that it is getting harder and harder for the students to get economic returns from investing higher education since AI is 'smarter' than the average students especially in building and manipulating given knowledge.

Table 1 .
Evaluation results in NCTUA mock tests.