The Law School Admissions Test (LSAT) includes a section on analytical reasoning that requires test takers to solve logic puzzles. My Capstone project solves some of these puzzles and answers LSAT questions about them.
Within NLP, semantic parsing or natural language understanding remains largely an unsolved problem. Aside from corpuses too vast for humans to read comfortably, the reading comprehension skills of the best NLP systems typically lag far behind those of humans. I hope to demonstrate that, even on a tiny corpus where size poses no challenge to a human reader, algorithms can sometimes outperform humans.
My data consists of questions and answers from actual LSAT examinations, which are copyrighted. I may also incorporate simulated LSAT questions constructed by test prep companies.
Foundations of Statistical Natural Language Processing
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
Speech and Language Processing
Dan Jurafsky and James Martin
Introduction to Information Retrieval
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
Natural Language Toolkit: Comparative Sentence Corpus Reader
-
Nitin Jindal and Bing Liu. "Identifying Comparative Sentences in Text Documents". Proceedings of the ACM SIGIR International Conference on Information Retrieval (SIGIR-06), 2006.
-
Nitin Jindal and Bing Liu. "Mining Comprative Sentences and Relations". Proceedings of Twenty First National Conference on Artificial Intelligence (AAAI-2006), 2006.
-
Murthy Ganapathibhotla and Bing Liu. "Mining Opinions in Comparative Sentences". Proceedings of the 22nd International Conference on Computational Linguistics (Coling-2008), Manchester, 18-22 August, 2008.
Natural Language Processing (Columbia)
Michael Collins
Natural Language Processing with Deep Learning (Stanford)
Richard Socher and Chris Manning
To answer questions about logic games, seven steps must be performed. Here, I describe each step, indicate its degree of difficulty, and suggest how I intend to approach it:
-
Classify the Puzzle: medium
I will use standard text classification tools from SpaCy and Sci-Kit Learn, perhaps aided by feature engineering, to determine what type of puzzle the prompt represents. -
Set-Up: medium
I will use SpaCy's entity recognizer to extract the names of the relevant entities from the puzzle prompt. The possible permutations/combinations of these names define the event space for the puzzle--that is, the possible solutions to the puzzle. -
Parse Rules: very difficult
The number of possible permutations shrinks as rules are introduced that limit the permissible permutations. To parse these rules, I will construct a Chomsky Formal Grammar and write a parser that translates the LSAT's English statements of the rules into executable functions. -
Apply the Rules to Narrow the Possible Solutions: medium
The rules are then applied to all the events in the puzzle's sample space to reduce the set of possible solutions to the puzzle. -
Parse Questions: medium
The questions must be parsed to determine what is being asked. Some questions impose additional, local rules, which further reduce the number of possible solutions. These local rules must be parsed as in Step 3. -
Parse Multiple Choice Answers: easy
The test's five possible answers must be parsed. -
Select Answer: easy
From the five answers, the correct answer must be identified.