Is it possible to have a computer read an unfamiliar passage of text, comprehend and answer questions from it? With the latest advancements in Natural Language Processing (NLP), this is one step closer to reality.
Machine reading comprehension, otherwise known as Question answering systems, are one of the most challenging tasks in the field of NLP. The goal of this task is to be able to answer an arbitrary question given an unfamiliar context. For instance, given the following excerpt from Wikipedia:
New Zealand (Māori: Aotearoa) is a sovereign island country in the southwestern Pacific Ocean. It has a total land area of 268,000 square kilometres (103,500 sq mi), and a population of 4.9 million. New Zealand's capital city is Wellington, and its most populous city is Auckland.
We ask the question:
How many people live in New Zealand?
We expect the machine to respond with something like this:
In 2017, there was a breakthrough in the NLP space with the introduction of the Transformer model. This took a different approach in the way sequences of data (e.g. words) were represented, which allowed relationships between words to be better captured. This model architecture then gave rise to BERT, a technique developed by the Google AI Team whereby a transformer model is pre-trained with massive amounts of data (more than most people/companies can afford in resources) and later can be fine tuned for a specific task, making use of the word relationships captured during the initial training process. This technique achieved state of art results on 11 NLP tasks including sentiment analysis (positivity/negativity) and question answering.
Since the rise of BERT, other pre-trained models followed such as GPT-2 and XLNet, all of which became larger and larger in size. More recently, the Google AI team released ALBERT - which is a fraction of the size of BERT, yet performs even better on a popular question answering benchmark.
To test out the question answering capabilities of ALBERT, I built a small demo to play around with. The transformers library developed by the amazing team at Hugging Face provides an incredibly clean and easy to use implementation of the transformer models.
Trying out a few passages and questions yielded some interesting results.
The first was a relatively straightforward test. I asked about the city with the most people, and it correctly answered it (note that the passage describes it as "most populous city", so is not word for word). Similar examples work just fine, such as "How many people does New Zealand have?"
It seems to also be able to handle relationships spread over two different sentences. It must have inferred it from the pronoun at the start of the second sentence.
Here is one example where it does struggle. In order to answer this question, you must be able to infer that the turtle got there first from the sentence "found the turtle there waiting for him". The question answer system is unable to handle this level of abstraction.
This one is interesting. Not sure how it picked this as the most unique feature of New Zealand, but it works very well as an answer.