Language is a uniquely human ability and possibly the most complex manifestation of our intelligence. But through AI—specifically, natural language processing (NLP)—we are enabling machines with language capabilities, opening up a new realm of possibilities for how we’ll work side-by-side with our machines in more advanced ways.
The Salesforce Research team, led by Chief Scientist Richard Socher, today published a new paper that attempts for the first time to capture the many nuances of language understanding in one generalized model. In the interview below, Richard shares his thoughts on the team’s latest research and how he hopes it will change the field of NLP moving forward. Be sure to also check out the team’s blog post here for more detailed information on the paper.
Tell us a little bit about the team’s latest work.
Language understanding is inherently difficult because it connects and requires all our other areas of intelligence, such as visual, emotional, logical and others. There are ambiguities and complexities in word choice, grammar, context, tone, sentiment, humor, cultural reference and more. It takes us humans years to master, so imagine the complexities of teaching a computer to understand these various facets in a single unified model. I’ve focused my career on this challenge and I’m incredibly excited about this new paper from Salesforce Research.
The paper is called the Natural Language Decathlon (decaNLP) and makes it possible for a single model to tackle ten different NLP tasks all at once. You can think of it as the Swiss Army Knife of NLP. Instead of carrying around a blade, screwdriver, can-opener and scissors, everything you need is neatly compacted into one tool that can be used for many tasks. Similarly, the decaNLP eliminates the need to build and train individual models for each problem of NLP. It spans question answering, machine translation, summarization, natural language inference, sentiment analysis, semantic role labeling, relation extraction, goal-oriented dialogue, database query generation, and pronoun resolution— all in one model. Traditional approaches would require a hyper-customized architecture for each task, which hinders the emergence of general NLP models.
With decaNLP, we are changing the way the community can solve new types of NLP challenges.
You mentioned that you’ve dedicated your career to advancing automated language understanding. How long did this project take and what are some of the advances that led you to this point?
I’ve always been interested in building a meta-architecture like this. This project would not have been possible without Bryan McCann, who showed an incredible amount of intellectual endurance and skill, as well as Nitish Shirish Keskar and Caiming Xiong, who were also integral to this work.
We’ve been working on this paper for more than a year and several lines of thinking led us to this idea. If you look at the broader landscape of AI, there’s been progress in multitask models as we’ve slowly evolved from focusing on feature engineering to feature learning and then to neural-architecture engineering for specific tasks. This has allowed for a fair amount of NLP improvements, but what we’re really looking for is a system that can solve all potential tasks.
This leads us to a second line of thought around having a dataset large enough to cover all tasks. In computer vision, for example, we’ve seen a lot of success thanks to ImageNet which was broader than any previous dataset and includes many visual categories. When ImageNet is combined with neural networks it is a good default model to start learning most visual classification tasks.
Unfortunately, there is no equivalent dataset for NLP yet. While identifying object categories was the biggest task to solve for in computer vision, there is not a single NLP task that solves for the complexity of natural language or a single data set that tells us we’re making progress towards general language understanding.
So, we set out to change that by combining many of the hardest NLP tasks and posing each as a question answering problem. Question answering is so broad— you can literally ask any question— it gives you a flavor of a single model for several tasks.
Do you see this paving the way for a machine that can converse well?
DecaNLP’s multitask question answering model (MQAN) allows for zero-shot learning capability, which essentially means that the model can tackle tasks that it has never seen before or been specifically trained to do. This can lead to more robust chatbots because it will free users from having to phrase things exactly the right way. Chatbots will be able to make inferences as well as complete a broader range of new tasks, which could lead to a more natural and effective interactions between humans and machines.
How do you see this impacting NLP research moving forward?
DecaNLP has the potential to change the way the NLP community is focusing on single tasks. Just like ImageNet spurred a lot of new research, I hope this will allow us to think about new types of architectures that generalize across all kinds of tasks.
With traditional single-task research it is often unclear whether a model is good at learning just the peculiarities of one task or whether it is a good general model. Rarely has research taken a broad approach to more generalized natural language understanding. The bar for achieving a truly unified approach to language understanding is high, and there is still work to be done to find a single model that can get even higher accuracies on all these tasks.
We have opened up the code for obtaining and preprocessing datasets as well as training and evaluating models to the public with the hope of encouraging further research and improvement in this area. I’m excited to see what the community will do.
What kinds of challenges will your team chase next?
Among many other projects, we will continue to work towards a general model for many NLP tasks. All NLP tasks can be mapped to question answering, language modeling and dialogue systems– they are the three equivalent super tasks of NLP– so it is important that we continue to improve performance of these within the decaNLP.
I hope that providing a powerful single default NLP model will also empower programmers without deep NLP expertise to quickly make progress on new NLP tasks, languages and challenges. This in turn will mean that products and tools we can speak to and interact with will become more broadly available.