SimpleTOD: A Simple Language Model For Task Oriented Dialogue

We propose a simple causal (unidirectional) language model for Task-oriented Dialogue. SimpleTOD enables modeling of the inherent dependencies between the sub-tasks of task-oriented dialogue, by optimizing for all tasks in an end-to-end manner.

Ehsan Hosseini-Asl

July 21, 2020 9 min read

Conversational AI has been a long-standing area of exploration in computer science [1]. There are broadly two categories of dialogue:

Open-domain dialogue systems focus on making chit-chat, open-ended conversations with humans more natural and engaging. They are usually trained end-to-end using large-scale data from social media [2, 3].

Task-oriented dialogue (TOD) systems accomplish a goal described by a user in natural language. They often use a pipeline approach that employ a variety of modules, breaking the task into smaller sub-tasks [4]. Natural language understanding (NLU) is handled by belief state tracking modules and dictates which results are retrieved from external APIs (in our case, a database). Dialogue management (DM) modules decide which actions to take based on those beliefs, and natural language generation (NLG) modules generate responses.

Task-Oriented Dialogue (TOD)

Traditionally, each component of task-oriented dialogue systems is trained independently with separate supervision for each component. The NLU module is trained on domain and intent labels. The DM module employs dialogue belief and dialogue act labels, and the NLG module accesses templatized or natural responses.

The modular dependencies of these components can lead to error propagation when information is not provided to subsequent modules in the pipeline [5]. For example, many systems do not consider the entire dialogue history at every turn. Instead, they rely on the NLU module to pass belief states reliably to following components. This was the original motivation behind SimpleTOD.

SimpleTOD

We propose recasting task-oriented dialogue as a simple, causal (unidirectional) language modeling task. We show that such an approach can solve all the sub-tasks in a unified way using multi-task maximum likelihood training. The proposed Simple Task-Oriented Dialogue (SimpleTOD) approach enables modeling of the inherent dependencies between the sub-tasks of task-oriented dialogue, by optimizing for all tasks in an end-to-end manner.

Because SimpleTOD still outputs interpretable, intermediate results that are typically associated with each sub-task, we can evaluate it on each sub-task independently be(as for dialogue state tracking) and all together (end-to-end).

SimpleTOD in the Wild:

We evaluate SimpleTOD with a human in a multi-domain dialogue. In this setting, SimpleTOD condition its response on its generation from previous turns.

Description: A human is tasked to request SimpleTOD for reserving a hotel, and then for booking a train as well.

It is shown that SimpleTOD is able to understand human intent, and by requesting related information for hotel and train reservation, suggest useful information about available hotel and restaurant from database.

Dialogue State Tracking Task

Dialogue State Tracking is the more general term for the belief state tracking performed by SimpleTOD. For this task, the model aims to relate the unstructured user inputs and dialogue history to the structured format that allows it to query the database. It does so by specifying a dictionary of belief states, which consist of key names and values.

Below, we review SimpleTOD performance on long multi-domain dialogues from the MultiWoZ 2.1 dataset. At each turn, the current user turn and all pervious user/system turns are considered as dialogue context and given to the model as input.

Dialogue: MUL1015
No. of turns: 10
Domains: attraction, hotel, taxi

Dialogue: MUL0671
No. of turns: 10
Domains: train, hotel

Belief, Action and Response Generation in End-to-End setting:

In this section, we evaluate turn-based performance of SimpleTOD in a long multi-domain dialogue. We study the performance of SimpleTOD in generating action decisions and responses for each turn.

Dialogue PMUL 3293
No. of turns: 11
Domains: train, hotel
Description: User requests to book a train and then reserving a hotel at destination. At some turns, SimpleTOD generates different actions than the ground truth system. Results indicate that such different actions lead to requesting more information from user to narrow down the hotel search results or to accomplish task in fewer turns. When suggesting a hotel name, SimpleTOD also provides more the detail information about the hotel to the user, such as hotel star rate, location, internet availability, etc., compared to ground truth system.

turn 1
The dialogue begins with a user request.

For comparison, we can observe the SimpleTOD outputs based on the dialogue history so far:

SimpleTOD has generated a response that we can compare during evaluation to the ground truth System response:

During evaluation, the system response is always passed to the next turns. In this turn, they were identical, but this isn’t always the case, as in the next turn.

turn 2:
This is how the user followed up in the next turn when the dataset was created:

SimpleTOD then generated the following beliefs, actions, and response:

SimpleTOD has generated a response that in this case differs from the ground truth system response:

but we can tell that SimpleTOD’s response is also achieving the same goals as specified by the actions that it has chosen to take. A minor point to note about evaluation: turns are treated all independently, so on the next turn SimpleTOD doesn’t see its own response from previous turns, but instead the history of user and system response. This is because we are evaluating on a fixed test set rather than with new dialogues with new human participants each time so that we can more easily compare methods without the confounding effects of different users and dialogues.

turn 3:
The next user response was:

This is the first turn that it becomes clear that SimpleTOD generates a delexicalized response with template placeholders that in real life, we would fill in based on the true database state.

Then, we lexicalized SimpleTOD response based on its generated belief states for direct comparison with ground truth system response

Comparing responses, it is clear that SimpleTOD provides the requested information by user (departure time), even though this is missing in the ground truth target system response.

As noted before, turns are treated all independently, so on the next turn SimpleTOD won’t see its own response. Instead it will see the lexicalized system response. This is for evaluating on the same dataset across methods, but if using SimpleTOD in the wild, we would lexicalize the SimpleTOD response with the true state of the database and use that as part of the dialogue history instead.

turn 4:

At this turn, SimpleTOD requests user preference on hotel price range and star rate to find a better match. However, the ground truth response only request information on price range only.

turn 5:

Since the user does not see SimpleTOD response, only the preference on hotel internet is provided. SimpleTOD requests information about hotel star rate and price range again, while ground truth system asks about price only.

turn 6:

Here, SimpleTOD suggests a hotel name based on enough information given by user. It also provide information about parking and star rate (preferred by user). However, ground truth system asks about hotel star rate at this turn, without suggesting any hotel name. Since, SimpleTOD asks more information at previous turn, this can lead to accomplishing the task sooner, compared to ground truth system.

turn 7:

Note: SimpleTOD provides hotel name with detailed information, i.e., internet, parking, star, area (user preference) in its response, but ground truth system contains star rate information only.

turn 8:

Since user emphasis his preferences on hotel type, SimpleTOD suggests a different hotel name and provides the requested information, while ground truth system suggests two hotels and only provides overall information.

turn 9:

Compared to ground truth system response, SimpleTOD did not ask user for hotel arrival date. Perhaps, in the real setting where model is conditioned on its own response, the model will ask about arrival date at the next turn.

Resources:

Paper: A Simple Language Model for Task-Oriented Dialogue

Code: https://github.com/salesforce/simpletod

When referencing this work, please cite:

@article{hosseini2020simple,
  title={A simple language model for task-oriented dialogue},
  author={Hosseini-Asl, Ehsan and McCann, Bryan and Wu, Chien-Sheng and Yavuz, Semih and Socher, Richard},
  journal={arXiv preprint arXiv:2005.00796},
  year={2020}
}