microsoft / task_oriented_dialogue_as_dataflow_synthesis Goto Github PK

Code to reproduce experiments in the paper "Task-Oriented Dialogue as Dataflow Synthesis" (TACL 2020).

License: MIT License

Makefile 0.23% Shell 0.93% Python 98.84%

task_oriented_dialogue_as_dataflow_synthesis's Introduction

Task-Oriented Dialogue as Dataflow Synthesis

This repository contains tools and instructions for reproducing the experiments in the paper Task-Oriented Dialogue as Dataflow Synthesis (TACL 2020). If you use any source code or data included in this toolkit in your work, please cite the following paper.

@article{SMDataflow2020,
  author = {{Semantic Machines} and Andreas, Jacob and Bufe, John and Burkett, David and Chen, Charles and Clausman, Josh and Crawford, Jean and Crim, Kate and DeLoach, Jordan and Dorner, Leah and Eisner, Jason and Fang, Hao and Guo, Alan and Hall, David and Hayes, Kristin and Hill, Kellie and Ho, Diana and Iwaszuk, Wendy and Jha, Smriti and Klein, Dan and Krishnamurthy, Jayant and Lanman, Theo and Liang, Percy and Lin, Christopher H. and Lintsbakh, Ilya and McGovern, Andy and Nisnevich, Aleksandr and Pauls, Adam and Petters, Dmitrij and Read, Brent and Roth, Dan and Roy, Subhro and Rusak, Jesse and Short, Beth and Slomin, Div and Snyder, Ben and Striplin, Stephon and Su, Yu and Tellman, Zachary and Thomson, Sam and Vorobev, Andrei and Witoszko, Izabela and Wolfe, Jason and Wray, Abby and Zhang, Yuchen and Zotov, Alexander},
  title = {Task-Oriented Dialogue as Dataflow Synthesis},
  journal = {Transactions of the Association for Computational Linguistics},
  volume = {8},
  pages = {556--571},
  year = {2020},
  month = sep,
  url = {https://doi.org/10.1162/tacl_a_00333},
  abstract = {We describe an approach to task-oriented dialogue in which dialogue state is represented as a dataflow graph. A dialogue agent maps each user utterance to a program that extends this graph. Programs include metacomputation operators for reference and revision that reuse dataflow fragments from previous turns. Our graph-based state enables the expression and manipulation of complex user intents, and explicit metacomputation makes these intents easier for learned models to predict. We introduce a new dataset, SMCalFlow, featuring complex dialogues about events, weather, places, and people. Experiments show that dataflow graphs and metacomputation substantially improve representability and predictability in these natural dialogues. Additional experiments on the MultiWOZ dataset show that our dataflow representation enables an otherwise off-the-shelf sequence-to-sequence model to match the best existing task-specific state tracking model. The SMCalFlow dataset, code for replicating experiments, and a public leaderboard are available at \url{https://www.microsoft.com/en-us/research/project/dataflow-based-dialogue-semantic-machines}.},
}

Understand SMCalFlow Programs

Please read this document to understand the syntax of SMCalFlow programs, and read this document to understand their semantics.

Install

# (Recommended) Create a virtual environment
virtualenv --python=python3 env
source env/bin/activate

# Install the sm-dataflow package and its core dependencies
pip install git+https://github.com/microsoft/task_oriented_dialogue_as_dataflow_synthesis.git

# Download the spaCy model for tokenization
python -m spacy download en_core_web_md-2.2.0 --direct

# Install OpenNMT-py and PyTorch for training and running the models
pip install OpenNMT-py==1.0.0 torch==1.4.0

Our experiments used OpenNMT-py 1.0.0 with PyTorch 1.4.0. Other versions are not tested. You can skip these two packages if you don't need to train or run the models.

SMCalFlow Experiments

Follow the steps below to reproduce the results reported in the paper (Table 2).

NOTE: We highly recommend following the instructions for the leaderboard to report your results for consistency. If you use your own evaluation script, please pay attention to the notes in Step 2 and Step 7.

Download and unzip the SMCalFlow 1.0 dataset.

dataflow_dialogues_dir="output/dataflow_dialogues"
mkdir -p "${dataflow_dialogues_dir}"

cd "${dataflow_dialogues_dir}"
# Download the dataset `smcalflow.full.data.tgz` or `smcalflow.inlined.data.tgz`
# The `PATH_TO_DATA_TGZ` is the path to the tgz file of the corresponding dataset.
tar -xvzf PATH_TO_DATA_TGZ

Both SMCalFlow 1.0 and SMCalFlow 2.0 can be found under the datasets folder.
The dataset is distributed under the CC BY-SA 4.0 license.

Compute data statistics:

dataflow_dialogues_stats_dir="output/dataflow_dialogues_stats"
mkdir -p "${dataflow_dialogues_stats_dir}"
python -m dataflow.analysis.compute_data_statistics \
    --dataflow_dialogues_dir ${dataflow_dialogues_dir} \
    --subset train valid \
    --outdir ${dataflow_dialogues_stats_dir}

Basic statistics

	num_dialogues	num_turns	num_kept_turns	num_skipped_turns	num_refer_turns	num_revise_turns
train	32,647	133,821	121,200	12,621	33,011	9,315
valid	3,649	14,757	13,499	1,258	3,544	1,052
test	5,211	22,012	21,224	7,88	8,965	3,315
all	41,517	170,590	155,923	14,667	45,520	13,682

We currently do not release the test set, but we report the data statistics here.
NOTE: There are a small number of turns (num_skipped_turns in the table) whose sole purpose is to establish dialogue context and should not be directly trained or tested on. The dataset statistics reported in the paper are based on non-skipped turns only.

Prepare text data for the OpenNMT toolkit.
```
onmt_text_data_dir="output/onmt_text_data"
mkdir -p "${onmt_text_data_dir}"
for subset in "train" "valid"; do
    python -m dataflow.onmt_helpers.create_onmt_text_data \
        --dialogues_jsonl ${dataflow_dialogues_dir}/${subset}.dataflow_dialogues.jsonl \
        --num_context_turns 2 \
        --include_program \
        --include_described_entities \
        --onmt_text_data_outbase ${onmt_text_data_dir}/${subset}
done
```
- We use --include_program to add the gold program of the context turns.
- We use --include_described_entities to add the entities (e.g., entity@123456) described in the generation outcome for the context turns. These entities mentioned in the context turns can appear in the "inlined" programs for the current turn, and thus, we include them in the source sequence so that the seq2seq model can produce such tokens via a copy mechanism.
- You can vary the number of context turns by changing --num_context_turns.

Compute statistics for the created OpenNMT text data.

onmt_data_stats_dir="output/onmt_data_stats"
mkdir -p "${onmt_data_stats_dir}"
python -m dataflow.onmt_helpers.compute_onmt_data_stats \
    --text_data_dir ${onmt_text_data_dir} \
    --suffix src src_tok tgt \
    --subset train valid \
    --outdir ${onmt_data_stats_dir}

Train OpenNMT models. You can also skip this step and instead download the trained model from the table below.

onmt_binarized_data_dir="output/onmt_binarized_data"
mkdir -p "${onmt_binarized_data_dir}"

src_tok_max_ntokens=$(jq '."100"' ${onmt_data_stats_dir}/train.src_tok.ntokens_stats.json)
tgt_max_ntokens=$(jq '."100"' ${onmt_data_stats_dir}/train.tgt.ntokens_stats.json)

# create OpenNMT binarized data
onmt_preprocess \
    --dynamic_dict \
    --train_src ${onmt_text_data_dir}/train.src_tok \
    --train_tgt ${onmt_text_data_dir}/train.tgt \
    --valid_src ${onmt_text_data_dir}/valid.src_tok \
    --valid_tgt ${onmt_text_data_dir}/valid.tgt \
    --src_seq_length ${src_tok_max_ntokens} \
    --tgt_seq_length ${tgt_max_ntokens} \
    --src_words_min_frequency 0 \
    --tgt_words_min_frequency 0 \
    --save_data ${onmt_binarized_data_dir}/data

# extract pretrained Glove 840B embeddings (https://nlp.stanford.edu/projects/glove/)
glove_840b_dir="output/glove_840b"
mkdir -p "${glove_840b_dir}"
wget -O ${glove_840b_dir}/glove.840B.300d.zip http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip ${glove_840b_dir}/glove.840B.300d.zip -d ${glove_840b_dir}

onmt_embeddings_dir="output/onmt_embeddings"
mkdir -p "${onmt_embeddings_dir}"
python -m dataflow.onmt_helpers.embeddings_to_torch \
    -emb_file_both ${glove_840b_dir}/glove.840B.300d.txt \
    -dict_file ${onmt_binarized_data_dir}/data.vocab.pt \
    -output_file ${onmt_embeddings_dir}/embeddings

# train OpenNMT models
onmt_models_dir="output/onmt_models"
mkdir -p "${onmt_models_dir}"

batch_size=64
train_num_datapoints=$(jq '.train' ${onmt_data_stats_dir}/nexamples.json)
# validate approximately at each epoch
valid_steps=$(python3 -c "from math import ceil; print(ceil(${train_num_datapoints}/${batch_size}))")

onmt_train \
    --encoder_type brnn \
    --decoder_type rnn \
    --rnn_type LSTM \
    --global_attention general \
    --global_attention_function softmax \
    --generator_function softmax \
    --copy_attn_type general \
    --copy_attn \
    --seed 1 \
    --optim adam \
    --learning_rate 0.001 \
    --early_stopping 2 \
    --batch_size ${batch_size} \
    --valid_batch_size 8 \
    --valid_steps ${valid_steps} \
    --save_checkpoint_steps ${valid_steps} \
    --data ${onmt_binarized_data_dir}/data \
    --pre_word_vecs_enc ${onmt_embeddings_dir}/embeddings.enc.pt \
    --pre_word_vecs_dec ${onmt_embeddings_dir}/embeddings.dec.pt \
    --word_vec_size 300 \
    --attention_dropout 0 \
    --dropout 0.5 \
    --layers ??? \
    --rnn_size ??? \
    --gpu_ranks 0 \
    --world_size 1 \
    --save_model ${onmt_models_dir}/checkpoint

Hyperparameters for models reported in the Table 2 in the paper.

--layers --rnn_size model

dataflow 2 384 link

inline 3 384 link

	`--layers`	`--rnn_size`	model
dataflow	2	384	link
inline	3	384	link

Make predictions using a trained OpenNMT model. You need to replace the checkpoint_last.pt in the following script with the final model you get from the previous step.

onmt_translate_outdir="output/onmt_translate_output"
mkdir -p "${onmt_translate_outdir}"

onmt_model_pt="${onmt_models_dir}/checkpoint_last.pt"
nbest=5
tgt_max_ntokens=$(jq '."100"' ${onmt_data_stats_dir}/train.tgt.ntokens_stats.json)

# predict programs using a trained OpenNMT model
onmt_translate \
    --model ${onmt_model_pt} \
    --max_length ${tgt_max_ntokens} \
    --src ${onmt_text_data_dir}/valid.src_tok \
    --replace_unk \
    --n_best ${nbest} \
    --batch_size 8 \
    --beam_size 10 \
    --gpu 0 \
    --report_time \
    --output ${onmt_translate_outdir}/valid.nbest

Compute the exact-match accuracy (taking into account whether the program_execution_oracle.refer_are_correct is true).

evaluation_outdir="output/evaluation_output"
mkdir -p "${evaluation_outdir}"

# create the prediction report
python -m dataflow.onmt_helpers.create_onmt_prediction_report \
    --dialogues_jsonl ${dataflow_dialogues_dir}/valid.dataflow_dialogues.jsonl \
    --datum_id_jsonl ${onmt_text_data_dir}/valid.datum_id \
    --src_txt ${onmt_text_data_dir}/valid.src_tok \
    --ref_txt ${onmt_text_data_dir}/valid.tgt \
    --nbest_txt ${onmt_translate_outdir}/valid.nbest \
    --nbest ${nbest} \
    --outbase ${evaluation_outdir}/valid

# evaluate the predictions (all turns)
python -m dataflow.onmt_helpers.evaluate_onmt_predictions \
    --prediction_report_tsv ${evaluation_outdir}/valid.prediction_report.tsv \
    --scores_json ${evaluation_outdir}/valid.all.scores.json

# evaluate the predictions (refer turns)
python -m dataflow.onmt_helpers.evaluate_onmt_predictions \
    --prediction_report_tsv ${evaluation_outdir}/valid.prediction_report.tsv \
    --datum_ids_json ${dataflow_dialogues_stats_dir}/valid.refer_turn_ids.jsonl \
    --scores_json ${evaluation_outdir}/valid.refer_turns.scores.json

# evaluate the predictions (revise turns)
python -m dataflow.onmt_helpers.evaluate_onmt_predictions \
    --prediction_report_tsv ${evaluation_outdir}/valid.prediction_report.tsv \
    --datum_ids_json ${dataflow_dialogues_stats_dir}/valid.revise_turn_ids.jsonl \
    --scores_json ${evaluation_outdir}/valid.revise_turns.scores.json

NOTE: The numbers reported using the scripts above should match those reported in Table 2 in the paper. The leaderboard has used a slightly different evaluation script that canonicalizes both the gold and predicted programs, and thus, the accuracy would be slightly higher (e.g., 0.665 vs. 0.668 on the test set). To obtain the leaderboard results, please add --use_leaderboard_metric when running python -m dataflow.onmt_helpers.create_onmt_prediction_report to create the report.

Calculate the statistical significance for two different experiments.

analysis_outdir="output/analysis_output"
mkdir -p "${analysis_outdir}"
python -m dataflow.analysis.calculate_statistical_significance \
    --exp0_prediction_report_tsv ${exp0_evaluation_outdir}/valid.prediction_report.tsv \
    --exp1_prediction_report_tsv ${exp1_evaluation_outdir}/valid.prediction_report.tsv \
    --scores_json ${analysis_outdir}/exp0_vs_exp1.valid.scores.json

The exp0_evaluation_outdir and exp1_evaluation_outdir are the evaluation_outdir in Step 7 for corresponding experiments.
You can also provide --datum_ids_jsonl to carry out the significance test on a subset of turns.

MultiWOZ Experiments

Download the MultiWoZ dataset and convert it to dataflow programs.

# creates TRADE-processed dialogues
raw_trade_dialogues_dir="output/trade_dialogues"
mkdir -p "${raw_trade_dialogues_dir}"
python -m dataflow.multiwoz.trade_dst.create_data \
    --use_multiwoz_2_1 \
    --output_dir ${raw_trade_dialogues_dir}

# patch TRADE dialogues
patched_trade_dialogues_dir="output/patched_trade_dialogues"
mkdir -p "${patched_trade_dialogues_dir}"
for subset in "train" "dev" "test"; do
    python -m dataflow.multiwoz.patch_trade_dialogues \
        --trade_data_file ${raw_trade_dialogues_dir}/${subset}_dials.json \
        --outbase ${patched_trade_dialogues_dir}/${subset}
done
ln -sr ${patched_trade_dialogues_dir}/dev_dials.json ${patched_trade_dialogues_dir}/valid_dials.json

# create dataflow programs
dataflow_dialogues_dir="output/dataflow_dialogues"
mkdir -p "${dataflow_dialogues_dir}"
for subset in "train" "valid" "test"; do
    python -m dataflow.multiwoz.create_programs \
        --trade_data_file ${patched_trade_dialogues_dir}/${subset}_dials.json \
        --outbase ${dataflow_dialogues_dir}/${subset}
done

To create programs that inline refer calls, add --no_refer when running the dataflow.multiwoz.create_programs command.
To create programs that inline both refer and revise calls, add --no_refer --no_revise.

Prepare text data for the OpenNMT toolkit.

onmt_text_data_dir="output/onmt_text_data"
mkdir -p "${onmt_text_data_dir}"
for subset in "train" "valid" "test"; do
    python -m dataflow.onmt_helpers.create_onmt_text_data \
        --dialogues_jsonl ${dataflow_dialogues_dir}/${subset}.dataflow_dialogues.jsonl \
        --num_context_turns 2 \
        --include_agent_utterance \
        --onmt_text_data_outbase ${onmt_text_data_dir}/${subset}
done

We use --include_agent_utterance following the setup in TRADE (Wu et al., 2019).
You can vary the number of context turns by changing --num_context_turns.

Compute statistics for the created OpenNMT text data.

onmt_data_stats_dir="output/onmt_data_stats"
mkdir -p "${onmt_data_stats_dir}"
python -m dataflow.onmt_helpers.compute_onmt_data_stats \
    --text_data_dir ${onmt_text_data_dir} \
    --suffix src src_tok tgt \
    --subset train valid test \
    --outdir ${onmt_data_stats_dir}

Train OpenNMT models. You can also skip this step and instead download the trained models from the table below.

onmt_binarized_data_dir="output/onmt_binarized_data"
mkdir -p "${onmt_binarized_data_dir}"

# create OpenNMT binarized data
src_tok_max_ntokens=$(jq '."100"' ${onmt_data_stats_dir}/train.src_tok.ntokens_stats.json)
tgt_max_ntokens=$(jq '."100"' ${onmt_data_stats_dir}/train.tgt.ntokens_stats.json)

onmt_preprocess \
    --dynamic_dict \
    --train_src ${onmt_text_data_dir}/train.src_tok \
    --train_tgt ${onmt_text_data_dir}/train.tgt \
    --valid_src ${onmt_text_data_dir}/valid.src_tok \
    --valid_tgt ${onmt_text_data_dir}/valid.tgt \
    --src_seq_length ${src_tok_max_ntokens} \
    --tgt_seq_length ${tgt_max_ntokens} \
    --src_words_min_frequency 0 \
    --tgt_words_min_frequency 0 \
    --save_data ${onmt_binarized_data_dir}/data

# extract pretrained Glove 6B embeddings
glove_6b_dir="output/glove_6b"
mkdir -p "${glove_6b_dir}"
wget -O ${glove_6b_dir}/glove.6B.zip http://nlp.stanford.edu/data/glove.6B.zip
unzip ${glove_6b_dir}/glove.6B.zip -d ${glove_6b_dir}

onmt_embeddings_dir="output/onmt_embeddings"
mkdir -p "${onmt_embeddings_dir}"
python -m dataflow.onmt_helpers.embeddings_to_torch \
    -emb_file_both ${glove_6b_dir}/glove.6B.300d.txt \
    -dict_file ${onmt_binarized_data_dir}/data.vocab.pt \
    -output_file ${onmt_embeddings_dir}/embeddings

# train OpenNMT models
onmt_models_dir="output/onmt_models"
mkdir -p "${onmt_models_dir}"

batch_size=64
train_num_datapoints=$(jq '.train' ${onmt_data_stats_dir}/nexamples.json)
# approximately validate at each epoch
valid_steps=$(python3 -c "from math import ceil; print(ceil(${train_num_datapoints}/${batch_size}))")

onmt_train \
    --encoder_type brnn \
    --decoder_type rnn \
    --rnn_type LSTM \
    --global_attention general \
    --global_attention_function softmax \
    --generator_function softmax \
    --copy_attn_type general \
    --copy_attn \
    --seed 1 \
    --optim adam \
    --learning_rate 0.001 \
    --early_stopping 2 \
    --batch_size ${batch_size} \
    --valid_batch_size 8 \
    --valid_steps ${valid_steps} \
    --save_checkpoint_steps ${valid_steps} \
    --data ${onmt_binarized_data_dir}/data \
    --pre_word_vecs_enc ${onmt_embeddings_dir}/embeddings.enc.pt \
    --pre_word_vecs_dec ${onmt_embeddings_dir}/embeddings.dec.pt \
    --word_vec_size 300 \
    --attention_dropout 0 \
    --dropout ??? \
    --layers ??? \
    --rnn_size ??? \
    --gpu_ranks 0 \
    --world_size 1 \
    --save_model ${onmt_models_dir}/checkpoint

Hyperparameters for models reported in the Table 3 in the paper.

	`--dropout`	`--layers`	`--rnn_size`	model
dataflow (`--num_context_turns 2`)	0.7	2	384	link
inline refer (`--num_context_turns 4`)	0.3	3	320	link
inline both (`--num_context_turns 10`)	0.7	2	320	link

Make predictions using a trained OpenNMT model. You need to replace the checkpoint_last.pt in the following script with the actual model you get from the previous step.

onmt_translate_outdir="output/onmt_translate_output"
mkdir -p "${onmt_translate_outdir}"

onmt_model_pt="${onmt_models_dir}/checkpoint_last.pt"
nbest=5
tgt_max_ntokens=$(jq '."100"' ${onmt_data_stats_dir}/train.tgt.ntokens_stats.json)

# predict programs on the test set using a trained OpenNMT model
onmt_translate \
    --model ${onmt_model_pt} \
    --max_length ${tgt_max_ntokens} \
    --src ${onmt_text_data_dir}/test.src_tok \
    --replace_unk \
    --n_best ${nbest} \
    --batch_size 8 \
    --beam_size 10 \
    --gpu 0 \
    --report_time \
    --output ${onmt_translate_outdir}/test.nbest

Compute the exact-match accuracy of the program predictions.

evaluation_outdir="output/evaluation_output"
mkdir -p "${evaluation_outdir}"

# create the prediction report
python -m dataflow.onmt_helpers.create_onmt_prediction_report \
    --dialogues_jsonl ${dataflow_dialogues_dir}/test.dataflow_dialogues.jsonl \
    --datum_id_jsonl ${onmt_text_data_dir}/test.datum_id \
    --src_txt ${onmt_text_data_dir}/test.src_tok \
    --ref_txt ${onmt_text_data_dir}/test.tgt \
    --nbest_txt ${onmt_translate_outdir}/test.nbest \
    --nbest ${nbest} \
    --outbase ${evaluation_outdir}/test

# evaluate the predictions
python -m dataflow.onmt_helpers.evaluate_onmt_predictions \
    --prediction_report_tsv ${evaluation_outdir}/test.prediction_report.tsv \
    --scores_json ${evaluation_outdir}/test.scores.json

Evaluate the belief state predictions.

belief_state_tracker_eval_dir="output/belief_state_tracker_eval"
mkdir -p "${belief_state_tracker_eval_dir}"

# creates the gold file from TRADE-preprocessed dialogues (after patch)
python -m dataflow.multiwoz.create_belief_state_tracker_data \
    --trade_data_file ${patched_trade_dialogues_dir}/test_dials.json \
    --belief_state_tracker_data_file ${belief_state_tracker_eval_dir}/test.belief_state_tracker_data.jsonl

# creates the hypo file from predicted programs
python -m dataflow.multiwoz.execute_programs \
    --dialogues_file ${evaluation_outdir}/test.dataflow_dialogues.jsonl \
    --cheating_mode never \
    --outbase ${belief_state_tracker_eval_dir}/test.hypo

python -m dataflow.multiwoz.create_belief_state_prediction_report \
    --input_data_file ${belief_state_tracker_eval_dir}/test.hypo.execution_results.jsonl \
    --format dataflow \
    --remove_none \
    --gold_data_file ${belief_state_tracker_eval_dir}/test.belief_state_tracker_data.jsonl \
    --outbase ${belief_state_tracker_eval_dir}/test

# evaluates belief state predictions
python -m dataflow.multiwoz.evaluate_belief_state_predictions \
    --prediction_report_jsonl ${belief_state_tracker_eval_dir}/test.prediction_report.jsonl \
    --outbase ${belief_state_tracker_eval_dir}/test

The scores are reported in ${belief_state_tracker_eval_dir}/test.scores.json.

Calculate the statistical significance for two different experiments.

analysis_outdir="output/analysis_output"
mkdir -p "${analysis_outdir}"
python -m dataflow.analysis.calculate_statistical_significance \
    --exp0_prediction_report_tsv ${exp0_evaluation_outdir}/test.prediction_report.tsv \
    --exp1_prediction_report_tsv ${exp1_evaluation_outdir}/test.prediction_report.tsv \
    --scores_json ${analysis_outdir}/exp0_vs_exp1.test.scores.json

The exp0_evaluation_outdir and exp1_evaluation_outdir are the belief_state_tracker_eval_dir in Step 7 for corresponding experiments.

task_oriented_dialogue_as_dataflow_synthesis's People

Contributors

Stargazers

Watchers

Forkers

barryzm adbmd smitakshigupta gaoyiyeah egez next-generation-search-engine zgd716 ericxsun tiagoooliveira jinfengr karndeepsingh siviltaram mqlove yobekiko mathieutuli comingboy0701 cdyangbo javelir lifengjin zhang-xi xrosliang wxc1884 nadileaf luweishuang lambdalexicon jongwon-jay-lee entn-at global-localhost global19 global19-atlassian-net ankitshah009 nhat8888 rogertovalle ysong10 pranavajitnair debayan esteng timothyxxx jzhou316 tangku006 adsk2050 qpc-database elisaf yiweijiang2015 standardgalactic partysu95 molingbo kingb12 mdheller yauhen-info chia-hsuan-lee zhangqile900621 aizwei jtremb sungjinl projetsplusia nkhuyu ionutiga zyfyyzyf adampauls wanmok patleeman

task_oriented_dialogue_as_dataflow_synthesis's Issues

About onmt_train

Excuse me, how can I get the detailed content of the function onmt_train?

Live demo

Hi,

Is there a live demo website presenting the dataflow model in action, or perhaps a recording of a few outputs by the model for a series of in-the-wild input examples? I'd like to try them out before using onmt_translate to generate programs and evaluating it manually.

Thank you!

What are the entities@ ?

I noticed that in the omnt text data created, there are entity@ in the input tokens.
E.g.

__User What time is my appointment with Jerri Skinner on Friday? __StartOfProgram ( Yield :output ( :start ( singleton ( :results ( FindEventWrapperWithDefaults :constraint ( EventOnDate :date ( NextDOW :dow # ( DayOfWeek " FRIDAY " ) ) :event ( Constraint[Event] :attendees ( AttendeeListHasRecipientConstraint :recipientConstraint ( RecipientWithNameLike :constraint ( Constraint[Recipient] ) :name # ( PersonName " Jerri Skinner " ) ) ) ) ) ) ) ) ) ) entity@824743096 __User Can you add an appointment with Jerri Skinner at 9 am? __StartOfProgram

But it seems there is no explanation for entity@. Could you provide an explanation for that?

OpenDF / Constraints

Hello,

Thanks again for putting the tremendous effort to develop and release SMCalFlow!

Following the motto "you don't understand it until you implement it", we tried to reproduce the basic functionality of a system which would be able to execute some SMCalFlow dialogues. Trying to do this, really made it clear 1) how much work was put into building the full SMCalFlow application, and 2) that there are many design decisions to be made when developing such a system, which would result in different flavors of dataflow applications.

One central example of this is the way constraints are represented, handled, and executed. If you could give some more explanation, that would be very appreciated. For example, multiple constraints are often combined in a compositional pattern, or are surrounded by intension/extension markers. Does this mean you use lazy evaluation? And how do you handle contradictions between constraints?...

We tried to simplify the expressions in our implementation, to make them more understandable (somewhat along the lines of your macros in "Constrained Language Models Yield Few-Shot Semantic Parsers"), while still being executable. This is described in an upcoming paper in LREC 2022 : "Simplifying Semantic Annotations of SMCalFlow", and the corresponding package OpenDF - https://github.com/telepathylabsai/OpenDF, which includes the code to transform SMCalFlow expressions to a simplified form, and code to execute simplified expressions.

We hope OpenDF will increase interest in dataflow dialogues, and encourage development of new dialogue designs.

Are all turns "real"?

Hi,

I really like this work!

A lot of work went into creating this dataset, and obviously you had to rely on many people to do the labelling, so the quality/style of the labelling (sexp/lispress's) varies from labeler to labeler.
Also, you may have first constructed the fastest, even before you had a functioning end-to-end system.

My question is - are the labeled lispress expression "real"?
Meaning:

Is there a system which gets these expressions (exactly as they appear in the dataset) and actually executes them "correctly"/"as intended"
Are the labeled lispress's a "good" translation of the user input given the dialog context? good - in the sense of correct/complete/efficient/...

Now that you have an end-to-end system, could you maybe go over the dataset and verify/correct/unify it?
Or separate it to subsets (e.g. by correctness / quality...)?

Especially since you can not release more details about your system, trying to understand the concept and use the data, in the presence of incorrect data makes it even more difficult...

Thanks!

Convert back to MultiWOZ

Hi there,
Is there any possibility to convert dataflow(TRADE processed json) back to MultiWOZ?
I have a dataset in the dataflow format and I need to transfer it to MultiWOZ format.

Thanks！

how to get the string of program as in the paper instead of lispress

I've read the relevant issues and READMEs. The annotation of the program is in the lispress format, which can be parsed through parse_lispress and transformed to a Program object through lispress_to_program. However, how can I get the string that represents the function calls directly as in the paper (e.g., findEvent(EventSpec(start=DateTimeSpec(month=feb, day=30))))?

Thanks!

How to retrive the refer and recovery information from previous turn?

Thank you for releasing this wonderful dataset. I am wondering how one could actually execute a program?

For example, given such dialogue below:
__User When is my next staff meeting scheduled for? __User Is there a meeting after that? __StartOfProgram
The target program is :
( let ( x0 ( Event.end ( Execute ( refer ( extensionConstraint ( ^ ( Event ) EmptyStructConstraint ) ) ) ) ) ) ( Yield ( > ( size ( QueryEventResponse.results ( FindEventWrapperWithDefaults ( EventOnDateAfterTime ( DateTime.date x0 ) ( ^ ( Event ) EmptyStructConstraint ) ( DateTime.time x0 ) ) ) ) ) 0L ) ) )
The target program before this turn is :
( Yield ( Event.start ( FindNumNextEvent ( Event.subject_? ( ?~= " staff meeting " ) ) 1L ) ) )

We know that the that in the dialogue is to refer to a staff meeting in the previous turn, but how we could automatically resolve this refer and locate the event or entity it refers to? What I mean is, is there a way that could automatically retrieve the referred event/entity from previous dialogue turns?

To be exact:
There is a refer function in the target program, which we can know from the context, it refers to the staff meeting. This staff meaning is also in the target program of the previous turn. How can we automatically know that refer refers to staff meeting of the previous program? Is a code provided in the repository to resolve such special functions like refer/recovery/revision? So we could automatically retrieve information from previous turns?

It seems like the repository does not provide a program executor, but I want to is such function provided? If it isn't, training in a Seq2Seq way does not help the model to understand what refer is, since in such occasion, refer is just a normal function like any other function, but we want the model to find related information from the previous turn when the model encounters such function like refer/recovery/revision. If it is, how we could use current codes in the repository to automatically retrieve information from previous turns so that the model could actually use information from previous turns when encountering the refer token?

How could one execute a dataflow program?

Hi all! Apologies in advance for the wall of text. I was interested in looking into interactive extensions using the dataflow representation and wanted to know roughly how executor of a lispress program works in a fully interactive dialogue. To my understanding an executor is not provided, but I'm curious about how one could be constructed. The high-level dialogue flow I considered was:

Parse a user utterance and context history into a new Program

all the important pieces for this seem to be provided

From this Program, use infer_types(program: Program, library: Dict[str, Definition]) -> Program to produce an equivalent Program that is fully-typed

Missing for SMCalFlow is the library: Dict[str, Definition] which if I understand right, cannot be provided due to legal reasons mentioned here.

Execute this fully-typed lispress Program to produce an execution ?
Generate an agent utterance from the program and its execution
Use these results as context in subsequent dialogues

Most of my questions are on representations for 3 and 5:

Is this a reasonable understanding of the process for a full interactive system in this representation?
For type-inference and execution of the program in step 2 & 3, I believe one would need a library: Dict[str, Definition] with additional pointers to implementations and some kind of executor execute(program: Program, library: Dict[str, Definition]) -> ? which could be library-agnostic.
- Is this correct, or are there additional missing pieces not present in this repository?
- What type of result would be produced by an executor? Could one just append the result of each CallOp as a ValueOp expression, returning an extended Program?
- Is a single Program also an adequate representation for a full dialogue, or only a turn? e.g. each turn in a dialogue appears to produce a distinct Program instead of appending expressions to the initial one, but I wasn't sure if this was a dataset artifact or something that would also be true when run interactively.
Last, the execution methods in the multiwoz section of the repository seem to be focused on translating a Program to a belief state dictionary and vice-versa. Do these also resemble actual program execution in dataflow? For example the VanillaSalienceModel takes an ExecutionTrace argument which itself holds slot-value pairs. What would a similar VanillaSalienceModel look like in a dataflow-only representation?

Thanks for releasing this dataset and library and for any insight you can provide! For context, given the size of the SMCalFlow library that would need to be implemented I am looking into other potential libraries to work with in the dataflow representation, but wanted to know the best place to start.

combine train data

is it ok to combine the train data of SMCalFlow and MultiWOZ to train model?

REQUEST: Any plan to update README-SEMANTICS.md?

I am really impressed by your tremendous work on this dataset. Notice that you have deprecated several symbols like "#", "get" and update the syntax Readme. Is there any plan to update the file for lispress semantics as well?

Inquiry about the new model

hi, I notice that a new model has shown up in the leaderboard named "Value-Agnostic Conversational Semantic Parsing". Is there any publication linked to the model?

Is SMCalFlow more like a Semantic Parsing dataset?

I know the main contribution of the SMCalFlow is targeted in a new representation of dialogue state, but it seems the SMCalFlow seems to be a very complex scheme. It is true that by using Dataflow structure you can solve many hard problems in the dialogue like reference, but as you mentioned in the repository, all the data in the SMCalFlow is represented as lispress language. Both the generated prediction and the gold data are treated as a lispress program and will be executed using a parser to get the results, so it really makes me think a SMCalFlow is more like a Semantic Parsing task, which the goal is to translate the given user query to the lispress program, even though the program is intended to represent dialogue states, but the core is to translate user query to an executable format, whereas in conventional dialogue state tracking like MultiWoz or DSTC, the dialogue state is mostly represented as a multiple slot-value pair, which is mostly not executable as a program language. If possible, could you share some of your opinions about this?

Missing definition of variable ${processed_translate_outdir}

This variable has no definition but is used in step 7 of MultiWOZ Experiments.

Convert non-gold programs with meta-computation to in-line programs

I might have missed this, but it seems that the dataset releases the inlined and non-inlined versions for the gold annotated programs only. How do I convert an arbitrary predicted program with meta-computation to an inlined version? Is this feature currently supported? Thanks!!

function definition

will you give some explanations on the functions to help us understanding the code? such as ?<, ?<=,=,> ,>=, ~=,intension and singleton.

How can I get all the functions used for SMCalFlow?

program length

Hello,

Looking at your paper (Task-Oriented Dialogue as Dataflow Synthesis), there is a table with program length for .25, .50, .75 quantiles (table 1) - with values (11, 40, 57) for SMCalflow.
This looks on the hight side... How is this length calculated?
for example - what is the length of this program. (which looks to be of "average" length):
(Yield
:output (CreateCommitEventWrapper
:event (CreatePreflightEventWrapper :constraint (Constraint[Event]
:start (?=
(DateAtTimeWithDefaults
:date (Tomorrow)
:time (NumberAM
:number #(Number 10.0))))
:subject (?= #(String "soccer game"))))))

Thanks!

More clarification on the data format

Hi, thanks for the effort of open-sourcing the data and codes! I'm curious whether there is (or will be) more detailed documentation on the data provided. For example, what each of the data's fields represents and how are we supposed to use them. I'm having trouble locating this in the paper and the README, but feel free to point me to specific line numbers if I miss anything. Thanks in advance for the help!

regarding agent utterance

Hi,

from reading the paper and looking at the training/valid data , I understand that the seq2seq model training is user_utterance -> program. I just want to know how to involve agent_utterance into the picture. Would be great if you could provide some tips regarding the same. Thanks!

What is the different between SMCalFlow 1.0 and 2.0?

Hi, thank you for your contribution of the datasets. I noticed that there is SMCalFlow 1.0 and SMCalFlow 2.0 in this repository, but this is not mentioned in either your paper or your leader board, which only contains SMCalFlow. If possible, may you explain the difference between SMCalFlow 1.0 and SMCalFlow 2.0?

inconsistence of data labelling

i found two turns of the same dialogue where similar sentences have different labelled code:

dialogue_id: 320a1a29-c2ea-40c3-9253-175af766d515, turn: 0
user: what's the wether like where i am?
agent: Sorry, I can only help with your calendar.
code: (FenceScope)

dialogue_id: 320a1a29-c2ea-40c3-9253-175af766d515, turn: 1
user: what's the weather in my area?
agent: It is clear with a temperature of 0.00 °F right now.
code: (Yield :output (WeatherQueryApi :place (AtPlace :place (Here)) :time (?= (Now))))

test issue

(this is a test issue)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.