Giter Site home page Giter Site logo

li3cmz / grade Goto Github PK

View Code? Open in Web Editor NEW
58.0 2.0 9.0 1.96 MB

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

Python 96.65% Shell 2.03% Java 0.46% Perl 0.20% Makefile 0.35% Batchfile 0.31%
emnlp2020 dialogue-metric open-domain

grade's Introduction

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

This repository contains the source code for the following paper:

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
Lishan Huang, Zheng Ye, Jinghui Qin, Xiaodan Liang; EMNLP 2020

Model Overview

GRADE

Prerequisites

Create virtural environment (recommended):

conda create -n GRADE python=3.6
source activate GRADE

Install the required packages:

pip install -r requirements.txt

Install Texar locally:

cd texar-pytorch
pip install .

Note: Make sure that your environment has installed cuda 10.1.

Data Preparation

GRADE is trained on the DailyDialog Dataset proposed by (Li et al.,2017).

For convenience, we provide the processed data of DailyDialog. And you should also download it and unzip into the data directory. And you should also download tools and unzip it into the root directory of this repo.

If you wanna prepare the training data from scratch, please follow the steps:

  1. Install Lucene;
  2. Run the preprocessing script:
cd ./script
bash preprocess_training_dataset.sh

Training

To train GRADE, please run the following script:

cd ./script
bash train.sh

Note that the checkpoint of our final GRADE is provided. You could download it and unzip into the root directory.

Evaluation

We evaluate GRADE and other baseline metrics on three chit-chat datasets (DailyDialog, ConvAI2 and EmpatheticDialogues). The corresponding evaluation data in the evaluation directory has the following file structure:

.
└── evaluation
    └── eval_data
    |   └── DIALOG_DATASET_NAME
    |       └── DIALOG_MODEL_NAME
    |           └── human_ctx.txt
    |           └── human_hyp.txt
    └── human_score
        └── DIALOG_DATASET_NAME
        |   └── DIALOG_MODEL_NAME
        |       └── human_score.txt
        └── human_judgement.json

Note: the entire human judgement data we proposed for metric evaluation is in human_judgement.json.

To evaluate GRADE, please run the following script:

cd ./script
bash eval.sh

Using GRADE

To use GRADE on your own dialog dataset:

  1. Put the whole dataset (raw data) into ./preprocess/dataset;
  2. Update the function load_dataset in ./preprocess/extract_keywords.py for loading the dataset;
  3. Prepare the context-response data that you want to evaluate and convert it into the following format:
.
└── evaluation
    └── eval_data
        └── YOUR_DIALOG_DATASET_NAME
            └── YOUR_DIALOG_MODEL_NAME
                ├── human_ctx.txt
                └── human_hyp.txt
  1. Run the following script to evaluate the context-response data with GRADE:
cd ./script
bash inference.sh
  1. Lastly, the scores given by GRADE can be found as below:
.
└── evaluation
    └── infer_result
        └── YOUR_DIALOG_DATASET_NAME
            └── YOUR_DIALOG_MODEL_NAME
                ├── non_reduced_results.json
                └── reduced_results.json

grade's People

Contributors

james-yip avatar li3cmz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

grade's Issues

cuda版本问题

您好,是不是必须使用cuda 10.1呢,cpu上或者cuda11.1可以吗

关于lucene版本的询问

您好,我们最近在用您的代码复现GRADE的时候(from scratch)遇到了一点小问题,问题主要出在lexical sampling的实现上,由于lucene的版本不同,对应java代码会有区别(包的引入等),请问您在做lexical sampling时用的是哪一个版本的lucene呢,万分感谢!

Segmentation fault (core dumped)

在运行 bash train.sh 时,老是报错 Segmentation fault (core dumped)。不知道您在运行是否有该问题,如何解决的呢?

Questions regarding training of the model

Hello! First, let me thank you for the paper and the code! The paper is cogent and inspiring and got me to work on a project that applies the idea in a different domain. I have some questions regarding the training process of the GRADE model using the margin ranking loss. For your information, I'm using my own implementation of the GRADE model minus the graph reasoning module.

(Q1) Have you evaluated the scoring model in terms of the discrimination (positive vs negative) accuracy? When I'm training my model, I'm exactly not sure what kind of performance in this task leads to a well-performing scoring model.

(Q2) In your experience, what does your loss curve look like? I noticed that during my training, the training loss collapses to the margin value, and I've read that this is because the model is defaulting to output highly similar scores for positive and negative examples, resulting in the loss value = margin. I've tried different things to counter this, but I'm not even sure if this is expected or not, so I was hoping to hear more from you about the dynamics of the training process if possible.

Thank you again for your contribution and hard work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.