[BUG] The result of SUMBT model about convlab-2 HOT 5 CLOSED

thu-coai commented on June 27, 2024

[BUG] The result of SUMBT model

from convlab-2.

Comments (5)

zyds commented on June 27, 2024

def reformat_state(state):
    if 'belief_state' in state:
        state = state['belief_state']
    new_state = []
    for domain in state.keys():
        domain_data = state[domain]
        if 'semi' in domain_data:
            domain_data = domain_data['semi']
            for slot in domain_data.keys():
                val = domain_data[slot]
                if val is not None and val not in ['', 'not mentioned', '未提及', '未提到', '没有提到']:
                    new_state.append(domain + '-' + slot + '-' + val)
    # lower
    new_state = [item.lower() for item in new_state]
    return new_state

This code is in dst/evaluate.py, I want to know about dialog state, can it be calculated using only the semi part?

from convlab-2.

zqwerty commented on June 27, 2024

@function2-llx please look at this

from convlab-2.

zqwerty commented on June 27, 2024

Hi! When I tried to evaluate the translation training SUMBT model, I found that the eval mode was not set, which had a certain impact on the results. According to the results of my local test, I found that there is a difference of two points on the MutliWOZ-zh human-val dataset. I think it may be necessary to re-evaluate the SUMBT model after modifying the code. The current results
are not real model performance.

My Local Result on MultiWOZ-zh
not set eval mode
{'Joint Acc': 0.4821722435545804, 'Turn Acc': 0.9738983360760534, 'Joint F1': 0.8826705748001639}
set eval mode
{'Joint Acc': 0.49972572682391664, 'Turn Acc': 0.9751935149631128, 'Joint F1': 0.8885012208542876}

How do you get the above result? by running dst/evaluate.py ?

from convlab-2.

zyds commented on June 27, 2024

Hi! When I tried to evaluate the translation training SUMBT model, I found that the eval mode was not set, which had a certain impact on the results. According to the results of my local test, I found that there is a difference of two points on the MutliWOZ-zh human-val dataset. I think it may be necessary to re-evaluate the SUMBT model after modifying the code. The current results
are not real model performance.
My Local Result on MultiWOZ-zh
not set eval mode
{'Joint Acc': 0.4821722435545804, 'Turn Acc': 0.9738983360760534, 'Joint F1': 0.8826705748001639}
set eval mode
{'Joint Acc': 0.49972572682391664, 'Turn Acc': 0.9751935149631128, 'Joint F1': 0.8885012208542876}

How do you get the above result? by running dst/evaluate.py ?

Yes, but the results I report are not using the pre training model provided by the project. However, using the pre training model provided by the project, I also got similar results.

from convlab-2.

zqwerty commented on June 27, 2024

update SUMBT & test results #69

from convlab-2.

[BUG] The result of SUMBT model about convlab-2 HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent