Giter Site home page Giter Site logo

Comments (4)

Lillliant avatar Lillliant commented on September 7, 2024

Hi @farinamhz, I've calculated the statistics for twitter dataset and google translate's review files, which I have uploaded to the OneDrive paths under LADy0.2.0.1 > statistics.

from lady.

farinamhz avatar farinamhz commented on September 7, 2024

Previous conversation:

[Thursday 9:16 PM] Christine Wong
Hi Farinam Hemmati Zadeh, I unfortunately cannot make it to Friday's progress meeting this week, but I've added my progress to the issues pages and have made a PR so it can be reviewed. In addition to the questions there, I've also noticed that the exact match metrics calculated using LADy0.2.0.0's semeval datasets are different from the semeval+ statistics. Would this be something to be concerned about?

[Friday 11:59 AM] Farinam Hemmati Zadeh
Hey Christine, no worries! Thank you very much for the update and your work! You mean the newly translated reviews are different from the previously translated ones, right? If so, it is right as the translated reviews for LADy0.2.0.1 are from a new translator (googletranslate).

[Friday 12:42 PM] Farinam Hemmati Zadeh
Christine Wong I just realized that you said LADy0.2.0.0. LADy0.2.0.0 should be the same as before as only twitter has been added to this version. However, I wanted you to calculate the metrics for LADy0.2.0.1 which is for googletranslate results. I think there was a confusion between these two. Which one have you calculated now?
[Friday 12:44 PM] Farinam Hemmati Zadeh
In fact, previous results should not have any significant difference in compare with LADy 0.2.0.0. Did they have? Christine
[Friday 9:06 PM] Christine Wong
Hi Farinam Hemmati Zadeh, I've calculated the twitter metrics based on the LADy0.2.0.0 which was put into the readme.md. I've also calculated all the datasets (semeval + twitter) for the googletranslate results in LADy0.2.0.1 which are not in readme.md but uploaded to OneDrive.

[Friday 9:16 PM] Christine Wong
The results aren't too different (around 0.01 difference compare from LADy0.2.0.0 to the data in the readme.md), but I was wondering if it was alright to "mix" the results together, since it seems the twitter metrics may look different if it was produced at the same time as the metrics from the readme.

[Friday 9:19 PM] Christine Wong
I've also attached a run with the semeval15/16's result for better comparison: it seems like the newer version have higher em metric, which might be a good thing since it might suggest similar sentence structure, etc.?

from lady.

farinamhz avatar farinamhz commented on September 7, 2024

Hey @Lillliant,
Let's continue here.
I appreciate the updates you've provided. Everything is going well, but I'm facing some confusion regarding the calculation of BLEU and ROUGE scores. It appears that when dealing with longer tweets with diverse contexts, they do not yield accurate exact match results in comparison with semeval datasets. Perhaps we should consider exploring alternative metrics like BLEU and ROUGE in this context. So I want to make sure what are the inputs of BLEU and ROUGE.

from lady.

Lillliant avatar Lillliant commented on September 7, 2024

Hi @farinamhz, sure! I've attached my run of the twitter (LADy0.2.0.0) dataset here:

For bleu metrics:

dataset pes_Arab_bleu zho_Hans_bleu deu_Latn_bleu arb_Arab_bleu fra_Latn_bleu spa_Latn_bleu
twitter 0.2110 0.1892 0.4025 0.33383 0.3891 0.4439
semeval-2016-restaurant 0.3746 0.3065 0.5435 0.4465 0.5314 0.5864
semeval-2015-restaurant 0.3787 0.3080 0.5514 0.4523 0.5318 0.5895

For rouge metrics:

dataset pes_Arab_rouge_f zho_Hans_rouge_f deu_Latn_rouge_f arb_Arab_rouge_f fra_Latn_rouge_f spa_Latn_rouge_f
twitter 0.1889 0.1677 0.3307 0.2589 0.3117 0.3596
semeval-2016-restaurant 0.2802 0.2224 0.4258 0.3360 0.4089 0.4628
semeval-2015-restaurant 0.2783 0.2224 0.4332 0.3387 0.4076 0.4661

Here, the bleu metrics we had was obtained by computing the average of the bleu score means for each sentence calculated using weight=[(1.0,), (0.5, 0.5), (0.3333, 0.3333, 0.3333)].

The rouge metrics are obtained by computing the average of the F1-score means from rouge-1 to rouge-5 for each sentence.

from lady.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.