Giter Site home page Giter Site logo

Character Error Rate? about jiwer HOT 11 CLOSED

jitsi avatar jitsi commented on June 16, 2024 6
Character Error Rate?

from jiwer.

Comments (11)

nikvaessen avatar nikvaessen commented on June 16, 2024 4

The jiwer.cer method should be available from version 2.3.0 onwards.

from jiwer.

nikvaessen avatar nikvaessen commented on June 16, 2024 1

I think I found a bug with the way WER is calculated after I used your method:

wer(['h', 'e', 'l', 'l', 'o'],['h', 'e', 'l', 'l', 'o', '@', 't', 'h', 'e', 'r', 'e']) == 1.2

@alnah005 what should the correct answer be? The truth is N=5, the hypothesis needs 6 deletions, so 6/5 = 1.2

from jiwer.

nikvaessen avatar nikvaessen commented on June 16, 2024 1

The same tool also returns values larger than 1, as expected:

from datasets import load_metric

metric = load_metric("wer")

print(metric)

result = metric.compute(predictions=['hello hello hello hello'],references=['hello'])
print(result)
# prints > 3.0

You're always free to clip values to be between 0 and 1.

from jiwer.

alexcannan avatar alexcannan commented on June 16, 2024

I suppose splitting the hypothesis and true transcripts into a list of characters would accomplish this, no? Should be simple enough to add a transformation to do that

from jiwer.

chutaklee avatar chutaklee commented on June 16, 2024

Hi all, could you please check if my implementation of CER over multiple sentences is correct? Not much information of how to do this properly out there.

from jiwer import wer
ground_truth = ["hello world", "i like monthy python"]
hypothesis = ["hello duck", "i like python"]

ground_truth = [char for seq in ground_truth for char in seq]
hypothesis = [char for seq in hypothesis for char in seq]

error = wer(ground_truth, hypothesis)

from jiwer.

enhuiz avatar enhuiz commented on June 16, 2024
from jiwer import wer
ground_truth = ["hello world", "i like monthy python"]
hypothesis = ["hello duck", "i like python"]

ground_truth = [char for seq in ground_truth for char in seq]
hypothesis = [char for seq in hypothesis for char in seq]

error = wer(ground_truth, hypothesis)

This won't count space, i.e.,

hypothesis = ["hello duck", "i like python"]

will have the same CER with

hypothesis = ["h ello duck", "i like python"]

One workaround would be replace the space with some oov character, (e.g., @):

ground_truth = map(lambda s: s.replace(" ", "@"), ground_truth)
hypothesis = map(lambda s: s.replace(" ", "@"), hypothesis)

from jiwer.

alnah005 avatar alnah005 commented on June 16, 2024
from jiwer import wer
ground_truth = ["hello world", "i like monthy python"]
hypothesis = ["hello duck", "i like python"]

ground_truth = [char for seq in ground_truth for char in seq]
hypothesis = [char for seq in hypothesis for char in seq]

error = wer(ground_truth, hypothesis)

This won't count space, i.e.,

hypothesis = ["hello duck", "i like python"]

will have the same CER with

hypothesis = ["h ello duck", "i like python"]

One workaround would be replace the space with some oov character, (e.g., @):

ground_truth = map(lambda s: s.replace(" ", "@"), ground_truth)
hypothesis = map(lambda s: s.replace(" ", "@"), hypothesis)

I think I found a bug with the way WER is calculated after I used your method:

wer(['h', 'e', 'l', 'l', 'o'],['h', 'e', 'l', 'l', 'o', '@', 't', 'h', 'e', 'r', 'e']) == 1.2

from jiwer.

alnah005 avatar alnah005 commented on June 16, 2024

I think I found a bug with the way WER is calculated after I used your method:

wer(['h', 'e', 'l', 'l', 'o'],['h', 'e', 'l', 'l', 'o', '@', 't', 'h', 'e', 'r', 'e']) == 1.2

@alnah005 what should the correct answer be? The truth is N=5, the hypothesis needs 6 deletions, so 6/5 = 1.2

Maybe this is how WER is defined. Rates are often between 0 and 1 so having numbers above 1 I think are misleading. I think dividing by the bigger length of the strings/arrays could be more helpful, but I could be wrong. So instead of 6/5 it would be 6/max(5,11)

from jiwer.

nikvaessen avatar nikvaessen commented on June 16, 2024

The WER is defined to be between 0 and infinity. See https://en.wikipedia.org/wiki/Word_error_rate. It often takes contrived examples to get a WER above 1, in practice most systems perform adequate enough to have a WER << 1.

It doesn't make sense to me to define the length of the ground truth string based on the larger input string, which is what you propose with max(5, 11)

from jiwer.

alnah005 avatar alnah005 commented on June 16, 2024

The WER is defined to be between 0 and infinity. See https://en.wikipedia.org/wiki/Word_error_rate. It often takes contrived examples to get a WER above 1, in practice most systems perform adequate enough to have a WER << 1.

It doesn't make sense to me to define the length of the ground truth string based on the larger input string, which is what you propose with max(5, 11)

Here's a tool that defines it between 0 and 1. The whole thing about wer using Levenshtein distance is to measure how far away the target is from the prediction, it seems to me that measuring how far away the prediction is from the target should be symmetric. It's just not helpful, at least for me, to get a value greater than 1 in some examples. I'm currently running an experiment with many examples, which would mean that some examples could get a higher weight when I take the average.

from jiwer.

alnah005 avatar alnah005 commented on June 16, 2024

The same tool also returns values larger than 1, as expected:

from datasets import load_metric

metric = load_metric("wer")

print(metric)

result = metric.compute(predictions=['hello hello hello hello'],references=['hello'])
print(result)
# prints > 3.0

You're always free to clip values to be between 0 and 1.

Their documentation is wrong haha. I'll probably clip as you suggested.

from jiwer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.