Will simple CER be implemented? Seems like it is a common metric as well.

The same tool also returns values larger than 1, as expected: <div class="snippet-

I suppose splitting the hypothesis and true trans into a list of characters woul

<div class="highlight highlight-source-python notranslate position-relative overflow-au

<div class="highlight highlight-source-python notranslate position-relativ

The WER is defined to be between 0 and infinity. See <a href="https://en.wikipedia.org

The WER is defined to be between 0 and infinity. See <a href="https://en.

Character Error Rate? about jiwer HOT 11 CLOSED

jitsi commented on June 16, 2024 6

Character Error Rate?

from jiwer.

Comments (11)

nikvaessen commented on June 16, 2024 4

The jiwer.cer method should be available from version 2.3.0 onwards.

from jiwer.

nikvaessen commented on June 16, 2024 1

I think I found a bug with the way WER is calculated after I used your method:
wer(['h', 'e', 'l', 'l', 'o'],['h', 'e', 'l', 'l', 'o', '@', 't', 'h', 'e', 'r', 'e']) == 1.2

@alnah005 what should the correct answer be? The truth is N=5, the hypothesis needs 6 deletions, so 6/5 = 1.2

from jiwer.

nikvaessen commented on June 16, 2024 1

The same tool also returns values larger than 1, as expected:

from datasets import load_metric

metric = load_metric("wer")

print(metric)

result = metric.compute(predictions=['hello hello hello hello'],references=['hello'])
print(result)
# prints > 3.0

You're always free to clip values to be between 0 and 1.

from jiwer.

alexcannan commented on June 16, 2024

I suppose splitting the hypothesis and true transcripts into a list of characters would accomplish this, no? Should be simple enough to add a transformation to do that

from jiwer.

chutaklee commented on June 16, 2024

Hi all, could you please check if my implementation of CER over multiple sentences is correct? Not much information of how to do this properly out there.

from jiwer import wer
ground_truth = ["hello world", "i like monthy python"]
hypothesis = ["hello duck", "i like python"]

ground_truth = [char for seq in ground_truth for char in seq]
hypothesis = [char for seq in hypothesis for char in seq]

error = wer(ground_truth, hypothesis)

from jiwer.

enhuiz commented on June 16, 2024

from jiwer import wer
ground_truth = ["hello world", "i like monthy python"]
hypothesis = ["hello duck", "i like python"]

ground_truth = [char for seq in ground_truth for char in seq]
hypothesis = [char for seq in hypothesis for char in seq]

error = wer(ground_truth, hypothesis)

This won't count space, i.e.,

hypothesis = ["hello duck", "i like python"]

will have the same CER with

hypothesis = ["h ello duck", "i like python"]

One workaround would be replace the space with some oov character, (e.g., @):

ground_truth = map(lambda s: s.replace(" ", "@"), ground_truth)
hypothesis = map(lambda s: s.replace(" ", "@"), hypothesis)

from jiwer.

alnah005 commented on June 16, 2024

from jiwer import wer
ground_truth = ["hello world", "i like monthy python"]
hypothesis = ["hello duck", "i like python"]

ground_truth = [char for seq in ground_truth for char in seq]
hypothesis = [char for seq in hypothesis for char in seq]

error = wer(ground_truth, hypothesis)

This won't count space, i.e.,

hypothesis = ["hello duck", "i like python"]

will have the same CER with

hypothesis = ["h ello duck", "i like python"]

One workaround would be replace the space with some oov character, (e.g., @):

ground_truth = map(lambda s: s.replace(" ", "@"), ground_truth)
hypothesis = map(lambda s: s.replace(" ", "@"), hypothesis)

I think I found a bug with the way WER is calculated after I used your method:

wer(['h', 'e', 'l', 'l', 'o'],['h', 'e', 'l', 'l', 'o', '@', 't', 'h', 'e', 'r', 'e']) == 1.2

from jiwer.

alnah005 commented on June 16, 2024

I think I found a bug with the way WER is calculated after I used your method:
wer(['h', 'e', 'l', 'l', 'o'],['h', 'e', 'l', 'l', 'o', '@', 't', 'h', 'e', 'r', 'e']) == 1.2
@alnah005 what should the correct answer be? The truth is N=5, the hypothesis needs 6 deletions, so 6/5 = 1.2

Maybe this is how WER is defined. Rates are often between 0 and 1 so having numbers above 1 I think are misleading. I think dividing by the bigger length of the strings/arrays could be more helpful, but I could be wrong. So instead of 6/5 it would be 6/max(5,11)

from jiwer.

nikvaessen commented on June 16, 2024

The WER is defined to be between 0 and infinity. See https://en.wikipedia.org/wiki/Word_error_rate. It often takes contrived examples to get a WER above 1, in practice most systems perform adequate enough to have a WER << 1.

It doesn't make sense to me to define the length of the ground truth string based on the larger input string, which is what you propose with max(5, 11)

from jiwer.

alnah005 commented on June 16, 2024

The WER is defined to be between 0 and infinity. See https://en.wikipedia.org/wiki/Word_error_rate. It often takes contrived examples to get a WER above 1, in practice most systems perform adequate enough to have a WER << 1.

It doesn't make sense to me to define the length of the ground truth string based on the larger input string, which is what you propose with max(5, 11)

Here's a tool that defines it between 0 and 1. The whole thing about wer using Levenshtein distance is to measure how far away the target is from the prediction, it seems to me that measuring how far away the prediction is from the target should be symmetric. It's just not helpful, at least for me, to get a value greater than 1 in some examples. I'm currently running an experiment with many examples, which would mean that some examples could get a higher weight when I take the average.

from jiwer.

alnah005 commented on June 16, 2024

The same tool also returns values larger than 1, as expected:
from datasets import load_metric

metric = load_metric("wer")

print(metric)

result = metric.compute(predictions=['hello hello hello hello'],references=['hello'])
print(result)
# prints > 3.0
You're always free to clip values to be between 0 and 1.

Their documentation is wrong haha. I'll probably clip as you suggested.

from jiwer.

Character Error Rate? about jiwer HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent