Comments (11)
The jiwer.cer
method should be available from version 2.3.0 onwards.
from jiwer.
I think I found a bug with the way WER is calculated after I used your method:
wer(['h', 'e', 'l', 'l', 'o'],['h', 'e', 'l', 'l', 'o', '@', 't', 'h', 'e', 'r', 'e']) == 1.2
@alnah005 what should the correct answer be? The truth is N=5, the hypothesis needs 6 deletions, so 6/5 = 1.2
from jiwer.
The same tool also returns values larger than 1, as expected:
from datasets import load_metric
metric = load_metric("wer")
print(metric)
result = metric.compute(predictions=['hello hello hello hello'],references=['hello'])
print(result)
# prints > 3.0
You're always free to clip values to be between 0 and 1.
from jiwer.
I suppose splitting the hypothesis and true transcripts into a list of characters would accomplish this, no? Should be simple enough to add a transformation to do that
from jiwer.
Hi all, could you please check if my implementation of CER over multiple sentences is correct? Not much information of how to do this properly out there.
from jiwer import wer
ground_truth = ["hello world", "i like monthy python"]
hypothesis = ["hello duck", "i like python"]
ground_truth = [char for seq in ground_truth for char in seq]
hypothesis = [char for seq in hypothesis for char in seq]
error = wer(ground_truth, hypothesis)
from jiwer.
from jiwer import wer ground_truth = ["hello world", "i like monthy python"] hypothesis = ["hello duck", "i like python"] ground_truth = [char for seq in ground_truth for char in seq] hypothesis = [char for seq in hypothesis for char in seq] error = wer(ground_truth, hypothesis)
This won't count space, i.e.,
hypothesis = ["hello duck", "i like python"]
will have the same CER with
hypothesis = ["h ello duck", "i like python"]
One workaround would be replace the space with some oov character, (e.g., @):
ground_truth = map(lambda s: s.replace(" ", "@"), ground_truth)
hypothesis = map(lambda s: s.replace(" ", "@"), hypothesis)
from jiwer.
from jiwer import wer ground_truth = ["hello world", "i like monthy python"] hypothesis = ["hello duck", "i like python"] ground_truth = [char for seq in ground_truth for char in seq] hypothesis = [char for seq in hypothesis for char in seq] error = wer(ground_truth, hypothesis)This won't count space, i.e.,
hypothesis = ["hello duck", "i like python"]
will have the same CER with
hypothesis = ["h ello duck", "i like python"]
One workaround would be replace the space with some oov character, (e.g., @):
ground_truth = map(lambda s: s.replace(" ", "@"), ground_truth) hypothesis = map(lambda s: s.replace(" ", "@"), hypothesis)
I think I found a bug with the way WER is calculated after I used your method:
wer(['h', 'e', 'l', 'l', 'o'],['h', 'e', 'l', 'l', 'o', '@', 't', 'h', 'e', 'r', 'e']) == 1.2
from jiwer.
I think I found a bug with the way WER is calculated after I used your method:
wer(['h', 'e', 'l', 'l', 'o'],['h', 'e', 'l', 'l', 'o', '@', 't', 'h', 'e', 'r', 'e']) == 1.2
@alnah005 what should the correct answer be? The truth is N=5, the hypothesis needs 6 deletions, so 6/5 = 1.2
Maybe this is how WER is defined. Rates are often between 0 and 1 so having numbers above 1 I think are misleading. I think dividing by the bigger length of the strings/arrays could be more helpful, but I could be wrong. So instead of 6/5 it would be 6/max(5,11)
from jiwer.
The WER is defined to be between 0 and infinity. See https://en.wikipedia.org/wiki/Word_error_rate. It often takes contrived examples to get a WER above 1, in practice most systems perform adequate enough to have a WER << 1.
It doesn't make sense to me to define the length of the ground truth string based on the larger input string, which is what you propose with max(5, 11)
from jiwer.
The WER is defined to be between 0 and infinity. See https://en.wikipedia.org/wiki/Word_error_rate. It often takes contrived examples to get a WER above 1, in practice most systems perform adequate enough to have a WER << 1.
It doesn't make sense to me to define the length of the ground truth string based on the larger input string, which is what you propose with
max(5, 11)
Here's a tool that defines it between 0 and 1. The whole thing about wer using Levenshtein distance is to measure how far away the target is from the prediction, it seems to me that measuring how far away the prediction is from the target should be symmetric. It's just not helpful, at least for me, to get a value greater than 1 in some examples. I'm currently running an experiment with many examples, which would mean that some examples could get a higher weight when I take the average.
from jiwer.
The same tool also returns values larger than 1, as expected:
from datasets import load_metric metric = load_metric("wer") print(metric) result = metric.compute(predictions=['hello hello hello hello'],references=['hello']) print(result) # prints > 3.0
You're always free to clip values to be between 0 and 1.
Their documentation is wrong haha. I'll probably clip as you suggested.
from jiwer.
Related Issues (20)
- SentencesToListOfWords is removed after 2.2.0 HOT 8
- RemovePunctuation does not remove smart/curly quotes HOT 2
- Avoid error when a string in the truth is empty after transformation HOT 2
- Alignment options similar to `fstalign` HOT 1
- Batch vs Individual results are not same HOT 6
- Update Levenshtein dependency to maintained version
- Major performance regression in 2.5.0 for jiwer.transforms.RemovePunctuation HOT 2
- jiwer WER runs very fast , compared to Torchmetrics WER how? HOT 1
- Current licenses might not be allowed HOT 2
- jiwer.visualize_measures doesn't work as in the docs HOT 2
- Version 3.0.0 can produce wrong results HOT 1
- Regarding visualize_alignment() function. HOT 1
- Apparent WER bug? HOT 2
- Update rapidfuzz version HOT 1
- jiwer gives an error when passed a very long list of strings HOT 6
- Can't
- jiwer.wer(outputs_true, outputs_pred, standardize=True) HOT 1
- Is it possible just to get the number of errors? HOT 3
- [CLI] Allow CER calculation with global alignement HOT 1
- Add option to split alignment output based on maximum character length
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jiwer.