Comments (8)
The WER is defined as follows:
See also here: Word error rate WIkipedia
Your code would have to be adapted like this:
return Levenshtein.distance(''.join(w1), ''.join(w2))/float(len(w1))
from jiwer.
For people confused about the import: pip install python-Levenshtein
from jiwer.
@DevZiegler I only showed the way to make the code faster.
Complete the code at your own.
from jiwer.
Current benchmark:
executing wer with 1 sentences:
mean=0.0003 sec std=0.0000 sec
executing wer with 10 sentences:
mean=0.0134 sec std=0.0006 sec
executing wer with 50 sentences:
mean=0.3261 sec std=0.0024 sec
executing wer with 100 sentences:
mean=1.3144 sec std=0.0046 sec
from jiwer.
Your code:
executing wer with 1 sentences:
mean=0.0000 sec std=0.0000 sec
executing wer with 10 sentences:
mean=0.0001 sec std=0.0000 sec
executing wer with 50 sentences:
mean=0.0022 sec std=0.0001 sec
executing wer with 100 sentences:
mean=0.0079 sec std=0.0002 sec
from jiwer.
Although this solution is indeed much faster, it adds a C++ dependency. It would be nice if there was the possibility to use the old wer calculation without this dependency, maybe an option to choose wether to use C-level Levenshtein module or not.
from jiwer.
from jiwer.
In an environment where C++ dependencies' installation are blocked because they are installed at system-level, but pip packages are enabled for instance. Does it make sense?
In such a case, the following error is raised because I don't have C++ dependencies.
Complete output (27 lines):
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win32-3.8
creating build\lib.win32-3.8\Levenshtein
copying Levenshtein\StringMatcher.py -> build\lib.win32-3.8\Levenshtein
copying Levenshtein\__init__.py -> build\lib.win32-3.8\Levenshtein
running egg_info
writing python_Levenshtein.egg-info\PKG-INFO
writing dependency_links to python_Levenshtein.egg-info\dependency_links.txt
writing entry points to python_Levenshtein.egg-info\entry_points.txt
writing namespace_packages to python_Levenshtein.egg-info\namespace_packages.txt
writing requirements to python_Levenshtein.egg-info\requires.txt
writing top-level names to python_Levenshtein.egg-info\top_level.txt
reading manifest file 'python_Levenshtein.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '*pyc' found anywhere in distribution
warning: no previously-included files matching '*so' found anywhere in distribution
warning: no previously-included files matching '.project' found anywhere in distribution
warning: no previously-included files matching '.pydevproject' found anywhere in distributi
on
writing manifest file 'python_Levenshtein.egg-info\SOURCES.txt'
copying Levenshtein\_levenshtein.c -> build\lib.win32-3.8\Levenshtein
copying Levenshtein\_levenshtein.h -> build\lib.win32-3.8\Levenshtein
running build_ext
building 'Levenshtein._levenshtein' extension
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build T
ools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
----------------------------------------
ERROR: Failed building wheel for python-Levenshtein
My workaround was to checkout wer function to commit 2f1daee that is right before the Levenshtein implementation
from jiwer.
Related Issues (20)
- Don't support Chinese? HOT 4
- AttributeError: module 'jiwer' has no attribute 'cer'
- SentencesToListOfWords is removed after 2.2.0 HOT 8
- RemovePunctuation does not remove smart/curly quotes HOT 2
- Avoid error when a string in the truth is empty after transformation HOT 2
- Alignment options similar to `fstalign` HOT 1
- Batch vs Individual results are not same HOT 6
- Update Levenshtein dependency to maintained version
- Major performance regression in 2.5.0 for jiwer.transforms.RemovePunctuation HOT 2
- jiwer WER runs very fast , compared to Torchmetrics WER how? HOT 1
- Current licenses might not be allowed HOT 2
- jiwer.visualize_measures doesn't work as in the docs HOT 2
- Version 3.0.0 can produce wrong results HOT 1
- Regarding visualize_alignment() function. HOT 1
- Apparent WER bug? HOT 2
- Update rapidfuzz version HOT 1
- jiwer gives an error when passed a very long list of strings HOT 6
- Can't
- jiwer.wer(outputs_true, outputs_pred, standardize=True) HOT 1
- Is it possible just to get the number of errors? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jiwer.