Giter Site home page Giter Site logo

This code is 100x faster: about jiwer HOT 8 CLOSED

jitsi avatar jitsi commented on June 8, 2024 4
This code is 100x faster:

from jiwer.

Comments (8)

DevZiegler avatar DevZiegler commented on June 8, 2024 1

The WER is defined as follows:
grafik
See also here: Word error rate WIkipedia

Your code would have to be adapted like this:
return Levenshtein.distance(''.join(w1), ''.join(w2))/float(len(w1))

from jiwer.

nikvaessen avatar nikvaessen commented on June 8, 2024

For people confused about the import: pip install python-Levenshtein

from jiwer.

buriy avatar buriy commented on June 8, 2024

@DevZiegler I only showed the way to make the code faster.
Complete the code at your own.

from jiwer.

nikvaessen avatar nikvaessen commented on June 8, 2024

Current benchmark:

executing wer with 1 sentences:
	mean=0.0003 sec std=0.0000 sec
executing wer with 10 sentences:
	mean=0.0134 sec std=0.0006 sec
executing wer with 50 sentences:
	mean=0.3261 sec std=0.0024 sec
executing wer with 100 sentences:
	mean=1.3144 sec std=0.0046 sec

from jiwer.

nikvaessen avatar nikvaessen commented on June 8, 2024

Your code:

executing wer with 1 sentences:
	mean=0.0000 sec std=0.0000 sec
executing wer with 10 sentences:
	mean=0.0001 sec std=0.0000 sec
executing wer with 50 sentences:
	mean=0.0022 sec std=0.0001 sec
executing wer with 100 sentences:
	mean=0.0079 sec std=0.0002 sec

from jiwer.

gabrielziegler3 avatar gabrielziegler3 commented on June 8, 2024

Although this solution is indeed much faster, it adds a C++ dependency. It would be nice if there was the possibility to use the old wer calculation without this dependency, maybe an option to choose wether to use C-level Levenshtein module or not.

from jiwer.

nikvaessen avatar nikvaessen commented on June 8, 2024

from jiwer.

gabrielziegler3 avatar gabrielziegler3 commented on June 8, 2024

In an environment where C++ dependencies' installation are blocked because they are installed at system-level, but pip packages are enabled for instance. Does it make sense?

In such a case, the following error is raised because I don't have C++ dependencies.

Complete output (27 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win32-3.8
  creating build\lib.win32-3.8\Levenshtein
  copying Levenshtein\StringMatcher.py -> build\lib.win32-3.8\Levenshtein
  copying Levenshtein\__init__.py -> build\lib.win32-3.8\Levenshtein
  running egg_info
  writing python_Levenshtein.egg-info\PKG-INFO
  writing dependency_links to python_Levenshtein.egg-info\dependency_links.txt
  writing entry points to python_Levenshtein.egg-info\entry_points.txt
  writing namespace_packages to python_Levenshtein.egg-info\namespace_packages.txt
  writing requirements to python_Levenshtein.egg-info\requires.txt
  writing top-level names to python_Levenshtein.egg-info\top_level.txt
  reading manifest file 'python_Levenshtein.egg-info\SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  warning: no previously-included files matching '*pyc' found anywhere in distribution
  warning: no previously-included files matching '*so' found anywhere in distribution
  warning: no previously-included files matching '.project' found anywhere in distribution
  warning: no previously-included files matching '.pydevproject' found anywhere in distributi
on
  writing manifest file 'python_Levenshtein.egg-info\SOURCES.txt'
  copying Levenshtein\_levenshtein.c -> build\lib.win32-3.8\Levenshtein
  copying Levenshtein\_levenshtein.h -> build\lib.win32-3.8\Levenshtein
  running build_ext
  building 'Levenshtein._levenshtein' extension
  error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build T
ools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
  ----------------------------------------
  ERROR: Failed building wheel for python-Levenshtein

My workaround was to checkout wer function to commit 2f1daee that is right before the Levenshtein implementation

from jiwer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.