Giter Site home page Giter Site logo

wuqifan1988 / pocketsphinx Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cmusphinx/pocketsphinx

0.0 0.0 0.0 205.8 MB

A small speech recognizer

License: Other

Shell 0.63% C++ 0.84% Python 1.91% Perl 0.52% C 91.10% CMake 0.73% Yacc 0.17% Cython 2.78% Dockerfile 0.02% Roff 1.25% ReScript 0.05%

pocketsphinx's Introduction

PocketSphinx 5.0.0 release candidate 2

This is PocketSphinx, one of Carnegie Mellon University's open source large vocabulary, speaker-independent continuous speech recognition engines.

Although this was at one point a research system, active development has largely ceased and it has become very, very far from the state of the art. I am making a release, because people are nonetheless using it, and there are a number of historical errors in the build system and API which needed to be corrected.

The version number is strangely large because there was a "release" that people are using called 5prealpha, and we will use proper semantic versioning from now on.

Please see the LICENSE file for terms of use.

Installation

We now use CMake for building, which should give reasonable results across Linux and Windows. Not certain about Mac OS X because I don't have one of those. In addition, the audio library, which never really built or worked correctly on any platform at all, has simply been removed.

There is no longer any dependency on SphinxBase, because there is no reason for SphinxBase to exist. You can just link against the PocketSphinx library, which now includes all of its functionality.

To install the Python module in a virtual environment (replace ~/ve_pocketsphinx with the virtual environment you wish to create), from the top level directory:

python3 -m venv ~/ve_pocketsphinx
. ~/ve_pocketsphinx/bin/activate
pip install .

To install the C library and bindings (assuming you have access to /usr/local - if not, use -DCMAKE_INSTALL_PREFIX to set a different prefix in the first cmake command below):

cmake -S . -B build
cmake --build build
cmake --build build --target install

Usage

The pocketsphinx command-line program reads single-channel 16-bit PCM audio from standard input and attemps to recognize speech in it using the default acoustic and language model. It accepts a large number of options which you probably don't care about, and a command which defaults to live. The commands are as follows:

  • live: Detect speech segments in standard input, run recognition on them (using those options you don't care about), and write the results to standard output in line-delimited JSON. I realize this isn't the prettiest format, but it sure beats XML. Each line contains a JSON object with these fields, which have short names to make the lines more readable:

    • a: Start time in seconds, from the beginning of the stream
    • e: End time in seconds, from the beginning of the stream
    • p: Posterior probability of utterance
    • t: Full text of output
    • w: List of segments (usually words), each of which in turn contains the a, e, p, and t fields, for start, end, probability, and the text of the word. In the future we may also support hierarchical outputs in which case w could be present.
  • single: Recognize the input as a single utterance, and write a JSON object in the same format described above.

  • soxflags: Return arguments to sox which will create the appropriate input format. Note that because the sox command-line is slightly quirky these must always come after the filename or -d (which tells sox to read from the microphone). You can run live recognition like this:

    sox -d $(pocketsphinx soxflags) | pocketsphinx
    

    or decode from a file named "audio.mp3" like this:

    sox audio.mp3 $(pocketsphinx soxflags) | pocketsphinx
    

By default only errors are printed to standard error, but if you want more information you can pass -loglevel INFO. Partial results are not printed, maybe they will be in the future, but don't hold your breath. Force-alignment is likely to be supported soon, however.

Programming

For programming, see the examples directory for a number of examples of using the library from C and Python. You can also read the documentation for the Python API or the C API

Authors

PocketSphinx is ultimately based on Sphinx-II which in turn was based on some older systems at Carnegie Mellon University, which were released as free software under a BSD-like license thanks to the efforts of Kevin Lenzo. Much of the decoder in particular was written by Ravishankar Mosur (look for "rkm" in the comments), but various other people contributed as well, see the AUTHORS file for more details.

David Huggins-Daines (the author of this document) is guilty^H^H^H^H^Hresponsible for creating PocketSphinx which added various speed and memory optimizations, fixed-point computation, JSGF support, portability to various platforms, and a somewhat coherent API. He then disappeared for a while.

Nickolay Shmyrev took over maintenance for quite a long time afterwards, and a lot of code was contributed by Alexander Solovets, Vyacheslav Klimkov, and others.

Currently this is maintained by David Huggins-Daines again.

pocketsphinx's People

Contributors

dhdaines avatar mbait avatar nshmyrev avatar chussong avatar lenzo-ka avatar nxdefiant avatar jschueller avatar coeur avatar alexanderkoller avatar cshung avatar crozzers avatar guidovranken avatar helmutg avatar mattlarose avatar flyn-org avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.