Giter Site home page Giter Site logo

vck / april-asr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from abb128/april-asr

0.0 0.0 0.0 2.29 MB

Speech-to-text library in C

License: GNU General Public License v3.0

Shell 0.33% C++ 4.89% Python 10.22% C 73.26% Java 4.21% C# 4.46% CMake 2.58% Batchfile 0.06%

april-asr's Introduction

april-asr

aprilasr is a minimal library that provides an API for offline streaming speech-to-text applications

Documentation

Status

This library is currently under development. Some features are unimplemented, it may have bugs and crashes, and there may be significant changes to the API. It may not yet be production-ready.

Furthermore, there's only one model that only does English and has some accuracy issues at that.

Language support

The library has a C API, and there are C# and Python bindings available, but these may not be stable yet.

Example

An example use of this library is provided in example.cpp. It can perform speech recognition on a wave file, or do streaming recognition by reading stdin.

It's built as the target main. After building aprilasr, you can run it like so:

$ ./main /path/to/file.wav /path/to/model.april

For streaming recognition, you can pipe parec into it:

$ parec --format=s16 --rate=16000 --channels=1 --latency-ms=100 | ./main - /path/to/model.april

Models

Currently only one model is available, the English model, based on csukuangfj's trained icefall model as the base, and trained with some extra data.

To make your own models, check out extra/exporting-howto.md

Building on Linux

Building requires ONNXRuntime v1.13.1. You can either try to build it from source or just download the release binaries.

Downloading ONNXRuntime

Run ./download_onnx_linux_x64.sh for linux-x64.

For other platforms the script should be very similar, or visit https://github.com/microsoft/onnxruntime/releases/tag/v1.13.1 and download the right zip/tgz file for your platform and extract the contents to a directory named lib.

You may also define the env variable ONNX_ROOT containing a path to where you extracted the archive, if placing it in lib isn't a choice.

Building ONNXRuntime from source (untested)

You don't need to do this if you've downloaded ONNXRuntime.

Follow the instructions here: https://onnxruntime.ai/docs/how-to/build/inferencing.html#linux

then run

cd build/Linux/RelWithDebInfo/
sudo make install

Building aprilasr

Run:

$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ make -j4

You should now have main, libaprilasr.so and libaprilasr_static.so.

If running main fails because it can't find libonnxruntime.so.1.13.1, you may need to make libonnxruntime.so.1.13.1 accessible like so:

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`pwd`/../lib/lib/

Building on Windows (msvc)

Create a folder called lib in the april-asr folder.

Download onnxruntime-win-x64-1.13.1.zip and extract the insides of the onnxruntime-win-x64-1.13.1 folder to the lib folder

Run cmake to configure and generate Visual Studio project files. Make sure you select x64 as the target if you have downloaded the x64 version of ONNXRuntime.

Open the ALL_BUILD.vcxproj and everything should build. The output will be in the Release or Debug folders.

When running main.exe you may receive an error message like this:

The application was unable to start correctly (0xc000007b)

To fix this, you need to make onnxruntime.dll available. One way to do this is to copy onnxruntime.dll from lib/lib/onnxruntime.dll to build/Debug and build/Release. You may need to distribute the dll together with your application.

Applications

Currently I'm developing Live Captions, a Linux desktop app that provides live captioning.

Acknowledgements

Thanks to the k2-fsa/icefall contributors for creating the speech recognition recipes and models.

This project makes use of a few libraries:

  • pocketfft, authored by Martin Reinecke, Copyright (C) 2008-2018 Max-Planck-Society, licensed under BSD-3-Clause
  • Sonic library, authored by Bill Cox, Copyright (C) 2010 Bill Cox, licensed under Apache 2.0 license
  • tinycthread, authored by Marcus Geelnard and Evan Nemerson, licensed under zlib/libpng license

The bindings are based on the Vosk API bindings, which is another speech recognition library based on previous-generation Kaldi. Vosk is Copyright 2019 Alpha Cephei Inc. and licensed under the Apache 2.0 license.

april-asr's People

Contributors

abb128 avatar cristeigabriel avatar thejackimonster avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.