awni / transducer Goto Github PK

A Fast Sequence Transducer Implementation with PyTorch Bindings

License: Apache License 2.0

Python 22.38% C++ 30.74% CMake 20.77% C 0.93% Cuda 25.19%

transducer's Introduction

transducer

A fast RNN-Transducer implementation on the CPU and GPU (CUDA) with python bindings and a PyTorch extension. The RNN-T loss function was published in Sequence Transduction with Recurrent Neural Networks.

The code has been tested with Python 3.9 and PyTorch 1.9.

Install and Test

To install from the top level of the repo run:

python setup.py install

To use the PyTorch extension, install PyTorch and test with:

python torch_test.py

Usage

The easiest way to use the transducer loss is with the PyTorch bindings:

criterion = transducer.TransducerLoss()
loss = criterion(emissions, predictions, labels, input_lengths, label_lengths)

The loss will run on the same device as the input tensors. For more information, see the criterion documentation.

To get the "teacher forced" best path:

predicted_labels = criterion.viterbi(emissions, predictions, input_lengths, label_lengths)

Memory Use and Benchmarks

The transducer is designed to be much lighter in memory use. Most implementations use memory which scales with the product B * T * U * V (where B is the batch size, T is the maximum input length in the batch, U is the maximum output length in the batch, and V is the token set size). The memory of this implementation scales with the product B * T * U and does not increase with the token set size. This is particularly important for the large token set sizes commonly used with word pieces. (NB In this implementation you cannot use a "joiner" network to connect the outputs of the transcription and prediction models. The algorithm hardcodes the fact that these are additively combined.)

Performance benchmarks for the CUDA version running on an A100 GPU are below. We compare to the Torch Audio RNN-T loss which was also run on the same A100 GPU. An entry of "OOM" means the implementation ran out of memory (in this case 20GB).

Times are reported in milliseconds.

T=2000, U=100, B=8

V	Transducer	Torch Audio
100	8.18	139.26
1000	13.64	OOM
2000	18.83	OOM
10000	59.18	OOM

T=2000, U=100, B=32

V	Transducer	Torch Audio
100	20.58	555.00
1000	38.42	OOM
2000	58.19	OOM
10000	223.33	OOM

transducer's People

Contributors

Stargazers

Watchers

transducer's Issues

pytorch 1.1 support

getting this error when using torch 1.1:

from .._ext import transducer
File "/home/tanish/ds3/libs/transducer/_ext/transducer/init.py", line 2, in
from torch.utils.ffi import _wrap_function
File "/home/tanish/environments/ds3_torch_1.1/lib/python3.6/site-packages/torch/utils/ffi/init.py", line 1, in
raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

Please add pytorch 1.1 support to the code.
Any help on this ??

import error

I successfully compiled the source file, but there was a problem when using it.

Transducer update not backward compatible with speech repo running pytorch 0.4.1

Hello Awni,

I wanted to let you know that I think your recent update to the transducer repo is not backward compatible with pytorch 0.4.1 when using your speech repo: https://github.com/awni/speech. I used pytorch 0.4.1 instead of pytorch 1.X when running your speech repo because I encountered an import error with ffi "ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead." when I tried using a more recent version of pytorch. As outlined here: pytorch/pytorch#15645, the recommendation was to use an earlier version of pytorch, so I used 0.4.1.

As a smaller issue, the Makefile in your speech repo calls a build.py function in libs/transducer, which is no longer present in the transducer repo. This is a smaller issue, but when I ran the "python setup.py install" command, I got the output below.

Here are the details of my setup:
OS: ubuntu-1604-xenial-v20200108
Python: Python 3.6.5 :: Anaconda, Inc.
Pytorch: 0.4.1
Cuda: 10.0

(awni_env36) dzubke@phoneme-1:~/awni_speech/speech/libs/transducer$ python setup.py install
running install
running bdist_egg
running egg_info
writing transducer_cpp.egg-info/PKG-INFO
writing dependency_links to transducer_cpp.egg-info/dependency_links.txt
writing top-level names to transducer_cpp.egg-info/top_level.txt
reading manifest file 'transducer_cpp.egg-info/SOURCES.txt'
writing manifest file 'transducer_cpp.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'transducer_cpp' extension
gcc -pthread -B /home/dzubke/miniconda3/envs/awni_env36/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/dzubke/miniconda3/envs/awni_env36/lib/python3.6/site-packages/torch/lib/include -I/home/dzubke/miniconda3/envs/awni_env36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/dzubke/miniconda3/envs/awni_env36/lib/python3.6/site-packages/torch/lib/include/THC -I/home/dzubke/miniconda3/envs/awni_env36/include/python3.6m -c transducer.cpp -o build/temp.linux-x86_64-3.6/transducer.o -fopenmp -DTORCH_EXTENSION_NAME=transducer_cpp -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
transducer.cpp:6:29: fatal error: torch/extension.h: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1

This isn't a critical issue for me, as I am using a CTC model for phoneme recognition, so I don't need the transducer module. To get around it, I commented out the line: "from speech.models.transducer_model import Transducer" in speech/speech/models/init.py, and that seems to work for me. For reference on my setup, the warp-ctc module did build properly.

I just wanted to bring this to your attention. I am likely going to modify your speech repo to run on a more recent version of pytorch as I think that will make it easier convert my ctc model to coreML using onnx, so I will let you know if this issue is resolved when using a more recent version of pytorch.

Thanks for publishing your code! It has been very helpful to me in my project. Let me know if you wnat more details on my setup or this issue.

Questions about the reasoning in forward-backward algorithm

I'm looking for the explanation of 3 following cases:

Why forward and backward passes should return the same (or very similar) likelihoods?
the forward likelihood is a sum of last forward variable (and calculation step) and the probability of emitting blank token in the last step
alphas[T-1, U-1] + log_probs[T-1, U-1, blank]
however backward likelihood is only the first backward variable value (last calculation step).
betas[0, 0]
Why they're different?
In line 48 in https://github.com/awni/transducer/blob/master/ref_transduce.py
alphas[t, 0] = alphas[t-1, 0] + log_probs[t-1, 0, blank]
why shouldn't we use
log_probs[t-1, 0, labels[-1]]

instead? Isn't it assuming that we expect the last frame to emit a blank token, which is not true?

Could anyone here help me to understand these problems?

Does not work for Pytorch 1.1

When i run python build.py , i got this error.

Traceback (most recent call last):
  File "build.py", line 4, in <module>
    from torch.utils.ffi import create_extension
  File "/home/vespar/miniconda3/envs/ariyan/lib/python3.6/site-packages/torch/utils/ffi/__init__.py", line 1, in <module>
    raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

I am using pytorch 1.1 and the script is support only for pytorch 0.4.1
I changes the code with pytorch new library. Code is below,

import os
import sys
import torch
#from torch.utils.ffi import create_extension
from torch.utils.cpp_extension import BuildExtension, CppExtension
from setuptools import setup
import setuptools

this_file = os.path.abspath(__file__)

sources = ['src/transducer.c']
headers = ['src/transducer.h']

args = ["-std=c99"]
if sys.platform == "darwin":
    args += ["-DAPPLE"]
else:
    args += ["-fopenmp"]

#ffi = create_extension(
#    '_ext.transducer',
#    headers=headers,
#    sources=sources,
#    relative_to=__file__,
#    extra_compile_args=args
#)


setup(
	name='_ext.transducer',
	ext_modules=[
		CppExtension(
			name='_ext.transducer',
			sources=['src/transducer.h','src/transducer.c'],
			extra_compile_args=args),
	],
	cmdclass={
		'build_ext': BuildExtension
	})

after doing this, I got this error,

usage: build.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: build.py --help [cmd1 cmd2 ...]
   or: build.py --help-commands
   or: build.py cmd --help

error: no commands supplied

Import Error, _transducer can not be imported.

follow the readme, I got the error about importing error, suggest that there may be a circle import.

finally I find a soulution
I have to manually copy the _transducer.cpython-xxxx.so to run the torch_test, I'm wondering is there a more wisable solution
,

Question about the gradient computation

Thanks, for the great work.

transducer/ref_transduce.py

Line 92 in 5a1c2c7

grads = grads + log_probs - log_like

l couldn't find the log_probs term anywhere in the paper, specifically equation(20). Could you point me to the equation in the paper, that refers to it.

always get nothing trying to use viterbi decode interface

Hi, awni!
thanks for your gred repo, I have a problem in How to use the decode interface :
I have tried to use code like following:
`
B, T, *_ = scores.size()

   logit_lengths = torch.full((B, ), T, dtype=torch.int, device=scores.device)

   y = torch.full([B, 1], 0, dtype=torch.int32, device=scores.device)

    cur_len = 0

    for i in range(T):
        old_y = y
        preds, _ = self.pred_net(old_y)
        label_lengths = torch.full((B, ), cur_len, dtype=torch.int, device=scores.device)
        y = self.criterion.viterbi(scores, preds,logit_lengths, label_lengths)
        b, new_len = y.shape
        if new_len < 1:
            break
         print("shape of y is: ", y.shape)
        cur_len = new_len

but I always got break at the first step

setup error: ‘isfinite’ was not declared in this scope

For other's reference, I encountered an "‘isfinite’ was not declared in this scope" error when running the setup.py script. I was able to resolve it based on the fix outlined here: erincatto/box2d#509. I added the string "std::" in front of isfinite, so "isinfinite" became "std::isinfinite" in the two occurrences in transducer.cpp.

For reference, here is my setup information:
Ubuntu 16.04
Python 3.6.5 :: Anaconda, Inc.
Pytorch 1.3.1
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609

The error output is below:
(transducer) dzubke@phoneme-1:~/awni_speech/transducer$ python setup.py install
running install
running bdist_egg
running egg_info
creating transducer_cpp.egg-info
writing transducer_cpp.egg-info/PKG-INFO
writing dependency_links to transducer_cpp.egg-info/dependency_links.txt
writing top-level names to transducer_cpp.egg-info/top_level.txt
writing manifest file 'transducer_cpp.egg-info/SOURCES.txt'
reading manifest file 'transducer_cpp.egg-info/SOURCES.txt'
writing manifest file 'transducer_cpp.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'transducer_cpp' extension
creating build
creating build/temp.linux-x86_64-3.7
gcc -pthread -B /home/dzubke/miniconda3/envs/transducer/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include -I/home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/TH -I/home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/THC -I/home/dzubke/miniconda3/envs/transducer/include/python3.7m -c transducer.cpp -o build/temp.linux-x86_64-3.7/transducer.o -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=transducer_cpp -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
transducer.cpp: In function ‘float log_sum_exp(float, float)’:
transducer.cpp:9:20: error: ‘isfinite’ was not declared in this scope
if (!isfinite(a)) return b;
^
transducer.cpp:9:20: note: suggested alternative:
In file included from /usr/include/c++/5/random:38:0,
from /usr/include/c++/5/bits/stl_algo.h:66,
from /usr/include/c++/5/algorithm:62,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/c10/util/SmallVector.h:26,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/c10/util/ArrayRef.h:18,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/c10/core/MemoryFormat.h:5,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:5,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:11,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
from transducer.cpp:6:
/usr/include/c++/5/cmath:601:5: note: ‘std::isfinite’
isfinite(_Tp __x)
^
transducer.cpp:10:20: error: ‘isfinite’ was not declared in this scope
if (!isfinite(b)) return a;
^
transducer.cpp:10:20: note: suggested alternative:
In file included from /usr/include/c++/5/random:38:0,
from /usr/include/c++/5/bits/stl_algo.h:66,
from /usr/include/c++/5/algorithm:62,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/c10/util/SmallVector.h:26,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/c10/util/ArrayRef.h:18,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/c10/core/MemoryFormat.h:5,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:5,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:11,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
from /home/dzubke/miniconda3/envs/transducer/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
from transducer.cpp:6:
/usr/include/c++/5/cmath:601:5: note: ‘std::isfinite’
isfinite(_Tp __x)
^
error: command 'gcc' failed with exit status 1

Possible numerical error in log-norm computation

In current implementation, emissions and the predictions subtract their own maximum values respectively. But consider this case

emission[0, 0] = [0, -1000]
prediction[0, 0] = [-1000, 0]
->
# current impl
logNorm[0, 0, 0] = log(exp(emission[0, 0]-maxEs) @ exp(prediction[0, 0]-maxPs)) + maxEs + maxPs
                             = log(exp([0, -1000]) @ exp([-1000, 0]))
                             = log([1, exp(-1000)] @ [exp(-1000), 1])  <-- exp(-1000) would give 0 in FP32 precision
                             = log(0)
                             = -inf

# correct result
logNorm[0, 0, 0] = log(2) - 1000

I also tried convert emission and prediction into FP64 before calculating the logNorm, but it still didn't work in my asr experiment.

The broadcast-sum way is more numerical stable, but would consume O(B*T*U*V) memory.

logNorm = torch.log_softmax(emission.unsqueeze(2) + prediction.unsqueeze(1), dim=-1)

transducer/transducer/torch_binding.py

Lines 162 to 167 in e90c6f4

    
           maxEs = emissions.max(dim=2, keepdim=True)[0] 
        
           maxPs = predictions.max(dim=2, keepdim=True)[0] 
        
           log_norms = torch.log(torch.bmm( 
        
               torch.exp(emissions - maxEs), 
        
               torch.exp((predictions - maxPs)).transpose(1, 2))) 
        
           log_norms = log_norms + maxEs + maxPs.transpose(1, 2)

RNN Transducer loss backward computation accuracy

Thanks for your great work, but there is still about 1e-3 error between numeric and backward gradient,
is there any trick to be more accurate ?

Just questions about the code.

Is there any explanation why you didn't implement this part in C++, and instead did it in Python using torch.autograph.Function?

@staticmethod
def backward(ctx, deltas):
......
lngrads = lngrads * expLNs.reciprocal()
egrads += expEs * torch.bmm(lngrads, expPs)
pgrads += expPs * torch.bmm(lngrads.transpose(1, 2), expEs)
egrads = deltas[:, None, None] * egrads
pgrads = deltas[:, None, None] * pgrads

Support complex joiner networks

I find that the current implementation supports only joiner networks containing an adder:

transducer/torch_test.py

Lines 187 to 188 in b517f1f

    
           logits = emissions.unsqueeze(2) + predictions.unsqueeze(1) 
        
           loss_torch = torchaudio.functional.rnnt_loss(

In the paper Speech Recognition with Deep Recurrent Neural Networks, which is a successor of Sequence Transduction with Recurrent Neural Networks, it mentions another kind of joiner network:

In the original formulation Pr(k|t, u) was defined by taking an ‘acoustic’ distribution Pr(k|t) from the CTC network,
a ‘linguistic’ distribution Pr(k|u) from the prediction network, then multiplying the two together and renormalising.
An improvement introduced in this paper is to instead feed
the hidden activations of both networks into a separate feedforward output network, whose outputs are then normalised
with a softmax function to yield Pr(k|t, u). This allows a
richer set of possibilities for combining linguistic and acoustic information, and appears to lead to better generalisation.
In particular we have found that the number of deletion errors
encountered during decoding is reduced.

Wondering whether it will support feedfoward joiner networks that contain nn.Linear() and nn.Tanh() layers.

Also, the README.md says:

The memory of this implementation scales with the product B * T * U and does not increase with the token set size

But the memory occupied by the encoder and decoder is proportional to the vocabulary size. The memory consumed by a vocab size of 10k is certainly more than that of a size of 1k, I believe.

An error occurred while compiling

Thank you for the transducer, I got some errors (show blow).

Is this a known issue? How can it be debugged and solved?

Thank you!

Cuda version

Generating done (0.0s)
-- Build files have been written to: ./transducer/build/temp.linux-x86_64-cpython-39
[ 6%] Building CXX object CMakeFiles/transducer.dir/transducer.cpp.o
[ 13%] Creating directories for 'pybind11'
[ 26%] Building CXX object CMakeFiles/transducer.dir/transducer_cpu.cpp.o
[ 26%] Building CUDA object CMakeFiles/transducer.dir/transducer_cuda.cu.o
[ 33%] Performing download step (git clone) for 'pybind11'
Cloning into 'pybind11'...
.transducer/transducer/transducer_cuda.cu(9): error: A constant variable cannot be marked constexpr

./transducer/transducer/transducer_cuda.cu(10): error: A constant variable cannot be marked constexpr

./transducer/transducer_cuda.cu(364): error: identifier "cudaMallocAsync" is undefined

./transducer/transducer_cuda.cu(386): error: identifier "cudaFreeAsync" is undefined

4 errors detected in the compilation of "/tmp/tmpxft_0004a2b4_00000000-6_transducer_cuda.cpp1.ii".

Hi,

I get this kind of error, What version of cuda do you use ?

	maxEs = emissions.max(dim=2, keepdim=True)[0]
	maxPs = predictions.max(dim=2, keepdim=True)[0]
	log_norms = torch.log(torch.bmm(
	torch.exp(emissions - maxEs),
	torch.exp((predictions - maxPs)).transpose(1, 2)))
	log_norms = log_norms + maxEs + maxPs.transpose(1, 2)

	logits = emissions.unsqueeze(2) + predictions.unsqueeze(1)
	loss_torch = torchaudio.functional.rnnt_loss(