1ytic / warp-rnnt Goto Github PK

CUDA-Warp RNN-Transducer

License: MIT License

Cuda 35.66% C++ 16.87% Python 43.43% C 2.89% CMake 1.15%

rnn-transducer forward-backward cuda warp pytorch tensorflow

warp-rnnt's Introduction

CUDA-Warp RNN-Transducer

A GPU implementation of RNN Transducer (Graves 2012, 2013). This code is ported from the reference implementation (by Awni Hannun) and fully utilizes the CUDA warp mechanism.

The main bottleneck in the loss is a forward/backward pass, which based on the dynamic programming algorithm. In particular, there is a nested loop to populate a lattice with shape (T, U), and each value in this lattice depend on the two previous cells from each dimension (e.g. forward pass).

CUDA executes threads in groups of 32 parallel threads called warps. Full efficiency is realized when all 32 threads of a warp agree on their execution path. This is exactly what is used to optimize the RNN Transducer. The lattice is split into warps in the T dimension. In each warp, variables between threads exchanged using a fast operations. As soon as the current warp fills the last value, the next two warps (t+32, u) and (t, u+1) are start running. A schematic procedure for the forward pass is shown in the figure below, where T - number of frames, U - number of labels, W - warp size. The similar procedure for the backward pass runs in parallel.

Performance

NVIDIA Profiler shows advantage of the warp implementation over the non-warp implementation.

This warp implementation:

Non-warp implementation warp-transducer:

Unfortunately, in practice this advantage disappears because the memory operations takes much longer. Especially if you synchronize memory on each iteration.

	warp_rnnt (gather=False)	warp_rnnt (gather=True)	warprnnt_pytorch	transducer (CPU)
T=150, U=40, V=28
N=1	0.50 ms	0.54 ms	0.63 ms	1.28 ms
N=16	1.79 ms	1.72 ms	1.85 ms	6.15 ms
N=32	3.09 ms	2.94 ms	2.97 ms	12.72 ms
N=64	5.83 ms	5.54 ms	5.23 ms	23.73 ms
N=128	11.30 ms	10.74 ms	9.99 ms	47.93 ms
T=150, U=20, V=5000
N=1	0.95 ms	0.80 ms	1.74 ms	21.18 ms
N=16	8.74 ms	6.24 ms	16.20 ms	240.11 ms
N=32	17.26 ms	12.35 ms	31.64 ms	490.66 ms
N=64	out-of-memory	out-of-memory	out-of-memory	944.73 ms
N=128	out-of-memory	out-of-memory	out-of-memory	1894.93 ms
T=1500, U=300, V=50
N=1	5.89 ms	4.99 ms	10.02 ms	121.82 ms
N=16	95.46 ms	78.88 ms	76.66 ms	732.50 ms
N=32	out-of-memory	157.86 ms	165.38 ms	1448.54 ms
N=64	out-of-memory	out-of-memory	out-of-memory	2767.59 ms

Benchmarked on a GeForce RTX 2070 Super GPU, Intel i7-10875H CPU @ 2.30GHz.

Note

This implementation assumes that the input is log_softmax.
In addition to alphas/betas arrays, counts array is allocated with shape (N, U * 2), which is used as a scheduling mechanism.
core_gather.cu is a memory-efficient version that expects log_probs with the shape (N, T, U, 2) only for blank and labels values. It shows excellent performance with a large vocabulary.
Do not expect that this implementation will greatly reduce the training time of RNN Transducer model. Probably, the main bottleneck will be a trainable joint network with an output (N, T, U, V).
Also, there is a restricted version, called Recurrent Neural Aligner, with assumption that the length of input sequence is equal to or greater than the length of target sequence.

Install

There are two bindings for the core algorithm:

Reference

Awni Hannun transducer
Mingkun Huang warp-transducer

warp-rnnt's People

Contributors

Stargazers

Watchers

Forkers

entn-at by2101 sprinterzzj yeongtae tuananhvip eyekid peidong-wang songtaoshi zhengkuntian shanguanma hiyoung-asr kibitzing qianlanwyd mayokaze walk-talk ncilfone sanzimu qoboty arnoldhuang03 hulele03 teapoly dophist pkuvanilla1207 adeamoy orion545 lahiruts lbxcfx iceychris maxwellzh sundy1219 sdli1995 namnv1906 liangzhao123 changxiangshi junyoungse0 gaparkfile lauragpt lvhang wxy0505 appdays

warp-rnnt's Issues

question about the rnnt loss arguments

        log_probs (torch.FloatTensor): Input tensor with shape (N, T, U, V)
            where N is the minibatch size, T is the maximum number of
            input frames, U is the maximum number of output labels and V is
            the vocabulary of labels (including the blank).
        labels (torch.IntTensor): Tensor with shape (N, U-1) representing the
            reference labels for all samples in the minibatch.

Hi, I am confused about the labels, why the shape should be U-1,
<eos> should not be included in the labels ?
@1ytic

Not support for pytorch 1.7

Thanks for your codes. The warp-rnnt is not supported in pytorch 1.7.

Could you upgrade the packages?

runtime error with Python 3.8

Recently I found this bindings can not work properly with Python 3.8 (no problems with Python 3.7). When running python -m warp_rnnt.test, the errors reported like:

======================================================================
ERROR: test_calls (__main__.RNNTLossTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/environment/jwu/miniconda3/envs/jwu/lib/python3.8/site-packages/warp_rnnt/test.py", line 204, in test_calls
    costs, grads = core.rnnt_loss(
RuntimeError: rnnt_loss status 1 (rnnt_loss at binding.cpp:110)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7fa92bd8e627 in /home/environment/jwu/miniconda3/envs/jwu/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: rnnt_loss(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int) + 0x214f (0x7fa911a86d3f in /home/environment/jwu/miniconda3/envs/jwu/lib/python3.8/site-packages/warp_rnnt/_C.cpython-38-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x23e02 (0x7fa911a99e02 in /home/environment/jwu/miniconda3/envs/jwu/lib/python3.8/site-packages/warp_rnnt/_C.cpython-38-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x241ee (0x7fa911a9a1ee in /home/environment/jwu/miniconda3/envs/jwu/lib/python3.8/site-packages/warp_rnnt/_C.cpython-38-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x218b6 (0x7fa911a978b6 in /home/environment/jwu/miniconda3/envs/jwu/lib/python3.8/site-packages/warp_rnnt/_C.cpython-38-x86_64-linux-gnu.so)
<omitting python frames>

The pytorch version is 1.4.0 and the CUDA version is 10.0.

Not support Pytorch 1.5

I used warp_rnnt 0.4.0 and tried 0.3.0. Both of them could not work with pytorch 1.5.
The traceback showed this:

_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c1012CUDATensorIdEv

But when I changed pytorch from 1.5 to 1.0. Both of them worked.
Here is my env:
python 3.7.7
pytorch 1.5
gcc 4.8.5

Maybe you can fix it and your work will be appreciated.

THC/THC.h: No such file or directory

It appears since PyTorch 1.11, TH/THC include files have been removed. warp_rnnt wasn't the only one affected. Other repos relying on these include files were as well (see open-mmlab/mmdetection3d#1332).

Python version: 3.9.11
OS: Ubuntu 18.04.6 LTS

pip install torch==1.11.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install warp_rnnt

Output:

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting warp_rnnt
  Downloading warp_rnnt-0.5.0.tar.gz (10 kB)
  Preparing metadata (setup.py) ... done
Collecting pybind11
  Downloading pybind11-2.10.0-py3-none-any.whl (213 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 213.3/213.3 kB 32.3 MB/s eta 0:00:00
Collecting numpy
  Downloading numpy-1.23.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 10.9 MB/s eta 0:00:00
Requirement already satisfied: torch>=1.0.0 in ./.pyenv/versions/3.9.11/envs/rnnt/lib/python3.9/site-packages (from warp                                  _rnnt) (1.12.1+cu113)
Requirement already satisfied: typing-extensions in ./.pyenv/versions/3.9.11/envs/rnnt/lib/python3.9/site-packages (from                                   torch>=1.0.0->warp_rnnt) (4.3.0)
Using legacy 'setup.py install' for warp_rnnt, since package 'wheel' is not installed.
Installing collected packages: pybind11, numpy, warp_rnnt
  Running setup.py install for warp_rnnt ... error
  error: subprocess-exited-with-error

  × Running setup.py install for warp_rnnt did not run successfully.
  │ exit code: 1
  ╰─> [23 lines of output]
      running install
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.9
      creating build/lib.linux-x86_64-3.9/warp_rnnt
      copying warp_rnnt/test.py -> build/lib.linux-x86_64-3.9/warp_rnnt
      copying warp_rnnt/__init__.py -> build/lib.linux-x86_64-3.9/warp_rnnt
      running build_ext
      /home/linuxbrew/.linuxbrew/Cellar/pyenv/2.3.4/libexec/pyenv-exec: /home/craig/bin/ninja: /usr/local/bin/python: ba                                  d interpreter: No such file or directory
      /home/linuxbrew/.linuxbrew/Cellar/pyenv/2.3.4/libexec/pyenv-exec: line 48: /home/craig/bin/ninja: Success
      /home/craig/.pyenv/versions/3.9.11/envs/rnnt/lib/python3.9/site-packages/torch/utils/cpp_extension.py:411: UserWar                                  ning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow                                   distutils backend.
        warnings.warn(msg.format('we could not find ninja.'))
      /home/craig/.pyenv/versions/3.9.11/envs/rnnt/lib/python3.9/site-packages/torch/utils/cpp_extension.py:813: UserWar                                  ning: The detected CUDA version (11.1) has a minor version mismatch with the version that was used to compile PyTorch (1                                  1.3). Most likely this shouldn't be a problem.
        warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
      building 'warp_rnnt._C' extension
      creating build/temp.linux-x86_64-3.9
      gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/craig/.pyenv/versions/3                                  .9.11/envs/rnnt/lib/python3.9/site-packages/torch/include -I/home/craig/.pyenv/versions/3.9.11/envs/rnnt/lib/python3.9/s                                  ite-packages/torch/include/torch/csrc/api/include -I/home/craig/.pyenv/versions/3.9.11/envs/rnnt/lib/python3.9/site-pack                                  ages/torch/include/TH -I/home/craig/.pyenv/versions/3.9.11/envs/rnnt/lib/python3.9/site-packages/torch/include/THC -I/us                                  r/local/cuda/include -I/home/craig/.pyenv/versions/3.9.11/envs/rnnt/include -I/home/craig/.pyenv/versions/3.9.11/include                                  /python3.9 -c binding.cpp -o build/temp.linux-x86_64-3.9/binding.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_T                                  YPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CX                                  X11_ABI=0 -std=c++14
      binding.cpp:4:10: fatal error: THC/THC.h: No such file or directory
       #include <THC/THC.h>
                ^~~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> warp_rnnt

note: This is an issue with the package mentioned above, not pip.

The above error also occurs for PyTorch 1.12, which we're required to use for our software. Will warp_rnnt be updated, or do you recommend another way forward?

ninja: build stopped: subcommand failed.

I am trying to install wrap-rntt package after pulling github repository.
But it fails while giving following error:

nvcc fatal   : Unsupported gpu architecture 'compute_80'
ninja: build stopped: subcommand failed.

Docker environment is
Python 3.7
Pytorch 1.6.0-cuda10.1-cudnn7-devel

Complete log is as follows:

root@6a6261a50477:/workspace/warp-rnnt/pytorch_binding# python setup.py install
running install  
running bdist_egg
running egg_info 
writing warp_rnnt.egg-info/PKG-INFO
writing dependency_links to warp_rnnt.egg-info/dependency_links.txt
writing requirements to warp_rnnt.egg-info/requires.txt
writing top-level names to warp_rnnt.egg-info/top_level.txt
reading manifest file 'warp_rnnt.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'warp_rnnt.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py 
running build_ext
building 'warp_rnnt._C' extension
/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py:125: UserWarning:
A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Emitting ninja build file /workspace/warp-rnnt/pytorch_binding/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csr
c/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/i
nclude -I/opt/conda/include/python3.7m -c -c /workspace/warp-rnnt/pytorch_binding/core.cu -o /workspace/warp-rnnt/pytorch_binding/build/temp.linux-x86
_64-3.7/core.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '
'"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=sm_80 -std=c++14
FAILED: /workspace/warp-rnnt/pytorch_binding/build/temp.linux-x86_64-3.7/core.
/usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /workspace/warp-rnnt/pytorch_binding/core.cu -o /workspace/warp-rnnt/pytorch_binding/build/temp.linux-x86_64-3.7/core.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=sm_80 -std=c++14
nvcc fatal   : Unsupported gpu architecture 'compute_80'
[2/2] /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /workspace/warp-rnnt/pytorch_binding/core_gather.cu -o /workspace/warp-rnnt/pytorch_binding/build/temp.linux-x86_64-3.7/core_gather.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=sm_80 -std=c++14
FAILED: /workspace/warp-rnnt/pytorch_binding/build/temp.linux-x86_64-3.7/core_gather.o
/usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /workspace/warp-rnnt/pytorch_binding/core_gather.cu -o /workspace/warp-rnnt/pytorch_binding/build/temp.linux-x86_64-3.7/core_gather.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=sm_80 -std=c++14
nvcc fatal   : Unsupported gpu architecture 'compute_80'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1515, in _run_ninja_build
    env=env)
  File "/opt/conda/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "setup.py", line 64, in <module>
    "Topic :: Software Development :: Libraries :: Python Modules",
  File "/opt/conda/lib/python3.7/site-packages/setuptools/__init__.py", line 144, in setup
    return distutils.core.setup(**attrs)
  File "/opt/conda/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/install.py", line 67, in run
    self.do_egg_install()
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/install.py", line 109, in do_egg_install
    self.run_command('bdist_egg')
  File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 173, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 159, in call_command
    self.run_command(cmdname)
  File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build() 
  File "/opt/conda/lib/python3.7/distutils/command/install_lib.py", line 107, in build
    self.run_command('build_ext')
  File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 87, in run
    _build_ext.run(self)
  File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 649, in build_extensions
    build_ext.build_extensions(self)
  File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 208, in build_extension
    _build_ext.build_extension(self, ext)
  File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
    depends=ext.depends)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 478, in unix_wrap_ninja_compile
    with_cuda=with_cuda)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1233, in _write_ninja_file_and_compile_objects
    error_prefix='Error compiling objects for extension')
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1529, in _run_ninja_build
    raise RuntimeError(message)
RuntimeError: Error compiling objects for extension

Benchmarking methodology used is not quite correct

Hi @1ytic ,

Thank you for this work on optimizing warp rnn-t operation which is becoming increasingly useful for many speech recognition acoustic models. We have studied your implementation and have the following observations:

When we run the benchmark script in your repo, the run time we got is as below, where new refers to this new repo, and the baseline is what we have now in RNN-T reference model. We used B=32, T=U=200, V= 29, which is a typical case in our dataset. From the output of the benchmark script, it does appear that the new loss function runs faster:

new: 1.76 ms
baseline: 6.10 ms

However, in the benchmark script that the author provided, the run time was measured as:

t = timer()
costs = loss(xs, ys, xn, yn)
elapsed_time += timer() - t

This way of measuring has a problem that CPU could run ahead of GPU and stop the timer even before the kernel is completed. After adding synchronization as below,

torch.cuda.synchronize() # sync before start the timer
t = timer()
costs = loss(xs, ys, xn, yn)
torch.cuda.synchronize() # sync before stop the timer
elapsed_time += timer() - t

the run time we get is:

new: 15.82 ms
baseline: 6.12 ms

This is similar to the run time we got from GPU profiler nsys:

new: 14.38 ms
baseline: 4.75 ms

In summary - It does not look like that the alternative loss function is running faster than what we have. The claimed speedup in the repo is likely caused by a flawed benchmark methodology.

Can you share your thought process on this?

Thanks,
Ashish

question about the gather arguments

core_gather.cu is a memory-efficient version that expects log_probs with the shape (N, T, U, 2) only for blank and labels values. It shows excellent performance with a large vocabulary.

Hello, what dose the gather mean? Or when should I set gather to True? Does the log_probs shape (N, T, U, 2) means only have two classes, blank and whatever labels?

rnnt_loss status 1

I got the error which is shown as follows. python=3.8 torch verison=1.10.2 cudatoolkit=10.2.89 CUDA version=10.2. The GCC version is 5.4.0

Exception: CPU version is not implemented

ERROR: Command errored out with exit status 1:
cwd: /tmp/yangweiming/pip-install-bih20nfz/warp-rnnt_c2a512c1bf8a43c29d5d5a5f050bd56f/
Complete output (6 lines):
Traceback (most recent call last):
File "", line 1, in
File "/tmp/yangweiming/pip-install-bih20nfz/warp-rnnt_c2a512c1bf8a43c29d5d5a5f050bd56f/setup.py", line 48, in
raise Exception("CPU version is not implemented")
Exception: CPU version is not implemented
No CUDA runtime is found, using CUDA_HOME='/home/yangweiming/cuda-10.1'

Exception: CPU version is not implemented

 pip install warp_rnnt
Collecting warp_rnnt
Using cached warp_rnnt-0.6.0.tar.gz (10 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/private/var/folders/pn/y9dpgwpx3lddd2whnwsc256m0000gn/T/pip-install-n8iz7px8/warp-rnnt_40ff3db4b30a49339af3db950e96e47e/setup.py", line 22, in
raise Exception("CPU version is not implemented")
Exception: CPU version is not implemented
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I can't solve it Because I'm elementary level developer ;(
plz help me I wan't solve this error!

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed。

I search the error in the title,it happens when there are several losses.But in my code,there is only the RNN-t-loss,but it gives the error.
The full sentence is ''RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.".
I tried the "retain_graph=True" parameter in loss.backward(),it failed.
So do you have some ideas about the error?

Not compatible with latest pytorch like 1.5.0

Transducer loss leads to memory leak

Hi, I'm using rnnt-loss and pytorch-lightning to train my model. But I found the 4D tensor which is used to calculate transducer will be accumulated in GPU, when I check the GPU memory in training step(before a batch starts), there are many 4D tensor(come from the previous batches) in the GPU memory. That will lead to CUDA out of memory finally. I don't know what went wrong.

gpu_tracker is used to check the GPU memory.

The loss in training step is from this.

This is the result of GPU memory usage in training step.
I try to use 'del', 'gc.collect()' and 'torch.cuda.empty_cache()' in everywhere, but they are all useless.

PyTorch 1.9 Support

Hi,

I have the following library / image versions:

PyTorch 1.9.0 with CUDA 11.1
Based off this image pytorch/pytorch:1.9.0-cuda11.1-cudnn8-devel

When I try pip install warp_rnnt I have the following output:

Error message

root@a19b333cfe3b:/home/keras/notebook# pip install warp_rnnt Collecting warp_rnnt Using cached warp_rnnt-0.5.0.tar.gz (10 kB) Requirement already satisfied: pybind11 in /opt/conda/lib/python3.7/site-packages (from warp_rnnt) (2.9.0) Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from warp_rnnt) (1.19.5) Requirement already satisfied: torch>=1.0.0 in /opt/conda/lib/python3.7/site-packages (from warp_rnnt) (1.9.0) Requirement already satisfied: typing_extensions in /opt/conda/lib/python3.7/site-packages (from torch>=1.0.0->warp_rnnt) (3.7.4.3) Building wheels for collected packages: warp-rnnt Building wheel for warp-rnnt (setup.py) ... error ERROR: Command errored out with exit status 1: command: /opt/conda/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/setup.py'"'"'; __file__='"'"'/tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c8 50ea4a028137251/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compil e(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-73ng5boj cwd: /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/ Complete output (106 lines): running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.7 creating build/lib.linux-x86_64-3.7/warp_rnnt copying warp_rnnt/__init__.py -> build/lib.linux-x86_64-3.7/warp_rnnt copying warp_rnnt/test.py -> build/lib.linux-x86_64-3.7/warp_rnnt running build_ext building 'warp_rnnt._C' extension creating /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7 Emitting ninja build file /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] c++ -MMD -MF /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/binding.o.d -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fP IC -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/binding.cpp -o /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/binding. o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 FAILED: /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/binding.o c++ -MMD -MF /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/binding.o.d -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/ opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/THC -I/usr /local/cuda/include -I/opt/conda/include/python3.7m -c -c /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/binding.cpp -o /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/binding.o -DTO RCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ In file included from /usr/include/c++/7/ext/string_conversions.h:41:0, from /usr/include/c++/7/bits/basic_string.h:6361, from /usr/include/c++/7/string:52, from /usr/include/c++/7/stdexcept:39, from /usr/include/c++/7/array:39, from /usr/include/c++/7/tuple:39, from /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/binding.cpp:1: /usr/include/c++/7/cstdlib:75:15: fatal error: stdlib.h: No such file or directory #include_next ^~~~~~~~~~ compilation terminated. compilation terminated. [2/3] /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/core.cu -o /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/tem p.linux-x86_64-3.7/core.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIN D11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,co de=sm_86 -std=c++14 FAILED: /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/core.o /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-pa ckages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/core.cu -o /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linu x-x86_64-3.7/core.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_CO MPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_ 86 -std=c++14 In file included from /usr/local/cuda/include/crt/math_functions.h:8958:0, from /usr/local/cuda/include/crt/common_functions.h:295, from /usr/local/cuda/include/cuda_runtime.h:115, from :0: /usr/include/c++/7/cmath:45:15: fatal error: math.h: No such file or directory #include_next ^~~~~~~~ compilation terminated. [3/3] /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/s ite-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/core_gather.cu -o /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/bu ild/temp.linux-x86_64-3.7/core_gather.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENS ION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_86,code=compute_86 -gencode=arch =compute_86,code=sm_86 -std=c++14 FAILED: /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/core_gather.o /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-pa ckages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/core_gather.cu -o /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/te mp.linux-x86_64-3.7/core_gather.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compu te_86,code=sm_86 -std=c++14 In file included from /usr/local/cuda/include/crt/math_functions.h:8958:0, from /usr/local/cuda/include/crt/common_functions.h:295, from /usr/local/cuda/include/cuda_runtime.h:115, from :0: /usr/include/c++/7/cmath:45:15: fatal error: math.h: No such file or directory #include_next ^~~~~~~~ compilation terminated. ninja: build stopped: subcommand failed.

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1672, in _run_ninja_build
env=env)
File "/opt/conda/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/setup.py", line 64, in
"Topic :: Software Development :: Libraries :: Python Modules",
File "/opt/conda/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/opt/conda/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/opt/conda/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.7/site-packages/wheel/bdist_wheel.py", line 290, in run
self.run_command('build')
File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 709, in build_extensions
build_ext.build_extensions(self)
File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
_build_ext.build_extension(self, ext)
File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
depends=ext.depends)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 539, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1360, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1682, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

ERROR: Failed building wheel for warp-rnnt
Running setup.py clean for warp-rnnt
Failed to build warp-rnnt
Installing collected packages: warp-rnnt
Running setup.py install for warp-rnnt ... error
ERROR: Command errored out with exit status 1:
command: /opt/conda/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/setup.py'"'"'; file='"'"'/tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427
c850ea4a028137251/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(comp
ile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-vvd1smyc/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.7m/warp-rnnt
cwd: /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/
Complete output (108 lines):
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/warp_rnnt
copying warp_rnnt/init.py -> build/lib.linux-x86_64-3.7/warp_rnnt
copying warp_rnnt/test.py -> build/lib.linux-x86_64-3.7/warp_rnnt
running build_ext
building 'warp_rnnt.C' extension
creating /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7
Emitting ninja build file /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/binding.o.d -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -
fPIC -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/TH
C -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/binding.cpp -o /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/bindin
g.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
FAILED: /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/binding.o
c++ -MMD -MF /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/binding.o.d -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -
I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/THC -I/u
sr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/binding.cpp -o /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/binding.o -D
TORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /usr/include/c++/7/ext/string_conversions.h:41:0,
from /usr/include/c++/7/bits/basic_string.h:6361,
from /usr/include/c++/7/string:52,
from /usr/include/c++/7/stdexcept:39,
from /usr/include/c++/7/array:39,
from /usr/include/c++/7/tuple:39,
from /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/binding.cpp:1:
/usr/include/c++/7/cstdlib:75:15: fatal error: stdlib.h: No such file or directory
#include_next <stdlib.h>
^~~~~~~~~~
compilation terminated.
[2/3] /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7
/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/core_gather.cu -o /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/
build/temp.linux-x86_64-3.7/core_gather.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTE
NSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_86,code=compute_86 -gencode=ar
ch=compute_86,code=sm_86 -std=c++14
FAILED: /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/core_gather.o
/usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-
packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/core_gather.cu -o /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/
temp.linux-x86_64-3.7/core_gather.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS_ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_
H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_86,code=compute_86 -gencode=arch=com
pute_86,code=sm_86 -std=c++14
In file included from /usr/local/cuda/include/crt/math_functions.h:8958:0,
from /usr/local/cuda/include/crt/common_functions.h:295,
from /usr/local/cuda/include/cuda_runtime.h:115,
from :0:
/usr/include/c++/7/cmath:45:15: fatal error: math.h: No such file or directory
#include_next <math.h>
^~~~~~~~
compilation terminated.
[3/3] /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7
/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/core.cu -o /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/t
emp.linux-x86_64-3.7/core.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYB
IND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,
code=sm_86 -std=c++14
FAILED: /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.linux-x86_64-3.7/core.o
/usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-
packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/core.cu -o /tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/build/temp.li
nux-x86_64-3.7/core.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_
COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=s
m_86 -std=c++14
In file included from /usr/local/cuda/include/crt/math_functions.h:8958:0,
from /usr/local/cuda/include/crt/common_functions.h:295,
from /usr/local/cuda/include/cuda_runtime.h:115,
from :0:
/usr/include/c++/7/cmath:45:15: fatal error: math.h: No such file or directory
#include_next <math.h>
^~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1672, in _run_ninja_build
env=env)
File "/opt/conda/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/setup.py", line 64, in <module>
    "Topic :: Software Development :: Libraries :: Python Modules",
  File "/opt/conda/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/opt/conda/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/install.py", line 61, in run
    return orig.install.run(self)
  File "/opt/conda/lib/python3.7/distutils/command/install.py", line 545, in run
    self.run_command('build')
  File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.7/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 709, in build_extensions
    build_ext.build_extensions(self)
  File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
    _build_ext.build_extension(self, ext)
  File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
    depends=ext.depends)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 539, in unix_wrap_ninja_compile
    with_cuda=with_cuda)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1360, in _write_ninja_file_and_compile_objects
    error_prefix='Error compiling objects for extension')
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1682, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
----------------------------------------

ERROR: Command errored out with exit status 1: /opt/conda/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-b1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/setup.py'"'"'; file='"'"'/tmp/pip-install-b
1mhbvnc/warp-rnnt_074f8d6e4c20427c850ea4a028137251/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"',
'"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-vvd1smyc/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.7m/warp-rnnt Check the logs fo
r full command output.

I used to have similar errors with warp-ctc, and they were resolved with hacks like this:

RUN apt-get install gcc-5 g++-5 g++-5-multilib gfortran-5 -y && \
    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5 --slave /usr/bin/gfortran gfortran /usr/bin/gfortran-5 && \
    update-alternatives --query gcc
RUN gcc --version

Is there anything I can do about this without downgrading or changing the current environment?

ImportError: libcudart.so.10.1: cannot open shared object file: No such file or directory

My CUDA version is 10.2, why am I encountering this error?

Strange behavior using PyTorch DDP

@1ytic
Hi,

So far I have been able to use the loss with DDP on a single GPU , it behaves more or less as expected.

But when I use more than 1 device, the following happens:

On GPU-0 loss is calculated properly
On GPU-1 loss is close to zero for each batch

I checked the input tensors, devices, tensor values, etc - so far everything seems to be identical for GPU-0 and other GPUs.

Undefined symbol when loading module

Hi, I'm trying to use warp-rnnt, but I get the error mentioned below.
My environment:

Python: Python 3.7.5
Torch: 1.8.0+cu111
CUDA Version: 11.2

Output of python -m warp_rnnt.test

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.7/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/home/martynas/.../lib/python3.7/site-packages/warp_rnnt/__init__.py", line 2, in <module>
    import warp_rnnt._C as core
ImportError: /home/martynas/.../lib/python3.7/site-packages/warp_rnnt/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIN3c107complexIfEEEEPKNS_6detail12TypeMetaDataEv

Is there anything I can do?

ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory

{: width="50%" height="50%"}

Does the repo not support CUDA 11.x?

My environment:

GPU : GeForce RTX 3090ti
Pytorch 1.9.0 (cuda : 11.1)

improve efficiency of warps

In current implementation, the warps along T axis are computed in fully serialized manner

warp-rnnt/core.cu

Lines 112 to 134 in edd5857

    
           if (t < actual_t && u < actual_u) { 
        
               // Ready to compute alphas[t, u] 
        
               unsigned int l = labels[idx2(n, u-1, U-1)]; 
        
               float bias = log_probs[idx4(n, t-1, u, blank, T, U, V)]; 
        
               float skip = alphas[idx3(n, p, u, T, U)] + bias; 
        
               float emit = alphas[idx3(n, t, u-1, T, U)] + log_probs[idx4(n, t, u-1, l, T, U, V)]; 
        
               float r = log_sum_exp(skip, emit); 
        
               float output = r; 
        
               for(unsigned int i = 1; i < W; i++) { 
        
                   r = __shfl_up_sync(0xffffffff, r, 1); 
        
                   if (i == d) { 
        
                       r = log_sum_exp(r + bias, emit); 
        
                       output = r; 
        
                   } 
        
               } 
        
               alphas[idx3(n, t, u, T, U)] = output; 
        
           }

The for loop of each warp is executed one-by-one, which means the ith warp at specific row u, has to wait for all its leading warps to finish the loops, and that is i (num of warps) * W (for loop overhead, warpsize, 32 here) time complexity.

However, we don't necessarily have to wait for previous warps to finish before we go into the loop in current warp.

Let's take forward computation of alphas as the example with warpsize=4:

Here d denotes the index inside a warp, so 0 <= d < W. B is the result from u-1 row and supposed to be ready.

The forward computation of alpha follows (indeed we do the computation in logarithm, here is just for discussion):

Note that alpha_0 relies on result from the last warp.

Here comes the trick, I rewrote alpha_3 formula to following

The underlined part is warp-independent. The first part (the product of emitting probability e_2 e_1 e_0) can be computed by prefix sum (scan) algorithm in logarithm, and only introduce log2(W) complexity.

Finally, the new procedure is like:

Compute local paths combination prob (the underlined part). O(W) complexity;
Compute product of emitting probs (e2e1e0, ...) with prefix sum algorithm. O(log2(W)) complexity;
Wait for previous warps to finish and compute final results. Constant complexity.

For all warps at row u, 1 & 2 can be done in parallel, ith warp has only to wait all previous warps to finish step 3. The new procedure should be considerably faster than current serialized execution, especially when T is large.

how to apply backward to the output of rnnt_loss

Hi:
I want to use your warp-rnnt as the loss function to train my model. But i met the problem, that I don't know how to do backward. The output of rnnt_loss() is a cost and a grad, both of them are tensors. Can you give an example to show how to do backward. Thanks!
Neng

operating with apex?

I am try to use this implementation with apex half precision training, but it can't.
showing that it need float rather that half:

File "/data/asr_v3/src/model/transformer_transducer/lightning_model.py", line 41, in training_step
joint_out, rnnt_loss = self.forward(feature, feature_length, target, target_length, cal_rnnt_loss=True)
File "/opt/conda/lib/python3.7/site-packages/apex/amp/_initialize.py", line 197, in new_fwd
applier(kwargs, input_caster))
File "/data/asr_v3/src/model/transformer_transducer/lightning_model.py", line 36, in forward
joint_out, rnnt_loss = self.transducer.forward(feature, feature_length, target, target_length, cal_rnnt_loss)
File "/data/asr_v3/src/model/transformer_transducer/transformer_transducer.py", line 79, in forward
rnn_t_loss = self.cal_transducer_loss(joint, ori_token, feature_length, ori_token_length)
File "/data/asr_v3/src/model/transformer_transducer/transformer_transducer.py", line 108, in cal_transducer_loss
log_probs=log_prob, labels=target.int(), frames_lengths=frame_length.int(), labels_lengths=target_length.int(), reduction='mean')
File "/opt/conda/lib/python3.7/site-packages/warp_rnnt/init.py", line 80, in rnnt_loss
costs = RNNTLoss.apply(log_probs, labels, frames_lengths, labels_lengths, blank)
File "/opt/conda/lib/python3.7/site-packages/warp_rnnt/init.py", line 16, in forward
blank=blank,
RuntimeError: xs must be a Float tensor (rnnt_loss at binding.cpp:42)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7fa72c18c687 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: rnnt_loss(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int) + 0xf79 (0x7fa707c87389 in /opt/conda/lib/python3.7/site-packages/warp_rnnt/_C.cpython-37m-x86_64-linux-gnu.so)
frame #2: + 0x22ea7 (0x7fa707c9aea7 in /opt/conda/lib/python3.7/site-packages/warp_rnnt/_C.cpython-37m-x86_64-linux-gnu.so)
frame #3: + 0x232ee (0x7fa707c9b2ee in /opt/conda/lib/python3.7/site-packages/warp_rnnt/_C.cpython-37m-x86_64-linux-gnu.so)
frame #4: + 0x1fd11 (0x7fa707c97d11 in /opt/conda/lib/python3.7/site-packages/warp_rnnt/_C.cpython-37m-x86_64-linux-gnu.so)

frame #10: THPFunction_apply(_object, _object) + 0x8d6 (0x7fa7601b9e96 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #63: __libc_start_main + 0xf0 (0x7fa76fc35830 in /lib/x86_64-linux-gnu/libc.so.6)

can't install warp-rnnt

D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\ATen/core/DeprecatedTypeProperties.h(95): error C2664: 'caffe2::TypeMeta c10::TensorOptions::dtype(void) noexcept const': cannot convert argument 1 from 'caffe2::TypeMeta' to 'c10::optionalcaffe2::TypeMeta'
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\ATen/core/DeprecatedTypeProperties.h(95): note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\ATen/core/DeprecatedTypeProperties.h(96): error C2228: left of '.device' must have class/struct/union
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\ATen/core/DeprecatedTypeProperties.h(97): error C2228: left of '.layout' must have class/struct/union
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\ATen/core/DeprecatedTypeProperties.h(103): error C2039: 'has_value': is not a member of 'c10::optionalc10::Device'
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\c10/core/TensorOptions.h(21): note: see declaration of 'c10::optionalc10::Device'
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\ATen/core/DeprecatedTypeProperties.h(106): error C2039: 'value': is not a member of 'c10::optionalc10::Device'
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\c10/core/TensorOptions.h(21): note: see declaration of 'c10::optionalc10::Device'
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\ATen/core/DeprecatedTypeProperties.h(106): error C2512: 'c10::Device': no appropriate default constructor available
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\c10/core/Device.h(30): note: see declaration of 'c10::Device'
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\ATen/core/TensorBody.h(1319): warning C4522: 'at::Tensor': multiple assignment operators specified
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\ATen/core/TensorBody.h(561): error C2440: 'default argument': cannot convert from 'const c10::nullopt_t' to 'c10::optional'
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\ATen/core/TensorBody.h(561): note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\c10/util/Optional.h(282): warning C4814: 'c10::optionalat::Tensor::contained_val': in C++14 'constexpr' will not imply 'const'; consider explicitly specifying 'const'
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\ATen/core/TensorBody.h(561): note: see reference to class template instantiation 'c10::optionalat::Tensor' being compiled
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\c10/util/Optional.h(283): error C2556: 'at::Tensor &c10::optionalat::Tensor::c
ontained_val(void) const &': overloaded function differs only by return type from 'const at::Tensor &c10::optionalat::Tensor::contained_val(void) const &'
D:\Anaconda3\envs\conformer\lib\site-packages\torch\include\c10/util/Optional.h(277): note: see declaration of 'c10::optionalat::Tensor::contained_val'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> warp_rnnt

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Not support pytorch 1.6

Hi, thanks for this package. But the warp-rnnt can't support pytorch 1.6. Would you kindly update a newer version which support the newest pytorch and a cpu version?

WARNING: sample 0 [42, 26] has a forward/backward mismatch -52.543503 / 0.000000

what does the mismatch mean?

my_env.
python=3.7.0.
torch=1.6.0.
cuda=10.2.

Question about average_frames and reduction parmas

I want to have a stable loss which is rubust to labels_lengths when training.
What value should I pass to this two parmas?

What's more, what is the approximate relationship between loss and actual wer?
For example, if I want a wer aroud 0.5. How much should be the value of the loss?

version assignment breaks local build.

As the environment I am using has to be created at a time where a GPU is not available, and warp-rnnt does not allow installation without a GPU, I decided to compile it locally at runtime.

The problem is that if I build it locally with python setup.py build_ext --inplace,
there is no distribution information, so it crashes at https://github.com/1ytic/warp-rnnt/blob/master/pytorch_binding/warp_rnnt/__init__.py#L6 with:
DistributionNotFound: The 'warp_rnnt' distribution was not found and is required by the application

Of course I can create a local patchfile to remove that line manually to make it work, but maybe there is a more clean way to fix it here in the repo, e.g. by setting a __version__ = "unknown" or something when an exception is thrown.

Issues with non-standard CUDA install

I tried to install espnet (depends on warp-rnnt) on HPC (which CUDA path is in /apps/t3/sles12sp2/cuda/10.0.130, after I exported $CUDA_HOME in the install script), I always run into Exception("CPU version is not implemented")

I found that in warp-rnnt/pytorch_binding/setup.py line 21:
if not torch.cuda.is_available():

I think it should be:

if not ("CUDA_HOME" in os.environ or torch.cuda.is_available()):
    raise xxxx

or something else.

Is it always mean CUDA is not unavailable when torch.cuda.is_available() return False? I'm not an expert of PyTorch so I'm not sure, but I saw https://github.com/pytorch/pytorch/blob/master/torch/utils/cpp_extension.py shows No CUDA runtime is found only if neither cuda_home was found nor torch.cuda.is_available() is True.

I don't think it was a dependency issue but just in case I'm using: Python 3.7.9, PyTorch 1.3.1, espnet 0.9.4 (and other packages was installed automatically by Makefile of espnet)

undefined symbol: _ZNSt19basic_ostringstreamIcSt11char_traitsIcESaIcEEC1Ev

/***/warp-rnnt/pytorch_binding/warp_rnnt/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt19basic_ostringstreamIcSt11char_traitsIcESaIcEEC1Ev

RuntimeError: rnnt_loss status 1

Hello,the error occurs when I run my project.Then I go to run 'python -m warp_rnnt.test',but the same error occurs which is showed as follows. Could you give me some ideas to solve it?The pytorch version is 1.6.0.The CUDA version is 10.1.The gcc version is 7.5.0.

Problem installation

I tried to install warp-rnnt with
torch==1.0.1
python3.7.5
cuda9.0
but get the following error

Failed to build warp-rnnt
Installing collected packages: pybind11, warp-rnnt
    Running setup.py install for warp-rnnt ... error
    ERROR: Command errored out with exit status 1:
     command: /home/rohola/codes/sample/env/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-abwlipt5/warp-rnnt/setup.py'"'"'; __file__='"'"'/tmp/pip-install-abwlipt5/warp-rnnt/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-6i_1sjce/install-record.txt --single-version-externally-managed --compile --install-headers /home/rohola/codes/sample/env/include/site/python3.7/warp-rnnt
         cwd: /tmp/pip-install-abwlipt5/warp-rnnt/
    Complete output (26 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.7
    creating build/lib.linux-x86_64-3.7/warp_rnnt
    copying warp_rnnt/__init__.py -> build/lib.linux-x86_64-3.7/warp_rnnt
    copying warp_rnnt/test.py -> build/lib.linux-x86_64-3.7/warp_rnnt
    running build_ext
    building 'warp_rnnt._C' extension
    creating build/temp.linux-x86_64-3.7
    /usr/bin/nvcc -I/home/rohola/codes/sample/env/lib/python3.7/site-packages/torch/include -I/home/rohola/codes/sample/env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/rohola/codes/sample/env/lib/python3.7/site-packages/torch/include/TH -I/home/rohola/codes/sample/env/lib/python3.7/site-packages/torch/include/THC -I/usr/include/python3.7m -I/home/rohola/codes/sample/env/include/python3.7m -c core.cu -o build/temp.linux-x86_64-3.7/core.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
    /usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(36): error: identifier "__builtin_ia32_monitorx" is undefined
    
    /usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(42): error: identifier "__builtin_ia32_mwaitx" is undefined
    
    core.cu(101): error: identifier "__shfl_up_sync" is undefined
    
    core.cu(126): error: identifier "__shfl_up_sync" is undefined
    
    core.cu(206): error: identifier "__shfl_up_sync" is undefined
    
    core.cu(231): error: identifier "__shfl_up_sync" is undefined
    
    6 errors detected in the compilation of "/tmp/tmpxft_00003738_00000000-7_core.cpp1.ii".
    error: command '/usr/bin/nvcc' failed with exit status 2
    ----------------------------------------
ERROR: Command errored out with exit status 1: /home/rohola/codes/sample/env/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-abwlipt5/warp-rnnt/setup.py'"'"'; __file__='"'"'/tmp/pip-install-abwlipt5/warp-rnnt/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-6i_1sjce/install-record.txt --single-version-externally-managed --compile --install-headers /home/rohola/codes/sample/env/include/site/python3.7/warp-rnnt Check the logs for full command output.

Normalize the RNN-T Loss with input seq length

Hello,

We saw that your implementation doesn't normalize the loss with the input seq length,
Here is an example of the training on TIMIT corpus:

RNNT loss torchaudio:

epoch: 1, lr_adam: 3.00e-04, lr_wav2vec: 1.00e-04 - train loss: 1.40e+02 - valid loss: 1.16e+02, valid PER: 1.00e+02
epoch: 2, lr_adam: 3.00e-04, lr_wav2vec: 1.00e-04 - train loss: 95.39 - valid loss: 64.14, valid PER: 91.21
epoch: 3, lr_adam: 3.00e-04, lr_wav2vec: 1.00e-04 - train loss: 35.57 - valid loss: 17.67, valid PER: 22.56
epoch: 4, lr_adam: 3.00e-04, lr_wav2vec: 1.00e-04 - train loss: 19.28 - valid loss: 12.31, valid PER: 16.15

RNNT loss spbrain:

epoch: 1, lr_adam: 3.00e-04, lr_wav2vec: 1.00e-04 - train loss: 1.06 - valid loss: 7.76e-01, valid PER: 1.00e+02
epoch: 2, lr_adam: 3.00e-04, lr_wav2vec: 1.00e-04 - train loss: 6.28e-01 - valid loss: 2.57e-01, valid PER: 54.77
epoch: 3, lr_adam: 3.00e-04, lr_wav2vec: 1.00e-04 - train loss: 2.16e-01 - valid loss: 1.08e-01, valid PER: 23.30
epoch: 4, lr_adam: 3.00e-04, lr_wav2vec: 1.00e-04 - train loss: 1.27e-01 - valid loss: 8.16e-02, valid PER: 14.56

Can't install

OS: Windows
CUDA_toolkit: 10.1
Python: 3.7
Framework : tensorflow-gpu (1.15.4), pytorch(1.7.0)

Hello. Thank you for your projects

When I try install this module, I got error message.
pip install warp_rnnt

CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I didn't install cuda directly on my environment. My cuda is installed in anaconda virtual environment.

Can't I install this module in this case?

Thank you

warning that forward/backward mismatch

The warning messages occasionally thrown out during training,

...
WARNING: sample 10 [81, 25] has a forward/backward mismatch -0.000083 / -0.000083
...
WARNING: sample 11 [62, 28] has a forward/backward mismatch -0.000188 / -0.000188

The source code makes the judgement of whether abs(a-b)/abs(max(a,b)) > 0.001.
I'm sorry that I have difficulty reading the core_gather.cu.
Could you explain more details about the function kernel_fill_costs() and alphas, betas?

	if (t < actual_t && u < actual_u) {

	// Ready to compute alphas[t, u]

	unsigned int l = labels[idx2(n, u-1, U-1)];

	float bias = log_probs[idx4(n, t-1, u, blank, T, U, V)];
	float skip = alphas[idx3(n, p, u, T, U)] + bias;
	float emit = alphas[idx3(n, t, u-1, T, U)] + log_probs[idx4(n, t, u-1, l, T, U, V)];

	float r = log_sum_exp(skip, emit);
	float output = r;

	for(unsigned int i = 1; i < W; i++) {
	r = __shfl_up_sync(0xffffffff, r, 1);
	if (i == d) {
	r = log_sum_exp(r + bias, emit);
	output = r;
	}
	}

	alphas[idx3(n, t, u, T, U)] = output;
	}

1ytic / warp-rnnt Goto Github PK

warp-rnnt's Introduction

CUDA-Warp RNN-Transducer

Performance

Note

Install

Reference

warp-rnnt's People

Contributors

Stargazers

Watchers

Forkers

warp-rnnt's Issues

Recommend Projects

Recommend Topics

Recommend Org