baidu-research / warp-ctc Goto Github PK

View Code? Open in Web Editor NEW

4.1K 354.0 1.0K 353 KB

Fast parallel CTC.

License: Apache License 2.0

CMake 3.88% Cuda 50.35% C++ 33.26% C 3.88% Lua 1.98% Python 6.66%

warp-ctc's Introduction

In Chinese 中文版

warp-ctc

A fast parallel implementation of CTC, on both CPU and GPU.

Introduction

Connectionist Temporal Classification is a loss function useful for performing supervised learning on sequence data, without needing an alignment between input data and labels. For example, CTC can be used to train end-to-end systems for speech recognition, which is how we have been using it at Baidu's Silicon Valley AI Lab.

The illustration above shows CTC computing the probability of an output sequence "THE CAT ", as a sum over all possible alignments of input sequences that could map to "THE CAT ", taking into account that labels may be duplicated because they may stretch over several time steps of the input data (represented by the spectrogram at the bottom of the image). Computing the sum of all such probabilities explicitly would be prohibitively costly due to the combinatorics involved, but CTC uses dynamic programming to dramatically reduce the complexity of the computation. Because CTC is a differentiable function, it can be used during standard SGD training of deep neural networks.

In our lab, we focus on scaling up recurrent neural networks, and CTC loss is an important component. To make our system efficient, we parallelized the CTC algorithm, as described in this paper. This project contains our high performance CPU and CUDA versions of the CTC loss, along with bindings for Torch. The library provides a simple C interface, so that it is easy to integrate into deep learning frameworks.

This implementation has improved training scalability beyond the performance improvement from a faster parallel CTC implementation. For GPU-focused training pipelines, the ability to keep all data local to GPU memory allows us to spend interconnect bandwidth on increased data parallelism.

Performance

Our CTC implementation is efficient compared with many of the other publicly available implementations. It is also written to be as numerically stable as possible. The algorithm is numerically sensitive and we have observed catastrophic underflow even in double precision with the standard calculation - the result of division of two numbers on the order of 1e-324 which should have been approximately one, instead become infinity when the denominator underflowed to 0. Instead, by performing the calculation in log space, it is numerically stable even in single precision floating point at the cost of significantly more expensive operations. Instead of one machine instruction, addition requires the evaluation of multiple transcendental functions. Because of this, the speed of CTC implementations can only be fairly compared if they are both performing the calculation the same way.

We compare our performance with Eesen, a CTC implementation built on Theano, and a Cython CPU only implementation Stanford-CTC. We benchmark the Theano implementation operating on 32-bit floating-point numbers and doing the calculation in log-space, in order to match the other implementations we compare against. Stanford-CTC was modified to perform the calculation in log-space as it did not support it natively. It also does not support minibatches larger than 1, so would require an awkward memory layout to use in a real training pipeline, we assume linear increase in cost with minibatch size.

We show results on two problem sizes relevant to our English and Mandarin end-to-end models, respectively, where T represents the number of timesteps in the input to CTC, L represents the length of the labels for each example, and A represents the alphabet size.

On the GPU, our performance at a minibatch of 64 examples ranges from 7x faster to 155x faster than Eesen, and 46x to 68x faster than the Theano implementation.

GPU Performance

Benchmarked on a single NVIDIA Titan X GPU.

T=150, L=40, A=28	warp-ctc	Eesen	Theano
N=1	3.1 ms	.5 ms	67 ms
N=16	3.2 ms	6 ms	94 ms
N=32	3.2 ms	12 ms	119 ms
N=64	3.3 ms	24 ms	153 ms
N=128	3.5 ms	49 ms	231 ms

T=150, L=20, A=5000	warp-ctc	Eesen	Theano
N=1	7 ms	40 ms	120 ms
N=16	9 ms	619 ms	385 ms
N=32	11 ms	1238 ms	665 ms
N=64	16 ms	2475 ms	1100 ms
N=128	23 ms	4950 ms	2100 ms

CPU Performance

Benchmarked on a dual-socket machine with two Intel E5-2660 v3 processors - warp-ctc used 40 threads to maximally take advantage of the CPU resources. Eesen doesn't provide a CPU implementation. We noticed that the Theano implementation was not parallelizing computation across multiple threads. Stanford-CTC provides no mechanism for parallelization across threads.

T=150, L=40, A=28	warp-ctc	Stanford-CTC	Theano
N=1	2.6 ms	13 ms	15 ms
N=16	3.4 ms	208 ms	180 ms
N=32	3.9 ms	416 ms	375 ms
N=64	6.6 ms	832 ms	700 ms
N=128	12.2 ms	1684 ms	1340 ms

T=150, L=20, A=5000	warp-ctc	Stanford-CTC	Theano
N=1	21 ms	31 ms	850 ms
N=16	37 ms	496 ms	10800 ms
N=32	54 ms	992 ms	22000 ms
N=64	101 ms	1984 ms	42000 ms
N=128	184 ms	3968 ms	86000 ms

Interface

The interface is in include/ctc.h. It supports CPU or GPU execution, and you can specify OpenMP parallelism if running on the CPU, or the CUDA stream if running on the GPU. We took care to ensure that the library does not perform memory allocation internally, in order to avoid synchronizations and overheads caused by memory allocation.

Compilation

warp-ctc has been tested on Ubuntu 14.04 and OSX 10.10. Windows is not supported at this time.

First get the code:

git clone https://github.com/baidu-research/warp-ctc.git
cd warp-ctc

create a build directory:

mkdir build
cd build

if you have a non standard CUDA install export CUDA_BIN_PATH=/path_to_cuda so that CMake detects CUDA and to ensure Torch is detected, make sure th is in $PATH

run cmake and build:

cmake ../
make

The C library and torch shared libraries should now be built along with test executables. If CUDA was detected, then test_gpu will be built; test_cpu will always be built.

Tests

To run the tests, make sure the CUDA libraries are in LD_LIBRARY_PATH (DYLD_LIBRARY_PATH for OSX).

The Torch tests must be run from the torch_binding/tests/ directory.

Torch Installation

luarocks make torch_binding/rocks/warp-ctc-scm-1.rockspec

You can also install without cloning the repository using

luarocks install http://raw.githubusercontent.com/baidu-research/warp-ctc/master/torch_binding/rocks/warp-ctc-scm-1.rockspec

There is a Torch CTC tutorial.

Contributing

We welcome improvements from the community, please feel free to submit pull requests.

Known Issues / Limitations

The CUDA implementation requires a device of at least compute capability 3.0.

The CUDA implementation supports a maximum label length of 639 (timesteps are unlimited).

warp-ctc's People

Contributors

Stargazers

Watchers

Forkers

sherjilozair ssi379 fdoperezi nagyist codeaudit kalyanp admercs pbrakel tfwu gazimahmud maveriq hemel-cse bshillingford dagrol prabhjotsl vodp zhangaustin ml-lab mativait genba vgillella salemameen hnkulkarni notimesea neilthemathguy lucentcosmos ekelsen xyuan wangdongfrank ewang1986 zhichun wavelets tmp2 yurenyong123 albertmcma sdgdsffdsfff taolu0615 easyfmxu smart2robot vsooda kendazheng wangxiong2015 vikashranjan longpeng2008 technologiclee naxingyu cuihengbin deeplearningsprint louisyoo xombra wsdragon ironted informatrix stamhe euwen zencoding ml-ai-nlp-ir 201235571 phvu calisacole emrul wenmengzhou ginobilinie growthring yexihu adm1n007 lsgxeva weijiaustc alertisme vieyang stormltf olachan pjiahao shalomabitan inginx mengshuaiyang varadhanr zuozongming doitlite cloudstdio teazean eldan-china soledad89 faithere zuoshaobo xiaobona chaojie yanchaomars intfrr andrea-veritas liangkai roy881020 duzhanyuan uikit0 lihuibng skylook yiiwood yingxiaosan wuafeing fengyuxili

warp-ctc's Issues

Installaiton on Ubuntu 16.04 fails

[ 10%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o
/usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(36): error: identifier "__builtin_ia32_monitorx" is undefined

/usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(42): error: identifier "__builtin_ia32_mwaitx" is undefined

2 errors detected in the compilation of "/tmp/tmpxft_00001829_00000000-16_reduce.compute_52.cpp1.ii".
CMake Error at warpctc_generated_reduce.cu.o.cmake:266 (message):
  Error generating file
  /home/sarvex/warp-ctc/build/CMakeFiles/warpctc.dir/src/./warpctc_generated_reduce.cu.o


CMakeFiles/warpctc.dir/build.make:70: recipe for target 'CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o' failed
make[2]: *** [CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o] Error 1
CMakeFiles/Makefile2:141: recipe for target 'CMakeFiles/warpctc.dir/all' failed
    make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
    Makefile:127: recipe for target 'all' failed
    make: *** [all] Error 2

Number of epochs required

Hello,
Can I kindly ask for advice on the recommended number of epochs to be run to achieve non-blank predictions when using the CTC code? My network is always tending to predict a blank symbol, and I am thinking this is an effect of the number of run epochs (which is 30 for now).
Can someone give advice on the issue? Do you expect such a behavior to be an effect of the number of epochs, or the size of the training data?

Any help would be very much appreciated. Thank you !

please look the error when I run "test_cpu"

follow the Compilation explain, create "test_cpu" file.
when Ｉ try to run the program, but it's error like this:
$ ./test_cpu
./test_cpu: error while loading shared libraries: libwarpctc.so: cannot open shared object file: No such file or directory
"libwarpctc.so" exists. What is the problem?

Torch installation without cloning fails

When I try to install warp-ctc for Torch, it simply fails with the error as shown below.

$> luarocks install http://raw.githubusercontent.com/baidu-research/warp-ctc/master/torch_binding/rocks/warp-ctc-scm-1.rockspec

Using http://raw.githubusercontent.com/baidu-research/warp-ctc/master/torch_binding/rocks/warp-ctc-scm-1.rockspec... switching to 'build' mode

Error: Error fetching file: Failed downloading http://raw.githubusercontent.com/baidu-research/warp-ctc/master/torch_binding/rocks/warp-ctc-scm-1.rockspec - warp-ctc-scm-1.rockspec

However, manually cloning the repo works.

installing without CUDA

I'm trying to install warp-ctc on a google compute instance that does not have CUDA.

Below is the output from both the cmake and make step:

CMakeLists.txt	doc  examples  include	LICENSE  python  README.md  src  tests
root@rnn-permanent-kaldi:/srv/deepspeech/src/transforms/warp-ctc# mkdir build
root@rnn-permanent-kaldi:/srv/deepspeech/src/transforms/warp-ctc# cd build/
root@rnn-permanent-kaldi:/srv/deepspeech/src/transforms/warp-ctc/build# cmake ..
-- The C compiler identification is GNU 4.9.2
-- The CXX compiler identification is GNU 4.9.2
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
CUDA_TOOLKIT_ROOT_DIR not found or specified
-- Could NOT find CUDA (missing:  CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (Required is at least version "6.5")
-- cuda found FALSE
-- Building shared library with no GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /srv/deepspeech/src/transforms/warp-ctc/build
root@rnn-permanent-kaldi:/srv/deepspeech/src/transforms/warp-ctc/build# make
Scanning dependencies of target warpctc
[ 50%] Building CXX object CMakeFiles/warpctc.dir/src/ctc_entrypoint.cpp.o
/srv/deepspeech/src/transforms/warp-ctc/src/ctc_entrypoint.cpp:49:30: error: ‘cudaStream_t’ has not been declared
                              cudaStream_t stream,
                              ^
/srv/deepspeech/src/transforms/warp-ctc/src/ctc_entrypoint.cpp: In function ‘int compute_ctc_gpu(const float*, float*, const int*, const int*, const int*, int, int, float*, int, char*)’:
/srv/deepspeech/src/transforms/warp-ctc/src/ctc_entrypoint.cpp:50:53: error: conflicting declaration of C function ‘int compute_ctc_gpu(const float*, float*, const int*, const int*, const int*, int, int, float*, int, char*)’
                              char *ctc_gpu_workspace){
                                                     ^
In file included from /srv/deepspeech/src/transforms/warp-ctc/src/ctc_entrypoint.cpp:5:0:
/srv/deepspeech/src/transforms/warp-ctc/include/ctc.h:99:5: note: previous declaration ‘int compute_ctc_gpu(const float*, float*, const int*, const int*, const int*, int, int, float*, CUstream, char*)’
 int compute_ctc_gpu(const float* const activations,
     ^
CMakeFiles/warpctc.dir/build.make:54: recipe for target 'CMakeFiles/warpctc.dir/src/ctc_entrypoint.cpp.o' failed
make[2]: *** [CMakeFiles/warpctc.dir/src/ctc_entrypoint.cpp.o] Error 1
CMakeFiles/Makefile2:95: recipe for target 'CMakeFiles/warpctc.dir/all' failed
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
Makefile:117: recipe for target 'all' failed
make: *** [all] Error 2

I'm not familiar enough with C compiling to know where this error is or how to fix it, but I'm assuming that it has something to do with the make file not finding CUDA.

Any help would be greatly appreciated.

test_gpu Fails!

I am unable to run ./test_gpu, I've copied the result of ldd test_gpu
linux-vdso.so.1 => (0x00007ffe271e6000)
libcudart.so.7.5 => /usr/local/cuda/lib64/libcudart.so.7.5 (0x00007f96c0807000)
libwarpctc.so => /home/sarunac4/baidu/warp-ctc/build/libwarpctc.so (0x00007f96c0450000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f96c014c000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f96bfe46000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f96bfc30000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f96bf86b000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f96bf667000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f96bf449000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f96bf241000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f96bf032000)
/lib64/ld-linux-x86-64.so.2 (0x00007f96c0a65000)
But when I try running ./test_gpu I get the error
Running GPU tests
terminate called after throwing an instance of 'thrust::system::system_error'
what(): cudaSetDevice: unknown error
Aborted (core dumped)

I am running on Ubuntu 14.04, with a TitanX GPU and have the NVIDIA driver version 352.63 installed, any help would be appreciated.

Regards,
Deepak Kadetotad
PhD Student

error in the end of tensorflow_binding instructions

costs = warpctc_tensorflow.ctc(activations, input_lengths, flat_labels, label_lengths)

should really be:

costs = warpctc_tensorflow.ctc(activations, flat_labels, label_lengths, input_lengths)

then loss is inf

When i train the model using the warp ctc,the ctc criterion return the loss that is inf.Is there anything wrong ?How could i solve this problem?

the installation of wrap-ctc

Hello,
May I kindly ask for help on the the installation of wrap-ctc? when I install the wrap-ctc by running "luarocks make torch_binding/rocks/warp-ctc-scm-1.rockspec" at the top level directory,one error occurs as follows:
Missing dependencies for warp-ctc:
torch >= 7.0

Error: Could not satisfy dependency: torch >= 7.0**
so, I don't have any idea about this.
anyone has encountered such problem? Can you give me some advices on this ?Any help would be appreciated .thanks

torch tutorial

In the first paragraph, the tutorial said use 'abcd' four characters, but why did it get a 'daceba' at last?

Attribute error: Module object has not attribute "warpCTC"

Traceback (most recent call last):
File "lstm_ocr.py", line 186, in
symbol = sym_gen(SEQ_LENGTH)
File "lstm_ocr.py", line 177, in sym_gen
num_label = num_label)
File "/home/pratikgoyal/Desktop/lstm.py", line 76, in lstm_unroll
sm = mx.sym.WarpCTC(data=pred, label=label, label_length = num_label, input_length = seq_len)

getting the attribute error as i mentioned, while running lstmocr.py

当我使用tensorflow　包括warp-ctc的时候，如何拿到预测的结果？

当我使用tensorflow　包括warp-ctc的时候，如何拿到预测的结果？
因为warpctc_tensorflow.ctc函数只是返回了loss，那么如何取得输出？

能做做国际化吗，给个中文页面好不好

Using the torch bindings to experiment with CTC

Hello,
May I kindly ask for some help?while Using the torch bindings to experiment with CTC, I want to use the code to do some calculation by " require 'cutorch' ",however one error occurs as follows:

And I have compiled with GPU support,standard CUDA.
While I do the experiment with CPU,the situation as follows:

so, I have no idea why there is difference between them .
anyone has encountered such problem? Can you give me some advices on this ?Any help would be appreciated .thanks

Building error

Have you encountered the following error?

Scanning dependencies of target warpctc
Linking CXX shared library libwarpctc.so
/usr/bin/ld: cannot find -lTHC
collect2: error: ld returned 1 exit status
make[2]: *** [libwarpctc.so] Error 1
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
make: *** [all] Error 2

How to overcome it?

Compiling warp-ctc on Windows: ctcGetStatusString() disappears

I forked a repository https://github.com/mz24cn/warp-ctc which passes the compilation on Windows (VS2015).
But ctcGetStatusString() disappears.
I don't know why.
It is not related with compiler optimization.
I have to include ctcGetStatusString() function body in user side.
Anyone who resolves the issue please leave a message here.
Thank you.

CTC for Bi-LSTM and Uni-LSTM

Hi，

Are there differences between Uni and Bi on the process of decoding for CTC？

dlopen: cannot load any more object with static TLS

warpctc_tensorflow sometimes raises this error if it is imported after tensorflow imports another library that uses a *.so file, like in the following case:

>>> import tensorflow.contrib.ffmpeg
>>> import warpctc_tensorflow

tensorflow.python.framework.errors_impl.NotFoundError: dlopen: cannot load any more object with static TLS

A temporary fix is to just import warpctc_tensorflow beforehand, which seems to not trigger the error.

>>> import warpctc_tensorflow
>>> import tensorflow.contrib.ffmpeg

This fix, however, is quite ugly, and often means importing warpctc_tensorflow in the central main.py or train.py, as well as all associated test runners, etc. The actual ctc function is often only called in a small library function.

The gpu_ctc function returns 0

Hello,when I use gpu_ctc function, it always returns 0, but with cpu_ctc, it returns the correct value. Can anyone gives me some advices on this? Any help would be appreciated, many thanks.

can not work on gtx1080

error test_gpu:

Running GPU tests
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error: compute_ctc_loss in small_test, stat = execution failed
Aborted (core dumped)

Will you plan to release fast RNN (LSTM), along bindings for torch?

Really thanks for sharing. But the warp-ctc is a small part of the algorithm ( I mean end-to-end asr system), without the RNN part, it is really hard to use it. To really ease the pain, will you plan to release the RNN implementation?

Building error, OpenBlas

make[2]: *** No rule to make target `/opt/OpenBLAS/lib/libopenblas.so', needed by `libwarpctc.so'.  Stop.
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
make: *** [all] Error 2

I have OpenBLAS installed not in /opt/OpenBLAS/lib, but in the other local folder (i do not have root access). Is it possible to specify where the lib should look for openblas?

Gradient output from batch using torch binding

Sorry if this is answered elsewhere or blatantly obvious, but I'm not entirely sure of the formatting of the gradients after carrying out a batch like below:

th>acts = torch.Tensor({{0,0,0,0,0},{1,2,3,4,5},{-5,-4,-3,-2,-1},
                        {0,0,0,0,0},{6,7,8,9,10},{-10,-9,-8,-7,-6},
                        {0,0,0,0,0},{11,12,13,14,15},{-15,-14,-13,-12,-11}}):cuda()
th>labels = {{1}, {3,3}, {2,3}}
th>sizes = {1,3,3}
th>grads = torch.Tensor(acts:size())
th>gpu_ctc(acts, grads, labels, sizes)

{
  1 : 1.6094379425049
  2 : 7.355742931366
  3 : 4.938850402832
}

Should we expect the gradients to also be in column major formatting like how we put our multiple sequences in (i.e we would need to reverse the batching steps we did for the activation sequence with the gradients)? Thanks!

Decoding?

Just making sure: this repo only contains the code used for training, not for decoding, is that correct? (the Deep Speech 2 paper mentions using beam search to find the optimal transcription, but I don't see this in the code)

Support for Lua5.2?

Is warp-ctc not supported for Lua5.2? I had previously installed this package successfully on my Torch distribution made with Lua5.1, however I had to upgrade my Torch to Lua52 (version 5.2) and suddenly warp-ctc installation fails.

mohit.jain@node10:~/torch/warp-ctc$ luarocks make torch_binding/rocks/warp-ctc-scm-1.rockspec
Warning: unmatched variable LUALIB
cmake -E make_directory build && cd build && cmake .. -DLUALIB= -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/users/mohit.jain/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/users/mohit.jain/torch/install/lib/luarocks/rocks/warp-ctc/scm-1" && make -j$(getconf _NPROCESSORS_ONLN) && make install

-- cuda found TRUE
-- Found Torch7 in /users/mohit.jain/torch/install
-- Torch found /users/mohit.jain/torch/install/share/cmake/torch
-- Building shared library with GPU support
-- Building Torch Bindings with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /users/mohit.jain/torch/warp-ctc/build
Linking CXX shared library libwarpctc.so
/usr/bin/ld: cannot find -lluajit
collect2: error: ld returned 1 exit status
make[2]: *** [libwarpctc.so] Error 1
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
make: *** [all] Error 2

Error: Build error: Failed building.

mohit.jain@node10:~/torch/warp-ctc/build$ cmake ../
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found CUDA: /usr/local/cuda (found suitable version "7.5", minimum required is "6.5") 
-- cuda found TRUE
-- Found Torch7 in /users/mohit.jain/torch/install
-- Torch found /users/mohit.jain/torch/install/share/cmake/torch
-- Building shared library with GPU support
-- Building Torch Bindings with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /users/mohit.jain/torch/warp-ctc/build
mohit.jain@node10:~/torch/warp-ctc/build$ make
[ 16%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/./warpctc_generated_reduce.cu.o
[ 33%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/./warpctc_generated_ctc_entrypoint.cu.o
Scanning dependencies of target warpctc
Linking CXX shared library libwarpctc.so
/usr/bin/ld: cannot find -lluajit
collect2: error: ld returned 1 exit status
make[2]: *** [libwarpctc.so] Error 1
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
make: *** [all] Error 2

Input size much larger than label length

I find if input sequence length (e.g. 10) is much larger than label length (e.g. 4), the probability of blank will be much larger than other labels.

How to solve this problem?

Build warp-ctc shared/static libs

Sometimes, we need to build a static warp-ctc library.

Also, We don't want to bind with Torch.

So, PR is here #65, which did not influence the current status of warp-ctc .

Meaning of N

Hello,

Could you please explain what N symbol means in README.md benchmark tables?
L, A, T dimensions are described well but I cannot find a description for N.

Sorry if I am missing something obvious.
Anyway, thank you for your work,
Pavel

warp-ctc for cmake ExternalProject_Add

I planned to integrate warp-ctc in PaddlePaddle via ExternalProject_Add.

But, there is a problem for current warp-ctc cmake building system which automatically
check host system if it's cuda supported.

But, we want to control with cpu/gpu from our system, not from warp-ctc.

ExternalProject_Add(
    warpctc
    GIT_REPOSITORY "https://github.com/baidu-research/warp-ctc.git"
    GIT_TAG "v1.0"
    PREFIX ${WARPCTC_SOURCES_DIR}
    CMAKE_ARGS -DCMAKE_INSTALL_PREFIX=${WARPCTC_INSTALL_DIR}
    CMAKE_ARGS -DWITH_GPU=ON   # we want to add this flag in warp-ctc
    LOG_DOWNLOAD=ON
    UPDATE_COMMAND ""
)

I want to give a PR for warp-ctc's CMakeLists.txt.

optional(WITH_GPU, "compile warp-ctc with gpu", ${CUDA_FOUND})

"gpu_ctc" problem ?

When I try the example, I encounter a problem as follow.
"
th> gpu_ctc -h
[string "_RESULT={gpu_ctc -h}"]:1: attempt to perform arithmetic on global 'gpu_ctc' (a nil value)
stack traceback:
[string "_RESULT={gpu_ctc -h}"]:1: in main cheunk
[C]: in function 'xpcall'
/root/torch/install/share/lua/5.1/trepl/init.lua:651: in function 'repl'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
[C]: at 0x00406670
"

Installation problems (TF bindings)

I'm trying to install tensorflow binding and have followed the steps:

cloned tf repo (without building it, tf is installed via pip)
Sucessfully installed package.
It fails on tests:

test_ctc_loss_op (unittest.loader._FailedTest) ... ERROR
test_warpctc_op (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: test_ctc_loss_op (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_ctc_loss_op
Traceback (most recent call last):
  File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/tmp/warp-ctc/tensorflow_binding/tests/test_ctc_loss_op.py", line 23, in <module>
    import warpctc_tensorflow
  File "/tmp/warp-ctc/tensorflow_binding/warpctc_tensorflow/__init__.py", line 7, in <module>
    _warpctc = tf.load_op_library(lib_file)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library
    None, None, error_msg, error_code)
tensorflow.python.framework.errors_impl.NotFoundError: /tmp/warp-ctc/tensorflow_binding/warpctc_tensorflow/kernels.cpython-35m-x86_64-linux-gnu.so: undefined symbol: _ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE


======================================================================
ERROR: test_warpctc_op (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_warpctc_op
Traceback (most recent call last):
  File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/tmp/warp-ctc/tensorflow_binding/tests/test_warpctc_op.py", line 3, in <module>
    from warpctc_tensorflow import ctc
  File "/tmp/warp-ctc/tensorflow_binding/warpctc_tensorflow/__init__.py", line 7, in <module>
    _warpctc = tf.load_op_library(lib_file)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library
    None, None, error_msg, error_code)
tensorflow.python.framework.errors_impl.NotFoundError: /tmp/warp-ctc/tensorflow_binding/warpctc_tensorflow/kernels.cpython-35m-x86_64-linux-gnu.so: undefined symbol: _ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE


----------------------------------------------------------------------
Ran 2 tests in 0.000s

FAILED (errors=2)
/t/w/tensorflow_binding »

Ls of directory in question:

/t/w/tensorflow_binding » ls /tmp/warp-ctc/tensorflow_binding/warpctc_tensorflow      master ✔
__pycache__  __init__.py  kernels.cpython-35m-x86_64-linux-gnu.so

Installaition question“/home/geff/kaldi-ctc-master/tools/warp-ctc/src/ctc_entrypoint.cu(1): error: this declaration has no storage class or type specifier”

Hello,when i install warp-ctc ,i get this question:
[ 14%] Building NVCC (Device) object CMakeFiles/warpctc.dir/src/warpctc_generated_ctc_entrypoint.cu.o
/home/geff/kaldi-ctc-master/tools/warp-ctc/src/ctc_entrypoint.cu(1): error: this declaration has no storage class or type specifier

/home/geff/kaldi-ctc-master/tools/warp-ctc/src/ctc_entrypoint.cu(1): error: expected a ";"

2 errors detected in the compilation of "/tmp/tmpxft_00001c98_00000000-16_ctc_entrypoint.compute_52.cpp1.ii".
CMake Error at warpctc_generated_ctc_entrypoint.cu.o.cmake:266 (message):
Error generating file
/home/geff/kaldi-ctc-master/tools/warp-ctc/build/CMakeFiles/warpctc.dir/src/./warpctc_generated_ctc_entrypoint.cu.o

CMakeFiles/warpctc.dir/build.make:187: recipe for target 'CMakeFiles/warpctc.dir/src/warpctc_generated_ctc_entrypoint.cu.o' failed
make[2]: *** [CMakeFiles/warpctc.dir/src/warpctc_generated_ctc_entrypoint.cu.o] Error 1
CMakeFiles/Makefile2:141: recipe for target 'CMakeFiles/warpctc.dir/all' failed
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

anyone has encountered such problem? Can you give me some advices on this ?Any help would be appreciated .thanks

make error on mac osx 10.12

Hi, when I make warp-ctc on mac osx 10.12, I get this error:

nvcc fatal   : The version ('80000') of the host compiler ('Apple clang') is not supported

My xcode version is 8.2.1. Except downgrade my xcode version, how can I solve this problem?

Thanks for your replay.

Errors: test_gpu doesn't work

Hi all
I have these errors when I try to run 'test_gpu':

Running GPU tests
terminate called after throwing an instance of 'std::runtime_error'
what(): Error: compute_ctc_loss in small_test, stat = execution failed
Aborted (core dumped)

Note that: I can run 'test_cpu' without any problems (Running CPU tests, Tests pass)

I was wondering I might miss some libraries or anything else.
Every kind of suggestion would be appreciated!

Thank you so much.

项目可以放到百度公司的开源社区

可以考虑把项目放到 Baidu org

公司统一对外的开源窗口
共享项目影响力

Warp-CTC error on GPU : "cuda memcpy or memset failed"

I compile mxnet with warp-ctc plugin.
My env is: Ubuntu 14.04 + CUDA 8.0 + cuDNN 5.1 + Torch 7.0, GTX960.

When I compile warp-ctc, everything is normal, and passed "warp-ctc/build/test_gpu".

I rebuild mxnet successfully, except it shows warning : "nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning)."

Whichever I execute "example/image-classification/train_mnist.py" on mx.context.cpu(0) or mx.context.gpu(0), it works normal.

But when I execute "example/warpctc/toy_ctc.py" on mx.context.gpu(0), an error occurred :

terminate called after throwing an instance of 'std::runtime_error'
  what():  Error: compute_ctc_loss, stat = cuda memcpy or memset failed"

And if I use cpu, it is OK.

How to solve this problem?THX.

'THC.h' file not found error

I've tried to install torch binding, but always got error

/tmp/luarocks_warp-ctc-scm-1-3075/warp-ctc/torch_binding/binding.cpp:16:14: fatal error:
  'THC.h' file not found
#include "THC.h"

I'm using Mac OSX 11.
Could you please tell me which kind is this error ?

test_gpu fails on GTX 1060

The execution of test_gpu fails on a GTX 1060 in a freshly installed Ubuntu 16.04 and cuda 8.0.44. This happens even after following the solutions given in #40 and #46 . More specifically I cloned the master branch and added to the CMakeLists.txt

set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_53,code=sm_53")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_60,code=sm_60")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_61,code=sm_61")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -gencode arch=compute_62,code=sm_62")

then compiled and when executing text_gpu I get

Running GPU tests
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error: compute_ctc_loss in small_test, stat = execution failed
Aborted (core dumped)

Why the software history was not kept?

Hi there,

I'm a researcher studying software evolution. As part of my current research, I'm studying the implications of open-sourcing a proprietary software, for instance, if the project succeed in attracting newcomers. However, I observed that some projects, like warp-ctc, deleted the software history during the transition to open-source.

Knowing that software history is indispensable for developers (e.g., developers need to refer to history several times a day), I would like to ask warp-ctc developers the following four brief questions:

Why did you decide to not keep the software history?
Do the core developers faced any kind of problems, when trying to refer to the old history? If so, how did they solve these problems?
Do the newcomers faced any kind of problems, when trying to refer to the old history? If so, how did they solve these problems?
How does the lack of history impacted on software evolution? Does it placed any burden in understanding and evolving the software?

Thanks in advance for your collaboration,

Gustavo Pinto, PhD
http://www.gustavopinto.org

infinite CTC costs

Apologies if I misunderstood something, but running the following code seems to return infinite CTC costs, though the gradients are fine.

th> require 'warp_ctc'
th> acts = torch.Tensor({{0,-150,0,0,0}}):float()
th> grads = torch.zeros(acts:size()):float()
th> labels = {{1}}
th> sizes = {1}
th> cpu_ctc(acts, grads, labels, sizes)
{
  1 : inf
}
th> print(grads)
 0.2500  0.0000  0.2500  0.2500  0.2500
[torch.FloatTensor of size 1x5]

Is this simply something that we have to guard against in our own Softmax code?

tensorflow binding error: "Could not find file or directory /root/tensorflow/_python_build/tensorflow/include."

I want to use warpctc for tensorflow, but I encounter some problems.

I install tensorflow form the sourcecode, and set all necessary environment variables which is mentioned in installation instructions. However, when I go to the tensorflow_bind dir and run the command "python setup.py install", terminals shows "Could not find file or directory /root/tensorflow/_python_build/tensorflow/include." Actually, I did cannot find that directory in tensorflow.

dir /root/tensorflow/_python_build/tensorflow/
---------------------------
contrib  core  examples  __init__.py  __init__.pyc  models  python  stream_executor  tensorboard  tools

so, I edit file "setup.py" and change "tf_includes=[tf_include, tf_src_dir]" to "tf_includes=[tf_src_dir]",
then I run "python setup.py install" to install warpctc_tensorflow successfully.

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
running install
running bdist_egg
running egg_info
writing warpctc_tensorflow.egg-info/PKG-INFO
writing top-level names to warpctc_tensorflow.egg-info/top_level.txt
writing dependency_links to warpctc_tensorflow.egg-info/dependency_links.txt
reading manifest file 'warpctc_tensorflow.egg-info/SOURCES.txt'
writing manifest file 'warpctc_tensorflow.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/warpctc_tensorflow
copying build/lib.linux-x86_64-2.7/warpctc_tensorflow/__init__.py -> build/bdist.linux-x86_64/egg/warpctc_tensorflow
copying build/lib.linux-x86_64-2.7/warpctc_tensorflow/kernels.so -> build/bdist.linux-x86_64/egg/warpctc_tensorflow
byte-compiling build/bdist.linux-x86_64/egg/warpctc_tensorflow/__init__.py to __init__.pyc
creating stub loader for warpctc_tensorflow/kernels.so
byte-compiling build/bdist.linux-x86_64/egg/warpctc_tensorflow/kernels.py to kernels.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying warpctc_tensorflow.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying warpctc_tensorflow.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying warpctc_tensorflow.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying warpctc_tensorflow.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
warpctc_tensorflow.__init__: module references __path__
creating 'dist/warpctc_tensorflow-0.1-py2.7-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing warpctc_tensorflow-0.1-py2.7-linux-x86_64.egg
creating /usr/local/lib/python2.7/dist-packages/warpctc_tensorflow-0.1-py2.7-linux-x86_64.egg
Extracting warpctc_tensorflow-0.1-py2.7-linux-x86_64.egg to /usr/local/lib/python2.7/dist-packages
Adding warpctc-tensorflow 0.1 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/warpctc_tensorflow-0.1-py2.7-linux-x86_64.egg
Processing dependencies for warpctc-tensorflow==0.1
Finished processing dependencies for warpctc-tensorflow==0.1

However, when run "import warpctc_tensorflow", error happens:

>>> import warpctc_tensorflow
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/warpctc_tensorflow/__init__.py", line 7, in <module>
    _warpctc = tf.load_op_library(lib_file)
  File "/root/tensorflow/_python_build/tensorflow/python/framework/load_library.py", line 64, in load_op_library
    None, None, error_msg, error_code)
tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python2.7/dist-packages/warpctc_tensorflow/kernels.so: undefined symbol: _ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE

In addition, the version of my tensorflow is '0.11.head'.

tensorflow.__version__
'0.11.head'

Could anyone give any suggestions? Thank you.

Failing GPU tests on CUDA 8

I'm running on a GTX 1070. This is compiled with CUDA 8.0 release candidate.

The ./test_gpu script fails with the following error:

Running GPU tests
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error: compute_ctc_loss in small_test, stat = execution failed
Aborted (core dumped)

Attaching a debugger, I see:

(gdb) run
Starting program: /warp-ctc/build/test_gpu 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Running GPU tests
[New Thread 0x7fffef84b700 (LWP 10325)]
[New Thread 0x7fffef04a700 (LWP 10326)]
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error: compute_ctc_loss in small_test, stat = execution failed

Program received signal SIGABRT, Aborted.
0x00007ffff6d55c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff6d55c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff6d59028 in __GI_abort () at abort.c:89
#2  0x00007ffff7660535 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff765e6d6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff765e703 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff765e922 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x0000000000403e33 in throw_on_error (message=0x4097f8 "Error: compute_ctc_loss in small_test", status=<optimized out>) at /storage/deep_learning/warp-ctc/tests/test.h:11
#7  small_test () at /storage/deep_learning/warp-ctc/tests/test_gpu.cu:63
#8  0x000000000040360f in main () at /storage/deep_learning/warp-ctc/tests/test_gpu.cu:333

TF Binding NaN loss issue

When I switched from tf.nn.ctc_loss to warpctc_tensorflow.ctc I got NaN loss for the early moment of training even though tf.nn.ctc_loss could learn normally.

I wouldn't know how should I make this issue reproducible on minimal configuration.

I had experienced NaN loss issue frequently(not always) on torch binding, and at that time I thought this NaN issue comes from training set. But when I could switched between tf implementation of ctc loss and warp-ctc, I am now suspect this NaN loss issue originated from warp-ctc core.

In the experience from torch binding of warp-ctc, warp-ctc would return NaN, inf, or -inf loss often.

For blank sequence of labels

If there is a blank output sequence for one input sequence totally, how to set the table for this sequence of target labels? Is it right to set it to an empty table {} or a table of {0}?
Thank you.

C tutorial

I am attempting to integrate warp-ctc into an existing C++ project, but can't quite work out the initialisation. Is it possible to give a small-scale example in C of how to set up and call compute_ctc_loss?

input_lengths has the same value?

here ,we can see that input_lengths can have different values, but here, the code think all inpus are the same length.

is this a problem?

Error installing Tensorflow binding

I tried using following command and get errors, any idea? please help

cd warp-ctc
mkdir build cd build
make
then
cd ../tensorflow_binding

sudo TENSORFLOW_SRC_PATH=../tensorflow python setup.py test

...
building 'warpctc_tensorflow.kernels' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/local/lib/python2.7/dist-packages/tensorflow/include -I../tensorflow -I/home/siva/git/warp-ctc/tensorflow_binding/../include -I/usr/include/python2.7 -c src/ctc_op_kernel.cc -o build/temp.linux-x86_64-2.7/src/ctc_op_kernel.o -std=c++11 -fPIC -Wno-return-type
In file included from src/ctc_op_kernel.cc:7:0:
../tensorflow/../tensorflow/core/framework/op_kernel.h:516:5: error: ‘ScopedStepContainer’ does not name a type
ScopedStepContainer* step_container = nullptr;
^
../tensorflow/../tensorflow/core/framework/op_kernel.h:943:3: error: ‘ScopedStepContainer’ does not name a type
ScopedStepContainer* step_container() const {
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Tensorflow Test Error

Hi,

I've installed the tensorflow binding for warp_ctc, the installation went without a hitch but after running the commandpython setup.py test I end up getting an error, I have pasted the output to the command below

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
running test
running egg_info
writing warpctc_tensorflow.egg-info/PKG-INFO
writing top-level names to warpctc_tensorflow.egg-info/top_level.txt
writing dependency_links to warpctc_tensorflow.egg-info/dependency_links.txt
reading manifest file 'warpctc_tensorflow.egg-info/SOURCES.txt'
writing manifest file 'warpctc_tensorflow.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-2.7/warpctc_tensorflow/kernels.so -> warpctc_tensorflow
running test
running egg_info
writing warpctc_tensorflow.egg-info/PKG-INFO
writing top-level names to warpctc_tensorflow.egg-info/top_level.txt
writing dependency_links to warpctc_tensorflow.egg-info/dependency_links.txt
reading manifest file 'warpctc_tensorflow.egg-info/SOURCES.txt'
writing manifest file 'warpctc_tensorflow.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-2.7/warpctc_tensorflow/kernels.so -> warpctc_tensorflow
testBasicCPU (test_ctc_loss_op.CTCLossTest) ... I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.2155
pciBusID 0000:03:00.0
Total memory: 11.92GiB
Free memory: 11.77GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
ok
testBasicGPU (test_ctc_loss_op.CTCLossTest) ... I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
ERROR
test_session (test_ctc_loss_op.CTCLossTest)
Returns a TensorFlow Session for use in executing tests. ... ok
test_basic_cpu (test_warpctc_op.WarpCTCTest) ... I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
ok
test_basic_gpu (test_warpctc_op.WarpCTCTest) ... I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
ok
test_multiple_batches_cpu (test_warpctc_op.WarpCTCTest) ... I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
ok
test_multiple_batches_gpu (test_warpctc_op.WarpCTCTest) ... I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
ok
test_session (test_warpctc_op.WarpCTCTest)
Returns a TensorFlow Session for use in executing tests. ... ok

======================================================================
ERROR: testBasicGPU (test_ctc_loss_op.CTCLossTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/sarunac4/RNN/warp-ctc/tensorflow_binding/tests/test_ctc_loss_op.py", line 227, in testBasicGPU
    self._testBasic(use_gpu=True)
  File "/home/sarunac4/RNN/warp-ctc/tensorflow_binding/tests/test_ctc_loss_op.py", line 220, in _testBasic
    self._testCTCLoss(inputs, seq_lens, labels, loss_truth, grad_truth, use_gpu=use_gpu)
  File "/home/sarunac4/RNN/warp-ctc/tensorflow_binding/tests/test_ctc_loss_op.py", line 83, in _testCTCLoss
    (tf_loss, tf_grad) = sess.run([loss, grad])
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 717, in run
    run_metadata_ptr)
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 915, in _run
    feed_dict_string, options, run_metadata)
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _do_run
    target_list, options, run_metadata)
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 985, in _do_call
    raise type(e)(node_def, op, message)
InvalidArgumentError: Cannot assign a device to node 'CTCLoss': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
         [[Node: CTCLoss = CTCLoss[_kernel="WarpCTC", ctc_merge_repeated=true, preprocess_collapse_repeated=false, _device="/device:GPU:0"](Const_3, Const, Const_1, CTCLoss/sequence_length)]]

Caused by op u'CTCLoss', defined at:
  File "setup.py", line 126, in <module>
    test_suite = 'setup.discover_test_suite',
  File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
    dist.run_commands()
  File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/setuptools/command/test.py", line 210, in run
    self.run_tests()
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/setuptools/command/test.py", line 231, in run_tests
    testRunner=self._resolve_as_ep(self.test_runner),
  File "/usr/lib/python2.7/unittest/main.py", line 94, in __init__
    self.parseArgs(argv)
  File "/usr/lib/python2.7/unittest/main.py", line 149, in parseArgs
    self.createTests()
  File "/usr/lib/python2.7/unittest/main.py", line 158, in createTests
    self.module)
  File "/usr/lib/python2.7/unittest/loader.py", line 130, in loadTestsFromNames
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/usr/lib/python2.7/unittest/loader.py", line 91, in loadTestsFromName
    module = __import__('.'.join(parts_copy))
  File "/home/sarunac4/RNN/warp-ctc/tensorflow_binding/setup.py", line 126, in <module>
    test_suite = 'setup.discover_test_suite',
  File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
    dist.run_commands()
  File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/setuptools/command/test.py", line 210, in run
    self.run_tests()
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/setuptools/command/test.py", line 231, in run_tests
    testRunner=self._resolve_as_ep(self.test_runner),
  File "/usr/lib/python2.7/unittest/main.py", line 95, in __init__
    self.runTests()
  File "/usr/lib/python2.7/unittest/main.py", line 232, in runTests
    self.result = testRunner.run(self.test)
  File "/usr/lib/python2.7/unittest/runner.py", line 151, in run
    test(result)
  File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/lib/python2.7/unittest/case.py", line 395, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/sarunac4/RNN/warp-ctc/tensorflow_binding/tests/test_ctc_loss_op.py", line 227, in testBasicGPU
    self._testBasic(use_gpu=True)
  File "/home/sarunac4/RNN/warp-ctc/tensorflow_binding/tests/test_ctc_loss_op.py", line 220, in _testBasic
    self._testCTCLoss(inputs, seq_lens, labels, loss_truth, grad_truth, use_gpu=use_gpu)
  File "/home/sarunac4/RNN/warp-ctc/tensorflow_binding/tests/test_ctc_loss_op.py", line 76, in _testCTCLoss
    sequence_length=seq_lens)
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/ctc_ops.py", line 144, in ctc_loss
    ctc_merge_repeated=ctc_merge_repeated)
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_ctc_ops.py", line 162, in _ctc_loss
    name=name)
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 749, in apply_op
    op_def=op_def)
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2380, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/sarunac4/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1298, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'CTCLoss': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
         [[Node: CTCLoss = CTCLoss[_kernel="WarpCTC", ctc_merge_repeated=true, preprocess_collapse_repeated=false, _device="/device:GPU:0"](Const_3, Const, Const_1, CTCLoss/sequence_length)]]


----------------------------------------------------------------------
Ran 8 tests in 0.616s

FAILED (errors=1)

Please let me know how to fix this.

Regards,
Deepak

Will you have any examples close to DeepSpeech 2?

Will this be possible to furnish at a small scale for CTC?

TF binding runtime error

Tried to run tests but get following error, tried using -D_GLIBCXX_USE_CXX11_ABI=0 in setup.py, still get errors. please help
_warpctc = tf.load_op_library(lib_file)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library
None, None, error_msg, error_code)
/warp-ctc/tensorflow_binding/warpctc_tensorflow/kernels.so: undefined symbol: _ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE