kaldi-asr / kaldi Goto Github PK

kaldi-asr/kaldi is the official location of the Kaldi project.

License: Other

Shell 45.79% Perl 5.27% Python 8.79% HTML 0.36% Awk 0.01% Makefile 0.24% TeX 1.17% MATLAB 0.07% C++ 35.85% C 1.22% Cuda 1.06% Java 0.07% Dockerfile 0.02% CMake 0.08% Cython 0.01%

kaldi c-plus-plus cuda shell speech-recognition speech-to-text speaker-verification speaker-id speech

kaldi's Introduction

Kaldi Speech Recognition Toolkit

To build the toolkit: see ./INSTALL. These instructions are valid for UNIX systems including various flavors of Linux; Darwin; and Cygwin (has not been tested on more "exotic" varieties of UNIX). For Windows installation instructions (excluding Cygwin), see windows/INSTALL.

To run the example system builds, see egs/README.txt

If you encounter problems (and you probably will), please do not hesitate to contact the developers (see below). In addition to specific questions, please let us know if there are specific aspects of the project that you feel could be improved, that you find confusing, etc., and which missing features you most wish it had.

Kaldi information channels

For HOT news about Kaldi see the project site.

Documentation of Kaldi:

Info about the project, description of techniques, tutorial for C++ coding.
Doxygen reference of the C++ code.

Kaldi forums and mailing lists:

We have two different lists

User list kaldi-help
Developer list kaldi-developers:

To sign up to any of those mailing lists, go to http://kaldi-asr.org/forums.html:

Development pattern for contributors

Create a personal fork of the main Kaldi repository in GitHub.
Make your changes in a named branch different from master, e.g. you create a branch my-awesome-feature.
Generate a pull request through the Web interface of GitHub.
As a general rule, please follow Google C++ Style Guide. There are a few exceptions in Kaldi. You can use the Google's cpplint.py to verify that your code is free of basic mistakes.

Platform specific notes

PowerPC 64bits little-endian (ppc64le)

Kaldi is expected to work out of the box in RHEL >= 7 and Ubuntu >= 16.04 with OpenBLAS, ATLAS, or CUDA.
CUDA drivers for ppc64le can be found at https://developer.nvidia.com/cuda-downloads.
An IBM Redbook is available as a guide to install and configure CUDA.

Android

Kaldi supports cross compiling for Android using Android NDK, clang++ and OpenBLAS.
See this blog post for details.

Web Assembly

Kaldi supports cross compiling for Web Assembly for in-browser execution using emscripten and CLAPACK.
See this post for a step-by-step description of the build process.

kaldi's People

Contributors

Stargazers

Watchers

Forkers

danpovey kkm000 amitbeka rikrd motlicek fotwo shuang777 janchorowski jankim ericgridspace vijayaditya gongbudaizhe jtrmal geo-gkartz karelvesely84 wbgxx333 haoqi pengeorge vimalmanohar chenguoguo yolaso guker vdp hdubey pswietojanski rickychanhoyin david-ryan-snyder tanmay-10et90r03 ilya-eder workflow-demo-org tomkocse jbender hyperdrive nlphacker sanyaade-speechtools claudiouzelac liuliu alwaleedhadi myechona datachand shkyang fedorajzf kronos-cm colingogo keli78 kwin-wang dresen akhti mistsc lightslife alumae freewym pegahgh xiaohui-zhang zoucan520 mindis xyshi25 shynggys tiphanied lucknotlove andrenatal minhua722 tribemedia sw005320 sayiho rizar wenlin-zhang irenexu199 aymannedaa siilats nouramoubayed nvb2005 aberforth19 teroxoid bingyanzen eliemichel nbravo hainan-xv grtkachenko mitch1991 jzombi jvhout tjuguozhen mallidi levanolis deeplearningsprint leonidwang yangxian10 yajiemiao alonshoa hhadian nryant harley digitalsword dongjoon-hyun epromram qutsaivt lovish1234 mcanthony jinhong9106

kaldi's Issues

CUDA interface change and new function

@naxingyu, it would be great if you could help with this because you have been working on similar code recently. This is something I need for my work on CTC (which I am working on currently in my personal fork of Kaldi). I'll add you to the paper if you can help with this.

The following code currently exists in cu-matrix.h:

// This function, adds a list of MatrixElements (scaled by alpha) to corresponding locations to                                                                    
// (*this).                                                                                                                                                        
void AddElements(Real alpha, const std::vector<MatrixElement<Real> >& input);

// This function resizes the output to indexes.size(), and for each element of                                                                                     
// "indexes" it interprets it as a (row, column) index into *this, and puts                                                                                        
// (*this)(row, column) into the corresponding element of "output".                                                                                                
void Lookup(const std::vector<Int32Pair> &indexes,
            std::vector<Real> *output) const;

What I would like is (1) to change the interface of the existing Lookup to take a Real* pointer instead of a std::vector, for greater flexibility, (2) add a version of Lookup that takes CuArray instead, that would be called internally by the existing function in cases where CUDA is being used; and (3) add a version of AddElements that is more compatible with the Lookup interface, as follows (we can add an alpha to its interface later if it becomes necessary).

 /// For each i, with indexes[i] = (j, k), does (*this)(j, k) += input[i].  Requires, but does
 /// not check, that the vector of indexes does not contain repeated elements.  'input'
 /// is the start of an array of length equal to indexes.Dim().
 void AddElements(const CuArray<Int32Pair> &indexes, const Real *input);

nnet3- add binary nnet3-show-progress

This is for anyone who has the time.
Look at how nnet3-average differs from nnet-am-average to see how you should modify nnet-show-progress to make nnet3-show-progress. It may be necessary to add functions to nnet-utils.h.
The program should read the "raw" nnet format.

NetworkNode types

The documentation admits that "we have a slightly messy layout", and my question might be naive, but why not writing a different object for each type of node (that may inherit from a virtual NetworkNode class)?

Program is likely to break with new gcc

Hello everyone !

RPMLint has noticed a code breaking in const-arpa-lm.cc with -strict-alising.
I: Program is likely to break with new gcc. Try -fno-strict-aliasing.
W: kaldi strict-aliasing-punning const-arpa-lm.cc:551, 555, 580, 913

Kaldi has been built with gcc 5. The concerned lines are 551, 555, 580 and 913

--realign-epochs options in nnet2 training scripts

In help message:

--realign-epochs <list-of-epochs|"">           # A list of space-separated epoch indices the beginning of which

But:

train_pnorm_accel2.sh: invalid option --realign-epochs

There is a realign_times variable defined, but no realign_epochs, and nothing in the code to handle it anyway. Did I miss something?

(By the way, does this usually pays off?)

PS: I bet that this is the help message that is wrong, not the overall code.

Speaker-id TODO

Something Najim requested-- to print out in the iVector extractor training, how much of the variance of the features is being modeled by the iVectors versus the variance of the Gaussians. Probably for me, or possibly @david-ryan-snyder

logistic-regression bug

@david-ryan-snyder, I found a bug in the logistic-regression code.
This was after a test failure in the logistic-regression-test. It's not repeatable because the logistic-regression-test.cc does srand(time()), but if you put a loop inside logistic-regression-test.cc you'll get an error in free().

If I put the loop in logistic-regression-test.cc and then run
valgrind --db-attach=yes logistic-regression-test

I get the following error:

==28965== Invalid write of size 4
==28965== at 0x4501DE: kaldi::UnitTestTrain() (logistic-regression-test.cc:92)
==28965== by 0x4505D1: main (logistic-regression-test.cc:128)
==28965== Address 0x73d9b54 is 4 bytes after a block of size 17,680 alloc'd
==28965== at 0x4C270FE: memalign (vg_replace_malloc.c:694)
==28965== by 0x4C271A7: posix_memalign (vg_replace_malloc.c:835)
==28965== by 0x46E5EE: kaldi::Matrix::Init(int, int) (kaldi-matrix.cc:656)
==28965== by 0x46E2F0: kaldi::Matrix::Resize(int, int, kaldi::MatrixResizeType) (kaldi-matrix.cc:702)
==28965== by 0x450932: kaldi::Matrix::Matrix(int, int, kaldi::MatrixResizeType) (kaldi-matrix.h:744)
==28965== by 0x4501B4: kaldi::UnitTestTrain() (logistic-regression-test.cc:90)
==28965== by 0x4505D1: main (logistic-regression-test.cc:128)
==28965==
==28965==

0x00000000004501de in kaldi::UnitTestTrain () at logistic-regression-test.cc:92
92 xs_with_prior(i, n_xs) = 1.0;
(gdb) list
87
88 // Internally in LogisticRegression we add an additional element to
89 // the x vectors: a 1.0 which handles the prior.
90 Matrix xs_with_prior(n_xs, n_features + 1);
91 for (int32 i = 0; i < n_xs; i++) {
92 xs_with_prior(i, n_xs) = 1.0;
93 }
94 SubMatrix sub_xs(xs_with_prior, 0, n_xs, 0, n_features);
95 sub_xs.CopyFromMat(xs);
96
(gdb) p n_xs
$1 = 221

I think maybe you mean xs_with_prior(i, n_features) = 1.0, not n_xs.

Do you think this affects anything you have been doing?

CTC- some optimizations

@freewym, it would be good if you could help with this, with help from @vijayaditya.

Actually this relates to the 'tombstone' branch, which is in my personal repository, but I think you should be able to make a pull request to there.
If you do
cat /home/dpovey/kaldi-tombstone/egs/swbd/s5c/exp/ctc/tdnn_a/log/train.100.3.log
you will see that a fair amount of time is spent in Tensor3dCopy, see
Tensor3dCopy 8.79502s
If you learn a bit about CUDA, you'll see that memory access is best when threads in a thread block access sequential memory locations. In the Tensor3dCopy function, this means that if the x stride is 1, it will be most efficient, because x maps to the thread index.
I'd like you to add to the top of that function, code like the following:
if (the x stride is not 1 but the y or z stride is 1) {
swap the x stride and dim with whichever other index has stride 1.
}
I'd also like you to change the CPU version of the code so that instead of iterating over x, y, z it iterates over z, y, x. This will make it more efficient if the x stride is 1, for memory-locality reasons.

Multi-decision-tree support (RE ContextDependency object)

@hainan-xv, this is mostly for you, but I'm putting it here as an issue in case others have comments or can help implement it. My hope is that you can do the parts of this that involve the tree code, since you have some experience with it, while others (e.g. @naxingyu, or someone else if he is busy) can handle the sparse-matrix things.

The basic idea here is to support multiple-decision-tree combination in a single network. In fact the framework I am proposing will support something more general, where any given context-dependent state will correspond to a variable-size list of 'features' (e.g. corresponding to the monophone, the left and right phones, various trees corresponding to combinations of phones, the state, etc.). So the output of the last affine component of the network would correspond to some space of 'features' (e.g. dimension several thousand), and then there would be a fixed, sparse zero-one matrix after that that maps the 'features' to 'pdf-ids', and after that is the softmax, over a larger dimension (e.g. 100k). If you use your current multi-tree framework, the number of nonzero features for any context window will be the same, and equal to the number of trees.

A slight wrinkle in this plan is that it will be tricky to accurately estimate the priors of this larger dimension of features. We'll have to be more careful with the flooring, and also use more data to estimate the priors, i.e. not such a small subset. Note that when we re-use this code in the newer 'chain' models, the prior-division won't be an issue any more as it's not needed there. But for now just use more data when getting the counts for the priors. [actually, it's a good thing that we now estimate the priors from the posteriors of the network itself and don't only rely on the training-data alignments, which are sparse.]

In terms of code and scripts, I aim to mostly limit the changes to the tree-building stage of things, and leave things relating to the transition-model and the graph-building untouched. Currently the output of 'build-tree' is the ContextDependency object. In your multi-tree code, the output of 'build-tree-multi' will be two things: a ContextDependencyMulti object, which you can write to disk as a 'tree' file, and a sparse matrix of zeros and ones (whose column-dimension determines the 'feature dimension' of the network).

The ContextDependencyMulti object will, like ContextDependency, inherit from ContextDependencyInterface, and it will have all the public members; its function will be to map from a context to a pdf_id, and it will likely have to store an array of EventMap objects together with some other bookkeeping information to map pairs of those EventMap outputs to a single integer id. When building the object, you'll need more information than just the input EventMap objects- you'll also need to know the number of pdf-classes for each phone, which you can get from the topology object. You will also have to enumerate all windows of phones (e.g. triples, in the common case), and all pdf-classes of the central phone, in order to work out all possible combinations of outputs of the individual EventMap objects. You should try to support the case where the context-windows and central-positions have different values (e.g. so we can combine left-biphone and right-biphone trees). This will require a little bookkeeping. Thinking about it in terms of the number of phones of left-context and right-context will make your life easier (you can map between this representation and the context-window/central-position representation).
There is no harm in having your ContextDependencyMulti object also having functions that are not in the interface, e.g.
// return the total number of features (the total of the NumPdfs() of the individual trees)
int32 NumFeatures() const;

and
// output to 'features' a list of the features that are nonzero for this pdf_id.
void GetFeatureList(int32 pdf_id, std::vector *features);

So the GetFeatureList function could be used to populate the sparse matrix.

I'd like you to change the code so that ContextDependencyInterface has a virtual ReadNew() function analogous to that in the nnet2 component code, i.e.
static ContextDependencyInterface* ReadNew(std::istream &is, bool binary);
and where you need code to be generic, you can use this reading mechanism. You can also make the Read and Write functions virtual, as they are in nnet2::Component.
Code that only needs the interface (e.g. graph-building code, and code that initializes the transition-model) can use the ReadNew() function.

nnet3: Convolution1dComponent does not initialize all class members when using file based initialization.

class members like patch_dim_ are not being initialized when using "matrix" argument in the config.

nnet3- small task

This task is for anyone who can do it and has free time.
If you check out the nnet3 branch of the code and build it, you'll notice that there is a test failure in nnet3-optimize-test:

LOG (UnitTestNnetOptimize():nnet-optimize-test.cc:111) Output sum (not optimized) is -33558.6
LOG (UnitTestNnetOptimize():nnet-optimize-test.cc:113) Output sum (optimized) is -34012.6
ERROR (UnitTestNnetOptimize():nnet-optimize-test.cc:115) Non-optimized and optimized versions of the computation give different outputs.
terminate called after throwing an instance of 'std::runtime_error'
what(): ERROR (UnitTestNnetOptimize():nnet-optimize-test.cc:115) Non-optimized and optimized versions of the computation give different outputs.

[stack trace: ]
kaldi::KaldiGetStackTrace()
kaldi::KaldiErrorMessage::~KaldiErrorMessage()
kaldi::nnet3::UnitTestNnetOptimize()
./nnet-optimize-test(main+0xe3) [0x54903b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f3de3f45ead]
./nnet-optimize-test() [0x547b69]

What needs to be done is a refactoring and change to UnitTestNnetOptimize(), so that when it detects a mismatch like this, it will turn on the optimizations in the NnetOptimizeOptions config one by one and show the output-sum for each one separately, and hopefully, as a convenience, try to inform the user which of those optimizations seems to be the issue.
Vijay was going to do this but he is very busy right now and I thought it's the kind of thing we could find someone from the list to do.

Dan

get_egs_discriminative2.sh

@vijayaditya and @vimalmanohar:

Someone said on Kaldi-help:

"When I'm running this steps, some of the nnet-get-egs from stage 10 fail, because exp/nnet2_online/nnet_ms_sp_degs/priors_uttlist doesn't have any utterance from (for example in my case) data/train_sp_hires/split350/141/feats.scp.
I'm not sure how to fix this so that it doesn't randomly break when I change the data again."

I think it would be less wasteful to just use one job for this (i.e. to use non-split feats.scp). Since the subset size is fairly small, this won't take long, and it will be less stressful to the cluster.

Wave{Reader,Writer} changes

For tracking required changes (Discussed in #76)

WaveWriter should not crash on out of range data, rather print a warning and continue (rather urgent)
~~Read and write 8 and 32 bit formats correctly.~~ drop support for 8 and 32 bits (see discussion). Scale all internal data to 2‐15×[-1,1] floats.
Explain why this range in Docs

build issue for src/online

src/online depends on jack on my Linux system because portaudio does, failing with undefined references to many jack functions. Not sure the best way to fix this although this works on Debian 8.2 (assuming libjack-dev is installed of course):

diff --git a/src/online/Makefile b/src/online/Makefile
index 4c0ad9f..27ddfe0 100644
--- a/src/online/Makefile
+++ b/src/online/Makefile
@@ -22,6 +22,7 @@ ifeq ($(UNAME), Linux)
   else
     EXTRA_LDLIBS += -lrt
   endif
+  EXTRA_LDLIBS += -ljack
 endif

why rnnlm tool doesn't have parameter for syn0 --> hidden

Release tarballs on github

Hi !

I'm working on speech recognition on openSUSE. I need to package Kaldi at first. Do you plan providing release tarballs in github ? It makes maintainer's life so much easier :).

By advance, thanks for your answer.

Unexpected tools installation behavior

I was executing a make command from the tools directory and got that interesting output

... LONG INSTALLATION LOG ...

for p in align2html asclite ctmValidator mergectm2rttm; do \
        pod2man $p.pod -o $p.1; \
        pod2html $p.pod > $p.html; \
    done 2> /dev/null


Perforce client error:
    Connect to server failed; check $P4PORT.
    TCP connect to perforce:1666 failed.
    nodename nor servname provided, or not known
patch: **** Can't get file lock.h from Perforce
make: *** [openfst-1.3.4/.patched] Error 2

I was on wifi and didn't have access to perforce at that time, but was a bit scared by the fact this script tried to access it.

That behavior is in fact caused by

$ pwd
~/kaldi/tools/openfst-1.3.4/src/include/fst
$ patch -p0 -N --verbose  < ../../../../extras/openfst_gcc41up.patch
Hmm...missing header for context diff at line 2 of patch
  Looks like a new-style context diff to me...
The text leading up to this was:
--------------------------
|*** lock.h
--------------------------
Get file lock.h from Perforce with lock? [y] ^C

Is there any flag that is missing in that command?

Kaldi configure script verifies on static openfst even with --static-fst=no option

All is in the title.

Optimization target

This is mostly just a note for myself.
Need to optimize the case where we have something like

some_component.Backprop(....., &m1)
m2 += m1
m1 = []
where the component's Backprop adds to its output, so we could just have done
some_component.Backprop(....., &m2)

Fixed sparse affine component

Note: to fully implement this efficiently on the GPU, we depend on issue #298. However, a placeholder implementation using full matrices could easily be created.

In nnet3, I'd like a FixedSparseAffineComponent to be implemented. This won't have trainable parameters, its parameters will be initialized from a file. This is going to be useful in some multi-tree training stuff that Hainan will implement. (note: we previously got good results from this but it wasn't efficient; I think that once we implement it this way it will be almost as fast as the baseline system to train).

We'll add whoever helps us with these things, to the paper.

Another TODO in 'chain' branch in my personal repo

This is another thing that's needed for the 'chain' project (this is a much-simplified version of the 'tombstone' CCTC).

in src/chain/language-model.h, two versions of a function GetAsFst() are declared. These need to be implemented (and of course some kind of testing needs to happen). There is documentation where they are declared. Obviously this requires some familiarity with FST code.

[easy] Create nnet3-copy command

to copy raw nnet3 nnets and possibly change binary status. modify this from nnet3-am-copy and remove a bunch of code.

Documentation for AddMatMatBatched call in cu-matrix.cc

@freewym
Documentation for the mentioned call is necessary.

tools/extras/check_dependencies.sh libtool issues

One more issue: running "make" from a clean install in tools/ on Debian 8.2 fails because check_dependencies thinks I need the binary "libtool". It seems Debian now has "libtoolize" in the libtool package which generates local "libtool"s (zeromq/libzmq#1385), and this is used by e.g. openfst's configure script. The full build works correctly if I check for "libtoolize" instead of "libtool" in tools/extras/check_dependencies.sh.

Otherwise, if I run "make" three separate times (or make -k) the build mostly completes, although it does not make the openfst symlink, so I have to make it myself or configure kaldi with --fst-root pointing at the right place. Note: there is also a Debian package "libtool-bin" which provides a system-wide "libtool" executable, and installing that will make the build succeed, although I don't think the executable is actually used because local "libtool"s are still created.

Maybe it would be best to test for either libtool or libtoolize. I'm not very familiar with autotools unfortunately.

Last comment: when building tools/ in parallel with make -j N, the check_dependencies is run at the same time as other configure scripts etc, which makes its output interleaved and difficult to spot. This will run check_dependencies first even if multiple jobs were specified:

diff --git a/tools/Makefile b/tools/Makefile
index cb3a35b..0c63b13 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -34,6 +34,9 @@ all: check_required_programs sph2pipe atlas sclite openfst
 check_required_programs:
        extras/check_dependencies.sh

+# make sure check_required_programs runs before anything else:
+sph2pipe atlas sclite openfst sctk: | check_required_programs
+
 clean: openfst_cleaned sclite_cleaned

 openfst_cleaned:
@@ -72,7 +75,7 @@ openfst-$(OPENFST_VERSION)/lib: | openfst-$(OPENFST_VERSION)/Makefile

 # Add the -O flag to CXXFLAGS on cygwin as it can fix the compilation error   
 # "file too big".
-openfst-$(OPENFST_VERSION)/Makefile: openfst-$(OPENFST_VERSION)/.patched
+openfst-$(OPENFST_VERSION)/Makefile: openfst-$(OPENFST_VERSION)/.patched | check_required_programs
 ifeq ($(OSTYPE),cygwin)
        cd openfst-$(OPENFST_VERSION)/; ./configure --prefix=`pwd` --enable-static --enable-shared --enable-far --enable-ngram-fsts CXX=$(CXX) CXXFLAGS="$(CXXFLAGS) -O" LDFLAGS="$(LDFLAGS)" LIBS="-ldl"
 else

'chain' code - a piece I need help with.

The 'tombstone' stuff, which is an extension of CCTC, is working quite well, but I am realizing that the model can be implemented much more simply as 'almost' a conventional hybrid system, but one that's trained discriminatively from the start without lattices-- and the training procedure is a significant simplification of the 'tombstone CCTC' training stuff I have in my personal repo. That stuff seems to be working well, but it's unnecessarily complicated.

I am starting work on this. I am calling it the 'chain' model... actually it's a kind of simplification of a hybrid HMM-DNN system, where the system is trained discriminatively from the start. The word 'chain' is chosen as it's a synonym for 'sequence', and the model is effectively a sequence model as that's how it's trained. I want this to be done, if possible, within a week. Perhaps this will become the 'next version' of high-performance models in Kaldi, instead of CTC.

I have pushed to my personal copy of Kaldi, the branch 'chain'; see [email protected]:/danpovey/kaldi.git.

This 'issue' is to do the implementation of what I've created the interface for in phone-topology.h. @naxingyu, do you have time for this right now? It involves some work with FSTs (nothing too complicated).

Tune nnet3 LSTMs

I am creating an issue for this in case we can get wider help--
Right now the thing that needs to be done that's most urgent for me is to get LSTMs tuned and working in the nnet3 setup. I think at this point the essential code is all written and it's a question of tuning the scripts (e.g. adding more layers). This is the limiting factor on me getting CTC working too (I'm doing it in a private branch, but I can't really test it until we have some recurrent setup working well).
What I'd like done this this: @vijayaditya and @pegahgh, can you please (fairly urgently) make sure that the essential pieces of your work so far are checked in, and provide pointers to it here in this thread? We can just check in the best configurations you have so far. After that I am hoping others such as @naxingyu and @nichongjia will be able to help tune it using their setups.

CTC tuning issue

@vijayaditya - you said you were going to set up CTC for Switchboard - there is something that needs to be done first.
In the version of CTC that I've implemented - I'm calling it CCTC - there is a phone language model involved. We don't want this phone language model to have too many history-states. The program
ctc-init-transition-model
has an option --state-count-cutoff2plus (default: 200) that can be modified to ensure this. [note: I haven't tuned any of its options... setting --ngram-order to 3 (default: 2) might help.]
With more data, we'll tend to get more history-states, and this will get unmanageable (we probably already have too many in the WSJ setup, at 2000). Note: the output-layer size equals num-tree-leaves plus num-history-states.
The way I think this should be set up is to aim for a target number of history-states (e.g. 500 or 1000). However, implementing this in code would be a pain. What I think would be easiest- for now, at least- would be to set off a number of ctc-init-transition-model jobs with different cutoffs, and to select the one that has the closest number of history-states to the target. You might just have to have a variable like cutoff_array="100 150 200 300 400 600 800 1000 1500 2000 3000" and try all those values [just make them separate jobs], and then
target_history_states=500
or whatever. Note: you can run ctc-transition-model-info to see the number of history states. You might have to write a python script to select the closest one.

Of course, in the short term it would be easiest to just tune the cutoff manually.

Dan

Upgrading to new CUDA API (v2)

[creating a separate issue for this. I expect this issue will be up for a while].

The new CUDA "v2" API has been supported since v4.0 of the CUDA toolkit (i.e. for quite a while; we are now on 7), and probably makes sense to upgrade the Kaldi project to use the new CUDA API. The old API is deprecated by NVidia, and does not support certain useful functionality such as batched matrix-multiply.

@jtrmal, could you please create a new branch in the kaldi-asr repo called cuda-v2, as a copy of trunk?
@vesis84, it would be good if you could help with this, as you wrote the original cuda matrix library. The new CUDA API supports having multiple execution contexts. The current CuDevice class would be very limiting. Do you have time to think about the best way to rework things and come up with some kind of sketch of a replacement? Or if you decide that leaving it mostly unchanged is the
best way, let us know why.

PerElementOffsetComponent

We need a PerElementOffsetComponent in nnet3.
This will be useful in the multi-decision-tree framework mentioned in previous issues.
It should support training, but also support a mode were you can't do training. @vijayaditya, do we have a standard name for the boolean variable that determines that. E.g. is_updatable_ ?
I believe we have similar code in nnet2 that could be used for reference.
This issue is fairly easy, and would be suitable for someone who is getting started with this type of thing.

Test failures and Travis CI

When a test fails under CI, all I get is a failure message like (see this Travis run)

Running matrix-lib-test ...... FAIL matrix-lib-test
Running kaldi-gpsr-test ...... FAIL kaldi-gpsr-test

make[1]: *** [test] Error 1
make[1]: Leaving directory `/home/travis/build/workflow-demo-org/kaldi/src/matrix'
make: *** [matrix/test] Error 2

The command "cd tools && make openfst CC=gcc-4.9 CXX=g++-4.9 CFLAGS="-g -march=native" CXXFLAGS="-g -march=native" -j2 && cd ../src && ./configure --use-cuda=no  --mathlib=OPENBLAS --openblas-root=/usr && make     -j2 CC=g++-4.9 CXX=g++-4.9 EXTRA_CXXFLAGS="-I ~/xroot/usr/include -g -march=native" EXTRA_LDLIBS="-llapack -L ~/xroot/usr/lib -L /usr/lib/openblas-base" && make test    CC=g++-4.9 CXX=g++-4.9 EXTRA_CXXFLAGS="-I ~/xroot/usr/include -g -march=native" EXTRA_LDLIBS="-llapack -L ~/xroot/usr/lib -L /usr/lib/openblas-base"" exited with 2.

Travis does not easily support uploading logs. After the build/test script completes, the worker VM and all files are irreversibly gone. So we need a way to print an error log of a failed build to the same console, which is captured. I am thinking of this:

Save test output to a known log file when test runs (e. g. foo-bar-test.testlog, matching the executable name
When a test succeeds, delete the log.
Print all available failure logs before exiting the test driver script.

(3) should certainly be limited to running under CI, as some of the logs are enormously large.

I also think maybe (1) and (2) should not be gated by the same setting, so log output is available without rerunning a test when it fails locally.

What do you think?

Remove unnecessary null pointer checks

Extra null pointer checks are not needed in functions like the following.

Remove perturbed training code and scripts from nnet2 setup

This is a feature that didn't turn out to be that useful and which shouldn't be kept.
Marking this as a TODO (anyone can do this).

Building with unpatched openfst-1.5 installed in /usr

Even if I build kaldi with static openfst, it still tries to load libfstscript from /usr when used, for instance, with gst-kaldi-nnet2-online.
(gst-plugin-scanner:11140): GStreamer-WARNING **: Failed to load plugin './libgstkaldionline2.so': /usr/lib/libfstscript.so.1: undefined symbol: _ZN3fst22ShortestPathProperties

Any idea on how to fix (apart from uninstalling openfst from /usr)?

Building kaldi with CUDA 7.0

Makefile in src/cudamatrix/ doesn't check for CUDA 7.0. It shows an error while building as the compute capabilities of CUDA 7.0 and previous versions are different.
I changed the makefile to check for CUDA 7.0. I got no errors while building. Is it the right way to add CUDA 7.0? Could you please add support for CUDA 7.0?

onlinebin no longer compiled ?

Hi,

I know that online's methods are outdated and online2 replace it.
I just wanted to know if onlinebin is no longer compilated volontary (or not).

Thanks

Issues relating to RNNLM

Would you like to add more error handling for return values from functions like the following?

fopen ⇒ TrainModelThread
fscanf ⇒ ReadVocab

Irrecoverable ivector extractor/diag-ubm overwrites in nnet2/nnet3 ivector recipes.

In the current setup if a user re-runs the ivector part of the recipe the extractor/diag-ubm get overwritten. This makes the old models unusable as two consecutive runs are not guaranteed to result in the same extractor.

This issue has been created to track the progress of changes being made.

steps/online/nnet2/train_ivector_extractor.sh and steps/online/nnet2/train_diag_ubm.sh are being modified to preserve the old files in a .backup directory of the parent.
All the speed-perturbation recipe specific exp/nnet3/* files are going to be stored under a new parent exp/nnet3_sp, as the speed-perturbed models usually do not share the same ivector extractor, and there is a huge scope for bugs where the wrong ivectors can be used for decodes.

nnet3- update CachingOptimizingCompiler to store a configurable number of computations.

Would require adding a hashing object for class ComputationRequest, so we can do fast lookup in a hash.
Idea is to cache the n most recently accessed Computations. Probably a reasonable default is to cache around 20 computations.

nnet3 : inconsistency in the way --ivector arguments are defined

nnet3-compute and nnet3-get-egs define the --ivector argument differently.

nnet3-compute expects per-utterance vectors and nnet3-get-egs expects --ivectors to be feature matrices. Is there a reason for this inconsistency ?

TODO for 'chain' model: clusterable class needed for tree building

Something that's going to be useful for the 'chain' model (to support more accurate tree-building when the trees are at the whole-phone level instead of the context-dependent state level) is to have clustering on all 3 (or however many) states of the tree simultaneously. This will involve, firstly, in clusterable-classes.h, creating a clusterable type that could be called, say, GaussMixtureClusterable, which is like GaussClusterable but stores stats for several different Gaussians with different counts. It could be implemented by storing an array of GaussClusterable locally; the objective function would be the sum of objective functions over the individual GaussClusterable objects, and the count would be the sum of the counts.
There's no need to insist on knowing the size of the array beforehand; stats will be added with the 'Gaussian index' (which will really be the pdf-id), and any indexes that don't have any stats will be equivalent to not having any stats at all.

After this class is written the code needs to be written to actually use it, but it may be best if I do that, as it's a little complicated.

latice-oracle.cc with --write-lattice segfaults on long inputs

The method GetOracleLattice crashes with a segfault most likely because of a stack overflow due to a too deep recursion. Obvious solution is to re-implement it iteratively.

Just putting it out there if anyone else gets a mysterious segfault in this program...

CUDA sparse-matrix class

Something needs to be coded for sparse-matrix support that's going to be quite important for the new 'chain' (post-CTC) stuff I am working on, and maybe some nearer-term things too (relating to multi-tree systems- we should be able to get a nice improvement).
We need to be able to efficiently do matrix multiplication by a sparse matrix. The current CuSparseMatrix class does not have a data format that is conducive to this.
in cu-sparse-matrix.h, I'd like a new class added, that stores its data per-row.

The class CuRowSparseMatrix will have members:

int32 num_rows_;
int32 elements_per_row_;
CuArray < std::pair<int32, Real> > data_; // pairs (column-index, value).
//dim = num_rows_ * elements_per_row_.

where elements_per_row_ will be like a stride; it will be the maximum number of nonzero elements per row; you'll pad with (-1, 0.0) in case some rows have fewer than the maximum number of elements.
You can pass the pointer to the data across the "C" interface as void*. It's messy but it avoids having to create a "C" equivalent to the C++ pair.

The key function that I want written (aside from the one to create the CuRowSparseMatrix from a SparseMatrix is the following function:

// Do *this = alpha * a * b + beta * *this.
// Only supports the case where trans_b = kTrans; will crash if trans_b = kNoTrans.
void CuMatrixBase::AddMatSmat(Real alpha, CuMatrixBase &a, MatrixTransposeType trans_a,
CuRowSparseMatrix &b, MatrixTransposeType trans_b, Real beta);

The kernel will do the summation over the elements_per_row_ elements using a simple for loop. You can assume that elements_per_row_ is fairly small, otherwise we wouldn't be using this.

Obviously there needs to be test code for this.

Add multiple parallel AddMatMat,

@freewym, this is for you, although @naxingyu may have an interest in it too.

Please look over #47 to understand the background for this (Convolutional component). That pull request is for nnet2, but there is a similar set of code in nnet1, with a separate pull request (you can search for that). The original reason we wanted to upgrade to the CuBLAS v2 API is because of parallel matrix multiplication not being available in the v1 API. Now that you've (nearly) finished that task, you can help us add this batched matrix multiplication.

The current AddMatMat has signature
void Matrix::AddMatMat(const Real alpha,
const MatrixBase& A, MatrixTransposeType transA,
const MatrixBase& B, MatrixTransposeType transB,
const Real beta);
I'd like you to add a batched AddMatMat function that is a wrapper for cuBLAS's gemmBatched function. This will later be used in the convolutional component. Of course this will require test code.
The function signature and documentation should be the following (although I won't have the
whitespace correct as I am composing this with non-fixed-width font).
/**
@brief This function executes multiple matrix multiplications, executing them in parallel
using cuBLAS's gemmBatched if we are using a GPU. Vectors a, b and c
must have the same length; for each i, this function executes the matrix operation
c[i] = alpha a[i] b[i] + beta c[i].

  @param [in] alpha   The constant alpha in the equation "c[i] = alpha a[i] b[i] + beta c[i]."
   @param [in] c        A vector of pointers to matrices; all elements must have the same
                                 num-rows, num-cols and stride.  The matrices must point to distinct 
                                regions of GPU memory, or results are undefined.  Ownership of
                                 pointers is retained by the caller.
   @param [in] a        A vector of pointers to matrices; all elements must have the same
                                 num-rows, num-cols and stride.  Ownership of pointers is retained
                                 by the caller
    @param [in] trans_a   Indicates whether we should use the transpose of a[i] in the equation
                                If trans_a == kTrans, transpose(a[i]) appears in place of a[i].
   @param [in] B        A vector of pointers to matrices; all elements must have the same
                                 num-rows, num-cols and stride.  Ownership of pointers is retained
                                 by the caller
    @param [in] trans_b   Indicates whether we should use the transpose of b[i] in the equation
                                If trans_b == kTrans, transpose(b[i]) appears in place of b[i].
    @param [in] beta   The constant beta in the equation "c[i] = alpha a[i] b[i] + beta c[i]."
*/
template <class Real>
void AddMatMatBatched(const Real alpha,
                                        const std::vector<CuSubMatrix<Real>* > &C,
                                      const std::vector<const CuSubMatrix<Real>* > &A,
                                     MatrixTransposeType trans_a,
                                      const std::vector<const CuSubMatrix<Real>* > &B,
                                     MatrixTransposeType trans_a,
                                    const Real beta);

Note: we have to pass vectors of pointers, although it is inconvenient from a memory management perspective, because CuSubMatrix doesn't have an operator=, so we can't easily create a vector of CuSubMatrix directly. Also, we prefer to pass CuMatrixBase in situations like these, but it would create difficulties when deleting the memory (since an abstract base class can't be deleted unless it has a virtual destructor). It's OK; we can always create a CuSubMatrix that's identical to any given matrix.

Please make sure your test code does not have memory leaks; you can run valgrind or cuda-memtest on it.
Also, if you could add some speed-tests to make it possible to see whether the batched matrix-multiplication is helpful for various matrix sizes, that would also be very helpful.

po->opts

In parts of the code where type OptionsItf is used, for historical reasons the variable name is typically "po" (for ParseOptions). I think this is confusing and should be changed. If someone could find time to do this, that would be helpful. Do it by script if you want.

SRE10 v2 Improvements

This issue (for david-ryan-snyder) tracks some improvements to the DNN-based SRE10 example (egs/sre10/v2)

Remove num-components option from init_full_ubm_from_dnn.sh. It should be worked out from nnet-am-info.
Change full-gmm-init-from-stats.cc so that it uses MleFullGmmUpdate() do the estimation. This has things like variance flooring, and limits on the condition number, to make sure that there are never problems inverting.
Provide GPU support for the DNN posterior computation in init_full_ubm_from_dnn.sh, train_ivector_extractor_dnn.sh and extract_ivectors_dnn.sh. The binary nnet-am-compute already supports GPU usage, but the bash scripts have not been updated.

can't build on mac os x

The output of running make in the src directory follows below. This particular attempt was configured with ./configure --use-cuda=no --openblas-root=/usr/local/opt/openblas, but different configurations behave the same way (always fail to build kaldi-lat.a). tools/ builds fine.

/Applications/Xcode.app/Contents/Developer/usr/bin/make -C lat
Makefile:29: warning: overriding commands for target `kaldi-lat.a'
../makefiles/default_rules.mk:35: warning: ignoring old commands for target `kaldi-lat.a'
g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_CLAPACK -I/Users/echonest/repos/kaldi/tools/openfst/include -Wno-sign-compare -g  -Wno-mismatched-tags -stdlib=libstdc++   -c -o kaldi-lattice.o kaldi-lattice.cc
g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_CLAPACK -I/Users/echonest/repos/kaldi/tools/openfst/include -Wno-sign-compare -g  -Wno-mismatched-tags -stdlib=libstdc++   -c -o lattice-functions.o lattice-functions.cc
g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_CLAPACK -I/Users/echonest/repos/kaldi/tools/openfst/include -Wno-sign-compare -g  -Wno-mismatched-tags -stdlib=libstdc++   -c -o word-align-lattice.o word-align-lattice.cc
g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_CLAPACK -I/Users/echonest/repos/kaldi/tools/openfst/include -Wno-sign-compare -g  -Wno-mismatched-tags -stdlib=libstdc++   -c -o phone-align-lattice.o phone-align-lattice.cc
g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_CLAPACK -I/Users/echonest/repos/kaldi/tools/openfst/include -Wno-sign-compare -g  -Wno-mismatched-tags -stdlib=libstdc++   -c -o word-align-lattice-lexicon.o word-align-lattice-lexicon.cc
g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_CLAPACK -I/Users/echonest/repos/kaldi/tools/openfst/include -Wno-sign-compare -g  -Wno-mismatched-tags -stdlib=libstdc++   -c -o sausages.o sausages.cc
g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_CLAPACK -I/Users/echonest/repos/kaldi/tools/openfst/include -Wno-sign-compare -g  -Wno-mismatched-tags -stdlib=libstdc++   -c -o push-lattice.o push-lattice.cc
g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_CLAPACK -I/Users/echonest/repos/kaldi/tools/openfst/include -Wno-sign-compare -g  -Wno-mismatched-tags -stdlib=libstdc++   -c -o minimize-lattice.o minimize-lattice.cc
g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_CLAPACK -I/Users/echonest/repos/kaldi/tools/openfst/include -Wno-sign-compare -g  -Wno-mismatched-tags -stdlib=libstdc++   -c -o determinize-lattice-pruned.o determinize-lattice-pruned.cc
g++ -msse -msse2 -Wall -I.. -pthread -DKALDI_DOUBLEPRECISION=0 -DHAVE_POSIX_MEMALIGN -Wno-sign-compare -Winit-self -DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_CLAPACK -I/Users/echonest/repos/kaldi/tools/openfst/include -Wno-sign-compare -g  -Wno-mismatched-tags -stdlib=libstdc++   -c -o confidence.o confidence.cc
ar -d kaldi-lat.a kws-functions.o
ar: kaldi-lat.a: No such file or directory
make[1]: *** [kaldi-lat.a] Error 1
make: *** [lat] Error 2

Functionality needed for sparse-matrix in nnet3

@chenguoguo - doing this via issue request rather than email so we can track it better.
Could you add a couple of sparse-matrix things, please? I need these for the nnet3 outer training code- they should be done for both CUDA and CPU based versions.
First, methods
Real SparseMatrix::Sum() const;
and
Real SparseVector::Sum() const;
[we won't likely be using the SparseVector::Sum() in the CUDA version, so you don't have to implement that].
Secondly, you'll see a function declaration

template
Real TraceMatSmat(const CuMatrixBase &A,
const CuSparseMatrix &B,
MatrixTransposeType trans = kNoTrans);
in cu-sparse-matrix.h. I'm going to need this implemented for the nnet3 training. It's the kTrans case that I'll actually need (of course, implementing both is better if you have time).
Dan

nnet3- optimization method to add

I am creating an issue to keep track of something that needs to be done fairly long-term in nnet3.
Note: documentation for the nnet3 code exists at
http://www.danielpovey.com/kaldi-docs/
(this is like what's at kaldi-asr.org/doc/, but corresponds to the nnet3 branch).
What's below is an email. I was originally hoping Karel would agree to do this, but he is too busy.
The task is quite complicated. I may have to do this myself in the end.

[what follows is an email]

OK, so there is a change of plan on this. Since the thing that needs
to be done now is a slightly bigger piece, I'm hoping that Karel will
agree to do it.
Karel, let me explain what the issue is.
If you download the sandbox/nnet3 branch and run, in src/nnet3/, the
program nnet-optimize-test, you'll see the following output near the
end.

c127: m64 += m62
c128: m6.AddRows(m64[-1x7, 0, -1x7, 1, -1x7, 2, -1x7, 3, -1x7, 4,
-1x7, 5, -1x7, 6])
c129: recurrent_affine1.Backprop(NULL, m59, [], m62, &m60)
c130: m60 += m58
c131: nonlin1.Backprop(NULL, [], m57, m58, &m56)
c132: m56 += m54
c133: m6.AddRows(m56[-1x6, 0, -1x7, 1, -1x7, 2, -1x7, 3, -1x7, 4,
-1x7, 5, -1x7, 6, -1])
c134: recurrent_affine1.Backprop(NULL, m51, [], m54, &m52)
c135: m52 += m50
c136: nonlin1.Backprop(NULL, [], m49, m50, &m48)
c137: m48 += m46
c138: m6.AddRows(m48[-1x5, 0, -1x7, 1, -1x7, 2, -1x7, 3, -1x7, 4,
-1x7, 5, -1x7, 6, -1, -1])
c139: recurrent_affine1.Backprop(NULL, m43, [], m46, &m44)

This is some kind of RNN, and m6 is a matrix corresponding to some
component near the input.
The command
c128: m6.AddRows(m64[-1x7, 0, -1x7, 1, -1x7, 2, -1x7, 3, -1x7, 4,
-1x7, 5, -1x7, 6])
is calling an AddRows function, and inside [ ]is the vector of indexes
. They have been pretty-printed, and -1x7 means -1 repeated 7 times.
The -1's in the vector mean, "do nothing for that row". The 0, 1, 2
through 6 mean, for those places where they appear, copy that row of
matrix m64 to a row of matrix m6. Now the problem here is that we are
invoking way too many CUDA kernels, most of which are doing nothing
because the argument is -1. What the thread below was about was, I
was asking Guoguo to add an AddToRows() function where m64 would be
the "this", and the vector argument would say which row of m6 to add
to. (we'd assume the indexes were unique). Guoguo pointed out that
we could use the AddToRows() function that takes pointer arguments,
but I hadn't wanted to do this because we'd need to transfer the
pointers to the GPU for each minibatch (since while the indexes don't
change, in general we reallocate the matrices each time, so the
addresses change). However, I realized that there is a better way to
do this. I'd like to ask you to do it because it's a little bit
tricky to get right, and this will be an opportunity for you to get
involved in nnet3.

The way I think it should be done is to add an optimization method
that detects situations where the same matrix (m6 in this case) is
subject to multiple repeated AddRows calls with nothing else in
between. Please try to understand what the stuff in nnet-analyze is
doing. You could first, by accessing all the Commands, work out for
each submatrix how many AddRows calls it has (as the *this), and then
for each submatrix that has multiple AddRows calls, detect ranges of
AddRows calls such that no other reads or writes of that submatrix
occur within that range of commands. (You'd have to iterate over the
Variables for that submatrix to do this, although there would normally
be just one). Then you would attempt to consolidate all of those
AddRows calls into one (or a few)- the command index could be the
latest of the command indexes of all the individual AddRows calls that
you are removing. You would likely want to first consolidate all the
AddRows calls into a single submat_locations_list [search for that in
nnet-compile.cc] and then use existing code from nnet-compile.cc to
turn that into either one command, or a list of commands. [Obviously
if it ends up generating as many commands as we started with, we're
not gaining anything and you'd want to abandon the attempt at
optimization.] You may need to move some functions from
nnet-compile.cc into nnet-compile-utils.{h,cc} to make them available
to the nnet-optimize code.

nnet3 build fails with ./configure --shared

i am trying to build with the --shared setting per the instructions in src/gst-plugin/README. on both my machines, the build fails in the nnet3 component:

make -C nnet3 
make[1]: Entering directory `/home/john/ws/git/kaldi/src/nnet3'
ar -cru kaldi-nnet3.a nnet-common.o nnet-compile.o nnet-component-itf.o nnet-simple-component.o nnet-general-component.o nnet-parse.o natural-gradient-online.o nnet-descriptor.o nnet-optimize.o nnet-computation.o nnet-computation-graph.o nnet-graph.o am-nnet-simple.o nnet-example.o nnet-nnet.o nnet-compile-utils.o nnet-nnet.o nnet-utils.o nnet-compute.o nnet-test-utils.o nnet-analyze.o nnet-compute.o nnet-example-utils.o nnet-training.o nnet-diagnostics.o nnet-combine.o nnet-am-decodable-simple.o
ranlib kaldi-nnet3.a
# Building shared library from static (static was compiled with -fPIC)
g++ -shared -o libkaldi-nnet3.so -Wl,--no-undefined -Wl,--as-needed  -Wl,-soname=libkaldi-nnet3.so,--whole-archive kaldi-nnet3.a -Wl,--no-whole-archive  -rdynamic -Wl,-rpath=/home/john/ws/git/kaldi/tools/openfst/lib -L/usr/local/cuda/lib64 -Wl,-rpath,/usr/local/cuda/lib64 -Wl,-rpath=/home/john/ws/git/kaldi/src/lib -L.  -L../thread/   -L../lat/   -L../gmm/   -L../hmm/   -L../tree/   -L../transform/   -L../cudamatrix/   -L../matrix/   -L../base/   -L../util/   ../thread//libkaldi-thread.so   ../lat//libkaldi-lat.so   ../gmm//libkaldi-gmm.so   ../hmm//libkaldi-hmm.so   ../tree//libkaldi-tree.so   ../transform//libkaldi-transform.so   ../cudamatrix//libkaldi-cudamatrix.so   ../matrix//libkaldi-matrix.so   ../base//libkaldi-base.so   ../util//libkaldi-util.so   -L/home/john/ws/git/kaldi/tools/openfst/lib -lfst /usr/lib/libatlas.so.3 /usr/lib/libf77blas.so.3 /usr/lib/libcblas.so.3 /usr/lib/liblapack_atlas.so.3 -lm -lpthread -ldl -lcublas -lcudart   -lkaldi-thread   -lkaldi-lat   -lkaldi-gmm   -lkaldi-hmm   -lkaldi-tree   -lkaldi-transform   -lkaldi-cudamatrix   -lkaldi-matrix   -lkaldi-base   -lkaldi-util 
kaldi-nnet3.a(nnet-nnet.o): In function `KaldiCompileTimeAssert<true>::Check()':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:30: multiple definition of `kaldi::nnet3::NetworkNode::Dim(kaldi::nnet3::Nnet const&) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:30: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::GetComponent(int) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:131: multiple definition of `kaldi::nnet3::Nnet::GetComponent(int) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:131: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::GetNodeNames() const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:49: multiple definition of `kaldi::nnet3::Nnet::GetNodeNames() const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:49: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::GetComponentNames() const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:53: multiple definition of `kaldi::nnet3::Nnet::GetComponentNames() const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:53: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `std::tr1::_Hashtable<std::string, std::string, std::allocator<std::string>, std::_Identity<std::string>, std::equal_to<std::string>, kaldi::StringHasher, std::tr1::__detail::_Mod_range_hashing, std::tr1::__detail::_Default_ranged_hash, std::tr1::__detail::_Prime_rehash_policy, false, true, true>::_M_insert_bucket(std::string const&, unsigned long, unsigned long)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:57: multiple definition of `kaldi::nnet3::Nnet::GetAsConfigLine(int, bool) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:57: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::IsOutputNode(int) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:98: multiple definition of `kaldi::nnet3::Nnet::IsOutputNode(int) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:98: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::IsInputNode(int) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:106: multiple definition of `kaldi::nnet3::Nnet::IsInputNode(int) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:106: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::IsDescriptorNode(int) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:112: multiple definition of `kaldi::nnet3::Nnet::IsDescriptorNode(int) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:112: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::IsComponentNode(int) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:118: multiple definition of `kaldi::nnet3::Nnet::IsComponentNode(int) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:118: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::IsDimRangeNode(int) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:124: multiple definition of `kaldi::nnet3::Nnet::IsDimRangeNode(int) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:124: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::GetComponent(int)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:136: multiple definition of `kaldi::nnet3::Nnet::GetComponent(int)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:136: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::IsComponentInputNode(int) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:143: multiple definition of `kaldi::nnet3::Nnet::IsComponentInputNode(int) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:143: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::GetConfigLines(bool, std::vector<std::string, std::allocator<std::string> >*) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:152: multiple definition of `kaldi::nnet3::Nnet::GetConfigLines(bool, std::vector<std::string, std::allocator<std::string> >*) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:152: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::ReadConfig(std::istream&)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:160: multiple definition of `kaldi::nnet3::Nnet::ReadConfig(std::istream&)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:160: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::RemoveRedundantConfigLines(int, std::vector<std::string, std::allocator<std::string> >*, std::vector<kaldi::nnet3::ConfigLine, std::allocator<kaldi::nnet3::ConfigLine> >*)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:464: multiple definition of `kaldi::nnet3::Nnet::RemoveRedundantConfigLines(int, std::vector<std::string, std::allocator<std::string> >*, std::vector<kaldi::nnet3::ConfigLine, std::allocator<kaldi::nnet3::ConfigLine> >*)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:464: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::ProcessComponentConfigLine(int, kaldi::nnet3::ConfigLine*)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:226: multiple definition of `kaldi::nnet3::Nnet::ProcessComponentConfigLine(int, kaldi::nnet3::ConfigLine*)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:226: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::ProcessComponentNodeConfigLine(int, kaldi::nnet3::ConfigLine*)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:265: multiple definition of `kaldi::nnet3::Nnet::ProcessComponentNodeConfigLine(int, kaldi::nnet3::ConfigLine*)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:265: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::ProcessInputNodeConfigLine(kaldi::nnet3::ConfigLine*)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:319: multiple definition of `kaldi::nnet3::Nnet::ProcessInputNodeConfigLine(kaldi::nnet3::ConfigLine*)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:319: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::ProcessOutputNodeConfigLine(int, kaldi::nnet3::ConfigLine*)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:346: multiple definition of `kaldi::nnet3::Nnet::ProcessOutputNodeConfigLine(int, kaldi::nnet3::ConfigLine*)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:346: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::ProcessDimRangeNodeConfigLine(int, kaldi::nnet3::ConfigLine*)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:398: multiple definition of `kaldi::nnet3::Nnet::ProcessDimRangeNodeConfigLine(int, kaldi::nnet3::ConfigLine*)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:398: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::Check() const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:670: multiple definition of `kaldi::nnet3::Nnet::Check() const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:670: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::GetComponentIndex(std::string const&) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:450: multiple definition of `kaldi::nnet3::Nnet::GetComponentIndex(std::string const&) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:450: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::GetNodeIndex(std::string const&) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:442: multiple definition of `kaldi::nnet3::Nnet::GetNodeIndex(std::string const&) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:442: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::GetSomeNodeNames(std::vector<std::string, std::allocator<std::string> >*) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:548: multiple definition of `kaldi::nnet3::Nnet::GetSomeNodeNames(std::vector<std::string, std::allocator<std::string> >*) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:548: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::NetworkNode::NetworkNode(kaldi::nnet3::NetworkNode const&)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:529: multiple definition of `kaldi::nnet3::NetworkNode::NetworkNode(kaldi::nnet3::NetworkNode const&)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:529: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::NetworkNode::NetworkNode(kaldi::nnet3::NetworkNode const&)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:529: multiple definition of `kaldi::nnet3::NetworkNode::NetworkNode(kaldi::nnet3::NetworkNode const&)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:529: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::Destroy()':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:538: multiple definition of `kaldi::nnet3::Nnet::Destroy()'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:538: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::Read(std::istream&, bool)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:563: multiple definition of `kaldi::nnet3::Nnet::Read(std::istream&, bool)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:563: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::Write(std::ostream&, bool) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:594: multiple definition of `kaldi::nnet3::Nnet::Write(std::ostream&, bool) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:594: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::Modulus() const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:622: multiple definition of `kaldi::nnet3::Nnet::Modulus() const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:622: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::InputDim(std::string const&) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:633: multiple definition of `kaldi::nnet3::Nnet::InputDim(std::string const&) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:633: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::OutputDim(std::string const&) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:641: multiple definition of `kaldi::nnet3::Nnet::OutputDim(std::string const&) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:641: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::GetNodeName(int) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:648: multiple definition of `kaldi::nnet3::Nnet::GetNodeName(int) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:648: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::GetComponentName(int) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:653: multiple definition of `kaldi::nnet3::Nnet::GetComponentName(int) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:653: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::GetComponentForNode(int) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:658: multiple definition of `kaldi::nnet3::Nnet::GetComponentForNode(int) const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:658: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::GetComponentForNode(int)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:664: multiple definition of `kaldi::nnet3::Nnet::GetComponentForNode(int)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:664: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::Nnet(kaldi::nnet3::Nnet const&)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:751: multiple definition of `kaldi::nnet3::Nnet::Nnet(kaldi::nnet3::Nnet const&)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:751: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::Nnet(kaldi::nnet3::Nnet const&)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:751: multiple definition of `kaldi::nnet3::Nnet::Nnet(kaldi::nnet3::Nnet const&)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:751: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::operator=(kaldi::nnet3::Nnet const&)':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:761: multiple definition of `kaldi::nnet3::Nnet::operator=(kaldi::nnet3::Nnet const&)'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:761: first defined here
kaldi-nnet3.a(nnet-nnet.o): In function `kaldi::nnet3::Nnet::Info() const':
/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:775: multiple definition of `kaldi::nnet3::Nnet::Info() const'
kaldi-nnet3.a(nnet-nnet.o):/home/john/ws/git/kaldi/src/nnet3/nnet-nnet.cc:775: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `__gnu_cxx::new_allocator<float>::new_allocator()':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:28: multiple definition of `kaldi::nnet3::NnetComputer::NnetComputer(kaldi::nnet3::NnetComputeOptions const&, kaldi::nnet3::NnetComputation const&, kaldi::nnet3::Nnet const&, kaldi::nnet3::Nnet*)'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:28: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `__gnu_cxx::new_allocator<float>::new_allocator()':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:28: multiple definition of `kaldi::nnet3::NnetComputer::NnetComputer(kaldi::nnet3::NnetComputeOptions const&, kaldi::nnet3::NnetComputation const&, kaldi::nnet3::Nnet const&, kaldi::nnet3::Nnet*)'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:28: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::DebugBeforeExecute(int, kaldi::nnet3::NnetComputer::CommandDebugInfo*)':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:53: multiple definition of `kaldi::nnet3::NnetComputer::DebugBeforeExecute(int, kaldi::nnet3::NnetComputer::CommandDebugInfo*)'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:53: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::GetSubMatrix(int)':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:238: multiple definition of `kaldi::nnet3::NnetComputer::GetSubMatrix(int)'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:238: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::DebugAfterExecute(int, kaldi::nnet3::NnetComputer::CommandDebugInfo const&)':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:81: multiple definition of `kaldi::nnet3::NnetComputer::DebugAfterExecute(int, kaldi::nnet3::NnetComputer::CommandDebugInfo const&)'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:81: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::ExecuteCommand(int)':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:115: multiple definition of `kaldi::nnet3::NnetComputer::ExecuteCommand(int)'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:115: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::GetPointers(int, int, kaldi::CuArray<float const*>*)':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:289: multiple definition of `kaldi::nnet3::NnetComputer::GetPointers(int, int, kaldi::CuArray<float const*>*)'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:289: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::GetPointers(int, int, kaldi::CuArray<float*>*)':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:250: multiple definition of `kaldi::nnet3::NnetComputer::GetPointers(int, int, kaldi::CuArray<float*>*)'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:250: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::Forward()':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:294: multiple definition of `kaldi::nnet3::NnetComputer::Forward()'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:294: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::CheckInputs(bool) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:432: multiple definition of `kaldi::nnet3::NnetComputer::CheckInputs(bool) const'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:432: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::Backward()':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:312: multiple definition of `kaldi::nnet3::NnetComputer::Backward()'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:312: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::AcceptInput(std::string const&, kaldi::CuMatrix<float>*)':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:329: multiple definition of `kaldi::nnet3::NnetComputer::AcceptInput(std::string const&, kaldi::CuMatrix<float>*)'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:329: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::GetMatrixIndex(std::string const&, bool, bool) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:399: multiple definition of `kaldi::nnet3::NnetComputer::GetMatrixIndex(std::string const&, bool, bool) const'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:399: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::GetInputDeriv(std::string const&) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:349: multiple definition of `kaldi::nnet3::NnetComputer::GetInputDeriv(std::string const&) const'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:349: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::GetOutput(std::string const&) const':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:359: multiple definition of `kaldi::nnet3::NnetComputer::GetOutput(std::string const&) const'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:359: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::GetOutputDestructive(std::string const&, kaldi::CuMatrix<float>*)':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:369: multiple definition of `kaldi::nnet3::NnetComputer::GetOutputDestructive(std::string const&, kaldi::CuMatrix<float>*)'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:369: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::AcceptOutputDeriv(std::string const&, kaldi::CuMatrix<float>*)':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:380: multiple definition of `kaldi::nnet3::NnetComputer::AcceptOutputDeriv(std::string const&, kaldi::CuMatrix<float>*)'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:380: first defined here
kaldi-nnet3.a(nnet-compute.o): In function `kaldi::nnet3::NnetComputer::AcceptInputs(kaldi::nnet3::Nnet const&, kaldi::nnet3::NnetExample const&)':
/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:459: multiple definition of `kaldi::nnet3::NnetComputer::AcceptInputs(kaldi::nnet3::Nnet const&, kaldi::nnet3::NnetExample const&)'
kaldi-nnet3.a(nnet-compute.o):/home/john/ws/git/kaldi/src/nnet3/nnet-compute.cc:459: first defined here
collect2: error: ld returned 1 exit status
make[1]: *** [libkaldi-nnet3.so] Error 1
make[1]: Leaving directory `/home/john/ws/git/kaldi/src/nnet3'
make: *** [nnet3] Error 2

Srilm using relative path

In many egs scripts, srilm binary examiner is using relative path, such as:
egs/babel/s5c/local/train_lms_srilm.sh: sdir=pwd/../../../tools/srilm/bin/i686-m64
egs/swbd/s5c/local/swbd1_train_lms.sh: sdir=pwd/../../../tools/srilm/bin/i686-m64

In our environment, users of Kaldi are encouraged to use a share KALDI_ROOT, instead of building their own version if they don't intend to change src code. Such scripts cause errors. So should it be an absolute path? aka:
egs/babel/s5c/local/train_lms_srilm.sh: sdir=$KALDI_ROOT/tools/srilm/bin/i686-m64
egs/swbd/s5c/local/swbd1_train_lms.sh: sdir=$KALDI_ROOT/tools/srilm/bin/i686-m64

It should work in most settings.