microsoft / lightlda Goto Github PK

Scalable, fast, and lightweight system for large-scale topic modeling

License: MIT License

Makefile 1.53% Shell 1.38% Python 1.68% C++ 95.41%

lightlda's Introduction

LightLDA

LightLDA is a distributed system for large scale topic modeling. It implements a distributed sampler that enables very large data sizes and models. LightLDA improves sampling throughput and convergence speed via a fast O(1) metropolis-Hastings algorithm, and allows small cluster to tackle very large data and model sizes through model scheduling and data parallelism architecture. LightLDA is implemented with C++ for performance consideration.

We have sucessfully trained big topic models (with trillions of parameters) on big data (Top 10% PageRank values of Bing indexed page, containing billions of documents) in Microsoft. For more technical details, please refer to our WWW'15 paper.

For documents, please view our website http://www.dmtk.io.

Why LightLDA

The highlight features of LightLDA are

Scalable: LightLDA can train models with trillions of parameters on big data with billions of documents, a scale previous implementations cann't handle.
Fast: The sampler can sample millions of tokens per second per multi-core node.
Lightweight: Such big tasks can be trained with as few as tens of machines.

Quick Start

Run $ sh build.sh to build lightlda. Run $ sh example/nytimes.sh for a simple example.

Reference

Please cite LightLDA if it helps in your research:

@inproceedings{yuan2015lightlda,
  title={LightLDA: Big Topic Models on Modest Computer Clusters},
  author={Yuan, Jinhui and Gao, Fei and Ho, Qirong and Dai, Wei and Wei, Jinliang and Zheng, Xun and Xing, Eric Po and Liu, Tie-Yan and Ma, Wei-Ying},
  booktitle={Proceedings of the 24th International Conference on World Wide Web},
  pages={1351--1361},
  year={2015},
  organization={International World Wide Web Conferences Steering Committee}
}

Microsoft Open Source Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

lightlda's People

Contributors

Stargazers

Watchers

Forkers

bobye coder3344 nlpgz jiaxibei2008 hengqujushi zclfly limeng05 xmzhao huangpingchun xuanhan863 nexcafe sixijinling luojiahuli lu839684437 abioy wang-xinhong lenovor jackwangcumt sunbjt zhengsx forschnix jessilee yiylin yenchih molodiuc strategist922 jayhetee fandywang caomw bertrama cykustcc lihuibng alphastaxllc egbutter leigaosearch schevalier platoonpluto 2014hci zhimingz wanghanbin rockflying ezhangle jdnull deeplearningsprint arthur503 paulnaoki nagyistoce hiyijian zhoujialinmumu amesianx kingsi beangu marszhuo lelouchbing v-italy littlepan0413 zzhgit bobohuang technologiclee yonglehou rlugojr pandasasa gdanskamir yiiwood alexintown lqleeqee xiaoxinyi fooway chisingh romaryd njuhugn qwerfdsaplking jjjkaixin mindis sxfmol liuxialong cqgwin mldl riverdarda liangliangbupt dylan-fan alexchao2012 niuhuakang iwii0425 llayman lisendong qjay612 w0lker cutecha situgongyuan nicolaswinckler lvchakele xfdingscut appcoreopc ljing2007 xinchoubiology jinyu0310 mawbhkdg citysir rockwater

lightlda's Issues

Questions about the DMTK LightLDA

Hello,

first of all thank you for sharing the LightLDA code, which seems to me the best C++ open-source LDA for its Metropolis-Hasting sampler and its scalability.
I would like to integrate the LightLDA in my project, which require changing the IO from file to MongoDB. I have started to look in more detail into the code, and have some questions.

Any hint where I should first look for such a task? From my understanding, the input data handling should be done in the multiverso::lightlda::DataBlock, and also in multiverso::lightlda::DiskDataStream and/or multiverso::lightlda::MemoryDataStream. Is that correct?
About the multiverso::lightlda::DataBlock implementation, in https://github.com/Microsoft/LightLDA/blob/master/src/data_block.cpp line 98, I do not understand what the purpose of the Write() function.
Probably I missed one point, but as I understood it, the DataBlock::Write() function write into a temporary file, e.g. block.0.temp, the buffer which was filled by the DataBlock::Read(file) function, where file is the original input data block file, e.g. block.0. Then the block.0.temp is renamed to block.0 and thus overwrite the original input, i.e. block.0
Is this correct? If yes why do we need to do this?
Do you intend to refactor LightLDA to be compatible with the new multiverso API at some point?
In the WWW15 paper of LightLDA, it is said that LightLDA is built on top of the Petuum framework in SSP mode with parameter s=1, which seems to not be the case anymore, isn't it?
I have read somewhere else that Multiverso only support BSP and ASP mode. What is the case of the DMTK LightLDA, BSP or ASP?
What are the main difference between the petuum version described in the paper and the new one on the DMTK GitHub repository?

Thank you very much for your reply

How to collect loglikelihood from logs lightlda print out?

Hi, what is the relation between block and slice?
When computing the doc loglikelihood, we just concern slice == 0 in each worker. Is this means we only computing the doc in slice 0, ignore the other slice. Shall we just compute partly doc in this data block?
When computing word loglikelihood, we set block == 0. Is this means we computing all the word in this block, but ignore the other blocks.
And When computing normlized loglikelihood, we use TrainerId() == 0 && block == 0 condition. Here also ingnore the other block.

In workers, all slices in every block may be executed loglikelihood under upper condition setting, and print computing loglikelihood.

So, How should me collect the corpus' doc loglikelihood, work loglikelihood and total loglikelihood?

doc topic distributed

I see the result is server_0_table_0.model and server_0_table_1.model,
server_0_table_0.model is the distributed of topic by terms.
but server_0_table_1.model only has one line.
could i get the distributed of topic by every docid?

build failed on ubuntu14.04 --ubgpu@ubgpu:~/github/DMTK.io/lightlda$ sudo make -j4

g++ -O3 -std=c++11 -Wall -Wno-sign-compare -fno-omit-frame-pointer /home/ubgpu/github/DMTK.io/lightlda/preprocess/dump_binary.cpp -o /home/ubgpu/github/DMTK.io/lightlda/bin/dump_binary
/home/ubgpu/github/DMTK.io/lightlda/src/data_stream.cpp: In constructor ‘multiverso::lightlda::DiskDataStream::DiskDataStream(int32_t, std::string, int32_t)’:
/home/ubgpu/github/DMTK.io/lightlda/src/data_stream.cpp:54:21: warning: ‘multiverso::lightlda::DiskDataStream::data_path_’ will be initialized after [-Wreorder]
std::string data_path_;
^
/home/ubgpu/github/DMTK.io/lightlda/src/data_stream.cpp:52:17: warning: ‘int32_t multiverso::lightlda::DiskDataStream::num_iterations_’ [-Wreorder]
int32_t num_iterations_;
^
/home/ubgpu/github/DMTK.io/lightlda/src/data_stream.cpp:98:5: warning: when initialized here [-Wreorder]
DiskDataStream::DiskDataStream(int32_t num_blocks,
^
g++ /home/ubgpu/github/DMTK.io/lightlda/src/lightlda.o /home/ubgpu/github/DMTK.io/lightlda/src/alias_table.o /home/ubgpu/github/DMTK.io/lightlda/src/meta.o /home/ubgpu/github/DMTK.io/lightlda/src/sampler.o /home/ubgpu/github/DMTK.io/lightlda/src/eval.o /home/ubgpu/github/DMTK.io/lightlda/src/common.o /home/ubgpu/github/DMTK.io/lightlda/src/trainer.o /home/ubgpu/github/DMTK.io/lightlda/src/data_block.o /home/ubgpu/github/DMTK.io/lightlda/src/document.o /home/ubgpu/github/DMTK.io/lightlda/src/data_stream.o -O3 -std=c++11 -Wall -Wno-sign-compare -fno-omit-frame-pointer -I/home/ubgpu/github/DMTK.io/lightlda/multiverso/include -L/home/ubgpu/github/DMTK.io/lightlda/multiverso/lib -lmultiverso -L/home/ubgpu/github/DMTK.io/lightlda/multiverso/third_party/lib -lzmq -lmpich -lmpl -o /home/ubgpu/github/DMTK.io/lightlda/bin/lightlda
/usr/bin/ld: /home/ubgpu/github/DMTK.io/lightlda/multiverso/third_party/lib/libmpich.a(ch3_win_fns.o): undefined reference to symbol 'pthread_mutexattr_init@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make: *** [/home/ubgpu/github/DMTK.io/lightlda/bin/lightlda] Error 1
make: *** Waiting for unfinished jobs....

not support inference on windows

I use LightLDA to infer doc-topic distribution on windows, but it doesn't work.
I find the windows part code for inference is not implemented. So, would you have a plan to add windows inference function.

https://github.com/Microsoft/lightLDA/blob/master/src/model.cpp#L3
https://github.com/Microsoft/lightLDA/blob/master/src/model.cpp#L50

The effect of block number

Hi, the default block number in code is 1, is the block number intensively related with parallel factor?
I've 5000 0000 docs to cluster. In that case, how many document should be splitted in each block?
Thanks.

git clone -b multiverso-initial failed!

andy@Andy-UB1604:~/prj/LightLDA$ git clone -b multiverso-initial [email protected]:Microsoft/multiverso.git
Cloning into 'multiverso'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
andy@Andy-UB1604:~/prj/LightLDA$

Evaluation questions

should add "TrainerId() == 0" when logging ”doc likelihood“ and “word likelihood“?

can not run after compile

error while loading shared libraries: libzmq.so.5: cannot open shared object file: No such file or directory

my libzmq is at /usr/local/lib64/

Likelihood not converging

Hi,
I am trying to run one of the examples (nytimes.sh). I have modified only the number of local workers (set to 6) and I launched the script out of the box.

It works, but the likelihood does not seem to converge like in the WWW'15 paper. Doc likelihood seems to be fine, word likelihood is positive and it increases, while Normalized likelihood does not move so much. Here an extract from the log:

[INFO] [2015-11-26 18:55:17] doc likelihood : -6.422044e+08
[INFO] [2015-11-26 18:55:18] word likelihood : 5.328907e+08
[INFO] [2015-11-26 18:55:18] Normalized likelihood : -1.561648e+09
[INFO] [2015-11-26 18:58:36] doc likelihood : -6.044065e+08
[INFO] [2015-11-26 18:58:37] word likelihood : 5.660771e+08
[INFO] [2015-11-26 18:58:37] Normalized likelihood : -1.562606e+09
[INFO] [2015-11-26 19:01:52] doc likelihood : -5.938570e+08
[INFO] [2015-11-26 19:01:53] word likelihood : 5.854258e+08
[INFO] [2015-11-26 19:01:53] Normalized likelihood : -1.564632e+09
[INFO] [2015-11-26 19:05:00] doc likelihood : -5.767895e+08
[INFO] [2015-11-26 19:05:01] word likelihood : 6.065751e+08
[INFO] [2015-11-26 19:05:01] Normalized likelihood : -1.567653e+09
[INFO] [2015-11-26 19:08:03] doc likelihood : -5.527261e+08
[INFO] [2015-11-26 19:08:04] word likelihood : 6.345852e+08
[INFO] [2015-11-26 19:08:04] Normalized likelihood : -1.571537e+09
[INFO] [2015-11-26 19:10:59] doc likelihood : -5.192143e+08
[INFO] [2015-11-26 19:10:59] word likelihood : 6.688301e+08
[INFO] [2015-11-26 19:10:59] Normalized likelihood : -1.575203e+09
[INFO] [2015-11-26 19:13:52] doc likelihood : -4.839103e+08
[INFO] [2015-11-26 19:13:53] word likelihood : 7.142310e+08
[INFO] [2015-11-26 19:13:53] Normalized likelihood : -1.579249e+09
[INFO] [2015-11-26 19:16:27] doc likelihood : -4.594354e+08
[INFO] [2015-11-26 19:16:28] word likelihood : 7.576918e+08
[INFO] [2015-11-26 19:16:28] Normalized likelihood : -1.582640e+09
[INFO] [2015-11-26 19:18:54] doc likelihood : -4.451361e+08
[INFO] [2015-11-26 19:18:54] word likelihood : 7.904300e+08
[INFO] [2015-11-26 19:18:54] Normalized likelihood : -1.584283e+09
[INFO] [2015-11-26 19:21:17] doc likelihood : -4.367304e+08
[INFO] [2015-11-26 19:21:18] word likelihood : 8.138167e+08
[INFO] [2015-11-26 19:21:18] Normalized likelihood : -1.584595e+09
[INFO] [2015-11-26 19:23:40] doc likelihood : -4.314949e+08
[INFO] [2015-11-26 19:23:40] word likelihood : 8.308841e+08
[INFO] [2015-11-26 19:23:40] Normalized likelihood : -1.584257e+09
[INFO] [2015-11-26 19:26:01] doc likelihood : -4.280392e+08
[INFO] [2015-11-26 19:26:01] word likelihood : 8.438210e+08
[INFO] [2015-11-26 19:26:01] Normalized likelihood : -1.583670e+09
[INFO] [2015-11-26 19:28:15] doc likelihood : -4.257501e+08
[INFO] [2015-11-26 19:28:15] word likelihood : 8.539954e+08
[INFO] [2015-11-26 19:28:15] Normalized likelihood : -1.583021e+09
[INFO] [2015-11-26 19:30:33] doc likelihood : -4.240845e+08
[INFO] [2015-11-26 19:30:33] word likelihood : 8.621267e+08
[INFO] [2015-11-26 19:30:33] Normalized likelihood : -1.582389e+09
[INFO] [2015-11-26 19:32:51] doc likelihood : -4.229418e+08
[INFO] [2015-11-26 19:32:52] word likelihood : 8.688454e+08
[INFO] [2015-11-26 19:32:52] Normalized likelihood : -1.581791e+09
[INFO] [2015-11-26 19:35:02] doc likelihood : -4.220603e+08
[INFO] [2015-11-26 19:35:02] word likelihood : 8.744806e+08
[INFO] [2015-11-26 19:35:02] Normalized likelihood : -1.581272e+09
[INFO] [2015-11-26 19:37:16] doc likelihood : -4.215044e+08
[INFO] [2015-11-26 19:37:17] word likelihood : 8.793446e+08
[INFO] [2015-11-26 19:37:17] Normalized likelihood : -1.580807e+09
[INFO] [2015-11-26 19:39:31] doc likelihood : -4.210534e+08
[INFO] [2015-11-26 19:39:32] word likelihood : 8.835320e+08
[INFO] [2015-11-26 19:39:32] Normalized likelihood : -1.580384e+09
[INFO] [2015-11-26 19:41:46] doc likelihood : -4.206959e+08
[INFO] [2015-11-26 19:41:46] word likelihood : 8.872202e+08
[INFO] [2015-11-26 19:41:46] Normalized likelihood : -1.580009e+09
[INFO] [2015-11-26 19:43:53] doc likelihood : -4.205036e+08
[INFO] [2015-11-26 19:43:54] word likelihood : 8.904672e+08
[INFO] [2015-11-26 19:43:54] Normalized likelihood : -1.579677e+09

Here the full log:

[INFO] [2015-11-26 18:54:21] INFO: block = 0, the number of slice = 1
[INFO] [2015-11-26 18:54:21] Server 0 starts: num_workers=1 endpoint=inproc://server
[INFO] [2015-11-26 18:54:21] Server 0: Worker registratrion completed: workers=1 trainers=6 servers=1
[INFO] [2015-11-26 18:54:21] Rank 0/1: Multiverso initialized successfully.
[INFO] [2015-11-26 18:54:21] Rank 0/1: Begin of configuration and initialization.
[INFO] [2015-11-26 18:54:36] Rank 0/1: End of configration and initialization.
[INFO] [2015-11-26 18:54:36] Rank 0/1: Begin of training.
[DEBUG] [2015-11-26 18:54:36] Request params. start = 0, end = 101635
[INFO] [2015-11-26 18:54:38] Rank = 0, Iter = 0, Block = 0, Slice = 0
[INFO] [2015-11-26 18:54:38] Rank = 0, Alias Time used: 2.29 s 
[INFO] [2015-11-26 18:55:10] Rank = 0, Training Time used: 198.78 s 
[INFO] [2015-11-26 18:55:10] Rank = 0, sampling throughput: 83392.330586 (tokens/thread/sec) 
[INFO] [2015-11-26 18:55:17] doc likelihood : -6.422044e+08
[INFO] [2015-11-26 18:55:18] word likelihood : 5.328907e+08
[INFO] [2015-11-26 18:55:18] Normalized likelihood : -1.561648e+09
[INFO] [2015-11-26 18:55:18] Rank = 0, Evaluation Time used: 37.24 s 
[DEBUG] [2015-11-26 18:55:18] Request params. start = 0, end = 101635
[INFO] [2015-11-26 18:55:21] Rank = 0, Iter = 1, Block = 0, Slice = 0
[INFO] [2015-11-26 18:55:22] Rank = 0, Alias Time used: 2.57 s 
[INFO] [2015-11-26 18:55:50] Rank = 0, Training Time used: 172.52 s 
[INFO] [2015-11-26 18:55:50] Rank = 0, sampling throughput: 96088.618782 (tokens/thread/sec) 
[DEBUG] [2015-11-26 18:55:52] Request params. start = 0, end = 101635
[INFO] [2015-11-26 18:56:02] Rank = 0, Iter = 2, Block = 0, Slice = 0
[INFO] [2015-11-26 18:56:03] Rank = 0, Alias Time used: 3.10 s 
[INFO] [2015-11-26 18:56:32] Rank = 0, Training Time used: 177.80 s 
[INFO] [2015-11-26 18:56:32] Rank = 0, sampling throughput: 93233.830746 (tokens/thread/sec) 
[DEBUG] [2015-11-26 18:56:32] Request params. start = 0, end = 101635
[INFO] [2015-11-26 18:56:40] Rank = 0, Iter = 3, Block = 0, Slice = 0
[INFO] [2015-11-26 18:56:40] Rank = 0, Alias Time used: 2.44 s 
[INFO] [2015-11-26 18:57:10] Rank = 0, Training Time used: 174.81 s 
[INFO] [2015-11-26 18:57:10] Rank = 0, sampling throughput: 94764.054198 (tokens/thread/sec) 
[DEBUG] [2015-11-26 18:57:11] Request params. start = 0, end = 101635
[INFO] [2015-11-26 18:57:19] Rank = 0, Iter = 4, Block = 0, Slice = 0
[INFO] [2015-11-26 18:57:20] Rank = 0, Alias Time used: 2.61 s 
[INFO] [2015-11-26 18:57:52] Rank = 0, Training Time used: 182.20 s 
[INFO] [2015-11-26 18:57:52] Rank = 0, sampling throughput: 90941.209369 (tokens/thread/sec) 
[DEBUG] [2015-11-26 18:57:52] Request params. start = 0, end = 101635
[INFO] [2015-11-26 18:58:00] Rank = 0, Iter = 5, Block = 0, Slice = 0
[INFO] [2015-11-26 18:58:00] Rank = 0, Alias Time used: 2.44 s 
[INFO] [2015-11-26 18:58:29] Rank = 0, Training Time used: 176.67 s 
[INFO] [2015-11-26 18:58:29] Rank = 0, sampling throughput: 93767.307385 (tokens/thread/sec) 
[INFO] [2015-11-26 18:58:36] doc likelihood : -6.044065e+08
[INFO] [2015-11-26 18:58:37] word likelihood : 5.660771e+08
[INFO] [2015-11-26 18:58:37] Normalized likelihood : -1.562606e+09
[INFO] [2015-11-26 18:58:37] Rank = 0, Evaluation Time used: 40.60 s 
[DEBUG] [2015-11-26 18:58:37] Request params. start = 0, end = 101635
[INFO] [2015-11-26 18:58:41] Rank = 0, Iter = 6, Block = 0, Slice = 0
[INFO] [2015-11-26 18:58:41] Rank = 0, Alias Time used: 2.54 s 
[INFO] [2015-11-26 18:59:11] Rank = 0, Training Time used: 175.08 s 
[INFO] [2015-11-26 18:59:11] Rank = 0, sampling throughput: 94682.829457 (tokens/thread/sec) 
[DEBUG] [2015-11-26 18:59:11] Request params. start = 0, end = 101635
[INFO] [2015-11-26 18:59:21] Rank = 0, Iter = 7, Block = 0, Slice = 0
[INFO] [2015-11-26 18:59:22] Rank = 0, Alias Time used: 2.50 s 
[INFO] [2015-11-26 18:59:51] Rank = 0, Training Time used: 174.15 s 
[INFO] [2015-11-26 18:59:51] Rank = 0, sampling throughput: 95190.270547 (tokens/thread/sec) 
[DEBUG] [2015-11-26 18:59:52] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:00:00] Rank = 0, Iter = 8, Block = 0, Slice = 0
[INFO] [2015-11-26 19:00:00] Rank = 0, Alias Time used: 2.70 s 
[INFO] [2015-11-26 19:00:30] Rank = 0, Training Time used: 179.12 s 
[INFO] [2015-11-26 19:00:30] Rank = 0, sampling throughput: 92546.646382 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:00:30] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:00:39] Rank = 0, Iter = 9, Block = 0, Slice = 0
[INFO] [2015-11-26 19:00:39] Rank = 0, Alias Time used: 2.28 s 
[INFO] [2015-11-26 19:01:06] Rank = 0, Training Time used: 164.11 s 
[INFO] [2015-11-26 19:01:06] Rank = 0, sampling throughput: 101008.266256 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:01:10] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:01:15] Rank = 0, Iter = 10, Block = 0, Slice = 0
[INFO] [2015-11-26 19:01:16] Rank = 0, Alias Time used: 2.15 s 
[INFO] [2015-11-26 19:01:40] Rank = 0, Training Time used: 143.45 s 
[INFO] [2015-11-26 19:01:40] Rank = 0, sampling throughput: 115558.460082 (tokens/thread/sec) 
[INFO] [2015-11-26 19:01:52] doc likelihood : -5.938570e+08
[INFO] [2015-11-26 19:01:53] word likelihood : 5.854258e+08
[INFO] [2015-11-26 19:01:53] Normalized likelihood : -1.564632e+09
[INFO] [2015-11-26 19:01:53] Rank = 0, Evaluation Time used: 74.25 s 
[DEBUG] [2015-11-26 19:01:53] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:01:55] Rank = 0, Iter = 11, Block = 0, Slice = 0
[INFO] [2015-11-26 19:01:55] Rank = 0, Alias Time used: 2.05 s 
[INFO] [2015-11-26 19:02:25] Rank = 0, Training Time used: 176.92 s 
[INFO] [2015-11-26 19:02:25] Rank = 0, sampling throughput: 93695.741097 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:02:26] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:02:35] Rank = 0, Iter = 12, Block = 0, Slice = 0
[INFO] [2015-11-26 19:02:36] Rank = 0, Alias Time used: 2.25 s 
[INFO] [2015-11-26 19:03:05] Rank = 0, Training Time used: 174.86 s 
[INFO] [2015-11-26 19:03:05] Rank = 0, sampling throughput: 94802.995521 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:03:05] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:03:11] Rank = 0, Iter = 13, Block = 0, Slice = 0
[INFO] [2015-11-26 19:03:12] Rank = 0, Alias Time used: 2.10 s 
[INFO] [2015-11-26 19:03:40] Rank = 0, Training Time used: 170.69 s 
[INFO] [2015-11-26 19:03:40] Rank = 0, sampling throughput: 97116.254759 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:03:41] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:03:48] Rank = 0, Iter = 14, Block = 0, Slice = 0
[INFO] [2015-11-26 19:03:49] Rank = 0, Alias Time used: 2.44 s 
[INFO] [2015-11-26 19:04:16] Rank = 0, Training Time used: 165.70 s 
[INFO] [2015-11-26 19:04:16] Rank = 0, sampling throughput: 99987.432960 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:04:19] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:04:26] Rank = 0, Iter = 15, Block = 0, Slice = 0
[INFO] [2015-11-26 19:04:27] Rank = 0, Alias Time used: 2.42 s 
[INFO] [2015-11-26 19:04:55] Rank = 0, Training Time used: 169.51 s 
[INFO] [2015-11-26 19:04:55] Rank = 0, sampling throughput: 97792.991463 (tokens/thread/sec) 
[INFO] [2015-11-26 19:05:00] doc likelihood : -5.767895e+08
[INFO] [2015-11-26 19:05:01] word likelihood : 6.065751e+08
[INFO] [2015-11-26 19:05:01] Normalized likelihood : -1.567653e+09
[INFO] [2015-11-26 19:05:01] Rank = 0, Evaluation Time used: 34.50 s 
[DEBUG] [2015-11-26 19:05:01] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:05:07] Rank = 0, Iter = 16, Block = 0, Slice = 0
[INFO] [2015-11-26 19:05:08] Rank = 0, Alias Time used: 2.14 s 
[INFO] [2015-11-26 19:05:36] Rank = 0, Training Time used: 168.02 s 
[INFO] [2015-11-26 19:05:36] Rank = 0, sampling throughput: 98663.274813 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:05:38] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:05:45] Rank = 0, Iter = 17, Block = 0, Slice = 0
[INFO] [2015-11-26 19:05:46] Rank = 0, Alias Time used: 2.00 s 
[INFO] [2015-11-26 19:06:14] Rank = 0, Training Time used: 172.06 s 
[INFO] [2015-11-26 19:06:14] Rank = 0, sampling throughput: 96346.686169 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:06:14] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:06:22] Rank = 0, Iter = 18, Block = 0, Slice = 0
[INFO] [2015-11-26 19:06:22] Rank = 0, Alias Time used: 1.74 s 
[INFO] [2015-11-26 19:06:53] Rank = 0, Training Time used: 183.90 s 
[INFO] [2015-11-26 19:06:53] Rank = 0, sampling throughput: 90142.883567 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:06:53] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:06:55] Rank = 0, Iter = 19, Block = 0, Slice = 0
[INFO] [2015-11-26 19:06:56] Rank = 0, Alias Time used: 2.25 s 
[INFO] [2015-11-26 19:07:26] Rank = 0, Training Time used: 178.41 s 
[INFO] [2015-11-26 19:07:26] Rank = 0, sampling throughput: 92913.800145 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:07:26] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:07:29] Rank = 0, Iter = 20, Block = 0, Slice = 0
[INFO] [2015-11-26 19:07:29] Rank = 0, Alias Time used: 1.88 s 
[INFO] [2015-11-26 19:07:52] Rank = 0, Training Time used: 139.02 s 
[INFO] [2015-11-26 19:07:52] Rank = 0, sampling throughput: 119193.773232 (tokens/thread/sec) 
[INFO] [2015-11-26 19:08:03] doc likelihood : -5.527261e+08
[INFO] [2015-11-26 19:08:04] word likelihood : 6.345852e+08
[INFO] [2015-11-26 19:08:04] Normalized likelihood : -1.571537e+09
[INFO] [2015-11-26 19:08:04] Rank = 0, Evaluation Time used: 63.24 s 
[DEBUG] [2015-11-26 19:08:04] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:08:08] Rank = 0, Iter = 21, Block = 0, Slice = 0
[INFO] [2015-11-26 19:08:09] Rank = 0, Alias Time used: 1.93 s 
[INFO] [2015-11-26 19:08:36] Rank = 0, Training Time used: 165.63 s 
[INFO] [2015-11-26 19:08:36] Rank = 0, sampling throughput: 100086.078133 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:08:39] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:08:42] Rank = 0, Iter = 22, Block = 0, Slice = 0
[INFO] [2015-11-26 19:08:43] Rank = 0, Alias Time used: 2.00 s 
[INFO] [2015-11-26 19:09:11] Rank = 0, Training Time used: 168.89 s 
[INFO] [2015-11-26 19:09:11] Rank = 0, sampling throughput: 98154.747588 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:09:13] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:09:19] Rank = 0, Iter = 23, Block = 0, Slice = 0
[INFO] [2015-11-26 19:09:19] Rank = 0, Alias Time used: 1.59 s 
[INFO] [2015-11-26 19:09:45] Rank = 0, Training Time used: 157.25 s 
[INFO] [2015-11-26 19:09:45] Rank = 0, sampling throughput: 105420.495884 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:09:48] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:09:50] Rank = 0, Iter = 24, Block = 0, Slice = 0
[INFO] [2015-11-26 19:09:51] Rank = 0, Alias Time used: 1.74 s 
[INFO] [2015-11-26 19:10:18] Rank = 0, Training Time used: 161.92 s 
[INFO] [2015-11-26 19:10:18] Rank = 0, sampling throughput: 102374.893736 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:10:19] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:10:25] Rank = 0, Iter = 25, Block = 0, Slice = 0
[INFO] [2015-11-26 19:10:26] Rank = 0, Alias Time used: 1.84 s 
[INFO] [2015-11-26 19:10:52] Rank = 0, Training Time used: 157.57 s 
[INFO] [2015-11-26 19:10:52] Rank = 0, sampling throughput: 105202.086087 (tokens/thread/sec) 
[INFO] [2015-11-26 19:10:59] doc likelihood : -5.192143e+08
[INFO] [2015-11-26 19:10:59] word likelihood : 6.688301e+08
[INFO] [2015-11-26 19:10:59] Normalized likelihood : -1.575203e+09
[INFO] [2015-11-26 19:10:59] Rank = 0, Evaluation Time used: 41.75 s 
[DEBUG] [2015-11-26 19:11:00] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:11:01] Rank = 0, Iter = 26, Block = 0, Slice = 0
[INFO] [2015-11-26 19:11:01] Rank = 0, Alias Time used: 1.97 s 
[INFO] [2015-11-26 19:11:24] Rank = 0, Training Time used: 138.78 s 
[INFO] [2015-11-26 19:11:24] Rank = 0, sampling throughput: 119448.079115 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:11:29] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:11:36] Rank = 0, Iter = 27, Block = 0, Slice = 0
[INFO] [2015-11-26 19:11:37] Rank = 0, Alias Time used: 2.40 s 
[INFO] [2015-11-26 19:12:13] Rank = 0, Training Time used: 184.30 s 
[INFO] [2015-11-26 19:12:13] Rank = 0, sampling throughput: 89946.089102 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:12:15] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:12:17] Rank = 0, Iter = 28, Block = 0, Slice = 0
[INFO] [2015-11-26 19:12:17] Rank = 0, Alias Time used: 2.10 s 
[INFO] [2015-11-26 19:12:45] Rank = 0, Training Time used: 164.67 s 
[INFO] [2015-11-26 19:12:45] Rank = 0, sampling throughput: 100667.571074 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:12:45] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:12:48] Rank = 0, Iter = 29, Block = 0, Slice = 0
[INFO] [2015-11-26 19:12:49] Rank = 0, Alias Time used: 1.67 s 
[INFO] [2015-11-26 19:13:17] Rank = 0, Training Time used: 165.13 s 
[INFO] [2015-11-26 19:13:17] Rank = 0, sampling throughput: 100342.776178 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:13:19] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:13:19] Rank = 0, Iter = 30, Block = 0, Slice = 0
[INFO] [2015-11-26 19:13:20] Rank = 0, Alias Time used: 1.66 s 
[INFO] [2015-11-26 19:13:45] Rank = 0, Training Time used: 155.96 s 
[INFO] [2015-11-26 19:13:45] Rank = 0, sampling throughput: 106286.736391 (tokens/thread/sec) 
[INFO] [2015-11-26 19:13:52] doc likelihood : -4.839103e+08
[INFO] [2015-11-26 19:13:53] word likelihood : 7.142310e+08
[INFO] [2015-11-26 19:13:53] Normalized likelihood : -1.579249e+09
[INFO] [2015-11-26 19:13:53] Rank = 0, Evaluation Time used: 36.55 s 
[DEBUG] [2015-11-26 19:13:53] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:13:54] Rank = 0, Iter = 31, Block = 0, Slice = 0
[INFO] [2015-11-26 19:13:54] Rank = 0, Alias Time used: 1.67 s 
[INFO] [2015-11-26 19:14:20] Rank = 0, Training Time used: 154.22 s 
[INFO] [2015-11-26 19:14:20] Rank = 0, sampling throughput: 107488.826353 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:14:20] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:14:24] Rank = 0, Iter = 32, Block = 0, Slice = 0
[INFO] [2015-11-26 19:14:24] Rank = 0, Alias Time used: 1.60 s 
[INFO] [2015-11-26 19:14:52] Rank = 0, Training Time used: 156.94 s 
[INFO] [2015-11-26 19:14:52] Rank = 0, sampling throughput: 105623.138350 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:14:57] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:14:58] Rank = 0, Iter = 33, Block = 0, Slice = 0
[INFO] [2015-11-26 19:14:58] Rank = 0, Alias Time used: 1.72 s 
[INFO] [2015-11-26 19:15:22] Rank = 0, Training Time used: 142.36 s 
[INFO] [2015-11-26 19:15:22] Rank = 0, sampling throughput: 116446.783711 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:15:26] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:15:27] Rank = 0, Iter = 34, Block = 0, Slice = 0
[INFO] [2015-11-26 19:15:28] Rank = 0, Alias Time used: 1.49 s 
[INFO] [2015-11-26 19:15:52] Rank = 0, Training Time used: 146.35 s 
[INFO] [2015-11-26 19:15:52] Rank = 0, sampling throughput: 113235.252901 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:15:54] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:15:56] Rank = 0, Iter = 35, Block = 0, Slice = 0
[INFO] [2015-11-26 19:15:56] Rank = 0, Alias Time used: 1.76 s 
[INFO] [2015-11-26 19:16:21] Rank = 0, Training Time used: 150.38 s 
[INFO] [2015-11-26 19:16:21] Rank = 0, sampling throughput: 110232.356935 (tokens/thread/sec) 
[INFO] [2015-11-26 19:16:27] doc likelihood : -4.594354e+08
[INFO] [2015-11-26 19:16:28] word likelihood : 7.576918e+08
[INFO] [2015-11-26 19:16:28] Normalized likelihood : -1.582640e+09
[INFO] [2015-11-26 19:16:28] Rank = 0, Evaluation Time used: 34.83 s 
[DEBUG] [2015-11-26 19:16:28] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:16:28] Rank = 0, Iter = 36, Block = 0, Slice = 0
[INFO] [2015-11-26 19:16:29] Rank = 0, Alias Time used: 1.78 s 
[INFO] [2015-11-26 19:16:52] Rank = 0, Training Time used: 139.90 s 
[INFO] [2015-11-26 19:16:52] Rank = 0, sampling throughput: 118453.417531 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:16:55] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:16:58] Rank = 0, Iter = 37, Block = 0, Slice = 0
[INFO] [2015-11-26 19:16:58] Rank = 0, Alias Time used: 1.47 s 
[INFO] [2015-11-26 19:17:21] Rank = 0, Training Time used: 142.79 s 
[INFO] [2015-11-26 19:17:21] Rank = 0, sampling throughput: 116058.543214 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:17:24] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:17:25] Rank = 0, Iter = 38, Block = 0, Slice = 0
[INFO] [2015-11-26 19:17:25] Rank = 0, Alias Time used: 1.69 s 
[INFO] [2015-11-26 19:17:46] Rank = 0, Training Time used: 122.43 s 
[INFO] [2015-11-26 19:17:46] Rank = 0, sampling throughput: 135402.404929 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:17:52] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:17:55] Rank = 0, Iter = 39, Block = 0, Slice = 0
[INFO] [2015-11-26 19:17:55] Rank = 0, Alias Time used: 1.92 s 
[INFO] [2015-11-26 19:18:16] Rank = 0, Training Time used: 125.18 s 
[INFO] [2015-11-26 19:18:16] Rank = 0, sampling throughput: 132423.392824 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:18:21] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:18:23] Rank = 0, Iter = 40, Block = 0, Slice = 0
[INFO] [2015-11-26 19:18:23] Rank = 0, Alias Time used: 1.90 s 
[INFO] [2015-11-26 19:18:49] Rank = 0, Training Time used: 153.46 s 
[INFO] [2015-11-26 19:18:49] Rank = 0, sampling throughput: 108023.534886 (tokens/thread/sec) 
[INFO] [2015-11-26 19:18:54] doc likelihood : -4.451361e+08
[INFO] [2015-11-26 19:18:54] word likelihood : 7.904300e+08
[INFO] [2015-11-26 19:18:54] Normalized likelihood : -1.584283e+09
[INFO] [2015-11-26 19:18:54] Rank = 0, Evaluation Time used: 27.21 s 
[DEBUG] [2015-11-26 19:18:54] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:18:55] Rank = 0, Iter = 41, Block = 0, Slice = 0
[INFO] [2015-11-26 19:18:55] Rank = 0, Alias Time used: 1.83 s 
[INFO] [2015-11-26 19:19:22] Rank = 0, Training Time used: 155.21 s 
[INFO] [2015-11-26 19:19:22] Rank = 0, sampling throughput: 106805.034757 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:19:23] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:19:24] Rank = 0, Iter = 42, Block = 0, Slice = 0
[INFO] [2015-11-26 19:19:24] Rank = 0, Alias Time used: 1.72 s 
[INFO] [2015-11-26 19:19:51] Rank = 0, Training Time used: 158.81 s 
[INFO] [2015-11-26 19:19:51] Rank = 0, sampling throughput: 104381.090406 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:19:52] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:19:53] Rank = 0, Iter = 43, Block = 0, Slice = 0
[INFO] [2015-11-26 19:19:53] Rank = 0, Alias Time used: 1.88 s 
[INFO] [2015-11-26 19:20:17] Rank = 0, Training Time used: 133.73 s 
[INFO] [2015-11-26 19:20:17] Rank = 0, sampling throughput: 123955.008691 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:20:19] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:20:21] Rank = 0, Iter = 44, Block = 0, Slice = 0
[INFO] [2015-11-26 19:20:21] Rank = 0, Alias Time used: 1.76 s 
[INFO] [2015-11-26 19:20:45] Rank = 0, Training Time used: 142.14 s 
[INFO] [2015-11-26 19:20:45] Rank = 0, sampling throughput: 116556.597077 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:20:46] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:20:47] Rank = 0, Iter = 45, Block = 0, Slice = 0
[INFO] [2015-11-26 19:20:47] Rank = 0, Alias Time used: 1.95 s 
[INFO] [2015-11-26 19:21:11] Rank = 0, Training Time used: 138.88 s 
[INFO] [2015-11-26 19:21:11] Rank = 0, sampling throughput: 119358.778462 (tokens/thread/sec) 
[INFO] [2015-11-26 19:21:17] doc likelihood : -4.367304e+08
[INFO] [2015-11-26 19:21:18] word likelihood : 8.138167e+08
[INFO] [2015-11-26 19:21:18] Normalized likelihood : -1.584595e+09
[INFO] [2015-11-26 19:21:18] Rank = 0, Evaluation Time used: 34.18 s 
[DEBUG] [2015-11-26 19:21:18] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:21:18] Rank = 0, Iter = 46, Block = 0, Slice = 0
[INFO] [2015-11-26 19:21:19] Rank = 0, Alias Time used: 2.07 s 
[INFO] [2015-11-26 19:21:44] Rank = 0, Training Time used: 139.06 s 
[INFO] [2015-11-26 19:21:44] Rank = 0, sampling throughput: 119130.069216 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:21:48] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:21:49] Rank = 0, Iter = 47, Block = 0, Slice = 0
[INFO] [2015-11-26 19:21:49] Rank = 0, Alias Time used: 1.49 s 
[INFO] [2015-11-26 19:22:13] Rank = 0, Training Time used: 140.08 s 
[INFO] [2015-11-26 19:22:13] Rank = 0, sampling throughput: 118335.397796 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:22:13] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:22:14] Rank = 0, Iter = 48, Block = 0, Slice = 0
[INFO] [2015-11-26 19:22:14] Rank = 0, Alias Time used: 1.68 s 
[INFO] [2015-11-26 19:22:38] Rank = 0, Training Time used: 138.16 s 
[INFO] [2015-11-26 19:22:38] Rank = 0, sampling throughput: 119980.225881 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:22:40] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:22:41] Rank = 0, Iter = 49, Block = 0, Slice = 0
[INFO] [2015-11-26 19:22:41] Rank = 0, Alias Time used: 1.54 s 
[INFO] [2015-11-26 19:23:07] Rank = 0, Training Time used: 152.84 s 
[INFO] [2015-11-26 19:23:07] Rank = 0, sampling throughput: 108459.268085 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:23:07] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:23:08] Rank = 0, Iter = 50, Block = 0, Slice = 0
[INFO] [2015-11-26 19:23:08] Rank = 0, Alias Time used: 1.40 s 
[INFO] [2015-11-26 19:23:30] Rank = 0, Training Time used: 133.89 s 
[INFO] [2015-11-26 19:23:30] Rank = 0, sampling throughput: 123807.136131 (tokens/thread/sec) 
[INFO] [2015-11-26 19:23:40] doc likelihood : -4.314949e+08
[INFO] [2015-11-26 19:23:40] word likelihood : 8.308841e+08
[INFO] [2015-11-26 19:23:40] Normalized likelihood : -1.584257e+09
[INFO] [2015-11-26 19:23:40] Rank = 0, Evaluation Time used: 41.05 s 
[DEBUG] [2015-11-26 19:23:40] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:23:41] Rank = 0, Iter = 51, Block = 0, Slice = 0
[INFO] [2015-11-26 19:23:41] Rank = 0, Alias Time used: 1.69 s 
[INFO] [2015-11-26 19:24:12] Rank = 0, Training Time used: 154.38 s 
[INFO] [2015-11-26 19:24:12] Rank = 0, sampling throughput: 107378.348014 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:24:12] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:24:13] Rank = 0, Iter = 52, Block = 0, Slice = 0
[INFO] [2015-11-26 19:24:13] Rank = 0, Alias Time used: 1.37 s 
[INFO] [2015-11-26 19:24:36] Rank = 0, Training Time used: 139.34 s 
[INFO] [2015-11-26 19:24:36] Rank = 0, sampling throughput: 118964.549555 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:24:38] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:24:38] Rank = 0, Iter = 53, Block = 0, Slice = 0
[INFO] [2015-11-26 19:24:39] Rank = 0, Alias Time used: 2.05 s 
[INFO] [2015-11-26 19:25:04] Rank = 0, Training Time used: 142.71 s 
[INFO] [2015-11-26 19:25:04] Rank = 0, sampling throughput: 116158.505032 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:25:04] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:25:05] Rank = 0, Iter = 54, Block = 0, Slice = 0
[INFO] [2015-11-26 19:25:05] Rank = 0, Alias Time used: 1.50 s 
[INFO] [2015-11-26 19:25:27] Rank = 0, Training Time used: 131.35 s 
[INFO] [2015-11-26 19:25:27] Rank = 0, sampling throughput: 126155.166989 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:25:30] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:25:31] Rank = 0, Iter = 55, Block = 0, Slice = 0
[INFO] [2015-11-26 19:25:31] Rank = 0, Alias Time used: 1.65 s 
[INFO] [2015-11-26 19:25:55] Rank = 0, Training Time used: 143.52 s 
[INFO] [2015-11-26 19:25:55] Rank = 0, sampling throughput: 115439.869846 (tokens/thread/sec) 
[INFO] [2015-11-26 19:26:01] doc likelihood : -4.280392e+08
[INFO] [2015-11-26 19:26:01] word likelihood : 8.438210e+08
[INFO] [2015-11-26 19:26:01] Normalized likelihood : -1.583670e+09
[INFO] [2015-11-26 19:26:01] Rank = 0, Evaluation Time used: 26.59 s 
[DEBUG] [2015-11-26 19:26:01] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:26:02] Rank = 0, Iter = 56, Block = 0, Slice = 0
[INFO] [2015-11-26 19:26:02] Rank = 0, Alias Time used: 1.49 s 
[INFO] [2015-11-26 19:26:24] Rank = 0, Training Time used: 125.45 s 
[INFO] [2015-11-26 19:26:24] Rank = 0, sampling throughput: 132138.935103 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:26:28] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:26:29] Rank = 0, Iter = 57, Block = 0, Slice = 0
[INFO] [2015-11-26 19:26:29] Rank = 0, Alias Time used: 1.37 s 
[INFO] [2015-11-26 19:26:51] Rank = 0, Training Time used: 129.02 s 
[INFO] [2015-11-26 19:26:51] Rank = 0, sampling throughput: 128398.332540 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:26:53] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:26:54] Rank = 0, Iter = 58, Block = 0, Slice = 0
[INFO] [2015-11-26 19:26:54] Rank = 0, Alias Time used: 1.39 s 
[INFO] [2015-11-26 19:27:18] Rank = 0, Training Time used: 139.39 s 
[INFO] [2015-11-26 19:27:18] Rank = 0, sampling throughput: 118926.323367 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:27:19] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:27:19] Rank = 0, Iter = 59, Block = 0, Slice = 0
[INFO] [2015-11-26 19:27:20] Rank = 0, Alias Time used: 1.57 s 
[INFO] [2015-11-26 19:27:45] Rank = 0, Training Time used: 141.16 s 
[INFO] [2015-11-26 19:27:45] Rank = 0, sampling throughput: 117430.336350 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:27:45] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:27:46] Rank = 0, Iter = 60, Block = 0, Slice = 0
[INFO] [2015-11-26 19:27:46] Rank = 0, Alias Time used: 1.62 s 
[INFO] [2015-11-26 19:28:09] Rank = 0, Training Time used: 133.56 s 
[INFO] [2015-11-26 19:28:09] Rank = 0, sampling throughput: 124116.618109 (tokens/thread/sec) 
[INFO] [2015-11-26 19:28:15] doc likelihood : -4.257501e+08
[INFO] [2015-11-26 19:28:15] word likelihood : 8.539954e+08
[INFO] [2015-11-26 19:28:15] Normalized likelihood : -1.583021e+09
[INFO] [2015-11-26 19:28:15] Rank = 0, Evaluation Time used: 34.62 s 
[DEBUG] [2015-11-26 19:28:15] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:28:16] Rank = 0, Iter = 61, Block = 0, Slice = 0
[INFO] [2015-11-26 19:28:16] Rank = 0, Alias Time used: 1.44 s 
[INFO] [2015-11-26 19:28:36] Rank = 0, Training Time used: 107.46 s 
[INFO] [2015-11-26 19:28:36] Rank = 0, sampling throughput: 154255.024397 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:28:46] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:28:47] Rank = 0, Iter = 62, Block = 0, Slice = 0
[INFO] [2015-11-26 19:28:47] Rank = 0, Alias Time used: 1.75 s 
[INFO] [2015-11-26 19:29:08] Rank = 0, Training Time used: 128.39 s 
[INFO] [2015-11-26 19:29:08] Rank = 0, sampling throughput: 129116.484644 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:29:11] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:29:12] Rank = 0, Iter = 63, Block = 0, Slice = 0
[INFO] [2015-11-26 19:29:12] Rank = 0, Alias Time used: 1.43 s 
[INFO] [2015-11-26 19:29:35] Rank = 0, Training Time used: 136.27 s 
[INFO] [2015-11-26 19:29:35] Rank = 0, sampling throughput: 121650.262386 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:29:36] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:29:37] Rank = 0, Iter = 64, Block = 0, Slice = 0
[INFO] [2015-11-26 19:29:37] Rank = 0, Alias Time used: 1.44 s 
[INFO] [2015-11-26 19:30:01] Rank = 0, Training Time used: 137.20 s 
[INFO] [2015-11-26 19:30:01] Rank = 0, sampling throughput: 120822.632376 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:30:02] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:30:02] Rank = 0, Iter = 65, Block = 0, Slice = 0
[INFO] [2015-11-26 19:30:03] Rank = 0, Alias Time used: 1.38 s 
[INFO] [2015-11-26 19:30:30] Rank = 0, Training Time used: 150.12 s 
[INFO] [2015-11-26 19:30:30] Rank = 0, sampling throughput: 110424.012822 (tokens/thread/sec) 
[INFO] [2015-11-26 19:30:33] doc likelihood : -4.240845e+08
[INFO] [2015-11-26 19:30:33] word likelihood : 8.621267e+08
[INFO] [2015-11-26 19:30:33] Normalized likelihood : -1.582389e+09
[INFO] [2015-11-26 19:30:33] Rank = 0, Evaluation Time used: 18.53 s 
[DEBUG] [2015-11-26 19:30:33] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:30:34] Rank = 0, Iter = 66, Block = 0, Slice = 0
[INFO] [2015-11-26 19:30:34] Rank = 0, Alias Time used: 1.66 s 
[INFO] [2015-11-26 19:30:54] Rank = 0, Training Time used: 122.02 s 
[INFO] [2015-11-26 19:30:54] Rank = 0, sampling throughput: 135676.916831 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:31:01] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:31:01] Rank = 0, Iter = 67, Block = 0, Slice = 0
[INFO] [2015-11-26 19:31:02] Rank = 0, Alias Time used: 1.50 s 
[INFO] [2015-11-26 19:31:28] Rank = 0, Training Time used: 138.75 s 
[INFO] [2015-11-26 19:31:28] Rank = 0, sampling throughput: 119397.508518 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:31:30] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:31:31] Rank = 0, Iter = 68, Block = 0, Slice = 0
[INFO] [2015-11-26 19:31:31] Rank = 0, Alias Time used: 1.46 s 
[INFO] [2015-11-26 19:31:52] Rank = 0, Training Time used: 123.99 s 
[INFO] [2015-11-26 19:31:52] Rank = 0, sampling throughput: 133694.848980 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:31:57] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:31:57] Rank = 0, Iter = 69, Block = 0, Slice = 0
[INFO] [2015-11-26 19:31:58] Rank = 0, Alias Time used: 1.67 s 
[INFO] [2015-11-26 19:32:21] Rank = 0, Training Time used: 138.28 s 
[INFO] [2015-11-26 19:32:21] Rank = 0, sampling throughput: 119878.573570 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:32:22] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:32:23] Rank = 0, Iter = 70, Block = 0, Slice = 0
[INFO] [2015-11-26 19:32:23] Rank = 0, Alias Time used: 1.35 s 
[INFO] [2015-11-26 19:32:40] Rank = 0, Training Time used: 103.31 s 
[INFO] [2015-11-26 19:32:40] Rank = 0, sampling throughput: 160461.523622 (tokens/thread/sec) 
[INFO] [2015-11-26 19:32:51] doc likelihood : -4.229418e+08
[INFO] [2015-11-26 19:32:52] word likelihood : 8.688454e+08
[INFO] [2015-11-26 19:32:52] Normalized likelihood : -1.581791e+09
[INFO] [2015-11-26 19:32:52] Rank = 0, Evaluation Time used: 64.24 s 
[DEBUG] [2015-11-26 19:32:52] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:32:53] Rank = 0, Iter = 71, Block = 0, Slice = 0
[INFO] [2015-11-26 19:32:53] Rank = 0, Alias Time used: 1.71 s 
[INFO] [2015-11-26 19:33:13] Rank = 0, Training Time used: 119.84 s 
[INFO] [2015-11-26 19:33:13] Rank = 0, sampling throughput: 138323.524062 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:33:16] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:33:17] Rank = 0, Iter = 72, Block = 0, Slice = 0
[INFO] [2015-11-26 19:33:17] Rank = 0, Alias Time used: 1.39 s 
[INFO] [2015-11-26 19:33:39] Rank = 0, Training Time used: 129.61 s 
[INFO] [2015-11-26 19:33:39] Rank = 0, sampling throughput: 127893.766193 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:33:42] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:33:42] Rank = 0, Iter = 73, Block = 0, Slice = 0
[INFO] [2015-11-26 19:33:43] Rank = 0, Alias Time used: 1.69 s 
[INFO] [2015-11-26 19:34:04] Rank = 0, Training Time used: 126.04 s 
[INFO] [2015-11-26 19:34:04] Rank = 0, sampling throughput: 131522.060527 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:34:08] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:34:09] Rank = 0, Iter = 74, Block = 0, Slice = 0
[INFO] [2015-11-26 19:34:09] Rank = 0, Alias Time used: 1.56 s 
[INFO] [2015-11-26 19:34:29] Rank = 0, Training Time used: 119.93 s 
[INFO] [2015-11-26 19:34:29] Rank = 0, sampling throughput: 138219.504981 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:34:33] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:34:34] Rank = 0, Iter = 75, Block = 0, Slice = 0
[INFO] [2015-11-26 19:34:34] Rank = 0, Alias Time used: 1.30 s 
[INFO] [2015-11-26 19:34:59] Rank = 0, Training Time used: 144.57 s 
[INFO] [2015-11-26 19:34:59] Rank = 0, sampling throughput: 114662.933687 (tokens/thread/sec) 
[INFO] [2015-11-26 19:35:02] doc likelihood : -4.220603e+08
[INFO] [2015-11-26 19:35:02] word likelihood : 8.744806e+08
[INFO] [2015-11-26 19:35:02] Normalized likelihood : -1.581272e+09
[INFO] [2015-11-26 19:35:02] Rank = 0, Evaluation Time used: 16.66 s 
[DEBUG] [2015-11-26 19:35:02] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:35:03] Rank = 0, Iter = 76, Block = 0, Slice = 0
[INFO] [2015-11-26 19:35:03] Rank = 0, Alias Time used: 1.30 s 
[INFO] [2015-11-26 19:35:24] Rank = 0, Training Time used: 123.81 s 
[INFO] [2015-11-26 19:35:24] Rank = 0, sampling throughput: 133895.158290 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:35:28] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:35:29] Rank = 0, Iter = 77, Block = 0, Slice = 0
[INFO] [2015-11-26 19:35:29] Rank = 0, Alias Time used: 1.77 s 
[INFO] [2015-11-26 19:35:52] Rank = 0, Training Time used: 126.83 s 
[INFO] [2015-11-26 19:35:52] Rank = 0, sampling throughput: 130701.121912 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:35:56] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:35:57] Rank = 0, Iter = 78, Block = 0, Slice = 0
[INFO] [2015-11-26 19:35:57] Rank = 0, Alias Time used: 1.56 s 
[INFO] [2015-11-26 19:36:19] Rank = 0, Training Time used: 123.26 s 
[INFO] [2015-11-26 19:36:19] Rank = 0, sampling throughput: 134399.596618 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:36:24] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:36:25] Rank = 0, Iter = 79, Block = 0, Slice = 0
[INFO] [2015-11-26 19:36:25] Rank = 0, Alias Time used: 1.23 s 
[INFO] [2015-11-26 19:36:47] Rank = 0, Training Time used: 133.95 s 
[INFO] [2015-11-26 19:36:47] Rank = 0, sampling throughput: 123757.576251 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:36:48] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:36:49] Rank = 0, Iter = 80, Block = 0, Slice = 0
[INFO] [2015-11-26 19:36:49] Rank = 0, Alias Time used: 1.49 s 
[INFO] [2015-11-26 19:37:11] Rank = 0, Training Time used: 133.50 s 
[INFO] [2015-11-26 19:37:11] Rank = 0, sampling throughput: 124170.388139 (tokens/thread/sec) 
[INFO] [2015-11-26 19:37:16] doc likelihood : -4.215044e+08
[INFO] [2015-11-26 19:37:17] word likelihood : 8.793446e+08
[INFO] [2015-11-26 19:37:17] Normalized likelihood : -1.580807e+09
[INFO] [2015-11-26 19:37:17] Rank = 0, Evaluation Time used: 25.43 s 
[DEBUG] [2015-11-26 19:37:17] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:37:17] Rank = 0, Iter = 81, Block = 0, Slice = 0
[INFO] [2015-11-26 19:37:18] Rank = 0, Alias Time used: 1.43 s 
[INFO] [2015-11-26 19:37:45] Rank = 0, Training Time used: 146.71 s 
[INFO] [2015-11-26 19:37:45] Rank = 0, sampling throughput: 112989.725378 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:37:45] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:37:46] Rank = 0, Iter = 82, Block = 0, Slice = 0
[INFO] [2015-11-26 19:37:46] Rank = 0, Alias Time used: 1.24 s 
[INFO] [2015-11-26 19:38:08] Rank = 0, Training Time used: 130.01 s 
[INFO] [2015-11-26 19:38:08] Rank = 0, sampling throughput: 127508.318693 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:38:10] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:38:10] Rank = 0, Iter = 83, Block = 0, Slice = 0
[INFO] [2015-11-26 19:38:11] Rank = 0, Alias Time used: 1.55 s 
[INFO] [2015-11-26 19:38:36] Rank = 0, Training Time used: 141.07 s 
[INFO] [2015-11-26 19:38:36] Rank = 0, sampling throughput: 117505.630428 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:38:36] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:38:37] Rank = 0, Iter = 84, Block = 0, Slice = 0
[INFO] [2015-11-26 19:38:37] Rank = 0, Alias Time used: 1.47 s 
[INFO] [2015-11-26 19:38:56] Rank = 0, Training Time used: 110.34 s 
[INFO] [2015-11-26 19:38:56] Rank = 0, sampling throughput: 150232.479256 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:39:01] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:39:02] Rank = 0, Iter = 85, Block = 0, Slice = 0
[INFO] [2015-11-26 19:39:02] Rank = 0, Alias Time used: 1.58 s 
[INFO] [2015-11-26 19:39:26] Rank = 0, Training Time used: 137.54 s 
[INFO] [2015-11-26 19:39:26] Rank = 0, sampling throughput: 120524.656035 (tokens/thread/sec) 
[INFO] [2015-11-26 19:39:31] doc likelihood : -4.210534e+08
[INFO] [2015-11-26 19:39:32] word likelihood : 8.835320e+08
[INFO] [2015-11-26 19:39:32] Normalized likelihood : -1.580384e+09
[INFO] [2015-11-26 19:39:32] Rank = 0, Evaluation Time used: 23.83 s 
[DEBUG] [2015-11-26 19:39:32] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:39:32] Rank = 0, Iter = 86, Block = 0, Slice = 0
[INFO] [2015-11-26 19:39:33] Rank = 0, Alias Time used: 1.56 s 
[INFO] [2015-11-26 19:39:58] Rank = 0, Training Time used: 138.72 s 
[INFO] [2015-11-26 19:39:58] Rank = 0, sampling throughput: 119450.538189 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:39:58] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:39:59] Rank = 0, Iter = 87, Block = 0, Slice = 0
[INFO] [2015-11-26 19:39:59] Rank = 0, Alias Time used: 1.44 s 
[INFO] [2015-11-26 19:40:19] Rank = 0, Training Time used: 117.04 s 
[INFO] [2015-11-26 19:40:19] Rank = 0, sampling throughput: 141633.889817 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:40:22] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:40:23] Rank = 0, Iter = 88, Block = 0, Slice = 0
[INFO] [2015-11-26 19:40:24] Rank = 0, Alias Time used: 2.08 s 
[INFO] [2015-11-26 19:40:45] Rank = 0, Training Time used: 119.46 s 
[INFO] [2015-11-26 19:40:45] Rank = 0, sampling throughput: 138661.819506 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:40:50] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:40:51] Rank = 0, Iter = 89, Block = 0, Slice = 0
[INFO] [2015-11-26 19:40:51] Rank = 0, Alias Time used: 1.16 s 
[INFO] [2015-11-26 19:41:09] Rank = 0, Training Time used: 110.17 s 
[INFO] [2015-11-26 19:41:09] Rank = 0, sampling throughput: 150464.912253 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:41:16] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:41:17] Rank = 0, Iter = 90, Block = 0, Slice = 0
[INFO] [2015-11-26 19:41:17] Rank = 0, Alias Time used: 1.59 s 
[INFO] [2015-11-26 19:41:40] Rank = 0, Training Time used: 133.89 s 
[INFO] [2015-11-26 19:41:40] Rank = 0, sampling throughput: 123735.482634 (tokens/thread/sec) 
[INFO] [2015-11-26 19:41:46] doc likelihood : -4.206959e+08
[INFO] [2015-11-26 19:41:46] word likelihood : 8.872202e+08
[INFO] [2015-11-26 19:41:46] Normalized likelihood : -1.580009e+09
[INFO] [2015-11-26 19:41:46] Rank = 0, Evaluation Time used: 31.54 s 
[DEBUG] [2015-11-26 19:41:46] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:41:47] Rank = 0, Iter = 91, Block = 0, Slice = 0
[INFO] [2015-11-26 19:41:47] Rank = 0, Alias Time used: 1.69 s 
[INFO] [2015-11-26 19:42:11] Rank = 0, Training Time used: 137.06 s 
[INFO] [2015-11-26 19:42:11] Rank = 0, sampling throughput: 120949.302820 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:42:13] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:42:14] Rank = 0, Iter = 92, Block = 0, Slice = 0
[INFO] [2015-11-26 19:42:14] Rank = 0, Alias Time used: 1.49 s 
[INFO] [2015-11-26 19:42:36] Rank = 0, Training Time used: 129.69 s 
[INFO] [2015-11-26 19:42:36] Rank = 0, sampling throughput: 127816.967768 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:42:37] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:42:38] Rank = 0, Iter = 93, Block = 0, Slice = 0
[INFO] [2015-11-26 19:42:38] Rank = 0, Alias Time used: 1.43 s 
[INFO] [2015-11-26 19:43:01] Rank = 0, Training Time used: 134.25 s 
[INFO] [2015-11-26 19:43:01] Rank = 0, sampling throughput: 123474.526856 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:43:01] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:43:02] Rank = 0, Iter = 94, Block = 0, Slice = 0
[INFO] [2015-11-26 19:43:02] Rank = 0, Alias Time used: 1.37 s 
[INFO] [2015-11-26 19:43:24] Rank = 0, Training Time used: 129.88 s 
[INFO] [2015-11-26 19:43:24] Rank = 0, sampling throughput: 127633.292251 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:43:25] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:43:26] Rank = 0, Iter = 95, Block = 0, Slice = 0
[INFO] [2015-11-26 19:43:26] Rank = 0, Alias Time used: 1.39 s 
[INFO] [2015-11-26 19:43:43] Rank = 0, Training Time used: 99.65 s 
[INFO] [2015-11-26 19:43:43] Rank = 0, sampling throughput: 166217.845728 (tokens/thread/sec) 
[INFO] [2015-11-26 19:43:53] doc likelihood : -4.205036e+08
[INFO] [2015-11-26 19:43:54] word likelihood : 8.904672e+08
[INFO] [2015-11-26 19:43:54] Normalized likelihood : -1.579677e+09
[INFO] [2015-11-26 19:43:54] Rank = 0, Evaluation Time used: 55.21 s 
[DEBUG] [2015-11-26 19:43:54] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:43:54] Rank = 0, Iter = 96, Block = 0, Slice = 0
[INFO] [2015-11-26 19:43:55] Rank = 0, Alias Time used: 1.38 s 
[INFO] [2015-11-26 19:44:16] Rank = 0, Training Time used: 125.78 s 
[INFO] [2015-11-26 19:44:16] Rank = 0, sampling throughput: 131736.899645 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:44:18] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:44:19] Rank = 0, Iter = 97, Block = 0, Slice = 0
[INFO] [2015-11-26 19:44:19] Rank = 0, Alias Time used: 1.30 s 
[INFO] [2015-11-26 19:44:36] Rank = 0, Training Time used: 102.23 s 
[INFO] [2015-11-26 19:44:36] Rank = 0, sampling throughput: 162153.926892 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:44:42] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:44:42] Rank = 0, Iter = 98, Block = 0, Slice = 0
[INFO] [2015-11-26 19:44:43] Rank = 0, Alias Time used: 1.47 s 
[INFO] [2015-11-26 19:45:12] Rank = 0, Training Time used: 147.84 s 
[INFO] [2015-11-26 19:45:12] Rank = 0, sampling throughput: 112124.181078 (tokens/thread/sec) 
[DEBUG] [2015-11-26 19:45:13] Request params. start = 0, end = 101635
[INFO] [2015-11-26 19:45:13] Rank = 0, Iter = 99, Block = 0, Slice = 0
[INFO] [2015-11-26 19:45:14] Rank = 0, Alias Time used: 1.69 s 
[INFO] [2015-11-26 19:45:38] Rank = 0, Training Time used: 138.22 s 
[INFO] [2015-11-26 19:45:38] Rank = 0, sampling throughput: 119927.936166 (tokens/thread/sec) 
[INFO] [2015-11-26 19:45:39] Rank 0/1: End of training.
[INFO] [2015-11-26 19:45:39] Server 0: Received close message from worker 0.
[INFO] [2015-11-26 19:45:39] Server 0: Dump model...
[INFO] [2015-11-26 19:45:41] Server 0 closed.
[INFO] [2015-11-26 19:45:42] Rank 0/1: Multiverso closed successfully.

Thank you.

Marco

Invalid topic assignment N from word proposal

Mem: 32GB
bin/lightlda -num_vocabs 141043 -num_topics 1000 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 22 -num_blocks 33 -max_num_document 1500000 -input_dir ./splitout -data_capacity 24000

Total doc number: 32800000

[INFO] [2016-05-03 10:50:51] Actual Model capacity: 309 MB, Alias capacity: 512 MB, Delta capacity: 230MB
[INFO] [2016-05-03 10:50:51] INFO: block = 0, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 310 MB, Alias capacity: 512 MB, Delta capacity: 231MB
[INFO] [2016-05-03 10:50:51] INFO: block = 1, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 311 MB, Alias capacity: 512 MB, Delta capacity: 232MB
[INFO] [2016-05-03 10:50:51] INFO: block = 2, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 312 MB, Alias capacity: 512 MB, Delta capacity: 233MB
[INFO] [2016-05-03 10:50:51] INFO: block = 3, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 4, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 5, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 6, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 7, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 8, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 310 MB, Alias capacity: 512 MB, Delta capacity: 231MB
[INFO] [2016-05-03 10:50:51] INFO: block = 9, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 311 MB, Alias capacity: 512 MB, Delta capacity: 232MB
[INFO] [2016-05-03 10:50:51] INFO: block = 10, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 312 MB, Alias capacity: 512 MB, Delta capacity: 233MB
[INFO] [2016-05-03 10:50:51] INFO: block = 11, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 12, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 13, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 14, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 15, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 16, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 310 MB, Alias capacity: 512 MB, Delta capacity: 230MB
[INFO] [2016-05-03 10:50:51] INFO: block = 17, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 311 MB, Alias capacity: 512 MB, Delta capacity: 232MB
[INFO] [2016-05-03 10:50:51] INFO: block = 18, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 312 MB, Alias capacity: 512 MB, Delta capacity: 233MB
[INFO] [2016-05-03 10:50:51] INFO: block = 19, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 20, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 21, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 22, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 23, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 24, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 309 MB, Alias capacity: 512 MB, Delta capacity: 230MB
[INFO] [2016-05-03 10:50:51] INFO: block = 25, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 311 MB, Alias capacity: 512 MB, Delta capacity: 231MB
[INFO] [2016-05-03 10:50:51] INFO: block = 26, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 312 MB, Alias capacity: 512 MB, Delta capacity: 232MB
[INFO] [2016-05-03 10:50:51] INFO: block = 27, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 28, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 29, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 30, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 31, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 32, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Server 0 starts: num_workers=1 endpoint=inproc://server
[INFO] [2016-05-03 10:50:51] Server 0: Worker registratrion completed: workers=1 trainers=22 servers=1
[INFO] [2016-05-03 10:50:51] Rank 0/1: Multiverso initialized successfully.
[INFO] [2016-05-03 10:51:09] Rank 0/1: Begin of configuration and initialization.
[DEBUG] [2016-05-03 11:00:50] Request params. start = 1, end = 133817
[INFO] [2016-05-03 11:00:53] Rank = 0, Iter = 0, Block = 0, Slice = 0
[DEBUG] [2016-05-03 11:00:53] Request params. start = 133818, end = 141042
[INFO] [2016-05-03 11:00:53] [FATAL] [2016-05-03 11:00:53] Invalid topic assignment 681102570 from word proposal
[FATAL] [2016-05-03 11:00:53] Rank = 0, Alias Time used: 6.48 s

BUG in Makefile

That's weird of Makefile, one cannot ensure the value of PROJECT won't be empty.

PROJECT := $(shell readlink $(dir $(lastword $(MAKEFILE_LIST))) -f)
BIN_DIR = $(PROJECT)/bin
clean: rm -rf $(BIN_DIR)

So when it is empty, a such dangerous thing will happen when using

make clean

to be in real:

rm -rf /bin

Invalid topic assignment from word proposal

Hi, so I can run lightlda in single machine, but met problem when I tried to run them in distributed mode (mpi).

Here is my steps:

split nytimes.libsvm into two parts, dump_binary each of them, so I get

block.0  block.1  vocab.0  vocab.0.txt  vocab.1  vocab.1.txt

and distribute them into two machines.

start lightlda with command

mpiexec -f machine $bin/lightlda -num_vocabs 111400 -num_topics 1000 -num_servers 2 -num_iterations 10 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 2 -max_num_document 300000 -input_dir $dir -data_capacity 800

Then I will get fatal error like this

[FATAL] [2015-11-18 15:51:28] Invalid topic assignment 280904469 from word proposal

and eventually failed with information:

===============================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 31489 RUNNING AT XXXXXXXXXXX
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===============================================================================

Any idea why it happens?

how to use after training model?

can you give a demo of usage after training model and inferer, or a more detailed document?

Invalid table or row ids: 0 -1Segmentation fault

I ran light lda over a corpus with 1000 doc and approx 22000 vocabulary size , i used text2libsvm api present in lightlda/example to convert UCI data to libsvm and get dictionary too.

I am getting the following error

bin/dump_binary test.libsvm test.dict . 0
There are totally 6729 words in the vocabulary
There are maximally totally 21405 tokens in the data set
The number of tokens in the output block is: 21405
Local vocab_size for the output block is: 6720
Elapsed seconds for dump blocks: 0.0125935
root@ankit:/home/ankit/lightlda# bin/lightlda -num_vocabs 22000 -num_topics 1000 -num_iterations 1 -alpha 0.1 -beta 0.01 -num_blocks 1 -max_num_document 1000 -input_dir /home/ankit/lightlda -data_capacity 800
[INFO] [2016-01-14 18:37:21] INFO: block = 0, the number of slice = 1
[INFO] [2016-01-14 18:37:21] Server 0 starts: num_workers=1 endpoint=inproc://server
[INFO] [2016-01-14 18:37:21] Server 0: Worker registratrion completed: workers=1 trainers=1 servers=1
[INFO] [2016-01-14 18:37:21] Rank 0/1: Multiverso initialized successfully.
[INFO] [2016-01-14 18:37:21] Rank 0/1: Begin of configuration and initialization.
[INFO] [2016-01-14 18:37:21] Rank 0/1: End of configration and initialization.
[INFO] [2016-01-14 18:37:21] Rank 0/1: Begin of training.
[DEBUG] [2016-01-14 18:37:21] Request params. start = 0, end = 6719
[INFO] [2016-01-14 18:37:21] Rank = 0, Iter = 0, Block = 0, Slice = 0
[INFO] [2016-01-14 18:37:21] Rank = 0, Alias Time used: 0.01 s
[ERROR] [2016-01-14 18:37:21] Rank=0 Trainer=0: TrainerBase::GetTable: Invalid table or row ids: 0 -1Segmentation fault

Program received segment fault in configuration and initialization

When I ran lightLDA on a single machine, I get the following message.
[INFO] [2016-06-30 11:49:39] INFO: block = 0, the number of slice = 1
[INFO] [2016-06-30 11:49:39] Server 0 starts: num_workers=1 endpoint=inproc://server
[INFO] [2016-06-30 11:49:39] Server 0: Worker registratrion completed: workers=1 trainers=1 servers=1
[INFO] [2016-06-30 11:49:39] Rank 0/1: Multiverso initialized successfully.
[INFO] [2016-06-30 11:49:43] Rank 0/1: Begin of configuration and initialization.
Segmentation fault (core dumped)

In other issues, say issue#15, I find that segmentation fault is usually caused by wrong tf count, which happened after "End of configuration and initialization". But my case happened before "End of configuration and initialization", so I wonder what can cause the fault of my case?

BTW, when I use gdb to debug, I get the following:
#0 __memset_sse2 () at ../sysdeps/x86_64/memset.S:65
#1 0x000000000041b023 in multiverso::RowFactory::CreateRow(int, multiverso::Format, int, void*) ()
#2 0x000000000041f07d in multiverso::Table::GetRow(int) ()
#3 0x000000000042c4ca in multiverso::Server::StartThread() ()
#4 0x0000003e1feb6470 in std::(anonymous namespace)::execute_native_thread_routine (__p=) at ../../../../libstdc++-v3/src/thread.cc:44
#5 0x00000035248079d1 in start_thread (arg=0x7f054cfc2700) at pthread_create.c:301
#6 0x00000035244e88fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Anyone can tell what has happened?

terminate called after throwing an instance of 'std::bad_alloc'

While doing dump_binary in the given example nytimes.sh, throw out such error: std:bad_alloc

bin/dump_binary example/data/nytimes/nytimes.libsvm example/data/nytimes/nytimes.word_id.dict example/data/nytimes/ 0

logs:

There are totally 101636 words in the vocabulary
There are maximally totally 99542125 tokens in the data set
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Abort

Could anybody help me? Highly appreciated.

is this will affect the result, please read the description for detail

for example, the uci fomat:
doc_id word word_count
1 189 3

i change it like this:
doc_id word word_count
1 189 1
1 189 1
1 189 1

is this will affect result?

Invalid topic assignment N from word proposal , something different compared with the former issues

I have the same problem of Invalid topic assignment N from word proposal as the former issues .The former issues all use their owns corpus with the mistake of the TF of words, But I use the example corpus of nytimes . When I run it by only one machine , it runs well. When I run it by two or four machines using MPI , it will get into the trouble of Invalid topic assignment N from word proposal . If you can tell me why , I will appreciate it very much.

the following is my input order:
OMP_NUM_THREADS=1 mpirun -n 2 -perhost 1 -host node1,node2 $bin/lightlda -num_vocabs 111400 -num_topics 1000 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 300000 -input_dir $dir -data_capacity 800

the following is the logging info:

/home/danyang/mfs/lightLDA/example
[INFO] [2016-09-29 12:49:01] INFO: block = 0, the number of slice = 1
[INFO] [2016-09-29 12:49:01] INFO: block = 0, the number of slice = 1
[INFO] [2016-09-29 12:49:01] Server 0 starts: num_workers=2 endpoint=inproc://server
[INFO] [2016-09-29 12:49:01] Server 1 starts: num_workers=2 endpoint=inproc://server
[INFO] [2016-09-29 12:49:01] Server 0: Worker registratrion completed: workers=2 trainers=2 servers=2
[INFO] [2016-09-29 12:49:01] Rank 0/2: Multiverso initialized successfully.
[INFO] [2016-09-29 12:49:01] Rank 1/2: Multiverso initialized successfully.
[INFO] [2016-09-29 12:49:09] Rank 1/2: Begin of configuration and initialization.
[INFO] [2016-09-29 12:49:11] Rank 0/2: Begin of configuration and initialization.
[INFO] [2016-09-29 12:49:34] Rank 0/2: End of configration and initialization.
[INFO] [2016-09-29 12:49:34] Rank 1/2: End of configration and initialization.
[INFO] [2016-09-29 12:49:34] Rank 0/2: Begin of training.
[INFO] [2016-09-29 12:49:34] Rank 1/2: Begin of training.
[DEBUG] [2016-09-29 12:49:34] Request params. start = 0, end = 101635
[DEBUG] [2016-09-29 12:49:34] Request params. start = 0, end = 101635
[INFO] [2016-09-29 12:49:37] Rank = 0, Iter = 0, Block = 0, Slice = 0
[INFO] [2016-09-29 12:49:38] Rank = 1, Iter = 0, Block = 0, Slice = 0
[INFO] [2016-09-29 12:49:40] Rank = 0, Alias Time used: 6.88 s
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 13989076 from word proposal
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 25504855 from word proposal
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 341954549 from word proposal
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 510688998 from word proposal
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 1604518467 from word proposal
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 72048999 from word proposal
[INFO] [2016-09-29 12:49:40] Rank = 1, Alias Time used: 7.31 s
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 13989076 from word proposal
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 25504855 from word proposal
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 341954549 from word proposal
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 510688998 from word proposal
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 1604518467 from word proposal
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 72048999 from word proposal
[FATAL] [2016-09-29 12:49:40] Invalid topic assignment 324590367 from word proposal

if my way of running it by multiple machine is wrong , please show me a right example.Thanks very much!

Support asymmetric Dirichlet prior optimization

The current released lightlda doesn't support asymmetric Dirichlet prior optimization. However, our internal practice show it would be useful to get better model with such feature (Also see this).

If anyone is interested in contributing this feature, please reply or contact us through email. We can collaborate on this.

Docker file supported

I would like to open an issue for tracking feature of docker file supported.

Word likelihood positive?

I ran lightlda on the nytimes example, and following is part of the log:

[INFO] [2015-11-16 16:04:49] doc likelihood : -6.043713e+08
[INFO] [2015-11-16 16:04:50] word likelihood : 5.660618e+08
[INFO] [2015-11-16 16:04:50] Normalized likelihood : -1.562610e+09

why is word likelihood positive? I assume it's log-likelihood

[BUG] Infer: Fail to build alias row, capacity of row = 0

Hi, I've trained a model with light lda, and want to infer topic for new documents. However, when I use the infer program, it gives an error : Fail to build alias row, capacity of row = 0. The details are as follows:

I run the lightlda using following command:
lightlda/bin/lightlda -num_vocabs 279164 -num_topics 1000 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 1 -max_num_document 100000 -input_dir data -data_capacity 6200 &
After the command completed, I found three files: server_0_table_0.model, server_0_table_1.model, doc_topic.0. The file server_0_table_0.model has 279163 lines and server_0_table_1.model has 1 line.
I run the infer with following command by inputing the exact training data:
ightlda/bin/infer -num_vocabs 279164 -num_topics 1000 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 1 -max_num_document 100000 -input_dir data -data_capacity 6200
The procedure exits and gives following log:
[INFO] [2016-02-05 11:19:02] Actual Alias capacity: 111 MB
[INFO] [2016-02-05 11:19:02] loading model
[ERROR] [2016-02-05 11:19:02] Fail to build alias row, capacity of row = 0
[ERROR] [2016-02-05 11:19:02] Fail to build alias row, capacity of row = 0

Anyone can tell what has happened here?

Can lightlda support training from a previous trained model ?

evaluate topic model

I'm curious whether there is any available tools here to calculate the perplexity or likelihood of a topic model. As the perplexity calculator in the Topic Model Toolkit provided by Stanford NLP group, it is really helpful to select an appropriate count of topics. Is there any similar tools here?

Improve exception safety with smart pointers

Would you like to wrap any pointer data members with the template class "std::unique_ptr"?

Update candidate: AliasTable

how can i dump the model to the specified directory?

hi ,
I want to save the lightlda model to the specified directory. And i have found the code to dump model

how can i parse the "output model dir" argument to this method when i train lightlda?
Thanks very much!

Segmentation Fault reasons

I just wasted 4 hours trying to figure out why my inference is always hitting Segmentation Fault when inferencing on a held out dataset of 3000 documents. So I am noting down the reason here for others who may stumble into it in future : max_num_document value should always be > than number of documents. Hope it helps!

how to calculate topic-word-probability table

I heard your talk last year in jd, i have installed successfully. then i have tow files,i.e server_0_table_0.model / server_0_table_1.model. how to transform into probability representation. thanks!

Inferencing of new/unseen documents

Hi there

As far as i see there is currently no implementation for inferencing new/unseen documents, once the model is trained. Is that correct? If yes are you planning to add that, or if no do you have any pointers how to accomplish that and contribute?

Thanks Ben

lpthread

I needed to add -lpthread to the Makefile "LD_FLAGS = -L$(MULTIVERSO_LIB) -lmultiverso -lpthread" to make it compile on Ubuntu 14.04

how can I train the lda model by multi machines?

As is said in your paper, the lda model can be trained on multi machines, but I don't find the instructions to do it. As what I know, the training command is " bin/lightlda -num_vocabs 70626 -num_topics 10 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 256 -max_num_document 997733 -input_dir example/data -out_of_core -data_capacity 800", how to change it to train on multiple machines?@feiga

"Invalid topic assignment from word proposal" error

Hi, when running on my own dataset using the following command,

lightlda -num_vocabs 64253 -num_topics 1000 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 10 -num_blocks 1 -max_num_document 4000000 -input_dir my_dataset_dir -data_capacity 1600

I encounter this error immediately after training starts:

[INFO] [2015-12-05 19:30:24] INFO: block = 0, the number of slice = 1
[INFO] [2015-12-05 19:30:24] Server 0 starts: num_workers=1 endpoint=inproc://server
[INFO] [2015-12-05 19:30:24] Server 0: Worker registratrion completed: workers=1 trainers=10 servers=1
[INFO] [2015-12-05 19:30:24] Rank 0/1: Multiverso initialized successfully.
[INFO] [2015-12-05 19:30:26] Rank 0/1: Begin of configuration and initialization.
[INFO] [2015-12-05 19:31:00] Rank 0/1: End of configration and initialization.
[INFO] [2015-12-05 19:31:00] Rank 0/1: Begin of training.
[DEBUG] [2015-12-05 19:31:00] Request params. start = 1, end = 64252
[INFO] [2015-12-05 19:31:01] Rank = 0, Iter = 0, Block = 0, Slice = 0
[FATAL] [2015-12-05 19:31:01] Invalid topic assignment 1737313747 from word proposal
[FATAL] [2015-12-05 19:31:01] Invalid topic assignment 1866155390 from word proposal
[FATAL] [2015-12-05 19:31:01] Invalid topic assignment 725731578 from word proposal
Segmentation fault (core dumped)

How can I get an idea of what is going wrong?

Permission denied (publickey).

mldl@mldlUB1604:/media/mldl/data1t/ub16_prj/LightLDA$ sudo bash build.sh
[sudo] password for mldl:
Cloning into 'multiverso'...
The authenticity of host 'github.com (192.30.253.113)' can't be established.
RSA key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'github.com,192.30.253.113' (RSA) to the list of known hosts.
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
build.sh: line 5: cd: multiverso: No such file or directory
build.sh: line 6: cd: third_party: No such file or directory
sh: 0: Can't open install.sh
Makefile:6: *** Makefile.config not found. See Makefile.config.example.. Stop.
make: *** No targets specified and no makefile found. Stop.
mldl@mldlUB1604:/media/mldl/data1t/ub16_prj/LightLDA$

when I trained the model, the words in topic 0 are more than other topics, why?

other topic words is only 1/50 of topic 0 words

Core dump when reading data blocks

I can run the nytimes example successfully. But on my own dataset, it failed with the following messages:

[INFO] [2015-11-20 16:26:13] INFO: block = 0, the number of slice = 1
[INFO] [2015-11-20 16:26:14] Server 0 starts: num_workers=1 endpoint=inproc://server
[INFO] [2015-11-20 16:26:14] Server 0: Worker registratrion completed: workers=1 trainers=4 servers=1
[INFO] [2015-11-20 16:26:14] Rank 0/1: Multiverso initialized successfully.
[INFO] [2015-11-20 16:26:14] Rank 0/1: Begin of configuration and initialization.
foot.sh: line 13: 26600 Segmentation fault (core dumped) $bin/lightlda -num_vocabs 99948 -num_topics 50 -num_iterations 50 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 1 -max_num_document 382578 -input_dir $dir -data_capacity 800

The program exited during processing the docs in the data blocks.

Any thought? Thanks a lot.

reserved identifier violation

I would like to point out that identifiers like "_LIGHTLDA_DATA_STREAM_H_" and "_LIGHTLDA_TRAINER_H_" do eventually not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?

Program received signal SIGSEGV, Segmentation fault.

in multiverso::lightlda::AliasTable::Propose (this=0x877ccd0, word=3854, rng=...) at /data/lrgroup/lightlda/src/alias_table.cpp:145
145 return (id & m) | (idx_vector[k] & ~m);
(gdb) l
140 int32_t* p = kv_vector + 2 * idx;
141 int32_t k = *p++;
142 int32_t v = *p;
143 int32_t id = idx_vector[idx];
144 int32_t m = -(n_kw_sample < v);
145 return (id & m) | (idx_vector[k] & ~m);
here k = 1305138694。I think the most probable is the k out of index

Building Infer project

Hello, what should be the project properties for infer project? I'm using visual studio 13.

Feature request : Label LDA

Is there any roadmap to develop Labelled LDA on top of this? Thanks!

Can I save model parameters throughout training, not just at the end?

Thanks for making your code available! I'm interested in benchmarking the single-machine version of lightlda against some other topic model algorithms, for some not too big datasets (about 10k - a few million documents). One thing I'd like to do is have snapshots of model parameters saved on disk throughout training. For example, after the 1st iteration, the 100th iteration, the 200th iteration, the 300th iteration, etc. Is this possible?

Right now, it just seems that the only saving that happens is after the final iteration. This is undesirable, since sometimes code takes a long time to run and I'd like to inspect intermediate results. It's also useful to understand how long a run really needs to be before its performance starts to plateau.

If this isn't possible with the current code, I'll try to make the necessary changes myself. My guess is that I'll need to add code to the loop over iterations in around lines 67-90 of lightlda.cpp.
https://github.com/Microsoft/lightlda/blob/master/src/lightlda.cpp

When the desired checkpoint iteration is reached, I should call something like the DumpModel() function from multiverso/server.cpp
https://github.com/Microsoft/multiverso/blob/9ed99cd2d3080a8683d1c511de5927e2b7274438/src/multiverso/server.cpp

Does that sound about right? Any other tips? Thanks in advance!

[Inference] Infer: Program received signal SIGSEGV, Segmentation fault. But train model is OK

Hi, @feiga .
I cannot inference new doc by lightlda infer tool. Can you give me a hand? My problem is:

run the lightlda tool to train model using the following command:
$bin/lightlda -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 111000 -input_dir $dir -data_capacity 500
I got server_0_table_0.model, server_0_table_1.model and doc_topic.0
run the infer tool to inference new docs using the following comand:
mv doc_topic.0 doc_topic.0.tr
$bin/infer -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 110629 -input_dir $dir -data_capacity 500
I run the command in the same dir with running lightlda (including block.0, vocab.0, vocab.0.txt), But I got the error info:
INFO] [2016-02-19 14:44:38] Actual Alias capacity: 5 MB
[INFO] [2016-02-19 14:44:38] loading model
[INFO] [2016-02-19 14:44:38] loading word topic table[server_0_table_0.model]
[INFO] [2016-02-19 14:44:38] loading summary table[server_0_table_1.model]
[INFO] [2016-02-19 14:44:38] block=0, Alias Time used: 0.11 s
[INFO] [2016-02-19 14:44:38] iter=0
Segmentation fault (core dumped) $bin/infer -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 110629 -input_dir $dir -data_capacity 500

GDB the program, the information like follow:
(gdb) r -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 110629 -input_dir /home/disk4/daimingyang/tools/DMTK/lightlda_feiga/example/data/20151001_65w_200k -data_capacity 500
Starting program: /home/disk4/daimingyang/tools/DMTK/lightlda_feiga/bin/infer -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 110629 -input_dir /home/disk4/daimingyang/tools/DMTK/lightlda_feiga/example/data/20151001_65w_200k -data_capacity 500
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/tls/libthread_db.so.1".
[INFO] [2016-02-19 14:45:47] Actual Alias capacity: 5 MB
[INFO] [2016-02-19 14:45:47] loading model
[INFO] [2016-02-19 14:45:47] loading word topic table[server_0_table_0.model]
[INFO] [2016-02-19 14:45:47] loading summary table[server_0_table_1.model]
[INFO] [2016-02-19 14:45:47] block=0, Alias Time used: 0.11 s
[INFO] [2016-02-19 14:45:47] iter=0
[New Thread 0x40a00960 (LWP 10091)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x40a00960 (LWP 10091)]
0x000000000041bdde in multiverso::Row::At(int) ()

Can you help me? Thank you.

how can i dump the model to the specified directory?

Close this issue by accident...
The same as the last closed issue.
I found that maybe communicator can parse the config of lightlda to server, but only found "new Server(...)" in MPI version in some condition, am i right ?

How can I parse the config from lightlda without MPI version to server? Thank you.

Fail to build alias row, capacity of row = 0 Floating point exception

I use lightLDA to do new Document inference ,I changed new/Unseen Document to the libsvm file by the old vocabulary dictionary and generate datablock,then i read the mode server_0_table_0 and server_0_table_1.model
I use bin/infer to infer new Doc but got this
INFO] [2017-09-07 12:15:29] Actual Alias capacity: 50 MB
[INFO] [2017-09-07 12:15:29] loading model
[INFO] [2017-09-07 12:15:29] loading word topic table[server_0_table_0.model]
[INFO] [2017-09-07 12:15:31] loading summary table[server_0_table_1.model]
[ERROR] [2017-09-07 12:15:31] Fail to build alias row, capacity of row = 0
Floating point exception

Can some one helps me? Is it because there are new words in my new Doc ? but I think after change Doc to LibSVM ,there is no relevance between the Word Dictionary and the inference process.

how can i get the topic of each word in train set like GibbsLDA++

Remove unnecessary null pointer checks

Extra null pointer checks are not needed in a function like "AliasTable::Clear".

Fatal in alias index: word 46631 not exist

[INFO] [2016-06-16 04:22:00] Actual Model capacity: 186 MB, Alias capacity: 301 MB, Delta capacity: 256MB
[INFO] [2016-06-16 04:22:00] INFO: block = 0, the number of slice = 2
[INFO] [2016-06-16 04:22:00] Server 0 starts: num_workers=1 endpoint=inproc://server
[INFO] [2016-06-16 04:22:00] Server 0: Worker registratrion completed: workers=1 trainers=8 servers=1
[INFO] [2016-06-16 04:22:00] Rank 0/1: Multiverso initialized successfully.
[INFO] [2016-06-16 04:22:01] Rank 0/1: Begin of configuration and initialization.
[INFO] [2016-06-16 04:22:26] Rank 0/1: End of configration and initialization.
[INFO] [2016-06-16 04:22:26] Rank 0/1: Begin of training.
[DEBUG] [2016-06-16 04:22:26] Request params. start = 1, end = 162931
[INFO] [2016-06-16 04:22:27] Rank = 0, Iter = 0, Block = 0, Slice = 0
[DEBUG] [2016-06-16 04:22:27] Request params. start = 162932, end = 1099301
[INFO] [2016-06-16 04:22:27] Rank = 0, Alias Time used: 1.77 s
[INFO] [2016-06-16 04:23:01] Rank = 0, Training Time used: 270.57 s
[INFO] [2016-06-16 04:23:01] Rank = 0, sampling throughput: 21328.182632 (tokens/thread/sec)
[INFO] [2016-06-16 04:23:08] doc likelihood : -8.291677e+08
[INFO] [2016-06-16 04:23:08] word likelihood : 2.141731e+09
[INFO] [2016-06-16 04:23:08] Normalized likelihood : -7.755637e+09
[INFO] [2016-06-16 04:23:08] Rank = 0, Evaluation Time used: 59.11 s
[INFO] [2016-06-16 04:23:08] Rank = 0, Iter = 0, Block = 0, Slice = 1
[FATAL] [2016-06-16 04:23:09] Fatal in alias index: word 447 not exist
[FATAL] [2016-06-16 04:23:09] [FATAL] [2016-06-16 04:23:09] [FATAL] [2016-06-16 04:23:09] [FATAL] [2016-06-16 04:23:09] Fatal in alias index: word 429 not exist
Invalid topic assignment 934165176 from word proposal
[FATAL] [2016-06-16 04:23:09] Fatal in alias index: word 1202 not exist
Fatal in alias index: word 2893 not exist
[FATAL] [2016-06-16 04:23:09] Invalid topic assignment 412339115 from word proposal
[FATAL] [2016-06-16 04:23:09] Fatal in alias index: word 46631 not exist
Fatal in alias index: word 2946 not exist
[FATAL] [2016-06-16 04:23:09] Fatal in alias index: word 46631 not exist
[FATAL] [2016-06-16 04:23:09] Fatal in alias index: word 429 not exist

How to use LightLDA in distribute mode?

I just using LightLDA example in distribute mode, then command is below:
mpiexec -machinefile $root/machine.list $bin/lightlda -num_vocabs 111400 -num_topics 1000 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_servers 6 -num_local_workers 1 -num_blocks 1 -max_num_document 300000 -input_dir $dir -data_capacity 800
I add -machinefile params, -num_servers params, all the other params are same with nytimes.sh.
When I exec the comman I got below log(error) and I don't know why.I just copy the same data to 6 machines at same position;

[INFO] [2016-11-02 16:12:44] INFO: block = 0, the number of slice = 1
[INFO] [2016-11-02 16:12:54] INFO: block = 0, the number of slice = 1
[INFO] [2016-11-02 16:12:54] INFO: block = 0, the number of slice = 1
[INFO] [2016-11-02 16:12:54] INFO: block = 0, the number of slice = 1
[INFO] [2016-11-02 16:12:54] INFO: block = 0, the number of slice = 1
[INFO] [2016-11-02 16:12:54] INFO: block = 0, the number of slice = 1
[INFO] [2016-11-02 16:12:54] Server 2 starts: num_workers=6 endpoint=inproc://server
[INFO] [2016-11-02 16:12:54] Server 4 starts: num_workers=6 endpoint=inproc://server
[INFO] [2016-11-02 16:12:44] Server 0 starts: num_workers=6 endpoint=inproc://server
[INFO] [2016-11-02 16:12:54] Server 1 starts: num_workers=6 endpoint=inproc://server
[INFO] [2016-11-02 16:12:54] Server 3 starts: num_workers=6 endpoint=inproc://server
[INFO] [2016-11-02 16:12:54] Server 5 starts: num_workers=6 endpoint=inproc://server
[INFO] [2016-11-02 16:12:44] Server 0: Worker registratrion completed: workers=6 trainers=6 servers=6
[INFO] [2016-11-02 16:12:54] Rank 4/6: Multiverso initialized successfully.
[INFO] [2016-11-02 16:12:54] Rank 1/6: Multiverso initialized successfully.
[INFO] [2016-11-02 16:12:54] Rank 5/6: Multiverso initialized successfully.
[INFO] [2016-11-02 16:12:54] Rank 2/6: Multiverso initialized successfully.
[INFO] [2016-11-02 16:12:54] Rank 3/6: Multiverso initialized successfully.
[INFO] [2016-11-02 16:12:44] Rank 0/6: Multiverso initialized successfully.
[INFO] [2016-11-02 16:12:55] Rank 3/6: Begin of configuration and initialization.
[INFO] [2016-11-02 16:12:55] Rank 4/6: Begin of configuration and initialization.
[INFO] [2016-11-02 16:12:55] Rank 2/6: Begin of configuration and initialization.
[INFO] [2016-11-02 16:12:55] Rank 1/6: Begin of configuration and initialization.
[INFO] [2016-11-02 16:12:55] Rank 5/6: Begin of configuration and initialization.
[INFO] [2016-11-02 16:12:44] Rank 0/6: Begin of configuration and initialization.
[INFO] [2016-11-02 16:13:13] Rank 3/6: End of configration and initialization.
[INFO] [2016-11-02 16:13:02] Rank 0/6: End of configration and initialization.
[INFO] [2016-11-02 16:13:13] Rank 2/6: End of configration and initialization.
[INFO] [2016-11-02 16:13:13] Rank 1/6: End of configration and initialization.
[INFO] [2016-11-02 16:13:13] Rank 5/6: End of configration and initialization.
[INFO] [2016-11-02 16:13:13] Rank 2/6: Begin of training.
[INFO] [2016-11-02 16:13:13] Rank 3/6: Begin of training.
[INFO] [2016-11-02 16:13:13] Rank 4/6: End of configration and initialization.
[INFO] [2016-11-02 16:13:02] Rank 0/6: Begin of training.
[INFO] [2016-11-02 16:13:13] Rank 1/6: Begin of training.
[INFO] [2016-11-02 16:13:13] Rank 5/6: Begin of training.
[INFO] [2016-11-02 16:13:13] Rank 4/6: Begin of training.
[DEBUG] [2016-11-02 16:13:02] Request params. start = 0, end = 101635
[DEBUG] [2016-11-02 16:13:13] Request params. start = 0, end = 101635
[DEBUG] [2016-11-02 16:13:13] Request params. start = 0, end = 101635
[DEBUG] [2016-11-02 16:13:13] Request params. start = 0, end = 101635
[DEBUG] [2016-11-02 16:13:13] Request params. start = 0, end = 101635
[DEBUG] [2016-11-02 16:13:13] Request params. start = 0, end = 101635
[INFO] [2016-11-02 16:13:16] Rank = 2, Iter = 0, Block = 0, Slice = 0
[INFO] [2016-11-02 16:13:07] Rank = 0, Iter = 0, Block = 0, Slice = 0
[INFO] [2016-11-02 16:13:17] Rank = 2, Alias Time used: 5.49 s 
[FATAL] [2016-11-02 16:13:17] Invalid topic assignment 148893078 from word proposal
[FATAL] [2016-11-02 16:13:17] Invalid topic assignment 1228263461 from word proposal
[FATAL] [2016-11-02 16:13:17] Invalid topic assignment 8397506 from word proposal
...
[INFO] [2016-11-02 16:13:18] Rank = 4, Iter = 0, Block = 0, Slice = 0
[INFO] [2016-11-02 16:13:08] Rank = 0, Alias Time used: 5.51 s 
[INFO] [2016-11-02 16:13:19] Rank = 1, Iter = 0, Block = 0, Slice = 0
[INFO] [2016-11-02 16:13:19] Rank = 5, Iter = 0, Block = 0, Slice = 0
[INFO] [2016-11-02 16:13:19] Rank = 3, Iter = 0, Block = 0, Slice = 0
[INFO] [2016-11-02 16:13:20] Rank = 4, Alias Time used: 6.07 s 
[FATAL] [2016-11-02 16:13:20] Invalid topic assignment 148893078 from word proposal
...
[FATAL] [2016-11-02 16:13:21] Invalid topic assignment 8397506 from word proposal

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0@tttt05] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:0@tttt05] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@tttt05] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:1@tttt06] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:1@tttt06] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1@tttt06] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@tttt05] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@tttt05] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@tttt05] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for completion
[mpiexec@tttt05] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion

but nytimes.sh can be exec in single machine, so I want to know how to use LightLDA in distribute mode? thanks~

the code to save the topic-word distributin with file "server_0_table_0.model" is in which file?@feiga

The model file is saved in the file multiverso/src/multiverso/server.cpp with the function DumpModel() of line 408.

Please provide recommendations in the documentation for how to divide a corpus

Thanks for sharing this implementation!

How does the number of machines, amount of memory per machine, corpus size, and vocabulary size affect how a user should divide up a corpus?