biddata / bidmach Goto Github PK

CPU and GPU-accelerated Machine Learning Library

License: BSD 3-Clause "New" or "Revised" License

Shell 3.44% Java 2.32% C++ 7.12% Cuda 8.37% HTML 1.38% Julia 0.03% Python 4.14% Makefile 0.15% C 5.02% Scala 55.70% Batchfile 0.46% Lex 0.22% Jupyter Notebook 11.63% Lua 0.02%

bidmach's People

Contributors

Stargazers

Watchers

Forkers

chrishzhao aperera derrickchengresearch coderxiang mrpozzi jingtaow ccsevers mindis sfurmb skyuuka invinciblejha songlu dg2 xpontus mwiewior frictionlesscoin unclegen daishichao tell1 applied-duality yanshanjing nkhuyu mrgloom timesofbadri narayana1208 latuji 0rchard rtvt123 hihihippp yalechang njuhugn xiaofengleo lucosax bikash annazhou leochencipher weixiaohua jinbochen fangzheng354 pyyu yiyinianhua wuntoguo neveroldmilk lenovor anasrferreira yanqingmen chagge jacklone mr1azl ma0511 willch bordaw emergentorder jcallow xiaohuiyan codezixo sdochengxu coryschillaci yuanjef ryan-ki codeaudit klonikar gitter-badger directorscut82 shirisht ml-ai-nlp-ir ypkang ldfaiztt kotalikg schevalier huangkai0225 caelestor pkalipatnapu uhjish geneyoo cfregly jamesjia94 brivas chagri pedes cucdn cwnga bobbyjaros ikuo michaelparrottsc bin2000 rygbee sungsoo dlyshare uprokevin caomw qdrk cc13ny arita37 jihongma adomore xlpe davidsoloman xinleipan byzhang

bidmach's Issues

prediction accuracy in multi-targets

in the end of https://github.com/BIDData/BIDMach/wiki/Quickstart

there is an accuracy formula
val p = ctest * cx + (1 - ctest) * (1 - cx)

it makes me confused. What does this formula means? and the dimensions of ctest and cx are same, how can them multipy?

Thanks!

DNN.learner methods don't work

When trying to create a dnn with DNN.learner, the ADAGrad initialization step from Learner.init gives the error

java.lang.NullPointerException
  at BIDMach.updaters.ADAGrad$$anonfun$init$1.apply$mcVI$sp(ADAGrad.scala:34)
  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
  at BIDMach.updaters.ADAGrad.init(ADAGrad.scala:33)
  at BIDMach.Learner.init(Learner.scala:45)

This seems to be due to DNN waiting to initialize the modelmats until the first call to forward.

It looks like the correct way to set a DNN to use ADAGrad is to set aopts rather than adding an updater to the learner, is this correct?

Concurrent GPU access using BIDMat

Hi,

I'm trying to write unit tests with scalatest toolkit for my ML modules. When I switch CUDA on, I've got the following errors:
[info] - error. *** FAILED ***
[info] java.lang.RuntimeException: CUDA alloc failed global function call is not configured
[info] at BIDMat.GMat$.apply(GMat.scala:1094)
[info] at BIDMat.GMat$.zeros(GMat.scala:1064)
...
The error happens when I have more than 1 test classes. But there was no error when there was only one test class. Does it have something to do with using BIDMat to access CUDA simulaneously? I did not use the caching of BIDMat. Or the error happens when there were data coming in and going out of GPU memory at the same time. If so, is there any parameter that I can set so that there is no exception in such a situation? What should I do if I want to have multiple threads using BIDMat based modules? Definitely the data copy operations can happen any time in such as concurrent environment.

Lizhen

converting datasets error

The benchmarks of this project are amazing! I would like to use it.
cd scripts and ./getdata.sh, datasets can get successfully, but cant convert. There are the errors:
/xxxx/BIDMach/scripts/../bin/tparse.exe: No such file or directory
/tmp/scalacmd9053744229355367880.scala:2: error: not found: value BIDMat

Before download and convert datasets, I run ./sbt package and ./bidmach, both look right.
Thank you!

java.lang.RuntimeException: Cuda error in GSMat() too many resources requested for launch

Hi all,

I was using this AMI: http://tleyden.github.io/blog/2014/10/25/cuda-6-dot-5-on-aws-gpu-instance-running-ubuntu-14-dot-04/ for cuda environment on a aws gx2.2xlarge instance.
after download this bidmach bundle: http://bid2.berkeley.edu/bid-data-project/BIDMach_1.0.0-linux-x86_64.tar.gz, I scripts/getdata.sh download the data.
I stuck at nn.train method while training a sample data from movielens10M, see following:

Welcome to Scala version 2.11.2 (OpenJDK 64-Bit Server VM, Java 1.7.0_79).
Type in expressions to have them evaluated.
Type :help for more information.

scala> val a = loadSMat("data/movielens/train1.smat.lz4")
a: BIDMat.SMat =
( 172, 14574) 4
( 187, 14574) 4
( 195, 14574) 4
( 207, 14574) 4
( 215, 14574) 4
( 222, 14574) 3
( 224, 14574) 3
( 226, 14574) 3
... ... ...

scala> val (nn, opts) = NMF.learner(a)
nn: BIDMach.Learner = Learner(BIDMach.datasources.MatDS@36895c35,BIDMach.models.NMF@7404b78b,null,BIDMach.updaters.IncNorm@61ae422e,BIDMach.models.NMF$xopts$4@777b0c1b)
opts: BIDMach.Learner.Options with BIDMach.models.NMF.Opts with BIDMach.datasources.MatDS.Opts with BIDMach.updaters.IncNorm.Opts = BIDMach.models.NMF$xopts$4@777b0c1b

scala> nn.train
corpus perplexity=65134.014613
pass= 0
device is 0
java.lang.RuntimeException: Cuda error in GSMat() too many resources requested for launch
at BIDMat.GSMat$.apply(GSMat.scala:325)
at BIDMat.GSMat$.newOrCheckGSMat(GSMat.scala:480)
at BIDMat.GSMat$.newOrCheckGSMat(GSMat.scala:528)
at BIDMat.GSMat$.fromSMat(GSMat.scala:409)
at BIDMat.GSMat$.apply(GSMat.scala:330)
at BIDMach.models.Model$$anonfun$copyMats$1.apply$mcVI$sp(Model.scala:106)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at BIDMach.models.Model.copyMats(Model.scala:94)
at BIDMach.models.Model.doblockg(Model.scala:73)
at BIDMach.Learner.retrain(Learner.scala:83)
at BIDMach.Learner.train(Learner.scala:49)
... 33 elided

jni/src configure script paths

When the configure script in jni/src is run, it creates 'Makefile.incl'. On Ubuntu 14.04 it has the following issues:

nvcc isn't in the path, but it is in the default cuda install folder. If properly detected, NVCC=/usr/local/cuda/bin/nvcc works.
Java is not installed in /usr/java/default/. Instead it is in /usr/lib/jvm/default-java. This impacts both the include and lib settings.

DNN learnerX/dlayers won't run with sparse targets

val (nn,opts)=DNN.learnerX(loadSMat(indir+"trainData000.smat.lz4"),FMat(loadSMat(indir+"trainLabels000.smat.lz4")));

opts.aopts = opts;
opts.featType = 1;               // (1) feature type, 0=binary, 1=linear
opts.addConstFeat = false;        // add a constant feature (effectively adds a $\beta_0$ term to $X\beta$)
opts.batchSize=500;
opts.reg1weight = 0.0001;
opts.lrate = 0.2f;
opts.texp = 0.4f;
opts.npasses = 5;
opts.links = iones(132,1);

DNN.dlayers(3,100,0.25f,132,opts,2);

nn.train;

runs just fine, but if I don't cast the targets as an FMat, i.e.,

val (nn,opts)=DNN.learnerX(loadSMat(indir+"trainData000.smat.lz4"),loadSMat(indir+"trainLabels000.smat.lz4"));

opts.aopts = opts;
opts.featType = 1;               // (1) feature type, 0=binary, 1=linear
opts.addConstFeat = false;        // add a constant feature (effectively adds a $\beta_0$ term to $X\beta$)
opts.batchSize=500;
opts.reg1weight = 0.0001;
opts.lrate = 0.2f;
opts.texp = 0.4f;
opts.npasses = 5;
opts.links = iones(132,1);

DNN.dlayers(3,100,0.25f,132,opts,2);

nn.train;

gives

scala> nn.train;
pass= 0
scala.MatchError: (           1     0.59643           1     0.54870     0.96072           1           1     0.97798...
           1  0.00054444  4.6897e-12     0.93011     0.99481     0.98594  4.0819e-37     0.99795...
     0.99987  1.1189e-14   0.0054939     0.96049     0.96663  0.00032491           0    0.010586...
  1.2195e-06  0.00018742  3.1474e-09     0.28522     0.54711  5.9957e-10           0     0.92651...
  0.00010903           1           1     0.31402     0.10718    0.016666           1     0.68628...
           1  9.1819e-18     0.99999    0.032690     0.99763      1.0000           1     0.94152...
          ..          ..          ..          ..          ..          ..          ..          ..
,(  14,   0)   1
(   6,   1)   1
(  40,   2)   1
(  59,   3)   1
(  46,   4)   1
(   1,   5)   1
(  94,   6)   1
(  60,   7)   1
  ...  ...  ...
,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) (of class scala.Tuple3)
  at BIDMach.models.GLM$.derivs(GLM.scala:475)
  at BIDMach.networks.DNN$GLMLayer.backward(DNN.scala:347)
  at BIDMach.networks.DNN$Layer.backward(DNN.scala:230)
  at BIDMach.networks.DNN.dobatch(DNN.scala:161)
  at BIDMach.models.Model.dobatchg(Model.scala:101)
  at BIDMach.Learner.retrain(Learner.scala:87)
  at BIDMach.Learner.train(Learner.scala:53)
  ... 33 elided

getdata.sh fails if the file path contains a space

Running the getdata script for the tutorials doesn't work if there is a space in the filepath, e.g. if the BIDMach distribution is in a folder called /Users/myname/Desktop/Stress\ Project/bidmach then running getdata.sh gives the following errors:

getdata.sh: line 17: /Users/myname/Desktop/Stress: is a directory
getdata.sh: line 19: /Users/myname/Desktop/Stress: is a directory
getdata.sh: line 21: /Users/myname/Desktop/Stress: is a directory
getdata.sh: line 25: /Users/myname/Desktop/Stress: is a directory
getdata.sh: line 27: /Users/myname/Desktop/Stress: is a directory

AWS image for 1.0.3 and loading trained models

Is there a plan for publishing AWS image for 1.0.3? We have tried creating one but ran into much of problems.
One of the main thing I was looking for using predictor for KMeans. predictor method is missing in 1.0.So, I can use 1.0 to train the models and use 1.0.3 for prediction. Is there a way to save the model created using 1.0 and use it for prediction in 1.0.3? I was able to save the model in 1.0. I couldn't find any example of how to load an existing model (for KMeans).

Regards

scripts/getdata.sh: Bad substitution error

When I try to run any of the get(...) scripts, I get "bad substitution" error:

~/software/BIDMach_0.9.0-linux-x86_64/scripts$ ./getdigits.sh
./getdigits.sh: 3: ./getdigits.sh: Bad substitution

Changing
BIDMACH_SCRIPTS="${BASH_SOURCE[0]}"

to
BIDMACH_SCRIPTS="${BASH_SOURCE}"

seems to help, although later I get the errors:

(...)
Scanning lyrl2004_tokens_test_pt0.dat.gz
1490963 lines
Scanning lyrl2004_tokens_test_pt1.dat.gz
1501165 lines
Scanning lyrl2004_tokens_test_pt2.dat.gz
1489662 lines
Scanning lyrl2004_tokens_test_pt3.dat.gz
1390993 lines
Scanning lyrl2004_tokens_train.dat.gz
171542 lines
Writing Dictionary
2606875 lines processed
./getrcv1.sh: 66: ./getrcv1.sh: bidmach: not found
Loading nips data
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (77) error setting certificate verify locations:
CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
Uncompressing docword.nips.txt.gz
gzip: docword.nips.txt.gz: No such file or directory
0 lines processed
./getuci.sh: 31: ./getuci.sh: bidmach: not found
clearing up
(...)

DNN learnerX doesn't respect addConstFeat option

It seems like building a dnn learner using the learnerX method with addConstFeat=true doesn't actually add a constant feature.

val (nn,opts)= DNN.learnerX(indir+"trainData%03d.smat.lz4",indir+"trainLabels%03d.fmat.lz4");

opts.aopts = opts; 

opts.featType = 1;              
opts.addConstFeat = true;       

opts.batchSize=5000;
opts.reg1weight = 0.0001;
opts.lrate = 0.5f;
opts.texp = 0.4f;
opts.npasses = 1;

opts.links = iones(nrValidMoods,1);

DNN.dlayers(5,200,0.5f,132,opts,2);

After training using nn.train, I run

size(nn.modelmats(0))
res17: (Int, Int) = (200,75000)

The input data has 75000 features, so with the const feature this should be (200,75001). Note that predict behaves as expected and adds the const feature, so with this option nn.predict gives a dimension mismatch error.

autoReset default values

This is not really an issue, but more of an unintuitive behavior.

When creating a (GLM) learner and predictor in the same function, we get by default: options.autoReset = false (for both learner opts and predictor opts).
On the other hand, when creating a GLM learner and GLM predictor independently, we get by default options.autoReset = true for both of them. This means that GLM predictor will reset the model weights (modelmat) to zero after running "predict" once. Should a predictor be able to reset the model weights resulting from a learner?

Instructions for setting up clone of BIDMach repo

Should be able to just copy the lib directory from the tarball, but this needs to be documented?

Integration with Spark?

I heard that u have integrated with Caffee, however, Caffee supports GPU as well.
Could you please list the reasons why need BIDMach for Caffee?
What's more, Support the Spark Integration?
How to integrate with Spark?

Start a Gitter Chat Room for BIDMach / BIDMat

Please start a https://gitter.im/ Chat Room for BIDMach / BIDMat. Documentation for the projects is sparse, and adoption will remain limited unless the user community has an opportunity to interact with the project authors and other community members.

This approach is used very successfully by a large number of github-hosted projects.

size of grid

for compute capacity 3.0 or above, the maximum of grid dimension are (x,y,z = 2147483647,65535,65535)

the code jni/src/GLM.cu
line 78,79
gridp->y = 1 + (nblocks-1)/65536;
gridp->x = 1 + (nblocks-1)/gridp->y;

if nblocks is very large, then gridp->y would exceed

"CUDA alloc failed initialization error" when calling mm.train the 2nd time

I go through the quickstart example on Windows 7. When I try to call mm.train the second time, I get the following error. I need to exit bidmach and run it anew to be able to train again.

scala> mm.train
corpus perplexity=5582,125391
pass= 0
2,00%, ll=-0,693, gf=0,116, secs=6,7, GB=0,02, MB/s= 2,86, GPUmem=0,03
16,00%, ll=-0,134, gf=0,630, secs=15,0, GB=0,12, MB/s= 8,10, GPUmem=0,03
30,00%, ll=-0,123, gf=0,825, secs=21,9, GB=0,22, MB/s=10,16, GPUmem=0,02
44,00%, ll=-0,102, gf=0,930, secs=28,7, GB=0,33, MB/s=11,31, GPUmem=0,02
58,00%, ll=-0,094, gf=0,995, secs=35,6, GB=0,43, MB/s=12,04, GPUmem=0,02
72,00%, ll=-0,074, gf=1,040, secs=42,4, GB=0,53, MB/s=12,49, GPUmem=0,02
87,00%, ll=-0,085, gf=1,075, secs=49,1, GB=0,63, MB/s=12,89, GPUmem=0,02
100,00%, ll=-0,069, gf=1,097, secs=55,8, GB=0,73, MB/s=13,02, GPUmem=0,02
Time=55,8000 secs, gflops=1,10

scala> mm.train
corpus perplexity=5582,125391
java.lang.RuntimeException: CUDA alloc failed initialization error
at BIDMat.GMat$.apply(GMat.scala:1094)
at BIDMat.GMat$.newOrCheckGMat(GMat.scala:1780)
at BIDMat.GMat$.newOrCheckGMat(GMat.scala:1814)
at BIDMat.GMat$.apply(GMat.scala:1100)
at BIDMach.models.RegressionModel.init(Regression.scala:29)
at BIDMach.models.GLM.init(GLM.scala:25)
at BIDMach.Learner.init(Learner.scala:37)
at BIDMach.Learner.train(Learner.scala:45)
at .(:26)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)

    at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)

    at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
    at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
    at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:8

at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILo

op.scala:882)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scal
a:837)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scal
a:837)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClass
Loader.scala:135)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala
:83)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)

    at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
    at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

Quickstart Errors

Just downloaded and trying to run the Quickstart steps. I'm not clear on whether it's sufficient to just install the executable bundle download and have JRE or whether I need to install IPython/Scala as well.

I get this error when running ./bidmach:
Error: Could not find or load main class scala.tools.nsc.MainGenericRunner

I'm getting this in Cygwin on running getdata.sh:
Loading RCV1 v2 data
/cygdrive/e/BidMach/BIDMach_1.0.0-win-x86_64/data/rcv1
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getrcv1.sh: line 29: wget: command not found
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getrcv1.sh: line 29: wget: command not found
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getrcv1.sh: line 29: wget: command not found
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getrcv1.sh: line 29: wget: command not found
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getrcv1.sh: line 38: wget: command not found
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getrcv1.sh: line 44: wget: command not found
gzip: lyrl2004_tokens_test_pt0.dat.gz: No such file or directory
gzip: lyrl2004_tokens_test_pt1.dat.gz: No such file or directory
gzip: lyrl2004_tokens_test_pt2.dat.gz: No such file or directory
gzip: lyrl2004_tokens_test_pt3.dat.gz: No such file or directory
gzip: lyrl2004_tokens_train.dat.gz: No such file or directory
Error: Could not find or load main class scala.tools.nsc.MainGenericRunner
Loading nips data
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getuci.sh: line 26: wget: command not found
mv: cannot stat ‘docword.nips.txt.gz’: No such file or directory
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getuci.sh: line 30: wget: command not found
mv: cannot stat ‘vocab.nips.txt’: No such file or directory
Uncompressing docword.nips.txt.gz
gzip: docword.txt.gz: No such file or directory
Error: Could not find or load main class scala.tools.nsc.MainGenericRunner
mv: cannot stat ‘smat.lz4’: No such file or directory
mv: cannot stat ‘term.sbmat.gz’: No such file or directory
mv: cannot stat ‘term.imat.gz’: No such file or directory
clearing up
Loading nytimes data
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getuci.sh: line 26: wget: command not found
mv: cannot stat ‘docword.nytimes.txt.gz’: No such file or directory
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getuci.sh: line 30: wget: command not found
mv: cannot stat ‘vocab.nytimes.txt’: No such file or directory
Uncompressing docword.nytimes.txt.gz
gzip: docword.txt.gz: No such file or directory
Error: Could not find or load main class scala.tools.nsc.MainGenericRunner
mv: cannot stat ‘smat.lz4’: No such file or directory
mv: cannot stat ‘term.sbmat.gz’: No such file or directory
mv: cannot stat ‘term.imat.gz’: No such file or directory
clearing up
Loading arabic digits data
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getdigits.sh: line 25: wget: command not found
sed: can't read Train_Arabic_Digit.txt: No such file or directory
Error: Could not find or load main class scala.tools.nsc.MainGenericRunner
Loading movielens 10M data
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getmovies.sh: line 26: wget: command not found
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getmovies.sh: line 29: unzip: command not found
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getmovies.sh: line 30: cd: ml-10M100K: No such file or directory
e:/BidMach/BIDMach_1.0.0-win-x86_64/scripts/getmovies.sh: line 31: ./split_ratings.sh: No such file or directory
mv: cannot stat ‘r1.train’: No such file or directory
mv: cannot stat ‘r1.test’: No such file or directory
mv: cannot stat ‘r2.train’: No such file or directory
mv: cannot stat ‘r2.test’: No such file or directory
mv: cannot stat ‘r3.train’: No such file or directory
mv: cannot stat ‘r3.test’: No such file or directory
mv: cannot stat ‘r4.train’: No such file or directory
mv: cannot stat ‘r4.test’: No such file or directory
mv: cannot stat ‘r5.train’: No such file or directory
mv: cannot stat ‘r5.test’: No such file or directory
mv: cannot stat ‘ra.train’: No such file or directory
mv: cannot stat ‘ra.test’: No such file or directory
mv: cannot stat ‘rb.train’: No such file or directory
mv: cannot stat ‘rb.test’: No such file or directory
Error: Could not find or load main class scala.tools.nsc.MainGenericRunner

StackDS stuck

Hi,
i would like to process some large libSVM files which do not fit into memory. I am trying to use a stacked data source for that. But the program gets stuck on accessing the first chunk of my StackedDS object. Accessing the data sources unstacked seems to work fine

Here is a script demonstrating the problem. It creates a small libSVM file for input
https://gist.github.com/hcbraun/334d3e9c8da7959d0f37#file-svmstack-scala

This is what i am trying to do:

reading the libSVM file with loadLibSVM
writing the training data to SMat file (saveSMat)
writing the labels to FMat file (saveFMat)
building SFilesDS from the training SMat files
building FilesDS from the label FMat files
combining both data sources with StackDS
run a learner with logistic regression

Any suggestions would be appreciated

Best,
Christian

getdata.sh either too slow or non-functional on parsing docword.nips

This error originates from the script getuci.sh.

I verified that the URLs used in the wget requests were valid. The problem seems to be when running the tparse script on this line.

It might be a good idea to have tparse output some error messages for us to know what's going on.

Opening iScala via bidmach notebook leads to kernel crash

I downloaded BIDMach 1.0.0 for 64-bit Mac OSX

I'm using OSX Yosemite 10.10.2 with iPython 2.4.1

After expanding, I run

./bidmach notebook

Shortly after opening any of the iScala tutorial notebooks I get an error message:

The kernel appears to have died. It will restart automatically."

and then eventually

"The kernel has died, and the automatic restart has failed. It is possible the kernel cannot be restarted. If you are not able to restart the kernel, you will still be able to save the notebook, but running code will no longer work until the notebook is reopened."

The command line output is

2015-02-20 14:01:07.815 [NotebookApp] Using existing profile dir: u'/Users/coryschillaci/.ipython/profile_scala'
2015-02-20 14:01:07.823 [NotebookApp] Using MathJax from CDN: https://cdn.mathjax.org/mathjax/latest/MathJax.js
2015-02-20 14:01:07.847 [NotebookApp] Serving notebooks from local directory: /Users/coryschillaci/Desktop/Stress Project/BIDMach_1.0.0-osx-x86_64
2015-02-20 14:01:07.847 [NotebookApp] 0 active kernels
2015-02-20 14:01:07.848 [NotebookApp] The IPython Notebook is running at: http://localhost:8888/
2015-02-20 14:01:07.848 [NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
2015-02-20 14:01:15.166 [NotebookApp] Kernel started: ef8b0c9b-34f5-4397-b931-67d0c5684209
Error: Could not find or load main class org.refptr.iscala.IScala
2015-02-20 14:01:18.169 [NotebookApp] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel ef8b0c9b-34f5-4397-b931-67d0c5684209 restarted
Error: Could not find or load main class org.refptr.iscala.IScala
2015-02-20 14:01:21.176 [NotebookApp] KernelRestarter: restarting kernel (2/5)
WARNING:root:kernel ef8b0c9b-34f5-4397-b931-67d0c5684209 restarted
Error: Could not find or load main class org.refptr.iscala.IScala
2015-02-20 14:01:24.185 [NotebookApp] KernelRestarter: restarting kernel (3/5)
WARNING:root:kernel ef8b0c9b-34f5-4397-b931-67d0c5684209 restarted
Error: Could not find or load main class org.refptr.iscala.IScala
2015-02-20 14:01:27.194 [NotebookApp] KernelRestarter: restarting kernel (4/5)
WARNING:root:kernel ef8b0c9b-34f5-4397-b931-67d0c5684209 restarted
Error: Could not find or load main class org.refptr.iscala.IScala
2015-02-20 14:01:30.207 [NotebookApp] WARNING | KernelRestarter: restart failed
2015-02-20 14:01:30.207 [NotebookApp] WARNING | Kernel ef8b0c9b-34f5-4397-b931-67d0c5684209 died, removing from map.
ERROR:root:kernel ef8b0c9b-34f5-4397-b931-67d0c5684209 restarted failed!

exec a program in command

I write a src/main/scala/exp/lrexp.scala file:
////////////////////////////////////
package exp

import BIDMat.{CMat,CSMat,DMat,Dict,IDict,Image,FMat,FND,GMat,GIMat,GSMat,HMat,IMat,Mat,SMat,SBMat,SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.Plotting._
import BIDMach.Learner
import BIDMach.models.{FM,GLM,KMeans,KMeansw,LDA,LDAgibbs,Model,NMF,SFA}
import BIDMach.datasources.{DataSource,MatDS,FilesDS,SFilesDS}
import BIDMach.mixins.{CosineSim,Perplexity,Top,L1Regularizer,L2Regularizer}
import BIDMach.updaters.{ADAGrad,Batch,BatchNorm,IncMult,IncNorm,Telescoping}
import BIDMach.causal.{IPTW}

object LRExp {
def main(args:Array[String]) = {
println("LRExp")
val a = grand(20,30)
println(a)
}
}
///////////////////////////////////////

./sbt package
generate a jar

and then typing
./sbt "runMain exp.LRExp"

println("LRExp") can be executed successfully

then appear errors:
error java.lang.UnsatisfiedLinkError: Could not load the native library.
[error] Error while loading native library "bidmatmkl-linux-x86_64" with base name "bidmatmkl"
[error] Operating system name: Linux
[error] Architecture : amd64
[error] Architecture bit size: 64
[error] Stack trace from the attempt to load the library as a resource:
[error] java.lang.NullPointerException: No resource found with name '/lib/libbidmatmkl-linux-x86_64.so'
[error] at jcuda.LibUtils.loadLibraryResource(LibUtils.java:149)

however, in interactive shells, scala scripts can run successfully:

./bidmach
val a = grand(20,30)

how can I run the program in a command not in interactive?
Thanks!

getnativepath issues

Running ./bidmach notebook from the latest distribution, the terminal output is:

Error: Could not find or load main class getnativepath
2015-02-25 15:06:18.968 [NotebookApp] Using existing profile dir: u'/Users/coryschillaci/.ipython/profile_scala'
2015-02-25 15:06:18.978 [NotebookApp] Using MathJax from CDN: https://cdn.mathjax.org/mathjax/latest/MathJax.js
2015-02-25 15:06:19.008 [NotebookApp] Serving notebooks from local directory: /Users/coryschillaci/Desktop/StressProject/BIDMach_1.0.0-osx-x86_64
2015-02-25 15:06:19.008 [NotebookApp] 0 active kernels
2015-02-25 15:06:19.008 [NotebookApp] The IPython Notebook is running at: http://localhost:8888/
2015-02-25 15:06:19.008 [NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
^C2015-02-25 15:06:42.219 [NotebookApp] interrupted
Serving notebooks from local directory: /Users/coryschillaci/Desktop/StressProject/BIDMach_1.0.0-osx-x86_64
0 active kernels
The IPython Notebook is running at: http://localhost:8888/

When I run the BIDMach_intro.pynb tutorial, the first cell output is

Cant find native CPU libraries
Cant find native HDF5 library
Couldnt load CUDA runtime
Out[1]:
()

The first error occurs when I run the section called "Transposed Multiplies," the first input gives

java.lang.UnsatisfiedLinkError: Could not load the native library.
Error while loading native library "bidmatmkl-apple-x86_64" with base name "bidmatmkl"
Operating system name: Mac OS X
Architecture : x86_64
Architecture bit size: 64
Stack trace from the attempt to load the library as a resource:
java.lang.NullPointerException: No resource found with name '/lib/libbidmatmkl-apple-x86_64.jnilib'
at jcuda.LibUtils.loadLibraryResource(LibUtils.java:149)
at jcuda.LibUtils.loadLibrary(LibUtils.java:83)
at edu.berkeley.bid.CBLAS.(CBLAS.java:8)
at BIDMat.FMat.Tmult(FMat.scala:565)
at BIDMat.FPair.Tx(FMat.scala:1113)
at BIDMat.Mop_TTimes$.op(Operators.scala:370)
at BIDMat.Mop$class.op(Operators.scala:37)
at BIDMat.Mop_TTimes$.op(Operators.scala:368)
at BIDMat.FMat.$up$times(FMat.scala:888)
at .(:48)
at .()
at .$result$lzycompute(:5)
at .$result(:5)
at $result()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:739)
at org.refptr.iscala.Interpreter$$anonfun$9.apply(Interpreter.scala:206)
at org.refptr.iscala.Interpreter.withException(Interpreter.scala:101)
at org.refptr.iscala.Interpreter.loadAndRunReq(Interpreter.scala:206)
at org.refptr.iscala.Interpreter$$anonfun$interpret$1.apply(Interpreter.scala:245)
at org.refptr.iscala.Interpreter$$anonfun$interpret$1.apply(Interpreter.scala:245)
at org.refptr.iscala.Runner$Execution$$anonfun$1.apply$mcV$sp(Runner.scala:28)
at org.refptr.iscala.IOUtil$$anon$2.run(Util.scala:21)
at java.lang.Thread.run(Thread.java:745)
Stack trace from the attempt to load the library as a file:
java.lang.UnsatisfiedLinkError: no bidmatmkl-apple-x86_64 in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
at java.lang.Runtime.loadLibrary0(Runtime.java:849)
at java.lang.System.loadLibrary(System.java:1088)
at jcuda.LibUtils.loadLibrary(LibUtils.java:94)
at edu.berkeley.bid.CBLAS.(CBLAS.java:8)
at BIDMat.FMat.Tmult(FMat.scala:565)
at BIDMat.FPair.Tx(FMat.scala:1113)
at BIDMat.Mop_TTimes$.op(Operators.scala:370)
at BIDMat.Mop$class.op(Operators.scala:37)
at BIDMat.Mop_TTimes$.op(Operators.scala:368)
at BIDMat.FMat.$up$times(FMat.scala:888)
at .(:48)
at .()
at .$result$lzycompute(:5)
at .$result(:5)
at $result()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:739)
at org.refptr.iscala.Interpreter$$anonfun$9.apply(Interpreter.scala:206)
at org.refptr.iscala.Interpreter.withException(Interpreter.scala:101)
at org.refptr.iscala.Interpreter.loadAndRunReq(Interpreter.scala:206)
at org.refptr.iscala.Interpreter$$anonfun$interpret$1.apply(Interpreter.scala:245)
at org.refptr.iscala.Interpreter$$anonfun$interpret$1.apply(Interpreter.scala:245)
at org.refptr.iscala.Runner$Execution$$anonfun$1.apply$mcV$sp(Runner.scala:28)
at org.refptr.iscala.IOUtil$$anon$2.run(Util.scala:21)
at java.lang.Thread.run(Thread.java:745)

jcuda.LibUtils.loadLibrary(LibUtils.java:126)
edu.berkeley.bid.CBLAS.(CBLAS.java:8)
BIDMat.FMat.Tmult(FMat.scala:565)
BIDMat.FPair.Tx(FMat.scala:1113)
BIDMat.Mop_TTimes$.op(Operators.scala:370)
BIDMat.Mop$class.op(Operators.scala:37)
BIDMat.Mop_TTimes$.op(Operators.scala:368)
BIDMat.FMat.$up$times(FMat.scala:888)

Any plans for word2vec implementation?

Hi,
Are there any plans to build word2vec implementation?

Thanks

Re-emergence of docword bug

It seems that the scripts/getdata.sh script has resurfaced this old issue with a bad path. I think this is an easy fix?

How to train a binary LR model?

as in Quickstart write, bidmach can train 103 models all at once if the train data has 103 targets.
if I want to train a binary LR model, the label matrix is as follows:

0 1 1 0 ....
1 0 0 1 ....

rows = 2, cols = train instances

then bidmach will train two models or one model?

there is another problem:
i use loadLibSVM to load libsvm's train file,
the train file format is:
label index1:value1 index2:value2 ...

val(a,c,_)=loadLibSVM(train_file, feat_num)
val (nn, opts) =GLM.learner(a, c, 1)
nn.train

then collapse, the error smg is "scala.MatchError:"
i find in HMat.scala, loadLibSVM just put all the labels of train instances in a array, not a matrix

Is there any funtion to load libsvm train files?

Thanks!

BIDMach_intro tutorial issue

When I run the first cell of BIDMach_intro I get the following result even though I have CUDA 6.5 installed:

Cant find native CPU libraries
Cant find native HDF5 library
Couldnt load CUDA runtime

Am I missing environment variables?

saveGoogleW2V is not a member of BIDMach.networks.Word2Vec

I'm attempting to save a model as a GoogleW2V. I've pulled the latest code (from June 2015) and compiled. However, when attempting to use saveGoogleW2V I get a not a member error.

I can see in the code I compiled the method in the Word2Vec companion object. I'm not that familiar with Scala, but thought I should have been able to call the method in a similar manner to a Java static:

Word2Vec.saveGoogleW2V(...)

Am I misunderstanding how companion object methods work? Or is this a bug of some sort?

How do I run PageRank and Matrix Factorization

Hi,

I'm new here. The benchmarks look amazing. I'm interested in trying PageRank and Matrix Factorization on GPU. Do you have any documentations about running these two applications? Could you also point to me the CUDA code of these two applications?

Thanks,
Cui

colslice index out of range in RF learner

I am trying to get the precompiled BIDMach_1.0.3-linux-86_64/bidmach to work with RF binary classification on a sample dataset, but getting an error:

scala> val trg  = loadIMat("../../RF_project/gisette/gisette_train.data.txt")
trg: BIDMat.IMat =
  550    0  495    0    0    0    0  976    0    0    0    0  983    0  995    0  983    0    0  983    0    0    0    0...
    0    0    0    0    0    0    0  976    0    0    0    0    0    0  584    0    0    0    0    0    0    0    0    0...
    0    0    0    0    0    0    0    0    0    0    0    0  983    0  995  983  976    0    0    0    0    0    0    0...
    0    0  742    0    0    0    0  684    0  956    0    0  983    0  991  816  983    0    0    0    0    0    0    0...
    0    0    0    0    0    0    0  608    0  979    0    0    0    0  972    0    0    0    0    0    0    0    0  480...
   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..

scala> val (nn,mopts) =RandomForest.learner(trg,yg)
nn: BIDMach.Learner = Learner(BIDMach.datasources.MatDS@61d91a71,BIDMach.models.RandomForest@745722e6,null,BIDMach.updaters.Batch@4b465b6,BIDMach.models.RandomForest$RFSopts@5f819223)
mopts: BIDMach.models.RandomForest.RFSopts = BIDMach.models.RandomForest$RFSopts@5f819223

scala> nn.train
java.lang.RuntimeException: colslice index out of range 5000 1
  at BIDMat.DenseMat$mcI$sp.gcolslice$mcI$sp(DenseMat.scala:469)
  at BIDMat.IMat.colslice(IMat.scala:99)
  at BIDMat.IMat.colslice(IMat.scala:7)
  at BIDMach.datasources.MatDS$$anonfun$next$1.apply$mcVI$sp(MatDS.scala:43)
  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
  at BIDMach.datasources.MatDS.next(MatDS.scala:41)
  at BIDMach.models.Model.bind(Model.scala:81)
  at BIDMach.Learner.init(Learner.scala:42)
  at BIDMach.Learner.train(Learner.scala:52)
  ... 33 elided

Train dataset is space separated integer data from UCI (gisette dataset). I can't believe that column slicing does not work for RF in BIDMach, so I must be doing something obviously wrong. Notably RF.learner does not take the same data in floating point, even though the BIDMach API doc says that discrete (integer) values are only needed for regression testing.

Any advice?

parallel graph processing with BIDmach?

Hi. Is it possible to do parallel graph processing using BIDMach? Specifically I'm looking to implement belief propagation on factor graphs.

java.lang.OutOfMemoryError: Java heap space

What should I do if I want to load a slightly bigger data like yahoo music and convert it to sparse matrix? Must I first split the data into pieces? Because if I directly convert the data as follows there will be a java heap problem.

Thanks a lot!

BIDMach_1.0.0-linux-x86_64/bidmach getdata.ssc
Loading /home/ubuntu/BIDMach_1.0.0-linux-x86_64/lib/bidmach_init.scala...
import BIDMat.{CMat, CSMat, DMat, Dict, FMat, FND, GMat, GDMat, GIMat, GLMat, GSMat, GSDMat, HMat, IDict, Image, IMat, LMat, Mat, SMat, SBMat, SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.Plotting._
import BIDMach.Learner
import BIDMach.models.{DNN, FM, GLM, KMeans, KMeansw, LDA, LDAgibbs, Model, NMF, SFA, RandomForest}
import BIDMach.datasources.{DataSource, MatDS, FilesDS, SFilesDS}
import BIDMach.mixins.{CosineSim, Perplexity, Top, L1Regularizer, L2Regularizer}
import BIDMach.updaters.{ADAGrad, Batch, BatchNorm, IncMult, IncNorm, Telescoping}
import BIDMach.causal.IPTW
4 CUDA devices found, CUDA version 6.5

Loading getdata.ssc...
nusers: Int = 1000990
nmovies: Int = 624961
nu: Int = 480189
nm: Int = 17770
ubuntu@ip-10-97-178-101:$ vim getdata.ssc
ubuntu@ip-10-97-178-101:$ BIDMach_1.0.0-linux-x86_64/bidmach getdata.ssc
Loading /home/ubuntu/BIDMach_1.0.0-linux-x86_64/lib/bidmach_init.scala...
import BIDMat.{CMat, CSMat, DMat, Dict, FMat, FND, GMat, GDMat, GIMat, GLMat, GSMat, GSDMat, HMat, IDict, Image, IMat, LMat, Mat, SMat, SBMat, SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.Plotting._
import BIDMach.Learner
import BIDMach.models.{DNN, FM, GLM, KMeans, KMeansw, LDA, LDAgibbs, Model, NMF, SFA, RandomForest}
import BIDMach.datasources.{DataSource, MatDS, FilesDS, SFilesDS}
import BIDMach.mixins.{CosineSim, Perplexity, Top, L1Regularizer, L2Regularizer}
import BIDMach.updaters.{ADAGrad, Batch, BatchNorm, IncMult, IncNorm, Telescoping}
import BIDMach.causal.IPTW
4 CUDA devices found, CUDA version 6.5

Loading getdata.ssc...
nusers: Int = 1000990
nmovies: Int = 624961
a: BIDMat.DMat =
1 507697 5.5000 1
1 137916 5.5000 1
1 22758 5.5000 1
1 120329 5.5000 1
.. .. .. ..
java.lang.OutOfMemoryError: Java heap space
at BIDMat.SparseMat$.sparseImpl$mFc$sp(SparseMat.scala:822)
at BIDMat.MatFunctions$.sparse(MatFunctions.scala:1238)
... 30 elided
:27: error: not found: value sa
sa.check
^
:27: error: not found: value sa
saveSMat("yahoo_train.smat.lz4", sa);

xmltweet.exe includes path of input file

dav@mercury:~⟫ /opt/BIDMach_1.0.0-full-linux-x86_64/bin/xmltweet.exe -i /var/local/destress/lj-annex/data/events/aa/aanniieed.xml -o ./aanniieed

Scanning /var/local/destress/lj-annex/data/events/aa/aanniieed.xml
00002 linesCouldnt open output file ./aanniieed/var/local/destress/lj-annex/data/events/aa/aanniieed.xml.imat
terminate called without an active exception
Aborted (core dumped)

Predictor for LDA

Is a predictor function planned for LDA?

Could you please show the support list for distributed Deep Learning networks?

Could you please show the support list for distributed Deep Learning networks?
How to do the Data Partition/Model Partition in BIDMACH

sbt fails to run

Steps:

git clone https://github.com/BIDData/BIDMach.git
cd BIDMach
./sbt package
Result:
Error: Unable to access jarfile lib/sbt-launch.jar

Note: While that jar is included in the release tar, it isn't in the repo so building from the repo doesn't work.

C configure script error

Steps:

cd src/main/C/newparse/
./configure
Result:
./configure: 3: ./configure: [[: not found
./configure: 32: ./configure: [[: not found

Note: At least on Ubuntu 14.04 /bin/sh is not bash and the script assumes that it is.

run bidmach on cluster

Hi,

I got three commodity computer with GPU, and I want to deploy bidmach on cluster of this three computer, is there a way to do this ? I go through the project home and I found no tutorial about this. Is bidmach works for a cluster? thank you.

Getting started Question

I am just getting started. When I try to run BIDMach root/bidmach I get this error.
Error: Could not find or load main class scala.tools.nsc.MainGenericRunner

I have scala 2.10.5 installed and am on Ubuntu 14.04. Is there a flag I need to add to fix this error?

How can I predict with a FM model?

I was able to train a FM model, but how do I predict with it?

The GLM model has a learner() method to create two leaners, one for training and one for predicting. I can use that to train and test with out of sample data.

But the FM model does not have the same method and it is not clear from the source code how to proceed.

Any suggestions would be appreciated.

Best,
-Doug

Where is the function cat2sparse( )?

In the wiki Data Wrangling section function cat2sparse( ) mentionde, but it can not be founded. where is it?

IPython notebook in tutorials report that the 'kernel died'

A minor annoyance: when opening the tutorials (via './bidmach notebook') and browsing to the tutorials then clicking any one tutorial, we get this message from IPython.

Followed by this:

I looked at this and it might be a false alarm from IPython and there is a suggested solution here.

Trouble installing on Mac 10.9.5

Installed latest cuda, have java 7.

Installed the tarball per instructions from the download site.

If I run the scripts/getdata.sh

dougs-mbp:BIDMach_0.9.5-osx-x86_64 dloyer$ ./scripts/getdata.sh
./scripts/getdata.sh: line 20: /Users/dloyer/Downloads/BIDMach_0.9.5-osx-x86_64/getrcv1.sh: No such file or directory
./scripts/getdata.sh: line 22: /Users/dloyer/Downloads/BIDMach_0.9.5-osx-x86_64/getuci.sh: No such file or directory
./scripts/getdata.sh: line 24: /Users/dloyer/Downloads/BIDMach_0.9.5-osx-x86_64/getuci.sh: No such file or directory
./scripts/getdata.sh: line 28: /Users/dloyer/Downloads/BIDMach_0.9.5-osx-x86_64/getdigits.sh: No such file or directory

If I cd to scripts and run getrcv1.sh, I get further, but get a different error message...

....
Scanning lyrl2004_tokens_train.dat.gz
171542 lines
Writing Dictionary
2606875 lines processed
/var/folders/3l/s60hgztj5_zc_chmj8hvy4gm0000gn/T/scalacmd2525326315882625671.scala:1: error: not found: value BIDMat
import BIDMat.{CMat,CSMat,DMat,Dict,IDict,Image,FMat,FND,GMat,GIMat,GSMat,HMat,IMat,Mat,SMat,SBMat,SDMat}
^
/var/folders/3l/s60hgztj5_zc_chmj8hvy4gm0000gn/T/scalacmd2525326315882625671.scala:2: error: not found: value BIDMat
import BIDMat.MatFunctions._
^
....

Processing nips: ArrayIndexOutOfBoundsException

I get the following error while running getdata.sh. Execution continues afterwards. Then a similiar error happens while processing nytimes.

Do they have to do with "Couldt load JCuda", or are unrelated to that?

Loading nips data
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2234k 100 2234k 0 0 155k 0 0:00:14 0:00:14 --:--:-- 176k
Couldnt load JCuda
Processing nips.
java.lang.ArrayIndexOutOfBoundsException: 0
at BIDMat.DenseMat$mcI$sp.ggReduceOp$mcI$sp(DenseMat.scala:907)
at BIDMat.IMat.iiReduceOp(IMat.scala:120)
at BIDMat.SciFunctions$.maxi(SciFunctions.scala:520)
at BIDMat.MatFunctions$.cols2sparse(MatFunctions.scala:1338)
at BIDMach.NYTIMES$.preprocess(Experiments.scala:30)
at Main$$anon$1.(scalacmd7501012979133857457.scala:16)
at Main$.main(scalacmd7501012979133857457.scala:1)
at Main.main(scalacmd7501012979133857457.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:139)
at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:139)
at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:28)
at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:45)
at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:35)
at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:45)
at scala.tools.nsc.ScriptRunner.scala$tools$nsc$ScriptRunner$$runCompiled(ScriptRunner.scala:171)
at scala.tools.nsc.ScriptRunner$$anonfun$runCommand$1.apply(ScriptRunner.scala:218)
at scala.tools.nsc.ScriptRunner$$anonfun$runCommand$1.apply(ScriptRunner.scala:218)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply$mcZ$sp(ScriptRunner.scala:157)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply(ScriptRunner.scala:131)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply(ScriptRunner.scala:131)
at scala.tools.nsc.util.package$.trackingThreads(package.scala:51)
at scala.tools.nsc.util.package$.waitingForThreads(package.scala:35)
at scala.tools.nsc.ScriptRunner.withCompiledScript(ScriptRunner.scala:130)
at scala.tools.nsc.ScriptRunner.runCommand(ScriptRunner.scala:218)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:94)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
clearing up
Loading nytimes data
...

Couldnt load JCuda
Processing nytimes.
java.lang.ArrayIndexOutOfBoundsException: 0
at BIDMat.DenseMat$mcI$sp.ggReduceOp$mcI$sp(DenseMat.scala:907)
at BIDMat.IMat.iiReduceOp(IMat.scala:120)
at BIDMat.SciFunctions$.maxi(SciFunctions.scala:520)
at BIDMat.MatFunctions$.cols2sparse(MatFunctions.scala:1338)
at BIDMach.NYTIMES$.preprocess(Experiments.scala:30)
at Main$$anon$1.(scalacmd7752450377490059554.scala:16)
at Main$.main(scalacmd7752450377490059554.scala:1)
at Main.main(scalacmd7752450377490059554.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:139)
at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:139)
at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:28)
at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:45)
at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:35)
at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:45)
at scala.tools.nsc.ScriptRunner.scala$tools$nsc$ScriptRunner$$runCompiled(ScriptRunner.scala:171)
at scala.tools.nsc.ScriptRunner$$anonfun$runCommand$1.apply(ScriptRunner.scala:218)
at scala.tools.nsc.ScriptRunner$$anonfun$runCommand$1.apply(ScriptRunner.scala:218)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply$mcZ$sp(ScriptRunner.scala:157)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply(ScriptRunner.scala:131)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply(ScriptRunner.scala:131)
at scala.tools.nsc.util.package$.trackingThreads(package.scala:51)
at scala.tools.nsc.util.package$.waitingForThreads(package.scala:35)
at scala.tools.nsc.ScriptRunner.withCompiledScript(ScriptRunner.scala:130)
at scala.tools.nsc.ScriptRunner.runCommand(ScriptRunner.scala:218)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:94)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
clearing up
Loading arabic digits data
$

BIDMach needs an updated BIDMat.jar

I'm using Mac OS X 10.9.5 and have run into the same problems I outlined in BIDData/BIDMat#18. Here's an example of a problem for quick reference:

scala> val diag = BIDMat.GMat(1 on 2 on 3)
diag: BIDMat.GMat =
1
2
3

scala> mkdiag(diag)
java.lang.RuntimeException: mkdiag requires a vector argument, but dims= 3 1
at BIDMat.GMat.mkdiag(GMat.scala:643)
at BIDMat.MatFunctions$.mkdiag(MatFunctions.scala:1442)
... 33 elided

scala>

The issue is that while I've added some fixes to BIDMat that resolve the above problem (plus others), they are not reflected in the latest BIDMat.jar that ships with BIDMach. The problem above happens when I downloaded the fresh 1.0.0 Mac bundle and typed in ./bidmach and then the commands above. Thus, the solution is simply to recompile BIDMat to get BIDMat.jar, and then copy that over to BIDMach's "lib" file in the 1.0.0 bundle, and that should fix things.

Also, the BIDMat API docs need to be updated since they do not have recent fixes and updated documentation.

CUDA alloc failed initialization error with kmeans algorithm

Hi All,
I'm getting the following error when I'm trying to run clustering with kmeans with GPU.
The problem occurs in the following use cases:
1)input 1M records/10attributes/k=100/10iterations
2)input 30M records/10attributes/k=10/10iterations
but works for:
1)input 10M/10attributes/k=10/10iterations

You reported running kmeans with 100M dataset with gtx680 gpu
that according to the specifications has 2GB of RAM so I think
it should also work in my case - gtx860m/2GB or am I missing something?

Besides - do you know why my card is being reported as :
1 CUDA device found, CUDA version 5.5
while I'm running CUDA 6.0?

Regards,
Marek

java.lang.RuntimeException: CUDA alloc failed initialization error
at BIDMat.GMat$.apply(GMat.scala:1094)
at BIDMat.GMat$.newOrCheckGMat(GMat.scala:1780)
at BIDMat.GMat$.newOrCheckGMat(GMat.scala:1814)
at BIDMat.GMat$.apply(GMat.scala:1100)
at BIDMach.models.ClusteringModel.init(Clustering.scala:21)
at BIDMach.models.KMeans.init(KMeans.scala:34)
at BIDMach.Learner.init(Learner.scala:37)
at BIDMach.Learner.train(Learner.scala:45)
at .(:26)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:83)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

Numeric mis-match between token dictionary and parsed file

xmltweet indexes tokens from 1, and also stores these tokens in a dictionary file. However, when scala reads in the dictionary, it indexes from 0. So, if you look up the index of a token in the dictionary, it will be one less than the value used in the parsed imat file.

This is easy enough to address by adding or subtracting 1, but we'd like to do this in a way that minimizes error. We talked about perhaps inserting a junk entry as the 0th element when you read the dictionary into Scala. But, we wanted to check with you to see if you had a different idea.

multinomial2(...) gives weird results; first row seems like it gets the samples of the last row

John,

It looks like I didn't catch a few test cases when we were checking multinomial2. I found some odd behavior this morning. Just to recap, the definition of multinomial2 is:

int multinomial2(int nrows, int ncols, float *A, int *B, int nvals)

Where

nrows is the number of rows of "matrix" A
ncols is the number of columns of "matrix" A
A is a pointer to a data matrix where columns correspond to an un-normalized probability distribution
B is a GIMat of the same dimension as A and holds the sampling results
nvals is the number of samples we want to get, where one sample should make a k-way decision.

Unfortunately, from what I can tell, the multinomial2 sampling is ignoring a row and putting the results in a different row. To demonstrate:

scala> import edu.berkeley.bid.CUMACH._
import edu.berkeley.bid.CUMACH._

scala> val test4 = grand(2,1000)
test4: BIDMat.GMat =
   0.40568   0.11102   0.97801   0.96959   0.83026   0.40141   0.41202   0.61166   0.20048   0.22633   0.12287   0.18022   0.32572   0.40397   0.86652   0.31198   0.17791   0.56108   0.59852...
   0.59785   0.32611   0.46033   0.17353   0.79053   0.91425   0.43692   0.55500   0.49750   0.96141   0.81040   0.62431   0.81548   0.96281  0.011548   0.73854   0.26540   0.56230   0.50416...

scala> val out4 = gizeros(2,1000)
out4: BIDMat.GIMat =
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0...
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0...

scala> multinomial2(2,1000,test4.data, out4.data,1000)
res12: Int = 0

scala> out4
res13: BIDMat.GIMat =
  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000  1000...
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0...

You can see that out4 allocates all of the 1000 samples to be the first value (first row), whereas since we randomized it, when we sum across each of the two rows, they should have roughly the same number of elements.

As another example with a larger matrix, which shows a more interesting result, we get:

scala> val test6 = grand(100,1000)
test6: BIDMat.GMat =
   0.64936   0.32890   0.80571  0.060403   0.46713   0.11794   0.35396   0.51864   0.55048   0.31779   0.11267   0.84305  0.063050   0.16121   0.13459   0.15608   0.23206   0.75661   0.76700...
   0.30496   0.52751   0.28042   0.15433   0.15183   0.54877   0.97555   0.73333   0.86240   0.63230   0.41277   0.32537   0.57536   0.73076   0.44918   0.69297   0.33638  0.043051   0.22322...
   0.37311  0.080803   0.53221   0.45509   0.65039   0.92046   0.16711   0.18513   0.28504   0.79746  0.015594   0.57462   0.45046   0.76908   0.48837   0.58818   0.99301   0.95240   0.72313...
        ..        ..        ..        ..        ..        ..        ..        ..        ..        ..        ..        ..        ..        ..        ..        ..        ..        ..        ..

scala> val out6 = gizeros(100,1000)
out6: BIDMat.GIMat =
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0...
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0...
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0...
  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  .....
scala> multinomial2(100,1000,test6.data,out6.data,10000)
res16: Int = 0

scala> out6
res17: BIDMat.GIMat =
  173  174  177   44  143  153  264  231  258  221  101  235  129  177  108  162   98  169  222   97  238  131  323  219  305  211  250  254  241  179  276  199  262  105   34  274  248  243  133...
   87   14   87  111  128  172   31   38   38  155    4   99   96  147   89  123  194  171  133  220   47  110  116   80  120  155   85  186   80  115  163  120   64  195    8    8   59   49  157...
  135  135   49    4   28  159   14  152  190  117   20  212  134   18   15  179   96  185   71   26   19   15  175  204   95  131  129  138   10  113   72   36   28  193   31  105  162  135  154...
   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ..   ...
scala> val sums = sum(IMat(out6),2)
sums: BIDMat.IMat =
  202950
  100015
   99611
      ..

scala> sums.t
res18: BIDMat.IMat = 202950,100015,99611,100062,98564,98280,101494,99784,101114,99836,100234,100544,97378,98849,
99934,104187,101520,100814,100156,100427,100484,101568,99160,98448,102099,101257,100595,10
1254,99316,101159,101845,99685,99266,99291,100698,98455,98149,97856,103206,98644,99495,101
699,102006,98426,100417,99976,99024,102430,99809,99613,100951,102319,98455,101879,103332,1
02684,100135,97815,101072,101160,98194,100579,100051,98683,97189,102728,100597,98409,98188
,97562,98516,99912,98672,97695,100228,101165,95438,101559,98860,99721,100042,99101,100166,1
01560,100659,99598,97924,100269,103065,98137,99056,99702,100359,98133,98068,102242,98344,9
8157,100587,0

What happens here is that it looks like the first row "took" the samples from the last row. The first row sums up to be about 200k, and the other rows (except the last one) are about 100k. Also, even if you disregard the last row, some of the output doesn't make sense. The fourth column, for instance, has 111 samples in the second row and 4 samples in the third. But when we look at the test6 matrix, we see un-normalized probabilities of 0.15 and 0.45, respectively, so how come the ratio is 111-to-4 despite the probability ratio being 0.15-to-0.45?

I strongly suspect that there is an edge case problem that's causing one row to lose its samples, and maybe fixing that will resolve the ratio difference I just observed.

Put BIDMach and BID Math in the maven central repository

Hi,

the BID project is great! Could you put the project in any maven repository so that people could include them easily via adding the dependencies in pom.xml file?

Lizhen

Issues compiling with latest version

When I pulled the latest version and then tried to compile, sbt gives the following errors:

~/BIDMach$ sbt package
[info] Set current project to BIDMach (in build file:/home/schillaci/BIDMach/)
[warn] Credentials file /home/schillaci/.ivy2/.credentials does not exist
[info] Compiling 16 Scala sources and 2 Java sources to /home/schillaci/BIDMach/target/scala-2.11/classes...
[error] /home/schillaci/BIDMach/src/main/scala/BIDMach/models/DNN.scala:568: value blockGemm is not a member of BIDMat.Mat
[error]       inputs(0).data.blockGemm(1, 0, nr, nc, reps, 
[error]                      ^
[error] /home/schillaci/BIDMach/src/main/scala/BIDMach/models/DNN.scala:580: value blockGemm is not a member of BIDMat.Mat
[error]       inputs(1).data.blockGemm(0, 1, nrows, nc, reps, 
[error]                      ^
[error] /home/schillaci/BIDMach/src/main/scala/BIDMach/models/DNN.scala:585: value blockGemm is not a member of BIDMat.Mat
[error]       inputs(0).data.blockGemm(0, 0, nrows, nr, reps, 
[error]                      ^
[error] three errors found
[error] (compile:compile) Compilation failed
[error] Total time: 12 s, completed Apr 9, 2015 12:20:39 PM

I tried copying the lib files from the latest executable bundles, but it didn't help. Any idea what's up?