Giter Site home page Giter Site logo

cjlin1 / libsvm Goto Github PK

View Code? Open in Web Editor NEW
4.5K 4.5K 1.6K 8.77 MB

LIBSVM -- A Library for Support Vector Machines

Home Page: https://www.csie.ntu.edu.tw/~cjlin/libsvm/

License: BSD 3-Clause "New" or "Revised" License

Java 21.85% C 13.35% Python 13.28% C++ 18.75% HTML 17.99% Makefile 0.67% MATLAB 0.16% M4 13.94%

libsvm's People

Contributors

betaboon avatar bloomen avatar carandraug avatar chiahuaho avatar cjlin1 avatar ericliu8168 avatar gkevinyen5418 avatar heartylearner avatar hychou0515 avatar infwinston avatar kevin1kevin1k avatar leepei avatar maclin726 avatar mosikico avatar pedrormjunior avatar ppetter1025 avatar roebu avatar sagi avatar sammer1107 avatar sinacam avatar tic66777 avatar will945945945 avatar ycjuan avatar yongzhuang22 avatar yxliu-ntu avatar zhi-bao avatar zyque avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libsvm's Issues

svm.cpp: svm_predict() forcing one class

I am using libsvm 3.20. I have a dataset which causes svm_predict() and svm_predict_probability() to give different results. In particular, svm_predict() classifies everything to one class, which is definitely wrong for this dataset.

You can trigger it with the command line tools as follows

wget https://github.com/kousu/statasvm/raw/master/bugs/libsvm_classification/classification_bug.svmlight

svm-train -b 1 classification_bug.svmlight FIT >/dev/null &&

# svm_predict(), incorrect
svm-predict -b 0 classification_bug.svmlight FIT P
cat P

# svm_predict_probability(), correct (or at least, reasonable)
svm-predict -b 1 classification_bug.svmlight FIT P
cat P

Tabulating the values, I see

# training data
0    61
1    91
2     9
3     9

# svm_predict(), incorrect
Model supports probability estimates, but disabled in prediction.
Accuracy = 53.5294% (91/170) (classification)
    170 1

# svm_predict_probability()
Accuracy = 84.7059% (144/170) (classification)
labels 0 1 2 3
     61 0
    100 1
      9 2

The class that is incorrectly chosen is the one that is dominant in the training data, which seems telling, but I don't know enough about the mathematics of SVM to know what it is telling.

Reference dataset and full test cases are at https://github.com/kousu/statasvm/tree/master/bugs/libsvm_classification. This showed up when run from my Stata wrapper in that repo, but it is also in sklearn and in your command line tools.

I hit the svm_predict() bug a week ago, but I was even more surprised to see that despite it, you can still good answers out of libsvm by tweaking parameters. Given the huge number of machine learning projects that are dependent on your code, there must be a lot of subtlely incorrect predictions that no one is catching. Do you have any idea what would cause this?

32-bit and 64-bit DLLs?

I was working on the getting the Julia language binding to LIBSVM working on Windows, and was wondering if you could add a 32-bit and 64-bit version of libsvm.dll to your makefile and repository? I think the current file is 32-bit only.

Zeroed weights for entire class

I know that it's weird usage of class weights, but stil, could it be explained somehow? Or fixed?

dataset.txt:

0 1:0 2:0 3:0
0 1:0 2:0 3:1
0 1:0 2:1 3:0
1 1:0 2:1 3:1
1 1:1 2:0 3:0
1 1:1 2:0 3:1
2 1:1 2:1 3:0
2 1:1 2:1 3:1

code:

libsvm-3.20$ ./svm-train -b 1 -w0 1 -w1 1 -w2 0 dataset.txt model
libsvm-3.20$ ./svm-predict -b 1 dataset.txt model predictions.out

It produces in predictions.out:

labels 0 1 2
2 3.31221e-14 3.30357e-14 1
2 3.63995e-14 3.24543e-14 1
2 3.36039e-14 3.30595e-14 1
2 3.77311e-14 3.12876e-14 1
2 3.86737e-14 2.78238e-14 1
2 3.82377e-14 2.50579e-14 1
2 3.84825e-14 2.96375e-14 1
2 3.84239e-14 2.58019e-14 1

CppCheck errors for realloc() usage

I'd like to report that CppCheck is reporting issues with a few of the C/C++ files' use of realloc without testing to ensure the result isn't NULL, resulting in possible memory leaks.

You can gather results by running:

cppcheck --quiet /path/to/libsvm

[/src/libsvm/matlab/libsvmread.c:48]: (error) Common realloc mistake: 'line' nulled but not freed upon failure
[/src/libsvm/svm-predict.c:31]: (error) Common realloc mistake: 'line' nulled but not freed upon failure
[/src/libsvm/svm-predict.c:96]: (error) Common realloc mistake: 'x' nulled but not freed upon failure
[/src/libsvm/svm-scale.c:342]: (error) Common realloc mistake: 'line' nulled but not freed upon failure
[/src/libsvm/svm-train.c:75]: (error) Common realloc mistake: 'line' nulled but not freed upon failure
[/src/libsvm/svm.cpp:2042]: (error) Common realloc mistake: 'label' nulled but not freed upon failure
[src/libsvm/svm.cpp:2043]: (error) Common realloc mistake: 'count' nulled but not freed upon failure
[/src/libsvm/svm.cpp:2757]: (error) Common realloc mistake: 'line' nulled but not freed upon failure
[/src/libsvm/svm.cpp:3137]: (error) Common realloc mistake: 'label' nulled but not freed upon failure
[/src/libsvm/svm.cpp:3138]: (error) Common realloc mistake: 'count' nulled but not freed upon failure

Drilling into the first one:

...
line = (char *) realloc(line, max_line_len);
...

To fix these, you should check to see if realloc returns NULL. If it does, then free(line). If not, then assign the pointer to line. Without this, line will be assigned to NULL and the original object pointed to by line will dangle. More detailed guidance at:

Thanks!

Check parameter failure on not applicable case

Using the following parameters:

    svm_parameter param;
    param.C = 100;
    param.svm_type = C_SVC;
    param.kernel_type = LINEAR;
    param.eps = 0.00001;
    param.probability = 0;
    param.shrinking = 0;
    param.cache_size = 100;

I ran svm_check_parameter to validate and it returned "degree of polynomial kernel < 0".
Since only POLY employs degree, the parameter's if-condition should also check for kernel type.
A similar issue could be applicable for gamma check just above.

'svm_check_parameter' problematic lines

The error can obviously be avoided by setting these parameters to zero, making them pass the conditions, but we shouldn't rely simply on this default value.
Adding a kernel / svm type check where these parameters are employed could avoid a few head scratches for future users of libsvm.

As proof, it just happened to me that gamma < 0 didn't raise any error while degree < 0 did.

Load and save from a memory buffer

I want to manage a model database without using temporary files.
For this, I propose an API extension:

int svm_save_model_buffer(const char *model_buffer, int buffer_length, const struct svm_model *model);
struct svm_model *svm_load_model_buffer(const char *model_buffer, int buffer_length);

svm_save_model_buffer saves a model to a buffer; returns the written size on success, or -1
if an error occurs.

svm_load_model_buffer returns a pointer to the model read from the buffer,
or a null pointer if the model could not be loaded.

make.m problem in win10 & MinGW64 compiler

I am on a windows 10 with matlab r2015b and MinGW64. When I run make.m I encountered with gcc: error: \-fexceptions: No such file or directory. I solved it by changing CFLAGS to COMPFLAGS.

Unknown parameter input ,what is happening?

I use libsvm on CentOS.
I scale ,train and make model file.
But I input some unknown parameter ,for example
1 1:0.01 2:0.32 3:-0.12 4:. 5:0.023 6:. 7:. 8.-0.02

Reslt of it scaling
1 1:0.023421 2:0.43 4:0.564 6:1.23 7:0.023

Reslt of it predicting
-1 0.0238351 0.976165

Some parameter is missing and some parameter is added on scaling.
What is happening?
Is the data trustless?

Calling multiclass_probability when mapping of decision values of binary classifier to probabilities

Hi,

I've got problem with mapping of decision values to probabilities of binary classifier (nr_class = 2).

In that case in function svm_predict_probability in L2615 in svm.cpp multiclass_probability will be called, which implements the method from this paper, and the resulting predicted probabilities (let's call them probs1) will not be the same as one just pulled the decision values through sigmoid, what one gets just by calling sigmoid_predict on decision values (let's call these probs2). Both probs1 and probs2 are probability estimates, but they are not the same, and probs2 was directly calibrated to output probabilities, so it makes more sense to output these as probability estimates for binary classifier.

Is there any reason to call multiclass_probability even when the classifier is binary (nr_class = 2)?

Thanks!

Possible bugs in one-class SVM -- the parity of the number of data may break the performance.

I am now using one-class SVM to learn a model from a dataset with 271 data.
All of the 271 data's label is +1, of course.
To measure the performance of a model, I first take a look at the accuracy of the results on training data -- it shouldn't be too bad, at least.

Here are the steps that I found and confirmed the problem:

I. I train a model by the first 270 data, the model I get correctly predicts that the last datum should be positive, and the accuracy of the results on training data seems good.

II. When I train a model by all these 271 data, the performance of the model on training data suddenly drops a lot, and it even predicts the last datum to be negative.

A possible reason for this phenomenon may be that the last datum is overfitted, but it is hard to believe that a model derived from 270 data can be changed so much by merely a datum and that the overfitted datum is predicted to be negative.

III. To make the model even fits the last datum more, I train a model by 272 data -- the original 271 data with one copy of the last datum. And out of my expectation, the performance becomes good again.

This doesn't make sense if the reason is overfitting. So I guess that the problem is on the parity of the size of the data. To test my guess here is step IV.

IV. To avoid the possibility that the last datum is weird, I use only the first 270 data to do this test. Every time I randomly choose i data (i = 0~9) and duplicate them to form a dataset with size 270 + i. Then I train a model on the dataset and see its performance. For each i, I will run 30 times and pick the average value as the result. The results obviously show that when i is odd, the performance will be very awful, while this phenomenon never appears in the cases i is even.

This is the code of the 4 tests mentioned above. Though I use sklearn in the code, this problem can be reproduced by directly using libsvm as well (with gamma=1/n_features, which is the 'auto' value in sklearn). The dataset can be downloaded here. For now, a solution is that the user can always keep the size of the training dataset to be even.

Using LIBSVM with OpenMP under Octave

Hello,
I don't have enough development experience on Linux, forgive me if I ask something obvious.

  1. I added the -fopenmp to CFLAGS and -lgomp to MEX_OPTION at /matlab/Makefile
  2. I added to /Makefile -fopenmp to CFLAGS
  3. In svm.cpp I added:
    3.1 "#pragma omp parallel for private(j)" above the line "for(j=start;j<len;j++)" (1285 line)
    3.2 "#pragma omp parallel for private(i) reduction(+:sum)" above the line "for(i=0;il;i++)" (2511 line)
    3.3 #pragma omp parallel for private(i)" above the line "for(i=0;i<l;i++)" (2528 line)
  4. I run under octave the make command

Then when I am using the svmtrain under Octave I get the error: /home/Octave/libsvm_mp/matlab/svmtrain.mex: failed to load: /home/Octave/libsvm_mp/matlab/svmtrain.mex: undefined symbol: omp_get_thread_num

What am I doing wrong here?

(Ubuntu 14.04, Octace 3.8.1, gcc 4.9, libsvm 3.20 )

Octave 4.0 and parallelised LibSVM don't work together

I'm using Octave 4.0.0 on Kubuntu 15.10 (yes, the beta) on a 64bit machine. I applied the updated rules that were mentioned in April. I can compile without error message using 'make.m'. However, I cannot run it.

N = 10000;
L = randi(2,1,N);
D = [randn(1,N/2) randn(1,N/2)+1];
model = svmtrain(L',D');
error: /opt/libsvm/octave/svmtrain.mex: failed to load: /opt/libsvm/octave/svmtrain.mex: undefined symbol: GOMP_loop_guided_start

How can this error be resolved?

Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

I am using https://github.com/ccerhan/LibSVMsharp as a wrapper

When i call LibSVM inside task this error happens after second task started.

If i call libsvm inside main thread, no error happens ever.

If i only start 1 task the error does not happen

As can be seen at the very first image the error is not related to my application or my functions. It is caused by either wrapper or the libSVM.dll itself i dont know which one.

I am using windows 8.1, x64, visual studio 2013, WPF .net 4.5.1 application, 32 gb ram memory on this computer

First here error message

when called

Second error message

error typ 2

Third what works and what causes error

errors 4

I really need help ty very much

bug - easy.py throws ValueError

easy.py throws an error, even on standard datasets (e.g., iris):

Traceback (most recent call last):
  File "easy.py", line 63, in <module>
    c,g,rate = map(float,last_line.split())
ValueError: need more than 0 values to unpack

This was observed in Windows but has been reported for other OS's elsewhere.

pred_label is not corresponding to pre[pred_label,accuracy,prob_estimates]=svmpredict(test_label,test_data,model,'-b 1');

In pred_estimates, the position of the maximum value in one row is not the pred_label.

prob_estimates=0.0877046072932294 0.00689885694870784 0.0510358500866629 0.0349193526856883 0.0201649925974930 0.0572772003038145 0.00354058458641571 0.434801642194089 0.299917382939861 0.00373953036403846
0.0569029578292815 0.0128889675010719 0.0226503273265042 0.235434067349005 0.0274432928060539 0.0223993134855364 0.00449010998588290 0.372552666098744 0.240135267815857 0.00510302980206350
0.0202618302729419 0.00609933422466536 0.00513096031248623 0.556599397170814 0.00770208398129837 0.0147579801297493 0.00117991662750405 0.123657362841668 0.263114176688193 0.00149695775068038
0.0138081408458132 0.0134109236889526 0.0261085782234978 0.379193740840643 0.0136879003688169 0.0278073190536115 0.00393368980397517 0.0770472704849993 0.441814426316247 0.00318801037344396
0.00737713942130312 0.00719604777712997 0.0190913916210304 0.766244252381808 0.00452925052833849 0.00832667922291313 0.000954117402067928 0.0242901169949015 0.160638180981833 0.00135282366867416
0.0404892166770354 0.0131597503809068 0.00812199404116744 0.0709609923235642 0.0256930503037581 0.0235279018153220 0.00310139828330843 0.190031347957460 0.620513527363689 0.00440082085378811
0.0233835236239888 0.00845191161599723 0.0204935753653564 0.0455692904676819 0.0733235759739852 0.0506628894366520 0.00506885299370543 0.0616635058140948 0.707636494571399 0.00374638013713891
0.0170375648555056 0.0117320006702165 0.0351611195497132 0.0329173319877270 0.0199963414010635 0.0233994742806957 0.00202927477235737 0.0367208285826699 0.809862937232914 0.0111431266671378
0.00249398437648049 0.00118775906267820 0.00420318445065694 0.00150617919781232 0.00851998127528923 0.0170565271724271 0.000216428383816490 0.00376026537283618 0.959633934361137 0.00142175634686587
0.0172952090357666 0.00203085205048888 0.0663402503175806 0.00337260286616463 0.0192039462603744 0.0368582111255392 0.00135917901052383 0.0574097352946263 0.753632624031509 0.0424973900074269
0.000537446336879699 2.15124069324038e-05 0.00561621351527447 0.00185702285684450 8.83922926914827e-05 0.000224499754837717 3.52901270154285e-05 7.98180192575826e-06 0.00140238264873741 0.990209258258861
0.00800564331385912 0.000723560253424673 0.00910529583487216 0.0517704592247967 0.00127130082212482 0.000836563684457387 0.0318391492343732 0.00144605301838154 0.00150543142470253 0.893496543189008
0.00597297494217007 0.00213978859351707 0.0254911330116816 0.0667699924629301 0.00264826602958697 0.00124512773832485 0.0281686476611874 0.00111821134747100 0.00185613966496867 0.864589718548163
0.737566350619235 0.00377211871885958 4.93077713563539e-05 0.00382722305898168 0.000840960416034537 7.15881072403622e-05 0.000763927380789079 0.000475734513815691 0.000236976483126986 0.252395812930561
0.485882559427058 0.179818235887341 0.00491168109845924 0.0735388345615359 0.0108076830998238 0.00347030734890046 0.00671528385336261 0.0153990089988476 0.00948964326287568 0.209966762461795

pred_label=6
6
4
9
4
9
9
9
9
9
8
8
8
7
7

MEX File crash for regression SVM in MATLAB

The MEX interface to Regression SVMs appear to be crashing in MATLAB - we use the classification SVMs widely, but having seg-faults with option: -s 3

See the attached xy.csv, then run:

clearvars;
close all;

xy = dlmread('xy.csv');
x = xy(:,1);
y = xy(:,2);

model = svmtrain(y, x, '-s 3');
out = svmpredict(y, x, model);

This causes a SEG FAULT in WIndows & OSX.
xy.zip

LibSVM Mex File Error

Hi Everyone,

I wanna call the libsvm function in a mex file from windows folder. I have add the mex file path but the function still got an error when i call it. The error is " Invalid MEX-file. The specified procedure could not be found.". I use matlab 2013a 64 bit and the mex file also compiled in 64 bit.

Any procedure that i missed?

Thanks anyway.

How to get the alpha_i * y_i in Libsvm 3.22 ?

as my title says,I want to get the alpha_i * y_i in Libsvm 3.22 but don't know how to do it.
the sv_coef now is a double[][] array and I can't get a_i * y_i just use the model.sv_coef[i] like most past answers
I asked the same question in http://stackoverflow.com/q/43348979/3097907 there are some more information there.
I hope anyone can help me with this question ,Thank you.

PS: my original problem is solving this Formula
gradient(J) = -0.5 X sum(a*_i X a*_j X y_i X y_j X Km(x_i,x_j) )
(from SimpleMKL formula.11 )

potential bug - the Matlab interface

I tried to run a very simple binary classification via the matlab interface of libsvm
where

class A : [ 1, 1]
class B : [-1,-1] and [ 1, -1 ]

but got wrong prediction results (compared to the python interface)
cases [-1,-1] and [1,-1] are all wrong.

here is the sample code

N=500;
A_pts = repmat([1,1],N*2,1);
A_label = ones(size(A_pts,1),1);

B_pts = repmat([-1,-1],N,1);
B_pts = cat(1,B_pts, repmat([1,-1],N,1));    
B_label = -1*ones(size(B_pts,1),1);


x = [ A_pts ; B_pts ];
y = [ A_label ; B_label ];


svmmodel = svmtrain(x,y);
svmpredict(1,[1,1],svmmodel)
svmpredict(-1,[-1,-1],svmmodel) % wrong 
svmpredict(-1,[1,-1],svmmodel) % wrong 

output:

optimization finished, #iter = 500
nu = 0.500000
obj = -1000.000000, rho = -1.000000
nSV = 1000, nBSV = 1000
Total nSV = 1000

strange results with polynomial kernel

Hello,

For some reason I get very strange classifications with polynomial kernel. I have 966 training instances and 518 test instances. With polynomial kernel I have only negative classifications. With any other kernel I have different results with small variance (accuracy is approximately 35%).

The problem is I don't understand why polynomial kernel gives these non-meaningful results. How I can debug it?

Java libsvm "reaching max number of iterations", libsvm.dll is not

Same data,c_type,parameters,the Java libsvm will "reaching max number of iterations",but libsvm.dll is not.
Version: 3.21
OS:Win 7 x64

Data:https://github.com/idlesysman/java/blob/master/file/trainData.zip

Parameters:
param.svm_type = svm_parameter.C_SVC;
param.kernel_type = svm_parameter.RBF;
param.gamma = 0.36;
param.C = 10;

svm_cross_validation.nr_fold=10

Using default values:
param.degree = 3;
param.coef0 = 0;
param.nu = 0.5;
param.cache_size = 100;
param.eps = 1e-3;
param.p = 0.1;
param.shrinking = 1;
param.probability = 0;
param.nr_weight = 0;
param.weight_label = new int[0];

param.weight = new double[0];

thanks!

Add Javascript binding

Hello,

as Javascript has became the glue that allows everything, have you ever thought of adding JS binding for you wonderful lib ?

Minor suggestion for the README file

I would kindly suggest to add that running "make" on a unix system builds three programs. Apparently, svm-scale is not in the list in the README. This is a very small change, but seems imho consistent with the style of the README file.

Wrong information in Octave when training C-SVC and NU-SVC

When I train a nu-SVC in Octave with the command

model = svmtrain(ytrain, Xtrain_norm, '-s 1 -t 2');

I get this output

*
optimization finished, #iter = 570
C = 0.085946
obj = 17.382239, rho = -0.596579
nSV = 859, nBSV = 808
Total nSV = 859

At the beginning I was puzzled by that "C = 0.085946", which had led me into thinking that a C-SVM was trained instead, and that there was an error in the libraries...

Also because if I use the "-s 0" argument (which means C-SVC) it outputs:

model = svmtrain(ytrain, Xtrain_norm, '-s 0 -t 2');
*
optimization finished, #iter = 595
nu = 0.227101
obj = -279.128990, rho = -0.810343
nSV = 430, nBSV = 328
Total nSV = 430

So I was thinking that the two arguments were swapped.

I went a little bit further and I tried running the svm-train binary with the same arguments:

svm-train -s 1 -t 2 trainingset_libsvm.dat model_libsvm_NU.dat

Excact same output as above but inside the created file I found:

svm_type nu_svc
kernel_type rbf
gamma 0.0833333 nr_class 2 total_sv 859 rho -0.596582 label 0 1 nr_sv 428 431 SV`

So is it just the information printed that is wrong? Or is it correct and I'm not understanding something?

Thanks

OpenMP

Hi,
will be great to have multiprocessing support like in C++ libsvm (read libsvm FAQ 'How can I use OpenMP to parallelize LIBSVM on a multicore/shared-memory computer?'). I tried to overwrite the libsvm.dll in LIBSVM.NET package by one compiled in C++ with OpenMP but after few seconds the application crashed.

Console output for predictions

It would be interesting, for performance reasons, that applications using svm-predict to classify single documents were able to pass the document to classify as a command line argument (instead of a file name), and that the predicted class be printed directly to the output (instead of writing it to a file). This would spare lots of useless IO operations.

What about adding a new usage:

svm-predict [options] test_document model_file

with a console output, for instance:

$ svm-predict '1:-0.14 2:0.2666667 3:0.1074111' model.svm
-1

AttributeError: /usr/lib/libsvm.so.3: undefined symbol: svm_get_sv_indices

Hi,

I got the error in the subject line and found that there was no solution for it in downloading any ubuntu packages. There was no libsvm.so.2 on my computer, but that was the only file that fixed the issue, regardless of libsvm.so.3 being named in the error.

Is any of this indicative of a bug?

Multiple people are encountering this issue. More background and details can be found at the URL below including my answer with the solution that worked for me:

http://stackoverflow.com/questions/42050356/error-in-importing-sidekit-in-python-on-ubuntu/

Andrew

Get size or dimension of data from model

Hi,
There are svm_get_nr_sv() to get the number of data and svm_get_nr_class() to get the number of classes, but seems like there is no function to get the data dimension(how many column) of the model.
Having this function will be helpful when one load the model in a wrapper and check the external input data dimensions every time before using predict().
Is there a way to get this information easily?
Thanks.

grid.py waits for results even though all workers has stopped

The program would wait for a result even though all workers had quit because of an error or a C-c. This isn't the most elegant fix, but it is the only one I could manage in the time I had.

Author: Bjarte Johansen <[email protected]>
Date:   Tue Dec 2 15:51:34 2014 +0100

    Fix waiting for results when there are no workers

    The program would wait for a result even though all workers had quit
    because of an error or a C-c.

diff --git a/tools/grid.py b/tools/grid.py
index 40f55fb..7c5b744 100755
--- a/tools/grid.py
+++ b/tools/grid.py
@@ -390,6 +390,7 @@ def find_parameters(dataset_pathname, options=''):

    job_queue._put = job_queue.queue.appendleft

+   workers = []
    # fire telnet workers

    if telnet_workers:
@@ -400,6 +401,7 @@ def find_parameters(dataset_pathname, options=''):
            worker = TelnetWorker(host,job_queue,result_queue,
                     host,username,password,options)
            worker.start()
+           workers.append(worker)

    # fire ssh workers

@@ -407,12 +409,14 @@ def find_parameters(dataset_pathname, options=''):
        for host in ssh_workers:
            worker = SSHWorker(host,job_queue,result_queue,host,options)
            worker.start()
+           workers.append(worker)

    # fire local workers

    for i in range(nr_local_worker):
        worker = LocalWorker('local',job_queue,result_queue,options)
        worker.start()
+       workers.append(worker)

    # gather results

@@ -436,7 +440,11 @@ def find_parameters(dataset_pathname, options=''):
    for line in jobs:
        for (c,g) in line:
            while (c,g) not in done_jobs:
-               (worker,c1,g1,rate1) = result_queue.get()
+               while any(map(Thread.is_alive, workers)):
+                   try:
+                       (worker,c1,g1,rate1) = result_queue.get(True, 1)
+                   except:
+                       continue
                done_jobs[(c1,g1)] = rate1
                if (c1,g1) not in resumed_jobs:
                    best_c,best_g,best_rate = update_param(c1,g1,rate1,best_c,best_g,best_rate,worker,False)

Cache size > 2000 not recognised

Noticed this while using libSVM in sklearn - training an SVM when cache_size > 2000 or so on large problems does not seem to lead to any benefit/speed up. Looking at RAM usage, it shows that usage is still about 200MB (which is roughly the original dataset size, rather than the Kernel matrix size). Looks like the issue is in svm.cpp, where the cache size is set to (long int) cache_size*(1<<20). I suspect this overflows for cases where for example, cache_size=4000.

Testing done using Anaconda 2.4.1 on Windows 8.1, x64 processor.

Use libsvm in hadoop

Hello, everyone.
I want to ask something. Can i use libsvm in apache hadoop ?
Is it work with map reduce programming model in hadoop ?

make lib failed

OS: OS X Sierra
GNU Make 3.81

this is output message:

kent:libsvm-3.21 kent$ make lib
if [ "Darwin" = "Darwin" ]; then \
        SHARED_LIB_FLAG="-dynamiclib -Wl,-install_name,libsvm.so.2"; \
    else \
        SHARED_LIB_FLAG="-shared -Wl,-soname,libsvm.so.2"; \
    fi; \
    c++ ${SHARED_LIB_FLAG} svm.o -o libsvm.so.2

as you can see, SHARED_LIB_FLAG can not be recognized

suggest:

lib: svm.o
    @if [ "$(OS)" = "Darwin" ]; then \
        SHARED_LIB_FLAG="-dynamiclib -Wl,-install_name,libsvm.so.$(SHVER)"; \
    else \
        SHARED_LIB_FLAG="-shared -Wl,-soname,libsvm.so.$(SHVER)"; \
    fi &&\
    $(CXX) $${SHARED_LIB_FLAG} svm.o -o libsvm.so.$(SHVER)

A bug in svm.java in sigmoid_train method

If anybody wants to use the probability outputs of the svm, it goes wrong and the output label is always the same. The problem is in the line 1672:

if (iter>=max_iter)
    //svm.info("Reaching maximal iterations in two-class probability estimates\n");

probAB[0]=A;probAB[1]=B;

As you can see, the next line after the commented part of the if statement will be fired if the statement in the if is true. Thus, the results are always wrong.

Solution
Simply, put comment for the whole if statement.

Regards,
Mahmood

ArrayOutOfBoundException in java v3.2

I have encountered an ArrayOutOfBoundException in version 3.2, that does not exist in v2.8.
Here is the stacktrace:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at libsvm.Cache.get_data(svm.java:63)
at libsvm.ONE_CLASS_Q.get_Q(svm.java:1208)
at libsvm.Solver.Solve(svm.java:496)
at libsvm.svm.solve_one_class(svm.java:1422)
at libsvm.svm.svm_train_one(svm.java:1516)
at libsvm.svm.svm_train(svm.java:1959)
at LibSvmBug.svmTrain(LibSvmBug.java:95)
at LibSvmBug.train(LibSvmBug.java:59)
at LibSvmBug.main(LibSvmBug.java:38)

In v2.8 the output is the following:
*
optimization finished, #iter = 0
obj = NaN, rho = Infinity
nSV = 246, nBSV = 245

I made a sample class. Run bug.sh and working.sh which can be found here:
https://www.dropbox.com/s/gnx9a9n1293spz4/LibSvmBug.tar.gz?dl=0

With version 2.8 you will get no exception, but with version 3.2 you will get an ArrayOutOfBoundException
(Please do not care about the strange parameter choices)

I also tested some other versions. The bug also occurs in v. 2.81, 2.88, 2.91, 3.00,

Edit: Running java version "1.7.0_65"
OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-0ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
(Same exception on Oracle VM on Ubuntu)

Just verified this on Windows 8 using jre7 and starting from eclipse.

Checking for users before running program

Hi again,

I am using the lab machines at my university, but I don't want to inconvenience others if they are sitting there. I had some problems implementing that with your grid.py, but I discovered that I could reimplement most of the functionality (that I needed) through gnu parallel instead.

#!/usr/bin/env bash

LOGFILE=$(mktemp "XXXX.parallel.log")

function unused {
    parallel --plain                                                \
             --sshloginfile ..                                      \
             --nonall                                               \
             --tag                                                  \
             '[[ -z $(users | sed "s/$USER//") ]] && echo "unused"' \
        | sed -e 's/\s*unused//'                                    \
              -e 's/^/4\//'
}

function exit_parallel {
    parallel --plain                                \
             --sshloginfile ..                      \
             --nonall                               \
             'killall -q -u $USER svm-train'
    rm "$LOGFILE"
}

trap 'echo "Ctrl-C detected.";                  \
          exit_parallel;                        \
          exit 130'                             \
     SIGINT SIGQUIT

parallel --plain                                    \
         --sshloginfile <(unused)                   \
         --filter-hosts                             \
         --joblog "$LOGFILE"                        \
         --resume-failed                            \
         --timeout 28800                            \
         --tag                                      \
         'nice svm-train -q                         \
                         -m 1024                    \
                         -h 0                       \
                         -v 5                       \
                         -c $(echo 2^{1} | bc -l)   \
                         -g $(echo 2^{2} | bc -l)   \
                 "'$DATA'"                          \
              | sed -e "s/Cross .* = //"            \
                    -e "s/%//"'                     \
         ::: {-5..15..2}                            \
         ::: {3..-15..-2}

This does make some assumptions according to my environment (like the home folder always being the same on every machine). You also need to configure the script directly in the script. I just thought I would tell you as you might be interested in it (or someone else following this repository).

MathCad

Someone tell me how to translate the source code in the same libsvm MathCAD?

Y must be a vector or a character array

Hi I use your example code in Matlab as below

[heart_scale_label, heart_scale_inst] = libsvmread('heart_scale');
% Split Data
train_data = heart_scale_inst(1:150,:);
train_label = heart_scale_label(1:150,:);
test_data = heart_scale_inst(151:270,:);
test_label = heart_scale_label(151:270,:);

% Linear Kernel
model_linear = svmtrain(train_label, train_data, '-t 0');
[predict_label_L, accuracy_L, dec_values_L] = svmpredict(test_label, test_data, model_linear);

% Precomputed Kernel
model_precomputed = svmtrain(train_label, [(1:150)', train_data*train_data'], '-t 4');
[predict_label_P, accuracy_P, dec_values_P] = svmpredict(test_label, [(1:120)', test_data*train_data'], model_precomputed);

accuracy_L % Display the accuracy using linear kernel
accuracy_P % Display the accuracy using precomputed kernel

but in svmtrain lines it says:

Error using svmtrain (line 234)
Y must be a vector or a character array.

Can you help me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.