gverdian / sofia-ml Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 968 KB

Automatically exported from code.google.com/p/sofia-ml

Makefile 2.73% C++ 95.27% Perl 2.00%

sofia-ml's People

Contributors

sofia-ml's Issues

The ids in cluster output are formatted scientific notation rather than ints


This effects large ids.

The issue is in cluster-src/sofia-kmeans.cc

The solution diff is:

345c345
<             << test_data->VectorAt(i).GetY() << std::endl;

---
>             << (int)test_data->VectorAt(i).GetY() << std::endl;

Original issue reported on code.google.com by [email protected] on 6 Mar 2013 at 2:37

Does sofia-ml support other values for features than "1"?

I noticed in the demo that all the features have a value of "1".  Does sofia-ml 
support and/or make use of higher integer values (like for # of times a word is 
seen in a document) or for floating point numbers?

Original issue reported on code.google.com by [email protected] on 25 Feb 2013 at 4:34

Multi-Label Passive-Aggressive

Hello D.

I've started to work on the multi-label branch. I have made the following 
changes:

- Parse comma-separated list of labels.

- Add a MultiplePassOuterLoop routine: it shuffles the dataset and makes 
several passes over it. It's more intuitive to determine a number of passes and 
results can sometimes be more stable on some datasets.

- Add a MultiLabelWeightVector. It is compatible with other weight classes 
(both API-wise and file-wise). It also has a bunch of additional methods such 
as "SelectLabel".

- Add Multi-Label Passive-Aggressive. Strictly speaking, the learner optimizes 
a label ranking (relevant labels should be more ranked higher than irrelevant 
labels). On the 20 newsgroup dataset, it gives 82% accuracy (liblinear gave 
85%). (I didn't optimize the hyperparameters though).

- Add a "--prediction_type multi-label" option.

- Infer the number of dimensions from the training dataset when --dimensioality 
is set to 0.


I wanted to add one-vs-all but unfortunately, the fact that the labels are 
attached to the vectors makes it hard (or inefficient): I need to be able to 
pass +1 or -1 instead of the real label to the update function.

Possible short-term plans could include optimizing the multi-class hinge loss 
and the multinomial logistic loss by SGD.

Original issue reported on code.google.com by [email protected] on 28 Apr 2011 at 8:38

lambda parameter not passed into SvmObjective correctly

in sofia-ml.cc

337       float objective = sofia_ml::SvmObjective(training_data,
338                                          *w,
339                                           CMD_LINE_BOOLS["--lambda"]);

Note that lambda is passed in from CMD_LINE_BOOLS not CMD_LINE_FLOATS which 
results in lambda=0. In TrainModel the correct value of lambda is used:

176   float lambda = CMD_LINE_FLOATS["--lambda"];

Original issue reported on code.google.com by [email protected] on 9 May 2013 at 1:20

malloc error with --hash_mask_bits

Download & build, run the demo commands adding --hash_mask_bits to the 
arguments.  Training proceeds fine, but testing of the model gives the malloc 
error:

$ ./sofia-ml --learner_type pegasos --loop_type stochastic --lambda 0.1 
--iterations 100000 --dimensionality 150000 --training_file demo/demo.train 
--model_out demo/model --hash_mask_bits 8
hash_mask_ 255
Reading training data from: demo/demo.train
Time to read training data: 0.061278
Time to complete training: 52.3639
Writing model to: demo/model
   Done.


$ ./sofia-ml --model_in demo/model --test_file demo/demo.train --results_file 
demo/results.txt --hash_mask_bits 8
hash_mask_ 255
sofia-ml(6235) malloc: *** error for object 0x800000: pointer being freed was 
not allocated
*** set a breakpoint in malloc_error_break to debug
Reading model from: demo/model
   Done.
Reading test data from: demo/demo.train
Time to read test data: 0.06114
Time to make test prediction results: 0.008274
Writing test results to: demo/results.txt
   Done.


========

$ g++ --version
i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659)

Original issue reported on code.google.com by [email protected] on 18 Jun 2010 at 6:43

k-means question

For the k-means training, does label (in my case, face label) have an influence 
on the clustering?

Original issue reported on code.google.com by [email protected] on 24 Jun 2013 at 1:01

Assertion '!cluster_centers_empty()' fails and crashes program

What steps will reproduce the problem?
cd "sofia-ml-read-only"

./sofia-kmeans --k 100 --init_type random --opt_type mini_batch_kmeans 
--mini_batch_size 100 --iterations 1000 --cluster_mapping_type rbf_kernel 
--test_file <test file location goes here> --cluster_mapping_out <cluster 
mapping output location goes here>

What is the expected output? What do you see instead?

The expected output is a cluster mapping text file. Instead, I see:

cd "sofia-ml-read-only"

sofia-kmeans: sf-cluster-centers.cc:93: float 
SfClusterCenters::SqDistanceToClosestCenter(const SfSparseVector&, int*) const: 
Assertion `!cluster_centers_.empty()' failed.

What version of the product are you using? On what operating system?

I don't know where to find the product version. The most recent version is the 
one I have been using.
Operating system: Ubuntu 12.04.5 LTS

Please provide any additional information below.

N/A

Original issue reported on code.google.com by [email protected] on 23 Sep 2014 at 6:46

sf-sparse-vector.cc bug, in Init function

Make all_test, then find an error occured white testing sf-sparse-vector_test, 
assertion assert(x1.GetGroupId() == "2"); failed at line 27 of file 
sf-sparse-vector_test.cc.

Solution. Add a line "group_id_c_string[end - position]=0;" in 
sf-sparse-vector.cc line 145. cause string generated by strncpy is not always 
'\0' terminated.

Original issue reported on code.google.com by [email protected] on 5 Nov 2012 at 4:22

Training Data Format and Class Label for kmeans

Hi,

I have changed my training data into sparse data format you mentioned.
./sofia-kmeans --k 1000 --init_type random --opt_type batch_kmeans --iterations 
1000 --objective_after_init --training_file demo/SMLFAutoTrain1s512val.txt 
--model_out demo/CSMLFAutoTrain1s512val.txt
However, I am getting the following errors:
Reading data from: demo/SMLFAutoTrain1s512val.txt
Error reading file demo/SMLFAutoTrain1s512val.txt
I opened your demo.train, I saw that you have square box at the end of every 
vector. How can I changed my data format to yours since the square box at the 
end may not be the only one? I tried to fetch your demo.train file in matlab, 
and it doesn't let me do that either.

For the example of kmeans:
> ./sofia-kmeans --k 5 --init_type random --opt_type mini_batch_kmeans 
--mini_batch_size 100 --iterations 500 --objective_after_init 
--objective_after_training --training_file demo/demo.train --model_out 
demo/clusters.txt
the above command will return the five centroid location, right?
In this case, since only producing the 5 cluster center location, the class 
label in the training data (demo.train) can be assigned with any values, right? 
Of course, I chose, say, all 1 among these values: 1,0,-1.

I look forward to your clarification. 

Thank you,


Fred

Original issue reported on code.google.com by [email protected] on 23 Sep 2011 at 3:56

Attachments:

SMLFAutoTrain1s512val.txt

make all_test : Assertion `x1.GetGroupId() == "2"' failed

What steps will reproduce the problem?

Follow the instructions in the README "Quick Start" section and on (on Ubuntu 
14.04)
1. svn checkout http://sofia-ml.googlecode.com/svn/trunk/ sofia-ml-read-only
2. cd sofia-ml-read-only/src
3. make clean
4. make all_test

What is the expected output? What do you see instead?

Expecting success in all tests

Seeing instead:
Test is failing immediately with:

g++ -O3 -lm -Wall -o sf-sparse-vector_test sf-sparse-vector_test.cc 
sf-sparse-vector.cc
./sf-sparse-vector_test
sf-sparse-vector_test: sf-sparse-vector_test.cc:27: int main(int, char**): 
Assertion `x1.GetGroupId() == "2"' failed.
make: *** [sf-sparse-vector_test] Aborted (core dumped)
make: *** Deleting file `sf-sparse-vector_test'

What version of the product are you using? On what operating system?

Latest source:
  r31 | [email protected] | 2010-07-26 14:17:11 -0700 (Mon, 26 Jul 2010) | 1 line

On Ubuntu 14.04 (LTS)

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 4 May 2015 at 9:22

sf-weight-vector fails unit test

What steps will reproduce the problem?
1. make all_tests

What is the expected output?

PASS.

What do you see instead?

sf-weight-vector_test: sf-weight-vector_test.cc:95: int main(int, char**): 
Assertion `w_6.ValueOf(3) == 1' failed.

What version of the product are you using? On what operating system?

Latest sophia-ml from svn, Debian 5, GCC 4.3.2.

Original issue reported on code.google.com by [email protected] on 14 Feb 2010 at 3:22

Issues with dimensionality off-by-one

What steps will reproduce the problem?
1. Create this training file:

======= train.txt  =======
1 1:1 2:.1 3:.1 200:1                                                           


1 1:1.2 2:.01 3:.01 200:1                                                       


1 1:3 2:.2 3:.41 200:1                                                          


-1 3:4 200:1                                                                    


-1 2:3 200:1                                                                    


-1 1:.1 2:3 3:2 200:1        
====================
2. ./sofia-ml-read-only/sofia-ml --learner_type pegasos --loop_type stochastic 
--lambda 0.1 --iterations 100000 --dimensionality 200 --training_file train.txt 
--model_out debug-model.txt                                                     


3. debug-model.txt has:
-5.01486 -0.169397 -10.0628 -10.0518 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0\
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

The the model should spit out 201 terms, the first being the bias term. Instead 
it spits out 200, and clips off the last weight. When I set dimensionality to 
201, I get what I would expect:

0.263645 0.561799 -0.509116 -0.382012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0.263645  

This was compiled from source a couple weeks ago. The program should probably 
crash if you say dimensionality is 200 and there is a "200:x" term in the 
sparse vector representation, unless the no-bias flag is set.

Original issue reported on code.google.com by [email protected] on 26 Feb 2013 at 3:24

sofia doesnt work on sparse dataset containing lines in which all features are 0

Hi there.

What steps will reproduce the problem?
./sofia-ml --learner_type pegasos --loop_type stochastic --lambda 0.1 
--iterations 10000 --dimensionality 450000 --training_file ../data/m256 
--model_out demo/model


What is the expected output?

What do you see instead?
Reading training data from: ../data/final/catted/train/m256
Segmentation fault (core dumped)

What version of the product are you using? On what operating system?
Ubuntu 13.10 64bit

Please provide any additional information below.

I guess it is because my training data (attached) is so sparse that in some 
lines all features are zero. Can sofia-ml support such dataset? Thank you!

Original issue reported on code.google.com by [email protected] on 18 Mar 2014 at 3:34

Attachments:

m256

Multi-label classification

Is there any example in sofia-ml for multilabel classification?

Original issue reported on code.google.com by [email protected] on 20 Jan 2015 at 5:34

simple fix for gcc 4.3

What steps will reproduce the problem?
1. make src folder with gcc version 4.3

Adding ...
#include <cstring> 
#include <cstdlib>
to the top of sf-sparse-vector.cc file fixed this problem for me.

Go Jumbos.

Original issue reported on code.google.com by [email protected] on 24 Jan 2010 at 11:42

problems building sofia-ml

What steps will reproduce the problem?
1. run make with gcc version 4.4.3 20100127 (Red Hat 4.4.3-4) (GCC)

What is the expected output? What do you see instead?

a proper build

What version of the product are you using? On what operating system?

trunk on 2010-03-30 15:52

Please provide any additional information below.

gcc output:

:sofia-ml-read-only/src$ make
g++ -O3 -lm -Wall -o sofia-ml sofia-ml.cc sofia-ml-methods.cc
sf-weight-vector.cc sf-sparse-vector.cc sf-data-set.cc
sf-hash-weight-vector.cc sf-hash-inline.cc
sf-sparse-vector.cc: In member function âvoid SfSparseVector::Init(const
char*)â:
sf-sparse-vector.cc:132: error: âsscanfâ was not declared in this scope
sf-hash-weight-vector.cc: In constructor
âSfHashWeightVector::SfHashWeightVector(int)â:
sf-hash-weight-vector.cc:40: error: âexitâ was not declared in this scope
sf-hash-weight-vector.cc: In constructor
âSfHashWeightVector::SfHashWeightVector(int, const std::string&)â:
sf-hash-weight-vector.cc:54: error: âexitâ was not declared in this scope
sf-hash-weight-vector.cc: In member function âvirtual void
SfHashWeightVector::AddVector(const SfSparseVector&, float)â:
sf-hash-weight-vector.cc:96: error: âexitâ was not declared in this scope
sf-hash-weight-vector.cc:111: error: âexitâ was not declared in this scope
make: *** [sofia-ml] Error 1

Original issue reported on code.google.com by [email protected] on 30 Mar 2010 at 1:55

Various errors in source code

There is a problem with the source code. Many files forget to include standard 
libraries, and some of the assertions in the Unit tests fail.

What steps will reproduce the problem?
1. Follow the instructions on https://code.google.com/p/sofia-ml/
2. Run make all_test in src/

What is the expected output? What do you see instead?
I see lots of compile time errors.

What version of the product are you using? On what operating system?
Ubuntu 14.04, G++ 4.7, sofia-ml

Please provide any additional information below.
The following updates fixed everything for me:
sf-sparse-vector_test.cc
l27   //assert(x1.GetGroupId() == "2");
l75   //assert(x6.GetGroupId() == "3");

simple-cmd-line-helper.h
l68 #include <cstdlib>
l69 #include <stdio.h>

sofia-ml-methods_test.cc
l19 #include <cstdlib>

Original issue reported on code.google.com by [email protected] on 16 Jul 2014 at 4:55

sofia-kmeans diverging with increasing number of iterations?

What steps will reproduce the problem?
1. Create 2-dimensional data drawn from 2-dim multivariate Gaussian 
distributions with different means variance = 1. e.g 21 different 
distributions, lets say 1000 draws. Total at 21.000 points. (have tried many 
different variations and does not have any positive effect on the reported 
issue)

2. Train sofia-kmeans with any batch size (tested 500:500:5000) and with any 
number of k clusters (tested 64 128 256) using mini_batch_kmeans with fixed 
random seed.

command line: sofia-kmeans --k 64 --dimensionality 3 --random_seed 124 
--init_type random --opt_type mini_batch_kmeans --mini_batch_size 500 
--iterations 10 --objective_after_init --objective_after_training 
--training_file traindatafile.svmlight --model_out modelfile.sofia

3. Calculate the training error
command line: sofia-kmeans --model_in modelfile.sofia --test_file 
traindatafile.svmlight --objective_on_test --cluster_assignments_out 
trainingassignments.sofia

4. run this in a loop as a function of number of iterations. i ran [1 10 100e3 
500e3 and 1000e3]

What is the expected output? What do you see instead?
I expect that the training error would fall as a function of number of 
iterations used. Since it has fixed seed the random initialization is the same. 
This occurs until 100e3 then it start to diverge. i.e. the training error 
starts increasing dramatically. The training error becomes even larger than the 
random initialization. This is very puzzling to me.

What version of the product are you using? On what operating system?
svn checkout http://sofia-ml.googlecode.com/svn/trunk/sofia-ml 
sofia-ml-read-only
performed 10/3-2015
OS: Ubuntu 14.04

Please provide any additional information below.
Attached is the commands and output from sofia-kmeans (sofia_kmeans.txt) and 
furthermore all model, assignment and datafiles are provided to reproduce these 
finding (tmp.zip)

Original issue reported on code.google.com by [email protected] on 11 Mar 2015 at 12:36

Attachments:

Assertion failure in sf-kmeans-methods_test

What steps will reproduce the problem?
1. Grab source from SVN.
2. cd cluster-src/
3. make all_test

What is the expected output? What do you see instead?

Test fails with:

    sf-kmeans-methods_test: sf-kmeans-methods_test.cc:50: int main(int, char**):
    Assertion `cluster_centers_3->ClusterCenter(0).ValueOf(1) == 1.0' failed.

Adding some debug just before the assert failure resulting in:

    cluster_centers_3->ClusterCenter(0).ValueOf(1) : 0 (should be 1.0)

What version of the product are you using? On what operating system?

SVN version:

    r25 | [email protected] | 2010-04-28 04:52:54 +1000 (Wed, 28 Apr 2010) | 1 line

Running on x86_64 linux with gcc version 4.7.2 (Debian 4.7.2-5).

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 17 Feb 2013 at 12:28

gverdian / sofia-ml Goto Github PK

sofia-ml's People

Contributors

sofia-ml's Issues

Recommend Projects

Recommend Topics

Recommend Org