Giter Site home page Giter Site logo

Comments (11)

ibayer avatar ibayer commented on July 28, 2024 1

libFM uses a time dependent seed for the random initialization by default.
"seed", "integer value, default=None"
https://github.com/srendle/libfm/blob/master/src/libfm/libfm.cpp#L93
I think the results between runs should match if you set a seed.

from libfm.

breuderink avatar breuderink commented on July 28, 2024

Using the same seed indeed prevents differences between runs. But what I try to report here is that the per-iteration training set and test set 'performance' differs, although I supplied the same data for both sets. I.e. in the snippet above, the train performance for iteration 99 is 0.52756, while the test performance on the same data is 0.530803. If I understand correctly, these numbers should be equal since the input data is equal.

This is based on my assumption that they are produced by computing some performance metric (like fraction correctly classified) on the predictions of the model (with parameters from that iteration), using either the training set and the validation set as input. But that assumption might be wrong.

from libfm.

ibayer avatar ibayer commented on July 28, 2024

Can you check if this is also true with the option --method=ALS?

from libfm.

breuderink avatar breuderink commented on July 28, 2024

Yes. With libFM -task c -train train.libfm -test train.libfm -method als there still is a small difference between the train and test scores.

from libfm.

ibayer avatar ibayer commented on July 28, 2024

How small is the difference compared to the difference with MCMC? Is it it plausible that's just a small numerical error? Which error is correct (train or test)? You can use the last error and compare it against what you get when calculating the error yourself.

from libfm.

breuderink avatar breuderink commented on July 28, 2024

I generated some artificial data with this Python script:

import random

with open('train.libfm', 'w') as f:
    for i in range(1000):
        # Write class.
        if i % 2 == 0:
            f.write('0')
        else:
            f.write('1')

        for j in range(100):
            f.write(' %d:%f' % (j, random.normalvariate(0, 1)))
        f.write('\n')

It generates alternating target labels, with 100 dense random features. The output looks like this:

...
#Iter= 97   Train=0.925 Test=0.997  Test(ll)=0.0801822
#Iter= 98   Train=0.913 Test=0.997  Test(ll)=0.0798717
#Iter= 99   Train=0.919 Test=0.997  Test(ll)=0.079558

It seems that it is overfitting, because the features are not informative. The difference is now relatively big. I have saved the output with the --out flag, and the results reported for Test= correspond to the accuracy calculated manually. So that part seems right. What could have caused the Train= score to deviate so much?

from libfm.

breuderink avatar breuderink commented on July 28, 2024

I think that the test score is calculated here: https://github.com/srendle/libfm/blob/master/src/libfm/src/fm_learn_mcmc_simultaneous.h#L243, while the train score is mainly calculated here: https://github.com/srendle/libfm/blob/master/src/libfm/src/fm_learn_mcmc_simultaneous.h#L170-L172. The code path seems indeed different. So, what happens in the code path that computes the accuracy for the training set?

from libfm.

ibayer avatar ibayer commented on July 28, 2024

libFM uses a few tricks like clipping prediction to highest / lowest vales. Maybe one of this tricks in only applied to the test predictions.

from libfm.

srendle avatar srendle commented on July 28, 2024

The printed train accuracy is calculated for one MCMC draw. The test accuracy over all draws (i.e., an average). I agree that this is misleading and both measures should report either the average or one draw.

In general, I would recommend to look at the log-file and not at std::out. The log file is more verbose and reports all test-values: one draw, all draws, all but 5 draws. It contains loglikelihood and accuracy for these measures.

from libfm.

breuderink avatar breuderink commented on July 28, 2024

Thanks for the elaboration. I'll take a look at the log file to see if I understand it.

from libfm.

ChenKevin0123 avatar ChenKevin0123 commented on July 28, 2024

where can download train and test data?I can only find movie,rating,user,tags data on movielens.

from libfm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.