I was testing libFM, and one of my tests involved running libFM with the same train an

I generated some artificial data with this Python : <div class="highlight hi

I think that the test score is calculated here: <a href="https://github.com/srendle/li

Train and test performance seem to be calculated differently. about libfm HOT 11 CLOSED

srendle commented on July 28, 2024

Train and test performance seem to be calculated differently.

from libfm.

Comments (11)

ibayer commented on July 28, 2024 1

libFM uses a time dependent seed for the random initialization by default.
"seed", "integer value, default=None"
https://github.com/srendle/libfm/blob/master/src/libfm/libfm.cpp#L93
I think the results between runs should match if you set a seed.

from libfm.

breuderink commented on July 28, 2024

Using the same seed indeed prevents differences between runs. But what I try to report here is that the per-iteration training set and test set 'performance' differs, although I supplied the same data for both sets. I.e. in the snippet above, the train performance for iteration 99 is 0.52756, while the test performance on the same data is 0.530803. If I understand correctly, these numbers should be equal since the input data is equal.

This is based on my assumption that they are produced by computing some performance metric (like fraction correctly classified) on the predictions of the model (with parameters from that iteration), using either the training set and the validation set as input. But that assumption might be wrong.

from libfm.

ibayer commented on July 28, 2024

Can you check if this is also true with the option --method=ALS?

from libfm.

breuderink commented on July 28, 2024

Yes. With libFM -task c -train train.libfm -test train.libfm -method als there still is a small difference between the train and test scores.

from libfm.

ibayer commented on July 28, 2024

How small is the difference compared to the difference with MCMC? Is it it plausible that's just a small numerical error? Which error is correct (train or test)? You can use the last error and compare it against what you get when calculating the error yourself.

from libfm.

breuderink commented on July 28, 2024

I generated some artificial data with this Python script:

import random

with open('train.libfm', 'w') as f:
    for i in range(1000):
        # Write class.
        if i % 2 == 0:
            f.write('0')
        else:
            f.write('1')

        for j in range(100):
            f.write(' %d:%f' % (j, random.normalvariate(0, 1)))
        f.write('\n')

It generates alternating target labels, with 100 dense random features. The output looks like this:

...
#Iter= 97   Train=0.925 Test=0.997  Test(ll)=0.0801822
#Iter= 98   Train=0.913 Test=0.997  Test(ll)=0.0798717
#Iter= 99   Train=0.919 Test=0.997  Test(ll)=0.079558

It seems that it is overfitting, because the features are not informative. The difference is now relatively big. I have saved the output with the --out flag, and the results reported for Test= correspond to the accuracy calculated manually. So that part seems right. What could have caused the Train= score to deviate so much?

from libfm.

breuderink commented on July 28, 2024

I think that the test score is calculated here: https://github.com/srendle/libfm/blob/master/src/libfm/src/fm_learn_mcmc_simultaneous.h#L243, while the train score is mainly calculated here: https://github.com/srendle/libfm/blob/master/src/libfm/src/fm_learn_mcmc_simultaneous.h#L170-L172. The code path seems indeed different. So, what happens in the code path that computes the accuracy for the training set?

from libfm.

ibayer commented on July 28, 2024

libFM uses a few tricks like clipping prediction to highest / lowest vales. Maybe one of this tricks in only applied to the test predictions.

from libfm.

srendle commented on July 28, 2024

The printed train accuracy is calculated for one MCMC draw. The test accuracy over all draws (i.e., an average). I agree that this is misleading and both measures should report either the average or one draw.

In general, I would recommend to look at the log-file and not at std::out. The log file is more verbose and reports all test-values: one draw, all draws, all but 5 draws. It contains loglikelihood and accuracy for these measures.

from libfm.

breuderink commented on July 28, 2024

Thanks for the elaboration. I'll take a look at the log file to see if I understand it.

from libfm.

ChenKevin0123 commented on July 28, 2024

where can download train and test data？I can only find movie，rating，user，tags data on movielens.

from libfm.

Train and test performance seem to be calculated differently. about libfm HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent