cjlin1 / libmf Goto Github PK

View Code? Open in Web Editor NEW

200.0 200.0 79.0 2.83 MB

License: Other

Makefile 0.58% C++ 99.42%

libmf's People

Contributors

Stargazers

Watchers

Forkers

yixuan chenbk85 mindis caomw huangfish cequencer keeganren westamine cv-ip xielm12 daniel-perry ty01csbaidu a814232002 wenbotse juntakagi kazewind22 blitzglep1326 liuxialong dikoufu minghao2016 qiuyuew benjamesbabala grseb9s kmodi4 mleimeister strategist922 guancw wschin etonchow zgsxwsdxg hengji-liu mackenbaron gegetang shunsunsun yjwcode andriisoldatenko fenildf littlepig2013 charlesfuture knowledgehacker annseidel praysunday sayaksam hu739317870 hahsieh zhangf911 haihachan defaultrobot retissou mingyidong nfreewind finesure2017 gediminaszylius matrix23-tech ankane mstfbl yueyedeai stjordanis silentdawn-zz yiqxiaobai whiteheadgoose howg0924 jilljenn jyhsia5174 felix660 juyongjiang mu-l unixjunkie zhenanf alpara akmullen wellfrogliu muhui00 tuantx7110 ioanna24 samhvw8 zilongxie bise86 ericstj

libmf's Issues

Calculation Error in line 601

libmf/mf.cpp

Line 601 in e70b9a3

XMM = _mm_add_ps(XMM, _mm_shuffle_ps(XMMtmp, XMMtmp, 1));

Line 600 and 601 are trying to add 4 floats to a single float. That is, they try to do
XMM[31:0] = XMM[31:0] + XMM[63:32] + XMM[95:64] + XMM[127:96].

Line 600 first add two pairs of floats. It does the following operation:
XMMtmp[63:0] = XMM[63:0] + XMM[127:64].

The current line 601 has the following operation:
XMM[31:0] = XMM[31:0] + XMMtmp[63:32].
But, I think the correct operation should be
XMM[31:0] = XMMtmp[31:0] + XMMtmp[63:32].

Therefore, line 601 should change to
"XMM = _mm_add_ps(XMMtmp, _mm_shuffle_ps(XMMtmp, XMMtmp, 1));".

nan with disk-level training

Hi, when passing the --disk option, LIBMF prints nan for the tr_rmse and obj with the demo data.

./mf-train --disk demo/real_matrix.tr.txt

Output:

iter      tr_rmse          obj
   0          nan          nan
   1          nan          nan
   2          nan          nan
   3          nan          nan
   4          nan          nan
   5          nan          nan
   6          nan          nan
   7          nan          nan
   8          nan          nan
   9          nan          nan
  10          nan          nan
  11          nan          nan
  12          nan          nan
  13          nan          nan
  14          nan          nan
  15          nan          nan
  16          nan          nan
  17          nan          nan
  18          nan          nan
  19          nan          nan

Without the --disk option, it prints numeric values.

Windows incompatibility message

Dear all,

When I tried running the executable recently, I received this error message below, for your information. My OS is Windows 10.

help with code

How can I run your library? to factorize a matrix in CSV file

row-oriented pair-wise logistic loss

请问 row-oriented pair-wise logistic loss是指对于负反馈的样本我们随机选择一个物品作为这个用户的负反馈物品吗？然后做logistics

bug

I think the '&&' below should be change to '||' or it could be deleted.
if(model->fun == P_L2_MFC &&
| model->fun == P_L1_MFC &&
| model->fun == P_LR_MFC)
| z = z > 0.0f? 1.0f: -1.0f;

LIBMF++ / RPCS

Hi @cjlin1, I recently came across your paper on LIBMF++. It sounds like RPCS has significant performance benefits. Are there any plans to bring it to LIBMF?

Repetitive error check on nr_bin and nr_thread

libmf/mf.cpp

Line 4022 in e70b9a3

if(param.nr_bins <= 2*param.nr_threads)

libmf/mf.cpp

Line 4028 in e70b9a3

if(param.nr_bins <= 2*param.nr_threads)

OCMF Parallelization Implementation?

I understand that the algorithm implemented is the BPR proposed by Rendle et al. Based on my understanding of BPR, for each iteration, the model takes a triple or (u, i, j) and try to optimize towards the difference between Xui and Xuj. In Rendle's paper it was only mentioned that randomly choosing the triples is suggested. However, we couldn't find a paper from your team or information on your website and this repo on how did your package achieve parallelization for this algorithm. Is there any paper or information on the implementation details that you can share?

Thank you.

大文件25G训练时出现段错误 (核心已转储) 错误

在训练25G大文件时出现段错误 (核心已转储) 错误

Python interface not working on python3

The python interface errors when importing. It seems likely that it is due to the python2.7 formatted print statements. Error below:

File "", line 1, in
from libmf import mf

File "/usr/local/lib/python3.7/dist-packages/libmf/mf.py", line 78
print "Unrecognized keyword argument '{0}={1}'".format(kw, kwargs[kw])
^
SyntaxError: invalid syntax

Rust Bindings

Hi, just wanted to share that there are now Rust bindings for LIBMF. Thanks again for all the great work on it!

Hi, how to decrease the learning as the number of iterations increasing?

I found that if learning rate is too large, it converges very slowly as the number iterations increasing.

how to set slow_only = True when iter > 0

I only find

2910 if(iter == 0)
2911 slow_only = false;

from mf.cpp. As twin learner described, when iter > 0, slow_only must to be set True.

But I can't find this code.

Can anybody tell me where is it?

mf-train: command not found

Dear actor!When I try to set up the programme in Ubuntu OS,there is a error that tf-train command not found.Hope your answer as soon as possible,Thank you!

Training stock at the last training iteration

It looks like the scheduler has a bug so that some threads are waiting forever.

NaN Values during training

I am currently trying to factorize a matrix using the MF_Solver with the KL Loss function. I experience NaN values during my training either after the 1st or the 2nd iteration. After creating small test cases, I have found that there might be an issue with large gradients causing negative values which are then converted to 0's in the sg_update function. This however seems to create 0 rows/columns in either my P or Q matrix which are not dealt with, as the prepare_for_sg_update function will calculate z=0, which results in NaN values which are carried through the calculations and eventually result in the whole model being filled with NaN values.

Can the algorithm (when calculating 1/z, one might consider 1/(z+epsilon) with epsilon>0) or my parameters (especially the learning rate) be adjusted to handle such cases?

Do you have some additional idea what might be causing NaN values during training?

对于训练集中没有出现的三元组

训练集是稀疏格式，只保留非0项，对应的0项应该默认提供 min ||pq-0||^2 的梯度？

Error always Inf. when using GKL

Hello, I am trying to evaluate libmf, but when I use the GLK NNMF flags, I notice that no matter how I tune the parameters, I always get inf. for both the loss function and the error.

User item bias

In the appendix of the original paper, A Fast Parallel SGD for Matrix Factorization in Shared Memory Systems, the formula states that LIBMF considers biases.

But it seems that this repository doesn't support this feature.

Is there a version that takes biases into consideration available?

Move binaries to GitHub Release Service

As title. We have pre-built executables for Windows and want to move them out of this repo.

Ruby Library

Hi, thanks for this great library! Just wanted to let you know there are now Ruby bindings. It works great for collaborative filtering with both explicit and implicit feedback.

Row-oriented logistic loss for prediction

Hi,

I am trying to use the executable to conduct experiments. However, for the same parameters, the results seem to differ from the R wrapper Recosystem.

If I want to use row-oriented logistic loss for my prediction, can I check if
mf-train -l1 0 -l2 0 -f 10 -k 20 -t 20 -r 0.1 train_positive.txt bpr_model.txt
mf-predict -e 10 testset.txt bpr_model.txt bpr_output.txt

are the correct commands to enter?

Thank you very much!

Possible overflow in matrix access

When trying to factorize a large matrix, I had a lot of weird errors in my calculations, which turned out to be due to seemingly random access to the model matrices P and Q. I found that in the calc_reg1_core function, there is a missing cast to mf_long in the matrix access:

for(mf_int j = 0; j < model.k; j++)
    tmp += abs(ptr[i*model.k+j]);

needs to be

for(mf_int j = 0; j < model.k; j++)
    tmp += abs(ptr[(mf_long)i*model.k+j]);

since i*model.k+j could be larger than INT_MAX

cjlin1 / libmf Goto Github PK

libmf's People

Contributors

Stargazers

Watchers

Forkers

libmf's Issues

Recommend Projects

Recommend Topics

Recommend Org