Giter Site home page Giter Site logo

jlibffm's Introduction

JLibFFM

A Java implementation of LIBFFM: A Library for Field-aware Factorization Machines

Description

LIBFFM is an open source tool for field-aware factorization machines (FFM).For the formulation of FFM, please see this paper. It has been used to win the top-3 in recent click-through rate prediction competitions (Criteo, Avazu, Outbrain, and RecSys 2015). JLibFFM is the Java version of LIBFFM.

Dependencies and requirements

Please note that the code is written in Java, and this project is a Maven project.

How to run

Please go to the project folder and run the command "mvn clean package", then we will get a archive file in the sub folder "target", it is "JLibFFM.jar".

You can find a description of supported data format at here("Data Format" section).

(1) an example:

 java -Xms10240M -Xmx20480M -jar JLibFFM.jar -e 0.1 -l 0.0001 -t 15 -k 8 -r true -n true -a false -s 8 -i tr_std.csv.sam -p va_std.csv.sam

(2) the meaning of the parameters:

  -p     set path to the validation set
  -a     stop at the iteration that achieves the best validation loss (must be used with -p).
  -help  print help
  -r     By default we do data shuffling, you can use  `-r false' to disable this function.
  -s     set number of threads (default 1)
  -t     set number of iterations (default 15)
  -e     set learning rate (default 0.2)
  -i     set path to the training set
  -k     set number of latent factors (default 4)
  -l     set regularization parameter (default 0.00002)
  -n     By default we do instance-wise normalization. That is, we normalize the 2-norm of each instance to 1. You can use  `-n false' to disable this function.

Evaluation data

The data comes from Criteo.

We follow the approach which was proposed by YuChin Juan, Wei-Sheng Chin, and Yong Zhuang.

(1) Download the data set from Criteo.

http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/

(2) Decompress and make sure the files are correct.

$ md5sum dac.tar.gz
df9b1b3766d9ff91d5ca3eb3d23bed27  dac.tar.gz
$ tar -xzf dac.tar.gz
$ md5sum train.txt test.txt
4dcfe6c4b7783585d4ae3c714994f26a  train.txt
94ccf2787a67fd3d6e78a62129af0ed9  test.txt

(3) Use `CriteoDataPipeline' to convert training data(include Feature engineering).

$ java -Xms10240M -Xmx10240M -cp JLibFFM.jar com.github.gaterslebenchen.libffm.examples.CriteoDataPipeline -f tr -s 8 -i train.txt -o train-ctiteo.csv

the meaning of the parameters:    
 -help           print parameters
 -s              set number of threads (default 1)
 -t              set the temporary Files path(default is current directory)
 -f              use  `-f tr' for training data and `-f te' for test data
 -i              set the input file path
 -o              set the output file path

(4) split data to train data and validation data.

 $ java -cp JLibFFM.jar com.github.gaterslebenchen.libffm.examples.SplitData train-ctiteo.csv '$your file output folder$' false
 
 '$your file output folder$' is your actual file output folder.

(5) run the training program:

  java -Xms10240M -Xmx30720M -jar JLibFFM.jar -e 0.1 -l 0.0001 -t 15 -k 8 -r true -n true -a false -s 8 -i tr_std.csv -p va_std.csv

with these parameters, our best loss is: ** 0.43853 **.

How to save and load model

The Main class com.github.gaterslebenchen.libffm.Main has saveModel and loadModel methods.

jlibffm's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.