Giter Site home page Giter Site logo

deep-ctr's Introduction

Deep Learning for Ad CTR Estimation

NOTE: we have upgraded the code of this repository here with TensorFlow and more advanced models in our new paper Product-based Neural Network for User Response Prediction.

This repository hosts the code of several proposed deep learning models for estimating ad click-through rates, implemented with Theano. The research paper Deep Learning over Multi-field Categorical Data – A Case Study on User Response Prediction has been published on ECIR 2016.

Different from traditional deep learning tasks like image or speech recognition, where neural nets work well on continuous dense input features, for ad click-through rate estimation task, the input features are almost categorical and of multiple field. For example, the input context feature could be City=London, Device=Mobile. Such multi-field categorical features are always transformed into sparse binary features via one-hot encoding, normally millions of dimensions. Tranditional DNNs cannot work well on such input data beacuse of the large dimension and high sparsity.

This work tries to address the above problems and the experiment results are promising. The corresponding research paper "Deep Learning over Multi-Field Categorical Data: A Case Study on User Response Prediction" has been accepted and will be published in ECIR 2016.

Note that this is just the authors' first attempt of training DNN models to predict ad click-through rate. Significant efforts on research and engineering will be made further on this project.

More any questions please contact Weinan Zhang ([email protected]) and Tianming Du ([email protected]).

Code Installation and Running

Theano and dependant packages (e.g., numpy and sklearn) should be pre-installed before running the code.

After package installation, you can simple run the code with the demo tiny dataset.

python FNN.py      # for FNN
python SNN_DAE.py  # for SNN_DAE
python SNN_RBM.py  # for SNN_RBM

The descriptions of the proposed models (FNN, SNN) are available in the above research paper, which will be available soon.

Note: directly running above code only checks the success of system installation. The input training/test are very small sample datasets, where the deep models are not effective. For large-scale datasets, please refer

Note: In our further practice on very large data, the FM initialisation is not necessary any more to train a good FNN.

deep-ctr's People

Contributors

tianmingdu avatar wnzhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-ctr's Issues

do some configurations not mentioned?

below is SNN_*'s result running on my envi, is there any configuration wrong causing the poor results...

my command THEANO_FLAGS=mode=FAST_RUN,device=gpu3,floatX=float64,nvcc.fastmath=True,optimizer_including=cudnn python *.py

Using gpu device 3: Tesla K40m (CNMeM is disabled)
../data/train.fm.txt
drop_mlp4da.py|ad:2997|drop:0.98|b_size:1000 | X:133465 | Hidden 0:200 | Hidden 1:300 | Hidden 2:100 | L_r:0.001 | activation1:tanh | lambda:0
training RBM
line: 133465
line: 200
Done epoch 1: 496.320000 seconds, MSE=2.042194
Done epoch 2: 495.180000 seconds, MSE=2.067020
Done epoch 3: 488.930000 seconds, MSE=2.081377
line: 300
Done epoch 1: 18.190000 seconds, MSE=52.485198
Done epoch 2: 19.510000 seconds, MSE=52.301880
Done epoch 3: 20.080000 seconds, MSE=51.954111
line: 100
Done epoch 1: 21.250000 seconds, MSE=57.726292
Done epoch 2: 20.790000 seconds, MSE=57.654725
Done epoch 3: 20.770000 seconds, MSE=57.515879
Training model:
training: 0m 10s
Training Err: 0 0.496572435445 0.0974163483508
training error: 0m 3s
Test Err:0 0.431917535545 0.053397720153
test error: 0m 2s
training: 0m 10s
Training Err: 1 0.498144730401 0.0973837941595
training error: 0m 3s
Test Err:1 0.428466824645 0.0530087656782
test error: 0m 3s
training: 0m 10s
Training Err: 2 0.494184712123 0.097358012913
training error: 0m 4s
Test Err:2 0.416555924171 0.0525893110747
test error: 0m 3s
training: 0m 10s
Training Err: 3 0.490345884187 0.0973344273793
training error: 0m 3s
Test Err:3 0.398964454976 0.052395568973
test error: 0m 3s
training: 0m 10s
Training Err: 4 0.474640680618 0.0973187175469
training error: 0m 3s
Test Err:4 0.418818009479 0.0519934911369
test error: 0m 3s

Minimal test error is 0.431917535545 , at EPOCH 0

Using gpu device 3: Tesla K40m (CNMeM is disabled)
../data/train.fm.txt
drop_mlp4da.py|ad:2997|drop:0.99|b_size:1000 | X:133465 | Hidden 0:200 | Hidden 1:300 | Hidden 2:100 | L_r:0.0005 | activation1:tanh | lambda:0
Training epoch 0, cost 0.0994964612935
Training epoch 1, cost 0.00828848277617
Training epoch 2, cost 0.00484525396831
Training epoch 0, cost 134.467684274
Training epoch 1, cost 134.398491925
Training epoch 2, cost 134.386115337
Training epoch 0, cost 152.707873105
Training epoch 1, cost 152.651793831
Training epoch 2, cost 152.644853724
Training model:
training: 0m 11s
Training Err: 0 0.5 0.0975933509993
training error: 0m 4s
Test Err:0 0.5 0.0486215139663
test error: 0m 3s
training: 0m 11s
Training Err: 1 0.51468813636 0.0971287952011
training error: 0m 4s
Test Err:1 0.566842654028 0.0490243392275
test error: 0m 3s
training: 0m 10s
Training Err: 2 0.513200800344 0.0971307412489
training error: 0m 4s
Test Err:2 0.553879620853 0.0489829048535
test error: 0m 3s
training: 0m 10s
Training Err: 3 0.51709338385 0.0971302388917
training error: 0m 3s
Test Err:3 0.559997156398 0.0489918453575
test error: 0m 3s
training: 0m 11s
Training Err: 4 0.513745306555 0.0971290437782
training error: 0m 3s
Test Err:4 0.564237440758 0.0490177193574
test error: 0m 3s
training: 0m 10s
Training Err: 5 0.511624808306 0.0971309889204
training error: 0m 4s
Test Err:5 0.571770616114 0.0489790098352
test error: 0m 3s
training: 0m 11s
Training Err: 6 0.514979392466 0.097128609435
training error: 0m 4s
Test Err:6 0.573091943128 0.0490302115227
test error: 0m 3s
training: 0m 11s
Training Err: 7 0.516966573944 0.0971283619885
training error: 0m 4s
Test Err:7 0.550987203791 0.0490390031933
test error: 0m 3s
training: 0m 10s
Training Err: 8 0.518681280492 0.0971314928913
training error: 0m 4s
Test Err:8 0.566776777251 0.0489713436744
test error: 0m 3s
training: 0m 11s
Training Err: 9 0.518882401743 0.0971282811621
training error: 0m 3s
Test Err:9 0.571286255924 0.0491369050573
test error: 0m 3s
training: 0m 11s
Training Err: 10 0.51660899219 0.097128478358
training error: 0m 3s
Test Err:10 0.559154028436 0.0490346202497
test error: 0m 3s

Minimal test error is 0.573091943128 , at EPOCH 6

Code mistake?

Is there an error in the following statement to update weight best_w3 in file FNN.py ?

best_w3=w1.get_value()

Paper CTR AUC results

Hello, I am sorry for disturbing you.
I am wondering if the stats you have in the paper, under "all" is for all 9 campaigns or just those you mention in table1. Furthermore, I assume that the value for "all" is not just taking the average of the campaigns in table1(i.e 70.52+69.74 +62.99+.. /5 ) for FNN, but counting their click prediction error individually for all the campaings and then take the average.
Thank you in advance.

questions about initialization of level-0 weight W_0^i and local FM

the 6th row of page 5 in the paper, it's about the initialization of W_0^i(0 is the subscript and i is the superscript), W_0^i[0] is initial by w_i, W_0^i[1] by v_i^1。here i have a question:W_0^i[0] is a vector, as far as i understand, W_0^i[0] is the first row of matrix W_0^i, initial by w_i means the whole elements of the vector are all w_i?

another question:as u say, FM is not fully connect, which means FM is only used in each field? and if FM is only used in each field, then the second order interaction feature (< v_i, v_j > x_i * x_j in fomula 6) is useless because in each field it's a one-hot encoding, so only one index of x_i is 1 and others are 0 in the field? but if FM is used in the whole feature x, the parameters are same as 100 * 1m, contradict with that FM reduces the parameters。 Am i misunderstand something?

AttributeError: 'Tensor' object has no attribute '_uses_learning_phase'

Hello,
I'm trying all your models in order to determine which one fits the most with my acceptation. This is for CTR prediction with Avazu data.
I tried them all I have mistakes in four of them, some of which are probably due to me, the one I understand the least is the following :
<< AttributeError: 'Tensor' object has no attribute '_uses_learning_phase' >>

Here is my code (which is more or less the example code you gave for criteo) :

model = DIN({"sparse": sparse_feature_list,"dense": dense_feature_list},
sparse_feature_name,task='binary')

model.compile("adam", "binary_crossentropy", metrics=['binary_crossentropy'], )

history = model.fit(train_model_input, train[click].values,
batch_size=226, epochs=5, verbose=2, validation_split=0.2, )


Just so you know here is the output of :

print( sparse_feature_list )
[SingleFeat(name='site_id', dimension=1000, hash_flag=True, dtype='string'),
SingleFeat(name='site_domain', dimension=1000, hash_flag=True, dtype='string'),
SingleFeat(name='site_category', dimension=1000, hash_flag=True, dtype='string'),
SingleFeat(name='app_id', dimension=1000, hash_flag=True, dtype='string'),
SingleFeat(name='app_domain', dimension=1000, hash_flag=True, dtype='string'),
SingleFeat(name='app_category', dimension=1000, hash_flag=True, dtype='string'),
SingleFeat(name='device_id', dimension=1000, hash_flag=True, dtype='string'),
SingleFeat(name='device_ip', dimension=1000, hash_flag=True, dtype='string'),
SingleFeat(name='device_model', dimension=1000, hash_flag=True, dtype='string')]
print(dense_feature_list)
[SingleFeat(name='id', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='C1', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='banner_pos', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='device_type', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='device_conn_type', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='C14', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='C15', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='C16', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='C17', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='C18', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='C19', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='C20', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='C21', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='hour', dimension=0, hash_flag=False, dtype='float32'),
SingleFeat(name='weekday', dimension=0, hash_flag=False, dtype='float32')]

If you have any comment on what I might do wrong don't hesitate to tell me !

How is your data handled?

希望我用中文问问题不会失礼~
研一小白想请教您几个问题:
1.数据featindex.txt和featindex.fm.txt是什么关系?我观察到featindex.txt中大多数是8、10、12列的编码(在这里要再问一句这些编码是自己随意设定的吗?感觉没有顺序呀?)别的列的编码呢?
2.数据中标签为1的样本数量远小于标签为0的样本,需要做什么操作来处理这种情况吗?样本的不均衡会影响结果吗?

why net for train and pred is different?

the network for train and pred is different according to the code in fnn.py
for train is: data(x) -> dropout(x) -> full connect(z1) -> active(h1) -> full connect(z2) -> active(h2) ---> 1 / (1 + T.exp(-T.dot(h2, w3) - b3))
for pred is: data(x) -> dropout(x) -> full connect(z1) -> active(h1) -> dropout(d1) -> full connect(d2) -> dropout(d2) ---> (1 / (1 + T.exp(-T.dot(d2, w3) - b3)))

is there any thing i've missed?

and when update feat_weights, why is b_size in (1 - 2. * lambda_fm * lr / b_size) is needed?

THS

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.