mjugo / streamingrec Goto Github PK

View Code? Open in Web Editor NEW

43.0 4.0 15.0 100 KB

A news recommendation evaluation framework

License: Apache License 2.0

Java 100.00%

news recommendation evaluation incremental streaming stream framework evaluation-metrics recommender-system

streamingrec's Issues

Issue of pre-processing the outbrain dataset

Hi,

I tried to process the Outbrain dataset with 'java -cp StreamingRec.jar org.streamingrec.data.loading.ReadOutbrain --input-folder=<folder_to_outbrain_files> --out-items=<path_to_item_output_file> --out-clicks=<path_to_clicks_output_file> --publisher=43' and got the Events.csv, the first 5 rows are as follows:

Publisher	Category	ItemID	Cookie	Timestamp	keywords
1707327	18787	1465876800836	NaN	NaN	NaN
1513276	134357	1465876801429	NaN	NaN	NaN
1766890	38208	1465876801758	NaN	NaN	NaN
1513276	136834	1465876802078	NaN	NaN	NaN
830700	2429	1465876802136	NaN	NaN	NaN

I reviewed the code and found that the Publisher = document_id, Category = user_id, and ItemID = timestamp, which is really peculiar. Could you please explain why you are processing the dataset in such a way? Why the Timestamp is NaN in stead of the true timestamp? Thanks a lot.

GRU4Rec evaluation

Hi Michael,

We were reading your paper and hoping to find the GRU4Rec evaluation, particularly: "For GRU4Rec, we employed a heuristic to sample from the more recent sessions to take recent temporal shifts into account, as proposed in [45]."

In [45], We were able to find the following:

"We propose a simple solution to get the best of both worlds via pre-training. We first train a model on the entire dataset. The trained model is then used to initialize a new model, which is only trained using only a more recent subset of the data, e.g. the last month worth of data out of a year of click sequences. This allows the model to have the benefit of a good initialization using large amounts of data, and yet is focused on more recent click-sequences. In this way, it resembles the fine-tuning process used in training of image-based networks [2], where the models are typically initialized by pre-training on ImageNet (a large image classification dataset) before the weights are fine-tuned on a smaller image dataset in the desired domain."

How was the sampling done in particular? Are there open-source template codes?

Thanks.

mjugo / streamingrec Goto Github PK

streamingrec's Issues

Issue of pre-processing the outbrain dataset

GRU4Rec evaluation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent