mjugo / streamingrec Goto Github PK
View Code? Open in Web Editor NEWA news recommendation evaluation framework
License: Apache License 2.0
A news recommendation evaluation framework
License: Apache License 2.0
Hi,
I tried to process the Outbrain dataset with 'java -cp StreamingRec.jar org.streamingrec.data.loading.ReadOutbrain --input-folder=<folder_to_outbrain_files> --out-items=<path_to_item_output_file> --out-clicks=<path_to_clicks_output_file> --publisher=43' and got the Events.csv, the first 5 rows are as follows:
Publisher | Category | ItemID | Cookie | Timestamp | keywords |
---|---|---|---|---|---|
1707327 | 18787 | 1465876800836 | NaN | NaN | NaN |
1513276 | 134357 | 1465876801429 | NaN | NaN | NaN |
1766890 | 38208 | 1465876801758 | NaN | NaN | NaN |
1513276 | 136834 | 1465876802078 | NaN | NaN | NaN |
830700 | 2429 | 1465876802136 | NaN | NaN | NaN |
I reviewed the code and found that the Publisher = document_id, Category = user_id, and ItemID = timestamp, which is really peculiar. Could you please explain why you are processing the dataset in such a way? Why the Timestamp is NaN in stead of the true timestamp? Thanks a lot.
Hi Michael,
We were reading your paper and hoping to find the GRU4Rec evaluation, particularly: "For GRU4Rec, we employed a heuristic to sample from the more recent sessions to take recent temporal shifts into account, as proposed in [45]."
In [45], We were able to find the following:
"We propose a simple solution to get the best of both worlds via pre-training. We first train a model on the entire dataset. The trained model is then used to initialize a new model, which is only trained using only a more recent subset of the data, e.g. the last month worth of data out of a year of click sequences. This allows the model to have the benefit of a good initialization using large amounts of data, and yet is focused on more recent click-sequences. In this way, it resembles the fine-tuning process used in training of image-based networks [2], where the models are typically initialized by pre-training on ImageNet (a large image classification dataset) before the weights are fine-tuned on a smaller image dataset in the desired domain."
How was the sampling done in particular? Are there open-source template codes?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.