recsyschallenge / 2016 Goto Github PK

RecSys Challenge 2016: job recommendations

Home Page: http://2016.recsyschallenge.com/

TeX 100.00%

2016's Introduction

ACM RecSys Challenge 2016

About: Job Recommendations

Given a XING user, the goal of the RecSys challenge is to develop a recommender system that predicts those job postings that a user will interact with.

Pointers:

Website with some basic infos: http://2016.recsyschallenge.com/
Submission system and datasets: https://recsys.xing.com
On github:
- Description of training data: TrainingDataset.md
- Evaluation Measure
- Simple baselines

Data

Description of training data: TrainingDataset.md
Download training data (requires login): https://recsys.xing.com/data

Evaluation

Details about evaluation: Evaluation Measure

News

July 14th: Reminder regarding paper submissions

This is a reminder that the deadline for the corresponding workshop papers in which you describe your solutions is next week: July 20th.

Instructions: paper-submissions

We are looking forward to your papers and seeing you in Boston!

June 22: Final Sprint, Paper Submissions

Dear RecSys 2016 Challenge Participant,

The RecSys 2016 Challenge is proceeding very well. We are very happy to see such a high number of active teams (more than 300 teams registered and 100 teams submitted solutions). An overview of deadlines and additional info about the Challenge is available at: http://2016.recsyschallenge.com However, we would like to highlight the following points:

Challenge Deadline: June 26th (23:59 Hawaiian time)

The deadline for submitting Challenge solutions is approaching: 26 June (Just under one weeks to go!). We are impressed with what we have been seeing on the leaderboard, but we point out that there is ample opportunity for further improvement. By our estimates, the current top score is approx. 300k points below the recommender currently in use by XING.

RecSys Challenge Workshop, Paper Submissions

The RecSys Challenge Workshop at ACM RecSys 2016 provides you with an opportunity to discuss your algorithms with XING and with the other contenders, to gain additional insight.

Each team is required to submit a paper to the workshop (max. 4 pages in length) describing their algorithm. See: http://2016.recsyschallenge.com/#paper-submissions
Deadline for submitting workshop papers: 20 July
The paper will be reviewed by the workshop PC, and acceptance notifications will go out on 1 August. Note that reviewers are very interested in the innovative potential of your ideas: scores are important to claim the prize money, but we are interested in also digger deeper into all viable approaches that participants have taken to the problem.
We anticipate that everyone is eager to submit a paper to the workshop. However, for completeness, we also remind you that according to the Challenge rules without a paper you will be removed from the leaderboard.
We expect you to be present at the RecSys Challenge Workshop at ACM RecSys 2016 to present your paper. In case of extenuating circumstances, please contact Martha Larson as soon as you become aware of them.

We wish you all the best during the final week of the Challenge and we look forward to seeing you at the RecSys 2016 Challenge Workshop at ACM RecSys 2016.

Best regards, Martha, András, Róbert, Daniel and Fabian

2016's People

Contributors

Stargazers

Watchers

Forkers

alphaprime sumitsidana jjyycchh chaitanyamalaviya anki1909 bikash617 wyixian denisparra dust-n-bones dxd132630 qiucode jcastro-inf smokarizadeh herrbuerger mahsabadami cesarjimenez uduwage tvkpz sonyfe25cp apacuk veterun kushkhosla cc213 sausax beifeizhou nlakshmipathi giering sohomghosh zhouyonglong fangego sam186 gelang93 thanhtd91 shubhampachori12110095 nitinhardeniya kimulsanne katnisszmx sept23 sam-xgen yxsllgz afnanamin rautnitesh

2016's Issues

about "creat_at"

how long does the "creat_at" contained in the interactions.csv file represent? a year?

Regarding Dataset

When will the data-set be available ??

Can you tell me that the meaning of the items order & repetition in "Impressions.csv"?

Many records in "Impressions.csv" show as following:
user_id year week items
1842650 2015 41 1386585,1386585,2139076,766293,977414,1133414,1163212

.............. ........ ... ............................

Could you tell me that the meaning of the item order & repetition in "Impressions.csv"?
(If the rules allow it)
thank you ^o^

About the week in impressions.csv

The week begins with 0 or 1 in the year? And one week begins with Monday or Sunday in this comp.

Problem about Evaluation Measure

function userSuccess(recommendedItems, relevantItem) = {
if (intersect(recommendedItems, relevantItems) > 0)
1.0
else
0.0
}
should be changed to
function userSuccess(recommendedItems, relevantItem) = {
if (intersect(recommendedItems.take(30), relevantItems) > 0)
1.0
else
0.0
}

CNAME

create a file called CNAME (in gh-pages) containing the line:
2016.recsyschallenge.com

Deadline for submitting solutions

What is the exact date for submitting solutions?

Invalid user_ids

There are user_ids in impressions that don't appear in users.

For example, the first user_id in impressions is 1842650, which isn't a string that is not in users.
Is this intentional? According to a script I wrote, around 33% of user_ids in impressions are invalid.

about the target_users.csv

About the uploaded paper

I have a question, Does the accepted papers will be published, or just slides?

Why my team is still waiting for approval ?

I've created the team for about a week ago, When it can be approved ?

thanks

Workshop proceedings

We will post the link to the proceedings here and close this issue as soon as the workshop proceedings are published by ACM.

Question on the order of the test set

Since the score is related to the order of items in the test set, can we know the rule of the order? If we don't know that, I think the offline evaluation is ugly. THX.

How many teams that participate in the challenge can be allowed to attend the workshop?

My team is waiting for approval for several days.

Our team is waiting for approval for several days,why the process is so slow? -_-

question on Metric : Precision at k. What if len(recommendation) < k

In the document, the pseudocode returns

topK = recommendedItems.take(k)
return intersect(topK, relevantItems).size / k

My question is, what if L = len(recommendedItems) < k?
Does the metric return intersect(topK, relevantItems).size / min(k, L)?

In other words, does the evaluation encourage submitting less items but accurate or more that seeking good recall?

Thanks!

Kuan

We have created a team

Excuse me, we have created a team named "Cine".
Hope it could be approved if convenient.

Question about rounding and click sequence in dataset

Hi,

in dataset description of the interactions we can read:

timestamp (Unix timestamp, rounded to 5 min)

Are You planing to do this rounding in a way that we will be able to get an actual click sequence in a session? I think it is important information information.

Regards,
B.Twardowski

Submission systems bug

There is a bug in the submission system, basically if you use too verbose labels the system returns a generic error saying that something went wrong.

I simply reduced the text of label and the same submission was accepted.

Please fix this or set a char limit in the textfield to prevent it :) .

About the rule "Your algorithm should not attempt to identify artificial users, or reconstruct flipped values."

I would like to learn some details about this rule. As we do accept not to try to reconstruct your flipped data, we would like to make some assumptions and change some of the unknown values. Would this be a violation of rules or just like a different default values for different users ?

ex: (Lets say the experience in years value is 5, but the careerlevel is null. I would like to assume the careerlevel as 3 or 4)

Thanks

Have been created a team the day before yesterday but still did not receive approval for downloading datasets?

How long should I wait?

"interact with in the next week"

Hi data people,

I didn't understand this "next week"

"the recommender should predict those job postings (items) that the user will interact with in the next week."

Thanks :)

something about submission time

Hi friend, we submitted a result at 23:50(Hawaii time), while the result(score) showed off at 00:01, which could be found in the submission records.
Unfortunately, this leads to our result showing in the unofficial leaderboard rather than the official one.
We wonder would this result be considered in the official leaderboard?
Looking forward to your reply and thank you very much.
Team "Cine" again and the submission label is "final1".

Question about the availability of dataset

Hi,

When will the dataset be available to use?

Private leaderboard evaluation

I'd like to have some clarifications about the private leaderboard evaluation.
Is it the best score reached ever, or the corresponding value of the public leaderboard score?

I'll provide an example to be more clear

submission_x has:
public_leaderboard: 100
private_leaderboard: 90

while submission_y has:
public_leaderboard: 95
private_leaderboard: 95

Clearly in the public leaderboard my team would score 100 (submission_x), but what for the private?
Will it be 90 or 95??

edu_degree values not conforming to description

The description says that edu_degree values range from 0 to 3, but analyzing the dataset we found that the actual range is 0-6. Is it an error in the description? What's the actual meaning of those values?

Is the dataset still open for download yet?

I want to get the dataset and create a team as the instructions illustrated, but no one ever approved my team.

Problem about the rule & submission

Hi, we found a conflict on the page https://recsys.xing.com/rules , it says "the submission limits: you can upload at maximum 5 solutions per day"
And on the top of the submission page https://recsys.xing.com/submission , it also says "New Submission 5 submissions still possible for you today", however on the bottom of this page https://recsys.xing.com/submission , shows that "Additional remarks: No.4 Notice that you can only submit 3 solutions per calendar day".

So, how many solutions can i submit per calendar day actually?
And which Time Zone is the standard? GMT or another?

Dataset undownloadable

Hi,

Is there something wrong with the server holding the data set? The download speed is very slow( 3~5kb/s) and for the 1GB training data set it's almost undownloadable (the whole downloading process always stops after several minutes). Moreover, since downloading the dataset requires signing in, I cannot use any download tool to help with this problem. I am from Canada and using the university's internet. Similar situation has never happened before. Can you look into it? Thank you!

problems with data?

hi all,

we found some problems with training data:

files are not in a real csv format. are they? we are not aware of csv formats that support fields which are lists
the number of values in the users.csv rows varies. sometimes 12 values are present in a row and sometimes only 11 (counting lists as a single value, clearly). how should be interpreted this? which field value is actually missing (perhaps the last one?)
there are many replicated users in the file users.csv. in particular, there are 1367057 unique users out of 1500000!

Thanks for your answer!

Is there a software license restrictions?

Hello, I would like to clarify whether there are restrictions on software by which a solution was developed, for example only Open Source Initiative-approved license or only the software for non-commercial use?

are all users in target_users also included in the interactions files?

It seems they are not. Just wondering if that is the expected behaviour : )

How to measure the relevance between user and item in the next week?

Relevant items are those items on which a user clicked, bookmarked or replied (interaction_type= 1, 2 or 3).

I want to build a local evaluation, But I has a question, how to measure it? Just by the value of the interaction_type, or related to the interaction create time, or related to the times the item interacted by the user.

The relation between impressions and interactions

Hi organizers , @fabianabel
May I ask the relation between impressions and interactions, whether these interactions came from impressions, or just collected from global site? Many thanks

Evaluation system

How long will the evaluation system be available? I know it's no longer good for the challenge, but if we would like to keep working on the problem for fun.

How could we do offline ranking's evaluation?

Hi,
I split the interaction data into training and test by leaving out last week's interactions.
but how could I know the relevantItem of each user in test data. how could I evaluate my algorithm offline by the method described in EvaluationMeasure.md?
Anyone can help me, Thanks

Issue about participating the workshop

Dear organizer:
Our team will be eager and sure to participate this workshop in Boston if our accompanied paper is accepted.
So, there is a request that will you issue (send) us a (more) formal invitation letter if our accompanied paper is accepted? To be honest, it is really hard to apply the VISA only by right of the letter you sent on July 14th.
And our very last question is how many participants are allowed to register the workshop in one single team? Could all of the authors participate this workshop in September?

Thanks.

Multiple Teammates

Hi,
is there a possibility of multiple teammates in the submission system, who could download the data and/or submit results, without the necessity to share the same xing account?

Could we get one of the previous results submitted?

Hi friends, we have submitted several results till now.
But unfortunately we covered some results by mistake, including which got the best score ever. It takes too much time to run a new result to make improvement on it.
Are there any method to get the previous results submitted from your server?
Our team is "Cine" and the label of result we want to get is "lda cosine similarity".
Thanks all the same if it's impossible.

NULL in career level

I would like to know if the value NULL in career level means 0 (unknown)? I have found several NULL at least in users.csv