Giter Site home page Giter Site logo

2016's Introduction

ACM RecSys Challenge 2016

About: Job Recommendations

Given a XING user, the goal of the RecSys challenge is to develop a recommender system that predicts those job postings that a user will interact with.

Pointers:

Data

Evaluation

Details about evaluation: Evaluation Measure

News

July 14th: Reminder regarding paper submissions

This is a reminder that the deadline for the corresponding workshop papers in which you describe your solutions is next week: July 20th.

Instructions: paper-submissions

We are looking forward to your papers and seeing you in Boston!

June 22: Final Sprint, Paper Submissions

Dear RecSys 2016 Challenge Participant,

The RecSys 2016 Challenge is proceeding very well. We are very happy to see such a high number of active teams (more than 300 teams registered and 100 teams submitted solutions). An overview of deadlines and additional info about the Challenge is available at: http://2016.recsyschallenge.com However, we would like to highlight the following points:

Challenge Deadline: June 26th (23:59 Hawaiian time)

The deadline for submitting Challenge solutions is approaching: 26 June (Just under one weeks to go!). We are impressed with what we have been seeing on the leaderboard, but we point out that there is ample opportunity for further improvement. By our estimates, the current top score is approx. 300k points below the recommender currently in use by XING.

RecSys Challenge Workshop, Paper Submissions

The RecSys Challenge Workshop at ACM RecSys 2016 provides you with an opportunity to discuss your algorithms with XING and with the other contenders, to gain additional insight.

  • Each team is required to submit a paper to the workshop (max. 4 pages in length) describing their algorithm. See: http://2016.recsyschallenge.com/#paper-submissions
  • Deadline for submitting workshop papers: 20 July
  • The paper will be reviewed by the workshop PC, and acceptance notifications will go out on 1 August. Note that reviewers are very interested in the innovative potential of your ideas: scores are important to claim the prize money, but we are interested in also digger deeper into all viable approaches that participants have taken to the problem.
  • We anticipate that everyone is eager to submit a paper to the workshop. However, for completeness, we also remind you that according to the Challenge rules without a paper you will be removed from the leaderboard.
  • We expect you to be present at the RecSys Challenge Workshop at ACM RecSys 2016 to present your paper. In case of extenuating circumstances, please contact Martha Larson as soon as you become aware of them.

We wish you all the best during the final week of the Challenge and we look forward to seeing you at the RecSys 2016 Challenge Workshop at ACM RecSys 2016.

Best regards, Martha, András, Róbert, Daniel and Fabian

2016's People

Contributors

fabianabel avatar rpalovics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

2016's Issues

about "creat_at"

how long does the "creat_at" contained in the interactions.csv file represent? a year?

Problem about Evaluation Measure

function userSuccess(recommendedItems, relevantItem) = {
if (intersect(recommendedItems, relevantItems) > 0)
1.0
else
0.0
}
should be changed to
function userSuccess(recommendedItems, relevantItem) = {
if (intersect(recommendedItems.take(30), relevantItems) > 0)
1.0
else
0.0
}

CNAME

create a file called CNAME (in gh-pages) containing the line:
2016.recsyschallenge.com

Invalid user_ids

There are user_ids in impressions that don't appear in users.

For example, the first user_id in impressions is 1842650, which isn't a string that is not in users.
Is this intentional? According to a script I wrote, around 33% of user_ids in impressions are invalid.

Workshop proceedings

We will post the link to the proceedings here and close this issue as soon as the workshop proceedings are published by ACM.

Question on the order of the test set

Since the score is related to the order of items in the test set, can we know the rule of the order? If we don't know that, I think the offline evaluation is ugly. THX.

question on Metric : Precision at k. What if len(recommendation) < k

In the document, the pseudocode returns

topK = recommendedItems.take(k)
return intersect(topK, relevantItems).size / k

My question is, what if L = len(recommendedItems) < k?
Does the metric return intersect(topK, relevantItems).size / min(k, L)?

In other words, does the evaluation encourage submitting less items but accurate or more that seeking good recall?

Thanks!

Kuan

We have created a team

Excuse me, we have created a team named "Cine".
Hope it could be approved if convenient.

Question about rounding and click sequence in dataset

Hi,

in dataset description of the interactions we can read:

timestamp (Unix timestamp, rounded to 5 min)

Are You planing to do this rounding in a way that we will be able to get an actual click sequence in a session? I think it is important information information.

Regards,
B.Twardowski

Submission systems bug

There is a bug in the submission system, basically if you use too verbose labels the system returns a generic error saying that something went wrong.

I simply reduced the text of label and the same submission was accepted.

Please fix this or set a char limit in the textfield to prevent it :) .

About the rule "Your algorithm should not attempt to identify artificial users, or reconstruct flipped values."

I would like to learn some details about this rule. As we do accept not to try to reconstruct your flipped data, we would like to make some assumptions and change some of the unknown values. Would this be a violation of rules or just like a different default values for different users ?

ex: (Lets say the experience in years value is 5, but the careerlevel is null. I would like to assume the careerlevel as 3 or 4)

Thanks

"interact with in the next week"

Hi data people,

I didn't understand this "next week"

"the recommender should predict those job postings (items) that the user will interact with in the next week."

Thanks :)

something about submission time

Hi friend, we submitted a result at 23:50(Hawaii time), while the result(score) showed off at 00:01, which could be found in the submission records.
Unfortunately, this leads to our result showing in the unofficial leaderboard rather than the official one.
We wonder would this result be considered in the official leaderboard?
Looking forward to your reply and thank you very much.
Team "Cine" again and the submission label is "final1".

Private leaderboard evaluation

I'd like to have some clarifications about the private leaderboard evaluation.
Is it the best score reached ever, or the corresponding value of the public leaderboard score?

I'll provide an example to be more clear

submission_x has:
public_leaderboard: 100
private_leaderboard: 90

while submission_y has:
public_leaderboard: 95
private_leaderboard: 95

Clearly in the public leaderboard my team would score 100 (submission_x), but what for the private?
Will it be 90 or 95??

edu_degree values not conforming to description

The description says that edu_degree values range from 0 to 3, but analyzing the dataset we found that the actual range is 0-6. Is it an error in the description? What's the actual meaning of those values?

Problem about the rule & submission

Hi, we found a conflict on the page https://recsys.xing.com/rules , it says "the submission limits: you can upload at maximum 5 solutions per day"
And on the top of the submission page https://recsys.xing.com/submission , it also says "New Submission 5 submissions still possible for you today", however on the bottom of this page https://recsys.xing.com/submission , shows that "Additional remarks: No.4 Notice that you can only submit 3 solutions per calendar day".

So, how many solutions can i submit per calendar day actually?
And which Time Zone is the standard? GMT or another?

Dataset undownloadable

Hi,

Is there something wrong with the server holding the data set? The download speed is very slow( 3~5kb/s) and for the 1GB training data set it's almost undownloadable (the whole downloading process always stops after several minutes). Moreover, since downloading the dataset requires signing in, I cannot use any download tool to help with this problem. I am from Canada and using the university's internet. Similar situation has never happened before. Can you look into it? Thank you!
slow-speed

problems with data?

hi all,

we found some problems with training data:

  1. files are not in a real csv format. are they? we are not aware of csv formats that support fields which are lists

  2. the number of values in the users.csv rows varies. sometimes 12 values are present in a row and sometimes only 11 (counting lists as a single value, clearly). how should be interpreted this? which field value is actually missing (perhaps the last one?)

  3. there are many replicated users in the file users.csv. in particular, there are 1367057 unique users out of 1500000!

Thanks for your answer!

Is there a software license restrictions?

Hello, I would like to clarify whether there are restrictions on software by which a solution was developed, for example only Open Source Initiative-approved license or only the software for non-commercial use?

How to measure the relevance between user and item in the next week?

Relevant items are those items on which a user clicked, bookmarked or replied (interaction_type= 1, 2 or 3). 

I want to build a local evaluation, But I has a question, how to measure it? Just by the value of the interaction_type, or related to the interaction create time, or related to the times the item interacted by the user.

Evaluation system

How long will the evaluation system be available? I know it's no longer good for the challenge, but if we would like to keep working on the problem for fun.

How could we do offline ranking's evaluation?

Hi,
I split the interaction data into training and test by leaving out last week's interactions.
but how could I know the relevantItem of each user in test data. how could I evaluate my algorithm offline by the method described in EvaluationMeasure.md?
Anyone can help me, Thanks

Issue about participating the workshop

Dear organizer:
Our team will be eager and sure to participate this workshop in Boston if our accompanied paper is accepted.
So, there is a request that will you issue (send) us a (more) formal invitation letter if our accompanied paper is accepted? To be honest, it is really hard to apply the VISA only by right of the letter you sent on July 14th.
And our very last question is how many participants are allowed to register the workshop in one single team? Could all of the authors participate this workshop in September?

Thanks.

Multiple Teammates

Hi,
is there a possibility of multiple teammates in the submission system, who could download the data and/or submit results, without the necessity to share the same xing account?

Could we get one of the previous results submitted?

Hi friends, we have submitted several results till now.
But unfortunately we covered some results by mistake, including which got the best score ever. It takes too much time to run a new result to make improvement on it.
Are there any method to get the previous results submitted from your server?
Our team is "Cine" and the label of result we want to get is "lda cosine similarity".
Thanks all the same if it's impossible.

NULL in career level

I would like to know if the value NULL in career level means 0 (unknown)? I have found several NULL at least in users.csv

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.