omartrinidad / challenge Goto Github PK
View Code? Open in Web Editor NEWData Science challenge
License: MIT License
Data Science challenge
License: MIT License
Create 10 samples on CSV.
Write the options to use listen_type
.
Functionalize preprocess modules
Add another column with a binary status weekday and weekend.
Correct the release date preprocessing.
album
, media
, genre
, media_duration
, and artist
Model A: train a model without user_id to find correlations between (user_gender, user_age, and the other features) to (song's cluster).
Model B: train models for each user's history using Model A and get the result of test sample.
Model B should be temporary and wrap Model A.
keyword: Gradient Boosting Decision Trees
The dataset has more than 7 million rows. I think we can get a sample with around 100 000 rows, (still a lot of data).
My first idea is read the whole dataset and get the sample randomly. Of course, we can do this in a better way, but how O.o?
Stratified sampling is an option: https://en.wikipedia.org/wiki/Stratified_sampling
Since there's no data with listen_type=0 in test set, the usage of listen_type is an issue.
Option 1.
Ignore listen_type
Option 2.
Use listen_type=1 only in the training
Option 3.
Use listen_type=0 as training set, listen_type=1 as validation set
Option 4.
Just Use It.
This is a nice observation. As the dataset is from one month, maybe we can have as a result only the hour and another column indicating the day.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.