hse-aml / competitive-data-science Goto Github PK
View Code? Open in Web Editor NEWMaterials for "How to Win a Data Science Competition: Learn from Top Kagglers" course
Home Page: https://www.coursera.org/learn/competitive-data-science
Materials for "How to Win a Data Science Competition: Learn from Top Kagglers" course
Home Page: https://www.coursera.org/learn/competitive-data-science
sales_train.csv.gz items.csv item_categories.csv shops.csv
Seems the data is in this project, but I can't get it (perhaps b/c not 'invited')
https://www.kaggle.com/c/competitive-data-science-final-project
There is a bug in week 4 programming assignment notebook where you generate lag features
'After creating a grid, we can calculate some features. We will use lags from [1, 2, 3, 4, 5, 12] months ago.'
The lag features are correct for only target_lag_{} (target_lag_1,2,3 ...) and incorrect for any other lag features.
I documented that bug and the fix in here. Fixing this bug helps me boost my score in LB tremendously.
https://gist.github.com/anhquan0412/330494b051f74eacad3917f43e3ba43a
Hi,
has someone got this running on Google Colab?
There should be two things we need: the grader package and the data files.
-Anton
Hi,
in compute_KNN_features, honours assignment week 4, inside get_features_for_one, it says:
"2. Same label streak: the largest number N, such that N nearest neighbours have the same label."
I find the task label to be very misleading. Literally it means to check the max number of neighbours within the array with the same label. I would reformulate the task label as:
"2. Same label streak: the largest number N, such that the first N nearest neighbours have the same label."
I hope you can understand my point.
Thanks for the great course!
Alessandro
It cannot be checked out under windows.
There is a line in https://github.com/zyunsg/Advanced-Machine-Learning/blob/master/course2/week1/assignments/PandasBasics.ipynb:
DATA_FOLDER = '../readonly/final_project_data/'
Is the folder missing in the root?
Please provide details!
Hi, friends.
I have not found materials for CatBoost basics assignment. Where could I find it?
The 3 EDA files:
EDA_video2.ipynb
EDA_video3_screencast.ipynb
EDA_Springleaf_screencast.ipynb
refer to data files which cannot be found. I like to run my own notebooks locally to try different parameters.
Whenever I try submitting the assignement I get the error: TypeError: argument of type 'NoneType' is not iterable in line 60 of grader.py
I think the code in line 58 should be if response.status_code == 201:
and not if request.status_code == 201:
Please check this Iam unable to submit my assignment due to this issue.
Attaching a screenshot.
Hi. The instruction of the final task of PandasBasics
says -
What was the variance of the number of sold items per day sequence for the shop with shop_id = 25 in December, 2014? Do not count the items, that were sold but returned back later.
So the code that followed for getting the rows filtered according to the given condition is -
transactions[(transactions.shop_id==25) & (transactions.date.dt.year==2014) & \
(transactions.date.dt.month==12) & (transactions.item_cnt_day>0.0)]
I included (transactions.item_cnt_day>0.0)
to only consider items with no returns. And then take the variance accordingly. But this does not seem to work. When I omit the condition I get the answer to be correct. Could you please tell me where I am getting this wrong?
I think there is an issue in the sanity check
test_knn_feats = NNF.predict(X_test[:50])
print ('Deviation from ground thruth features: %f' % np.abs(test_knn_feats - true_knn_feats_first50[44:45]).sum())
Shouldn't it be:
test_knn_feats = NNF.predict(X_test[44:45])
print ('Deviation from ground thruth features: %f' % np.abs(test_knn_feats - true_knn_feats_first50[44:45]).sum())
Because else we are comparing the wrong rows.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.