Giter Site home page Giter Site logo

sharmaroshan / advanced-house-price-prediction Goto Github PK

View Code? Open in Web Editor NEW
6.0 1.0 10.0 297 KB

It is a Advanced Problem of Regression which requires advanced techniques of feature engineering, feature selection and extraction, modelling, model evaluation, and Statistics.

License: GNU General Public License v3.0

Jupyter Notebook 100.00%
regression-models stacking machine-learning statistics feature-engineering feature-extraction data-visualization data-analytics data-cleaning

advanced-house-price-prediction's Introduction

Advanced-House-Price-Prediction

It is a Advanced Problem of Regression which requires advanced techniques of feature engineering, feature selection and extraction, modelling, model evaluation, and Statistics.

Start here if... You have some experience with R or Python and machine learning basics. This is a perfect competition for data science students who have completed an online course in machine learning and are looking to expand their skill set before trying a featured competition.

Description

Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

Practice Skills Creative feature engineering Advanced regression techniques like random forest and gradient boosting Acknowledgments The Ames Housing dataset was compiled by Dean De Cock for use in data science education. It's an incredible alternative for data scientists looking for a modernized and expanded version of the often cited Boston Housing dataset.

Goal

It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable.

Metric

Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)

Kaggle Learn

Kaggle Learn offers hands-on courses for most data science topics. These short courses prepare you with the key ideas to build your own projects.

The Machine Learning Course will give you everything you need to succeed in this competition and others like it.

Other R Tutorials

Fun with Real Estate Data Use Rmarkdown to learn advanced regression techniques like random forests and XGBoost XGBoost with Parameter Tuning Implement LASSO regression to avoid multicollinearity Includes linear regression, random forest, and XGBoost models as well Ensemble Modeling: Stack Model Example Use "ensembling" to combine the predictions of several models Includes GBM (gradient boosting machine), XGBoost, ranger, and neural net using the caret package A Clear Example of Overfitting Learn about the dreaded consequences of overfitting data Other Python Tutorials Comprehensive Data Exploration with Python Understand how variables are distributed and how they interact Apply different transformations before training machine learning models House Prices EDA Learn to use visualization techniques to study missing data and distributions Includes correlation heatmaps, pairplots, and t-SNE to help inform appropriate inputs to a linear model A Study on Regression Applied to the Ames Dataset Demonstrate effective tactics for feature engineering Explore linear regression with different regularization methods including ridge, LASSO, and ElasticNet using scikit-learn Regularized Linear Models Build a basic linear model Try more advanced algorithms including XGBoost and neural nets using Keras

What is a Getting Started competition?

Getting Started competitions were created by Kaggle data scientists for people who have little to no machine learning background. They are a great place to begin if you are new to data science or just finished a MOOC and want to get involved in Kaggle.

Getting Started competitions are a non-competitive way to get familiar with Kaggle’s platform, learn basic machine learning concepts, and start meeting people in the community. They have no cash prize and are on a rolling timeline.

What’s the difference between a private and public leaderboard?

The Kaggle leaderboard has a public and private component to prevent participants from “overfitting” to the leaderboard. If your model is “overfit” to a dataset then it is not generalizable outside of the dataset you trained it on. This means that your model would have low accuracy on another sample of data taken from a similar dataset.

Public Leaderboard

For all participants, the same 50% of predictions from the test set are assigned to the public leaderboard. The score you see on the public leaderboard reflects your model’s accuracy on this portion of the test set.

Private Leaderboard

The other 50% of predictions from the test set are assigned to the private leaderboard. The private leaderboard is not visible to participants until the competition has concluded. At the end of a competition, we will reveal the private leaderboard so you can see your score on the other 50% of the test data. The scores on the private leaderboard are used to determine the competition winners. Getting Started competitions are run on a rolling timeline so the private leaderboard is never revealed.

How do I create and manage a team? When you accept the competition rules, a team will be created for you. You can invite others to your team, accept a merger with another team, and update basic information like team name by going to the More < Team page.

We've heard from many Kagglers that teaming up is the best way to learn new skills AND have fun. If you don't have a teammate already, consider asking if anyone wants to team up in the discussion forum.

What are kernels? Kaggle Kernels is a cloud computational environment that enables reproducible and collaborative analysis. Kernels supports scripts in R and Python, Jupyter Notebooks, and RMarkdown reports. Go to the Kernels tab to view all of the publicly shared code on this competition. For more on how to use Kernels to learn data science, visit the Read more about our decision to implement a rolling leaderboard on getting started competitions here.

How do I contact Support?

Kaggle does not have a dedicated support team so you’ll typically find that you receive a response more quickly by asking your question in the appropriate forum. (For this competition, you’ll want to use the House Prices discussion forum).

Support is only able to help with issues that are being experienced by all participants. Before contacting support, please check the discussion forum for information on your problem. If you can’t find it, you can post your problem in the forum so a fellow participant or a Kaggle team member can provide help. The forums are full of useful information on the data, metric, and different approaches. We encourage you to use the forums often. If you share your knowledge, you'll find that others will share a lot in turn!

If your problem persists or it seems to be affecting all participants then please contact us.

advanced-house-price-prediction's People

Contributors

sharmaroshan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.