Giter Site home page Giter Site logo

bikesharing's Introduction

Bikesharing

Bike sharing systems are a means of renting bicycles where the process of obtaining membership, rental, and bike return is automated via a network of kiosk locations throughout a city. Using these systems, people are able rent a bike from a one location and return it to a different place on an as-needed basis.

The data generated by these systems makes them attractive for researchers because the duration of travel, departure location, arrival location, and time elapsed is explicitly recorded. Bike sharing systems therefore function as a sensor network, which can be used for studying mobility in a city. Data source:kaggle

This project makes use of historical usage patterns and weather data to forecast amount of user

Data Fields

  • datetime - hourly date + timestamp
  • season - 1 = spring, 2 = summer, 3 = fall, 4 = winter
  • holiday - whether the day is considered a holiday
  • workingday - whether the day is neither a weekend nor holiday
  • weather - 1: Clear, Few clouds, Partly cloudy, Partly cloudy
    2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
  • temp - temperature in Celsius
  • atemp - "feels like" temperature in Celsius
  • humidity - relative humidity
  • windspeed - wind speed
  • casual - number of non-registered user rentals initiated
  • registered - number of registered user rentals initiated
  • count - number of total rentals

Data visualization

  • The distribution of the number of users per hour
    A total of 10886 samples and the samples outside 3 std account for less than 1% of data.The outlier data will be filtered out.
  • Correlation matrix
    There are no obvious linear relationship among temperature,humidity,windspeed and user count.
  • Hours mean statistics with weather
    This picture show the different numbers of user count per hour in different weather.There will be more users in good weather and the morning peak is at eight in the morning and the evening peak is at seventeen in the evening
  • Month statistics

This picture show the mean of user count per hour in different month and different weather.January has the fewest users.

  • Season statistics

This picture show the user count in different season and different weather.There are more users in summer and fall and less in spring and winter.

Regression

Firstly preprocess the data,filter out the outlier data and adding dummy varibles for categorical feature 'season' 'holiday' 'workingday' 'weather' 'hour' 'month'.Then choosing the appropriate model for regression by respectively comparing the accuracy of different regression models and using grid search method based on k fold cross validation error for selecting the model parameters.

  • Linear Regression

Multiple linear regression gave k-fold R2 score: 0.61841 and mse: 10011.33622.In order to eliminate collinearity of independent variables,using ridge regression and applying grid search for optimal parameter "alpha"=1.5(Regularization strength).Ridge regression gave k-fold R2 score: 0.62404 and mse: 10331.75954. At the same time, Using backward estimation before regression,which gave result k-fold R2 score: 0.61899 and mse: 10017.96790.

  • SVR

Using support vector regression to predict the number of users.Training SVR by selecting Gaussian kernel function and applying grid search for optimal parameter "C"=2000(Penalty parameter C of the error term),"epsilon"=0.1(Epsilon in the epsilon-SVR model).SVR gave k-fold R2 score: 0.63185 and mse: 9382.43674

  • Decision Tree

Regression prediction using decision tree regression.applying grid search for optimal parameter "'min_samples_split'"=0.02(The minimum number of samples required to split an internal node).Decision tree regression gave k-fold R2 score: 0.61579 and mse: 9095.94262.

  • Random Forest Regression

Regression analysis using random forest method and select optimal parameter by grid search.Firstly,determine the parameter 'n_estimators'=100(The number of trees in the forest).Then determine the parameters 'max_features'= 0.6(The number of features to consider when looking for the best split), 'max_depth'= 26(The maximum depth of the tree).Random forest fegression gave k-fold R2 score: 0.82300 , mse: 4818.77715 and error rate: 0.65858.

  • XGBoost

Using XGRegressor to regression prediction.Selecting optimal parameter by grid search,'max_depth'= 10(The maximum depth of the tree), 'min_child_weight'= 8(Minimum leaf node sample weight),'colsample_bytree'= 0.8(The ratio of the number of columns sampled at random), 'gamma'= 4(The minimum loss function required for node splitting).XGRegressor gives the maximum k-fold R2 score: 0.83835 , minimum mean square error: 4414.09082 and error rate: 0.63869

Conclusion

In this project,processing data through data visualization,data analysis,data preprocessing,selecting models and optimization parameters.As the data distribution is skewed,the future work of this project is trying to transform this data using log transformation,then importing different models.

bikesharing's People

Contributors

suangzi123 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.