Giter Site home page Giter Site logo

2018-nhts-data-challenge's Introduction

2018-NHTS-Data-Challenge

Working space for NHTS Data Challenge.

Organization

  • src: Any R scripts (*.R and *.Rmd) and result files (*.html).

    • bootstrap.*: Comparison of weighted m-out-of-n bootstrap and delete-a-group jackknife resampling.

    • Cramer_s_V.*: Correlation bwtween USES_TNC and demographic features.

    • BBN.*: Bayesian Belief Network of USES_TNC and demographic features.

    • BBN-with-Missing.*: Missing value imputation and missing pattern encoding in BNN.

    • Bootstrapped-Poisson-Regression.*: Bootstrapped Poisson Regression.

    • Semantic-Analysis.*: Hierarchical n-gram model of transportation transformation.

  • data: Derived variable configuration.

  • result: Output data, figures and tables.

    • data: Markov chain of transportation transformation obtained by partial-pooling model.

    • fig: Model visualizations.

    • table: Models or methods comparison.

Install

This project depends on R. You will need to install several R packages for this project:

# R pipeline of NHTS data
if(!require(summarizeNHTS)){
install.packages('devtools')
devtools::install_github('Westat-Transportation/summarizeNHTS')
require(summarizeNHTS)
}
# List of other packages this project depends on
packages <- c("dplyr","survey","vcd","knitr","bnlearn","Rgraphviz","missForest")
if(!require(packages)){
install.packages(packages)
require(packages)
}

Method

  • Resampling Fitting

    This work is inspired by the idea of function fitting. Firstly, I chosed a ratio as hyperparameter (in this project, the ratio is 0.1), then I used weighted m-out-of-n bootstrapping to mimic jackknifing, which achieved very close results in estimation and corresponding standard error of marginal and conditional probability (some results are shown in src/bootstrape.html). I also compared them in more complicated calculation, like Cramer's V, and I applied this bootstrapping strategy in Bayesian Belief Network and Poisson Regression. One obvious advantage is that bootstrapping returns a distribution so that we have chances to calculate any statistics based on it. Moreover, bootstrapping is known to be consistent in more cases and some modern models/algorithms, such as BBN, are more or less based on it. And for Poisson Regression, bootstrapping is actually the only hope since weighted count is not count variable anymore. So this work broadens and deepens our exploration of survey data.

  • Bayesian Belief Network with Missing Pattern:

    BBN could be a useful tool for encoding correlation patterns among TNC usage and demographic features (education level, gender, race, age level and health condition). In order to make the most of data, I firstly applied Random Forrest in nonparametric imputation, and then I generated new boolean variables to encode missing pattern. Finally, I put them all in BBN learning. It turned out that some correlation, like health condition, would be overestimated if simply removing entries with missing value, but education level and age level seem to be reliable strong features.

  • Hierarchical Semantic Model:

    This work is inspired by n-gram model in NLP. I used a demographic feature (education level) to build hierarchical models for transportation transformation, and it turned out that partial-pooling model outperformed fully-pooling/non-hierarchical model and no-pooling model.

Results

  • Bayesian Belief Network with Missing Values

    • Cramer's V

    • BBN

    • BBN with Missing

  • Bootstrapped Poisson Regression

    • Parameters

  • Hierarchical N-gram Model of Transportation Transformation

    • Model Comparison

    • Partial-pooling Model Visualization (local)

Build

Download 2017 datasets to ~/data/cvs/, replace derived_variable_config.csv and knit src/*.Rmd

2018-nhts-data-challenge's People

Contributors

xiaobw95 avatar

Watchers

 avatar

Forkers

yiran6 dataning

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.