Giter Site home page Giter Site logo

transdim's Introduction

transdim

MIT License Python 3.7 GitHub stars

Machine learning models make important developments about spatiotemporal data modeling - like how to forecast near-future traffic states of road networks. But what happens when these models are built with incomplete data commonly collected in real-world systems?

About the Project

In the transdim (transportation data imputation) project, we create machine learning models to help address some of the toughest challenges of spatiotemporal data modeling -- from missing data imputation to time series prediction.

In a hurry? Please check out our contents as follows.

Table of Contents
Strategic Aim
Tasks and Challenges
What we do just now!
Overview
Selected References
Our Publications
License

Strategic Aim

Creating accurate and efficient solutions for the spatiotemporal traffic data imputation and prediction tasks.

Tasks and Challenges

  • Missing data imputation

    • Random missing: Each sensor lost their observations at completely random. (★★★)
    • Non-random missing: Each sensor lost their observations during several days. (★★★★)
  • Rolling traffic prediction

    • Forecasting without missing values. (★★★)
    • Forecasting with incomplete observations. (★★★★★)

What we do just now!

  • add a framework indicating overall studies;

framework

Framework: Tensor completion task and its framework including data organization and tensor completion, in which traffic measurements are partially observed.

  • define the problems clearly;

    • Example: Traffic forecasting using matrix factorization models.

      example

Real experiment setting: Observations with 0%, 20% and 40% fiber missing rates during first 56 days are treated as stationary inputs. Meanwhile, there are some rolling inputs for forecasting traffic speed during last 5 days (from Monday to Friday) in a rolling manner.

  • describe the core challenges intuitively;
  • list main contributions of these studies.

What we care about!

  • Best algebraic structure for data imputation.
  • The context of urban transportation.
  • Data noise avoidance.
  • Competitive imputation and prediction performance.
  • Capable of various missing data scenarios.

Overview

With the development and application of intelligent transportation systems, large quantities of urban traffic data are collected on a continuous basis from various sources, such as loop detectors, cameras, and floating vehicles. These data sets capture the underlying states and dynamics of transportation networks and the whole system and become beneficial to many traffic operation and management applications, including routing, signal control, travel time prediction, and so on. However, the missing data problem is inevitable when collecting traffic data from intelligent transportation systems.

Publicly available at our Zenodo repository!

example (a) Time series of actual and estimated speed within two weeks from August 1 to 14.

example (b) Time series of actual and estimated speed within two weeks from September 12 to 25.

The imputation performance of BGCP (CP rank r=15 and missing rate α=30%) under the fiber missing scenario with third-order tensor representation, where the estimated result of road segment #1 is selected as an example. In the both two panels, red rectangles represent fiber missing (i.e., speed observations are lost in a whole day).

Machine learning models

  • Missing data imputation

Urban traffic speed data set (i.e., Guangzhou-data-set(Gdata)) registered traffic speed data from 214 road segments over two months (61 days from August 1 to September 30 in 2016) in Guangzhou, China. We organize the raw data into a time series matrix of (214, 8784). For tensor-based models, we use a third-order tensor (214, 61, 144) as input. Matrix based models are tested with the time series matrix (214, 8784).

We consider two common missing data scenarios (i.e., random missing (RM) and non-random missing (NM)). For RM, we simply remove certain amount of observed entries in the matrix randomly and use these entries as ground truth to evaluate RMSE. For NM, we apply correlated fiber missing experiment by randomly choosing certain amount (e.g., 40%) (location, day) combinations and removing the whole time series in each combination.

Selected References

Our Publications

  • Xinyu Chen, Zhaocheng He, Yixian Chen, Yuhuan Lu, Jiawei Wang (2019). Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transportation Research Part C: Emerging Technologies, 104: 66-77. [preprint] [slide] [data] [Matlab code]

  • Xinyu Chen, Zhaocheng He, Lijun Sun (2019). A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 98: 73-84. [preprint] [doi] [data] [Matlab code] [Python code]

  • Xinyu Chen, Zhaocheng He, Jiawei Wang (2018). Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transportation Research Part C: Emerging Technologies, 86: 59-77. [doi] [data]

    Please consider citing our papers if they help your research.

Our Blog Posts (in Chinese)

License

This work is released under the MIT license.

transdim's People

Contributors

xinychen avatar vadermit avatar yxnchen avatar hanty avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.