Giter Site home page Giter Site logo

thuzarwin / public-paper-impute-coding Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sunnybingome/public-paper-impute-coding

0.0 2.0 0.0 756 KB

An Improved k-Nearest Neighbours Method for Traffic Time Series Imputation

Jupyter Notebook 88.66% Python 11.34%

public-paper-impute-coding's Introduction

GSW-kNN Imputation for Time Series

We propose a new imputation method for filling the missing values in time series. The missing values are marked as -1 in the example data file Dodgers.data.csv. The data comes from UCI Database and we converted it into a .csv file for convenience.

Usage

(Optional, but please firstly check if you want to set use_spark at the beginning of the code.)

(If you want to really impute the missing values, for example, after inner-analysis, please set real_impute_NOT_param_estimate=True and set the estimated optimal values at input In [354].)

# default data, about 5% missing:
python2 impute-1-joblib-spark-public.py 2 1 0
# other missing ratio, e.g. 10%:
python2 impute-1-joblib-spark-public.py 2 1 0 0.1

Tip: About 2500 non-missing points (5%) are selected to test the accuracy. To avoid the influence of selected points on the real-missing data, only 1 point is used in each config, it will take hours to run. Spark or multi-server config is suggested. If 4 servers are used, the following 4 commands should be excuted on each of the 4 servers respectively:

python2 impute-1-joblib-spark-public.py 2 4 0
python2 impute-1-joblib-spark-public.py 2 4 1
python2 impute-1-joblib-spark-public.py 2 4 2
python2 impute-1-joblib-spark-public.py 2 4 3

Results

Our proposed GSW-kNN is 18% to 46% better when being compared to general kNN, 34% more accurate than benchmarking methods, and it is still robust even if the missing ratio increases to 90%.

Citation Request

[IEEE Format] B. Sun, L. Ma, W. Cheng, W. Wen, P. Goswami, and G. Bai, β€œAn Improved k-Nearest Neighbours Method for Traffic Time Series Imputation,” in Chinese Automation Congress (CAC), Jinan, China, 2017.

[AAA Format] Bin Sun, Liyao Ma, Wei Cheng, et al. 2017. An Improved K-Nearest Neighbours Method for Traffic Time Series Imputation. In Chinese Automation Congress (CAC). Jinan, China: IEEE.

[Bibtex]:

@inproceedings{sun2017improved,
  location = {{Jinan, China}},
  title = {An {{Improved}} K-{{Nearest Neighbours Method}} for {{Traffic Time Series Imputation}}},
  abstract = {Intelligent transportation systems (ITS) are becoming more and more effective, benefiting from big data. Despite this, missing data is a problem that prevents many prediction algorithms in ITS from working effectively. Much work has been done to impute those missing data. Among different imputation methods, k-nearest neighbours (kNN) has shown excellent accuracy and efficiency. However, the general kNN is designed for matrix instead of time series so it lacks the usage of time series characteristics such as windows and weights that are gap-sensitive. This work introduces gap-sensitive windowed kNN (GSW-kNN) imputation for time series. The results show that GSW-kNN is 34\% more accurate than benchmarking methods, and it is still robust even if the missing ratio increases to 90\%.},
  eventtitle = {Chinese Automation Congress (CAC)},
  booktitle = {Chinese {{Automation Congress}} ({{CAC}})},
  publisher = {{IEEE}},
  author = {Sun, Bin and Ma, Liyao and Cheng, Wei and Wen, Wei and Goswami, Prashant and Bai, Guohua},
  date = {2017-10},
  keywords = {Traffic Time Series, Gap-Sensitive Windowed k-Nearest Neighbours (GSW-kNN), Missing Data Imputation}
}

Paper

The paper is available on Diva and ResearchGate.

public-paper-impute-coding's People

Contributors

sunnybingome avatar

Watchers

James Cloos avatar thuzarwin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.