Giter Site home page Giter Site logo

spat_prediction's Introduction

Signal Phase & Timing (SPaT) Prediction

This project enables the prediction of signal phase timing (SPaT) in fixed and adaptive environments by using a combination of machine learning techniques and historical traffic data.

It has been developed to support a dissertation titled "A study of machine learning algorithms and their suitability for predicting traffic signal timing" towards an MSc in Software Engineering at the University of Oxford.

Features

  • Takes historical traffic controller signal phase and detection data to create datasets suitable for machine learning analyses
  • Enables feature extraction from data to provide signal state and phase duration
  • Implements the Classification and Regression Tree (CART) for SPaT prediction.
  • Implements Recurrent Neural Network with Long Short-Term Memory for SPaT prediction.
  • Supports the creation of plots for data analysis

Getting started

The software has been divided into five packages:

  • Pre-Processing: processes the data in the expected format, creating datasets for usage with Decision Tree and Neural Network model creation
  • Analysis: manipulates the data for analysis, creating plots for further understanding
  • Decision Tree: implements the Classification and Regression Tree (CART) algorithm and Gradient Boosting Regression (GBR) ensemble algorithm to predict SPaT
  • Neural Network: implements a Recurrent Neural Network (RNN) using Long Short-Term Memory (LSTM) to predict SPaT
  • Tools: provides a number of helper functions that are re-used throughout the other packages

Data format

The application makes use of two types of data.

Siemens IC4 Tool's Emulator Data

Data generated by the Siemens IC4 Tool's Emulator can be provided to the PreProcessing module for formatting into the suitable historical traffic data used for the learning and prediction processes.

For information on the application itself and the data format it produces, please read the Siemenshandbook.

Historical Traffic Data

The prediction engine takes historical traffic data in three comma-separated (CSV) four formats to create the learning models.

Format ID Description Used in
1 Timestamped phase/stage with detection I/O state information RNN LSTM
2 Timestamped phase/stage with detection I/O state information (numerical values only) CART/GBR
3 Timestamped phase/stage without detection I/O state information (numerical values only) CART/GBR
4 Dated phase/stage with start/end times and duration information (numerical values only) CART/GBR

Notes:

  • 1 and 2 differ simply in terms of data presentation (with 2 only using numerical values due to limitations with the platform used).
  • Examples of the data in the 4 formats as above are included in the project (within the 'data' folder).

Pre-requisites

The following versions (or newer) are required to run SPaT Prediction:

  • Keras - v2.0.3
  • matplotlib - v2.0.2
  • NumPy - v1.13.1
  • Pandas - v0.20.3
  • Python - v3.5.2
  • seaborn - v0.8
  • scikit-learn - v0.19.0
  • TensorFlow - v1.0.0

Author

License

This project is licensed under the Apache Licence 2.0 - see the LICENSE file for further information

Citation

If you use SPaT Prediction, we would appreciate a citation ๐Ÿ˜Š :

A study of machine learning algorithms and their suitability for predicting traffic signal timing, Nagashima Boyd, P., University of Oxford, 2017.

spat_prediction's People

Contributors

priscillaboyd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar Rhys Williams avatar Ilya Pronyashin avatar  avatar Jinghui Yuan avatar Qiaoyu Lu avatar DJ Daugherty avatar  avatar  avatar

Watchers

Han Maokun avatar  avatar

Forkers

tonny2v emitsakis

spat_prediction's Issues

[RNN LSTM] Improve plotting description

At the moment, the matplotlib graph presented is not very descriptive. Ideally, it should have:

  • A description of the X and Y axis
  • A title
  • Further timing granularity in the X axis

[RNN LSTM] Adapt to use Pandas for efficiency

At present, the RNN LSTM module makes use of 'vanilla' Python functions for loading the dataset. Ideally, the application store the data into a Pandas data frame for efficiency.

Combine data pre-processing modules

The DataExtract, DataCleaning and DataMerge functions should be rationalised given the expected sequential actions that link them together.

[DA] Adopt Utils module to reduce duplication

At the moment, the Data Analysis module uses functionality that is already defined in the DPP's Utils module. This needs to be refactored to reduce duplication and increase loose coupling.

Read valid phases from config file

At the moment, list of signal phases is hardcoded. This should be more dynamic by reading the actual applicable ones from the IC4 .8SD config file.

Example of how this looks in the file itself:
[Phase]
PhaseNo:9
RealPhase:0
PhaseRef:J

[RNN] Save model to 'models' folder

At the moment, the .h5 model is being saved to the same directory as the code base. Instead, the model should be saved to the "models" folder (as with the DT model) for a given set of results.

[DT] Save model testing results to file

When performing the tests against the trained model, the accuracy results / scores are output to the command line. These should be saved to file together with:

  • Date/time of analysis performed
  • Data source filename
  • Model filename (i.e. saved to file)
  • Accuracy results

[DT] Use latest created sklearn dataset suitable

At the moment the application is taking a hardcoded CSV location to create DTs. This isn't ideal for obvious reasons. The application should take, as a minimum, the latest created sklearn dataset suitable to create a DT from.

Keep history of final datasets created

At present, the application overwrites any output data - i.e. it does not keep a history of datasets processed. Ideally, final datasets (i.e. only the final dataset.csv and not their raw/processed files) should be kept for comparison and testing purposes.

[RNN LSTM] Archive model as HDF5 file

At present, any model created is stored as a single model.h5 file. This archiving method needs to be improved, with associated basic info, i.e.:

  • What dataset it relates to
  • When it was created
  • Accuracy level achieved
  • The activation function used
  • The loss function used
  • The optimiser used
  • Number of epochs
  • Batch size

[DT-CART] 1s records have duration showing as 86399.0

Due to the logic in the application, there is an issue where 1s long records have the duration being calculated as 86399.0 seconds. This is because the end time is less than the start time (hence Pandas gives the 86399.0 result when performing the time delta operation). There's a workaround in place for now, but this needs fixing.

Read detector names from controller config file

At present, the detector names are hardcoded into the application. Ideally, the detector names should be taken from the IC4 controller config file.

Example from the .8SD file (where items in bold refer to detector names):

IOLine0:ASL1,0,I,0,1,2 LT1,A1,0,0,0,A,0,1,0,0,0,0,0,0,0,1,0,2,1,1,2,0,0,0,0,0,0,
IOLine1:BSL1,0,I,1,2,2 LT1,A2,0,0,0,A,0,2,0,0,0,0,0,0,0,2,0,2,1,1,2,0,0,0,0,0,0,
IOLine2:CSL1,0,I,2,4,2 LT1,A3,0,0,0,A,0,4,0,0,0,0,0,0,0,4,0,2,1,1,2,0,0,0,0,0,0,
IOLine3:DSL1,0,I,3,8,2 LT1,A4,0,0,0,A,0,8,0,0,0,0,0,0,0,8,0,2,1,1,2,0,0,0,0,0,0,

Clear SUP mode data from emulated dataset

When data is emulated, the first records will have the mode stream set to "8 - SUP". These must be removed when processing an emulated dataset as they relate to sample / test records that may affect accuracy of the system.

Change signal phase representation to numeric data type

The signal phase representation must use a numeric data type in order to be processed against the RNN LSTM module. At present, the phase representation is done using a string data type (i.e. 'Red', 'RedAmber', 'Amber' and 'Green').

Numeric representation should be:

  • Red = 0
  • RedAmber = 1
  • Amber = 2
  • Green = 3

Create dataset with duration of signal phase

At the moment, the historical signal phase data has timestamped records. For some algorithms (e.g. decision trees), the timestamps need to be aggregated to give an idea of duration per phase. Ideally, the application should create a dataset with the duration of each phase (with the start/end time for troubleshooting as well as the phase and state).

[RNN] Refactor to increase modularity

At the moment, the RNN functionality sits within a single (monolithic) Python file. Ideally this should be split so that neural network (generic) functions can be re-used (i.e. if other NNs aside of RNNs are to be evaluated in future for the problem).

[RNN] Save model to file

In order for the RNN model created to be re-used with new predictions, the models created must be saved to file.

More unit tests required

The number of unit tests for this application is very limited. Ideally, more unit tests should be developed to test the functionality.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.