Giter Site home page Giter Site logo

derekeden / trackio Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 94.12 MB

trackio - A Python approach to working with spatiotemporal mass movement data.

Python 3.26% Jupyter Notebook 96.74%
ais ais-data migration mobility mobility-data movement movement-data tracks trajectory abm

trackio's Introduction

sidepanel


Overview

This is an open source library geared towards working with mass movement or track data.

The library contains functionality to ingest, clean/repair, perform I/O conversions, process, analyze, and visualize mass movement data. In addition, it provides a robust feature engineering framework for downstream AI/ML applications.

Examples include data from vessels (AIS), planes (ADS), cars, animal migrations, Agent Based Modelling (ABM) output, hurricane paths, particles, etc. Anything that contains movements stored in spatiotemporal (X, Y, Time) point format. 3D data is also supported, with Z stored as a dynamic data field.

Below is an example of such data from an AIS vessel dataset:

AIS_DATA

The library is able to natively ingest data in formats such as:

  • CSV and other ASCII files
  • Feather files
  • Pandas DataFrames
  • GeoPandas GeoDataFrames

It can be extended to any data format as long as you can get it into a valid native format with upstream custom code.

Unlike other similar libraries, trackio was designed for memory-safe processing of arbitrarily sized, heterogenous datasets on a modest machine. I.e., if you have a huge amount of data spread across hundreds of files with potentially different column names in every file, and a modest machine with CPU/RAM limitations, this library will still work!

trackio also has heavy operations coded in parallel to harness available computational power for faster analyses on large datasets.

As well, trackio is interoperable with Dask bags, so the sky is the limit in terms of custom functionality and processing, or integrating with other tools.

trackio is largely based on Numpy, Pandas, GeoPandas, Dask, scikitlearn, GDAL, and Python's pickle module.

LIBRARIES

Installation

This library works on Windows (including WSL2) and Linux.

This library was written using Python 3.10, it is recommended to use this version. It has not been tested on other versions.

This library uses specific GDAL and rasterio wheels, so it's recommended to create a fresh environment.

If using Anaconda/Miniconda, run this from the trackio folder:

conda create -n trackio python=3.10

conda activate trackio

pip install .

Please note for the inpoly library, it requires Visual Studio C++ Dev Tools for Windows. If you don't have it already, the link will be in the error message, just follow the instructions to download then redo the trackio installation.

Data Structures

The trackio library is structured so Dataset is the main class that you interface with. The Dataset class is a custom class that contains references to stored data, metadata about this data, and a series of attributes and methods that make it easy to operate on the data.

When you make a Dataset, all of the data will be stored in the Dataset.data_path folder. In this folder, is a series of files, each one containing data pertaining to one unique agent in the dataset. Each of these files contains an Agent class object.

The Agent class is a custom class that represents a unique agent in the Dataset. It holds the metadata and dynamic data pertaining to one vessel, one animal, one car, etc.

The actual tracks associated with any given vessel are simply stored as pandas DataFrames in the Agent.tracks attribute. The Agent.tracks property is a dictionary containing Track ID, Track DataFrame key, value pairs.

In the Dataset.data_path folder, you will also find dataset.db, agent.db, and track.db files. The various .db files contain metadata at the dataset, agent, and track levels.

This is illustrated in the below figure:

STRUCTURE

The dataset.db file contains metadata about the actual Dataset in its entirety:

DATASET

The agent.db file contains a GeoPandas GeoDataFrame of metadata about all of the agents in the Dataset. The geometry column contains bounding boxes for each agents' data:

AGENT

The track.db file contains similar information, but for each track in the Dataset:

TRACK

Every agent and track is tagged with a unique identifier for record-keeping purposes and consistency throughout analyses.

These tables are used as a very lightweight, poor-man's version of SQL lookup tables.

Functionality

The below provides a bullet list of the various functionality that is included in this library, for more information and detailed examples, please refer to Documentation & Examples:

  • From CSV and other pandas friendly ASCII files
  • From Feather format
  • From pandas DataFrame containing points
  • From GeoPandas GeoDataFrame containing LineStrings of trajectories
  • Clipping raw data to polygon or bounding box prior to ingestion

Static / Dynamic Data Field Mapping (here and here)

  • Scanning and mapping raw data columns to standard names
  • Extension of standard name mapper to handle any new format of data automatically
  • Scanning of unique data fields across all files in dataset
  • Mapping of data fields to custom values during data ingestion (e.g. converting text descriptions to integer codes)
  • Generation of new static/dynamic fields by mapping existing data fields (e.g. adding a L/M/H speed description based on object speed)

Splitting Points into Tracks (here and here)

  • Spatiotemporal threshold based splitting
  • Modified spatiotemporal threshold splitting to handle duplicated agent identifier (e.g. MMSI for AIS vessels)
  • K-Means based clustering and splitting of points into tracks
  • DBSCAN based clustering and splitting of points into tracks
  • Splitting of tracks into sub-tracks based on data fields (e.g. splitting a track into sub-tracks where splits occur at major stops)
  • Reconnecting "bad" splits caused by erroneous data

SPLITTING

  • Reprojection to different CRS
  • Spatial/temporal resampling
  • Interpolation to dataset-wide global time axis
  • Smoothing of sharp corners
  • Decimation/simplification of tracks
  • Simplification of track stops
  • Imprinting geometry into tracks
  • Routing tracks through "cost rasters"
  • Computation of track coursing
  • Computation of track turning rate
  • Computation of track speed
  • Computation of track acceleration
  • Computation of track sinuosity
  • Computation of track radius of curvature
  • Interpolating raster data onto tracks
  • Computation of track fractal dimension
  • Extraction of characteristic tracks
  • Computation of encounters between tracks
  • Computation of track intersections
  • Computation of closest encounter distance to object(s)
  • Computation of closest encounters between tracks
  • Computation distance proximity distributions between tracks
  • Extraction of lateral position distributions along tracks
  • Extraction of arbitrary data distributions along tracks
  • Time spent inside polygon(s)
  • Generation of unstructured graphs (flow maps) from track data

Another unique aspect of trackio is the ability to "classify" points along a track. This means to assess some True or False condition at each point along a track. This could represent if an agent in inside a polygon, above a given speed threshold, stopped, turning, accelerating/deccelerating, within a certain distance of a target object, etc.

This is illustrated in the below image:

CLASSIFYING

When leveraged, this information can be extremely valuable for performing multi-layered filtering and conditional operations.

For example, one could use distance to known fishing grounds, a polygon, and track sinuosity/speed/turning thresholds, to identify segments of tracks where fishing activity was occurring.

As track data is classified, the metadata tables are updated. This way you can use these classifications as part of filters later on to select or process the data further.

  • Pandas DataFrame
  • GeoPandas GeoDataFrame, 1 feature for each track
  • GeoPandas GeoDataFrame, 1 feature for each segment
  • Dask bag for custom processing
  • Rasterized track counts
  • Rasterized track attributes (e.g. time spent in pixel)
  • Representative unstructured graph

Documentation

There are a series of Jupyter notebooks available in the notebooks folder.

The example notebooks showcase the full range of functionality for the library, as well as 3 specific example case studies.

There are complete docstrings for public functions and classes. At this time, there are no docstrings for private/utility functions that are not exposed to the user.

trackio's People

Contributors

derekeden avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.