Giter Site home page Giter Site logo

mobility-metrics's Introduction

Mobility Metrics


SharedStreets Mobility Metrics is an open source command line interface (CLI) and frontend for ingestion and analysis of Mobility Data Specification (MDS) mobility data. It is capable of reading raw MDS and aggregating useful & privacy-protecting metrics for longterm storage and analysis. Raw data is not persisted after aggregation.

Metrics

Summary:

Total vehicles: Total number of vehicles that were on the street at any time during the specified day. This includes all vehicles that were available, unavailable or reserved according to the event types specified here.

Active vehicles: Total number of vehicles that completed at least one trip during the specified day. (Trips)

Total trips: Total number of trips taken throughout the specified day.

Total trips distance: Total miles traveled by any vehicles throughout the specified day. (trip_distance)

Vehicle Utilization: Percentage of vehicles that were active over the course of a day.

Average distance per vehicle: Total trips distance, divided by active vehicles

Average trips per active vehicle: Total trips, divided by active vehicles

Average trip distance: Total trips distance, divided by total trips

Average trip duration: Total trips duration, divided by total trips

Fleet:

Available: Vehicles deployed and ready to be activated by a rider Unvailable: Vehicles deployed but unable to start a trip (awaiting maintinence, depleted battery, etc.) Reserved: Vehicles actively engaged in a trip

Geographic Time Filtered:

Each time filtered metric metric is aggregated by street, by hexbin, and optionally by custom polygon zones.

Trip Volume: The number of vehicles that moved over a street or in a zone during the time window specified.

Availability: The maximum number of vehicles that were available to users during the time window specified.

On-street: The maximum number of vehicles that are on the street and available or unavailable during the time window specified.

Pickups: The total number of trips that began during a time window

Dropoffs: The total number of trips that ended during a time window

Flows: The number of trips that went from one area of the city to another area of the city, sometimes referred to as origin/destination data or “O/D pairs”

Requirements

  • OSX or Linux (docker or WSL is recommended for Windows users)
  • Node.js v11
  • Valid MDS credentials for at least one live MDS Provider API supporting MDS v0.3 or higher

API

GET /data/{YYYY-MM-DD}/{provider}

Returns raw metrics data for a provider for the specified day. This endpoint is used by the frontend, and can be used to power alternate UIs or scripted analysis.

GET /reports/{YYYY-MM-DD}/{provider}

Serves an html report that visualizes the metrics data for the specified day using maps, charts, and summaries.

Install

Setup project and install dependencies.

npm install -g mobility-metrics

Configuration

A config.json file is required to run Mobility Metrics. Enable providers and set credentials through this file. This file is used to store access tokens - handle with care, these tokens are sensitive!. See the config file in /example/example.json for a working example.

Options

  • boundary
    • a GeoJSON bounding box array used for downloading the street network for matching
  • center
    • default map center represented as a coordinate array
  • zoom
    • default map zoom level
  • privacyMinimum
    • minimum unique record count for geographic trip volumes and origin destination flows
  • lost
    • maximum number of days without status change before vehicles are permanently lost
  • summary
    • enabled or disabled metrics in summary UI
  • vehicleFilter
  • geographicFilter
    • filter all data that falls outside the defined geographic filter, formatted as a valid GeoJSON Feature of type Polygon or MultiPolygon
  • providers
    • list of providers to query
      • type
        • "local" for data off disk or "mds" for data off MDS provider API
      • version
        • sets the version of MDS to target; defaults to 0.2, but to use 0.3, set version to "0.3"
      • trips
        • URI of trip data
      • status_changes
        • URI of status change event feed
      • token
        • token for MDS API; blank if local
      • enabled
        • true or false
  • zones
    • optional GeoJSON FeatureCollection of Polygons and/or MultiPolygon with a unique property named id

Provider types

In your config.json file, each provider can be one of two types:

  • "mds"
    • Standard MDS endpoint
    • "trips" and "status_changes" represent HTTP endpoints
  • "local"
    • "trips" and "status_changes" represent file paths with line delimited MDS data

CLI

The CLI is responsible for downloading raw data, running aggregation and reports, then deleting the raw cache. Configure this command in cron for automated daily imports.

Flags

The following flags are required to run mobility-metrics.

  • --config
    • path to configuration JSON file
  • --public
    • path to directory where data and dashboard will be written
  • --cache
    • path to temporary directory where raw data will be cached during aggregation (cache is automatically deleted after aggregation)
  • --startDay
    • Beginning date of query range
  • --endDay
    • End date of query range - should match startDay to query a single day of data
  • --reportDay
    • Date used for report titles - does not need to match query dates

Optional

  • --version
    • Get the mobility-metrics binary version

Auditing

Run Mobility Metrics data auditing functions to compare source data used for reports (see Auditing section below for additiona information).

  • --compareA
    • Path to auditing directory for report A
  • --compareB
    • Path to auditing directory for report B
  • --compareHashes
    • Run hash comparison between report A and B

Example

mobility-metrics \
  --config ./example/example.json \
  --public ./public \
  --cache ./cache \
  --startDay 2019-07-20 \
  --endDay 2019-07-20 \
  --reportDay 2019-07-20

Version

To check the version of the CLI you are running, use the -v or --version flags.

Docker

Docker is supported, and is recommended when installation is challenging on bespoke systems that fail when installing dependencies, such as OSRM.

Building a docker image

# clone repo
git clone [email protected]:sharedstreets/mobility-metrics.git
cd mobility-metrics

# build image
docker build --tag mobility-metrics-image .

Running mobility-metrics from docker image

Once you have a docker image built, use mobility-metrics CLI from within the image using docker run. In this example, a config file exists in the current directory, which mobility-metrics can read using the mounted volume.

docker run -it --rm \
-v $PWD:/data/ \ # mount current working directory to image volume
mobility-metrics-image \
  mobility-metrics \
    --config /data/config.json \
    --public /data/public \
    --cache /data/cache \
    --startDay 2019-09-20 \
    --endDay 2019-09-20 \
    --reportDay 2019-09-20;

Serving API

The metrics data and reports generated by the mobility-metrics CLI are intended to be served from a static HTTP server, such as apache, nginx, or a public HTTP service like netifly or Github Pages. See the gh-pages branch of this repository to see how the demo for this tool is hosted using simulated data in Nashville, TN.

For a simple demonstration, the node module serve can be used to test out a configuration:

npm install -g serve
mobility-metrics --config ./example/example.json --public ./public --cache ./cache --day 2019-07-20
serve ./public
open http://localhost:5000/reports/2019-07-20/All

Example

An example is provided in this repository for testing out mobility-metrics UI and aggregations. The example provides ready to use scripts that simulate MDS telemetry using an multi-agent model with the SharedStreets trip-simulator tool. Install the following requirements to get started:

  • trip-simulator
  • osmium
  • curl
  • osrm

To run the simulation, use the following script:

node example/simulate.js

Now that you have raw simulated MDS data to work with, run a backfill using the mobility-metrics CLI:

mobility-metrics --config ./example/example.json --public ./public --cache ./cache --day 2019-07-20

A full static file structure should be generated at ./public. See the Serving API section above for tips on serving this endpoint over HTTP.

Auditing

Mobility Metrics version +4.7.x implements data auditing functionality that allows fine-grain comparison of source data used in reports for similar or overlapping time periods. This allows debugging of source data to ensure metrics differences are not the result of underlying data integrity problems, without also requiring that users store sensitve data that can be used to reconstruct trips.

When a report is run, Mobility Metrics stores hashed auditing data in ./audits alongside the report data. The audit log contains summary stats about the input data as well as a complete set irreversible hashes for each input trip and status_change event. When a data audit is performed the logs from each report are compared and differences are flagged.

The audit log directory path for each report is listed at the bottom of the report page:

Audit log path listed in report

Example of using the comparison command line tools to compare audit logs:

$ mobility-metrics  --compareA path/to/audit_log_a/ --compareA path/to/audit_log_b/

Comparison

To compare individual hashes use the optional --compareHashes flag:

$ mobility-metrics --compareA path/to/audit_log_a/ --compareA path/to/audit_log_b/ --compareHashes

Comparison with Hashes

Test

Run a test suite across the project. Auto-formats code using linter.

npm test

Lint

Runs a linter, prettier, and auto-formats code to meet consistent style, while checking for syntax errors.

npm run lint

mobility-metrics's People

Contributors

emilyeros avatar jwoyame avatar kpwebb avatar lcc4 avatar molliemcardle avatar morganherlocker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mobility-metrics's Issues

Document "Export All" JSON structure

#34

"Export All" downloads a valid, but custom, JSON object containing deduplicated geometry and aggregates. This allows for a compact download of <10MB instead of hundreds of MB of duplicated data. Let's add a section in the readme explaining this structure and how it is used.

v1 mds derived metrics

@morganherlocker here's a preliminary list of metrics that can be derived from MDS:

Zonal (H3 cell):

  • Availability (weighted average fractional vehicles per hour)
  • Uniques (vehicles per hour)
  • Activity (trips per hour - can be further divided into things like "average trip length/duration/etc." )
  • Utilization (% of available capacity used per hour)
  • Zone to Zone flows (trips per hour)

Street-linked (ShSt references):

  • Unlinked trip start/end (trip per hour per ShSt bin)
  • Segment volume (trips per hour traversing ShSt segment)
  • Segment flows (trips across segments measure as % of trips traversing that traversed ShSt segment A that also traversed segment B)

let's break these into metric specific tickets so we can keep track of the discussion on implementation details.

PUDO by street

Add point matched aggregates for pickup and drop off metrics.

mds_processor error reporting

The work of mds_processor mostly takes place within an async function. This function does not report unhandled errors within its scope, such as HTTP failures or failures initializing databases. Since these issues are expected to happen from time to time, we should explicitly handle each error by either retrying, ignoring (but reporting to stderr), or exiting the process with helpful log and exit code.

cache matches during summarization

A major performance bottleneck during summarization of metrics is map matching. We can store matches in memory and look them up to see if a match has already been performed, which should save us a lot of time and CPU cycles.

decouple mds_processor

Currently mds_processor is kicked off as a forked child_process of the main server. This rescans all data each time the server is rebooted, which takes a substantial amount of time. This process should be manually triggered with a separate command instead.

total trips

Metric describing total unique trips logged.

design overhaul

At this point the UI design is very basic. Let's lay out the dashboard UI and identify any graphics that are needed.

  • decide on a css framework (currently bulma, but I'm not using it fully)
  • flexbox style containers for adapting to different interfaces
  • add a place for overview stats above the map
  • use the same map style used by closure interface (cc @indraneel)
  • rethink date, hour, and minute selection; needs to be more accessible
  • sharedstreets branded icon in interface
  • formatted sharedstreets icons for windows and osx
  • modal or tab based page selection for backfill and provider admin panels

cc @emilyeros @molliemcardle

Windows support

Some users have tried testing the server on Windows, however, they are running into issues installing OSRM, the C++ routing engine used in shst-js. We need to either support OSRM Windows builds or document getting this up and running in a generic VM or docker instance.

cc @emilyeros

backfill interface

Add a admin modal or tab for backfilling from the interface. Currently this is command line only. This interface should also show which days have already been imported.

Vehicle Count Methodology

This is somewhat related to #42. To confirm my understanding of the current method:

Total Vehicles = Count of unique device_id by day from Device_Status with an event_type of reserved, available, unavailable at least once in that day

Active Vehicles = Count of unique device_id by day from Trips

Is that right? In the interest of comparing notes, SFMTA has taken a few approaches to counting devices thus far, depending on the purpose, and they all each have their pros & cons. We initially started with a simple count of unique device_ids for each day but since providers may swap out vehicles mid-day, this ended up being misleading.

Hourly snapshots on-street devices = Given a "snapshot date/time" (e.g., 4/25 at 8am), total count of unique device_id from Device_Status that have sent an event within 48 hours of the snapshot date/time and with a latest event of available, reserved, or unavailable

Hourly snapshots of revenue devices = Given a "snapshot date/time" (e.g., 4/25 at 8am), total count of unique device_id from Device_Status that have sent an event within 48 hours of the snapshot date/time and with a latest event of available or reserved

The only difference between the two are whether or not to include the unavailable devices. If measuring towards cap adherence, then we include the unavailable devices (what we've been calling "on-street devices"), but if looking more at actual service and what's available to customers (what we've been calling "revenue devices"), we exclude the unavailable devices. The 48-hour window can also be flexible.

We've also been calculating:
Revenue Hours = sum of total time in an available or reserved state, which is derived from Device_Status

Using the hours helps to account for the change in devices over the course of the day. Given different service models, this may or may not matter. Some providers pull devices from service at night, some rebalance and/or replace devices over the course of the day, and others may just leave devices out all day with minimal rebalancing. The hours would account for this so that we can compare and/or aggregate providers in an apples-to-apples way.

The snapshot approach is a little more intuitive, but the revenue hours is a little more flexible. We're currently working with both now.

Style maps by computed quantile

All bins or streets in the maps are styled the same, so users can't tell where the hot spots are when they're using the web interface. Can we bring back the styling from the previous MM iteration?

Screen Shot 2019-04-19 at 11 14 14 AM

fit maps to boundary in config

Currently the maps center on a centroid in Detroit. Let's make the map center wherever the user has configured the instance boundary.

validate config.template.json in tests

I recently pushed a change that added a trailing comma to the config template. The issue was patched, but I would like to add config.template.json validation to the tests to make sure this does not happen when modifying the template in the future.

clear maps on date change

Maps currently do not clear or re-query when date changes. This issue is only apparent when moving to a date that has not been backfilled.

desktop

I have been working on a desktop implementation of this project on the desktop branch. This branch wraps the existing UI inside an electron app for 1 click installs and cross platform distribution. This greatly simplifies the ease of use for cities with limited IT infrastructure or cities where setting up servers and proxies is a slow process. Overall, decentralized applications put more control in the hands of the user, and more closely model the usage of traditional GIS tools.

todo

Moving away from server/CLI based tooling requires a couple additional UI panels for admin tasks, as well as dev tooling for building across OSX and Windows, our two most commonly requested platforms.

  • osx desktop bundler
  • windows desktop bundler
  • provider admin panel
  • backfill admin panel

atomic backfill by day

Backfill currently works over a range of days. I have noticed that some data does not always fall within the range, so I typically delete the entire database before re-importing all data at once. I would like to clean this up by making backfill operations operate one day at a time, filtering any data that falls outside the target day. This allows an individual day to be reimported without potentially double importing any spillover from a surrounding day. The spillover itself is a sign of a bug in provider implementations of the MDS API, so the providers should be notified, since this has other downstream implications that cannot be worked around.

Use with existing MDS data?

This is a really cool looking project!

Apologies if I have missed something (reading through the docs and code), is there a way to get started having already collected some MDS data? E.g. from JSON files or a database connection?

total travel time metric

It would be interesting to aggregate total travel time per spatial zone. This could be generated but slicing segments of each trip and adding the duration of each segment's timestamps to the bin.

sharedstreets spatial aggregator

Sharedstreets references and geometries should be added as spatial aggregator, alongside h3. Combined with the export download, this would allow metrics to be joined back to a city's centerline data in any mainstream GIS application with a simple column join.

time based vehicle utilization

Vehicle utilization is described as "total vehicle on-street time divided by total trip duration for all vehicles"

Here's an example of one day of data:
Screen Shot 2019-04-23 at 9 42 23 AM

Let's say there are 10 operational hours in this day. Then total vehicle on-street time = 624 vehicles x 10 hrs = 6240 hours

Total trip duration for all vehicles = 22m7s per trip / 60 min per hr * 544 trips = 200 hours

I think there's an error in the description; the numerator and the denominator should be reversed, otherwise you'll always get a utilization of over 100%. But even if I do that, I'd have:

utilization = 200 hours / 6240 hours = 3.2%. Not 34%, like in the example.

I think the calculation has a math error somewhere.

log version when generating backfill

Storing metadata with backfill data, such as software version, let's us know when breaking changes have been made between updates. This is important for data integrity checks and knowing when to re-backfill.

specify node engine version

sharedstreets-js v1 alpha is now a dependency, and it relied on some features from node 11 to run. Let's specify this in the package.json to avoid confusion.

export button

Add a button to the UI that allows the current selection to be downloaded as a GeoJSON file for use in a GIS.

cc @emilyeros

fallback for missing config options

When a new config option is introduced, this can cause confusion, since the server will crash with a cryptic error. Let's add a default fallback for base level config options.

aggregate across providers

We can add a special provider called "All", which aggregates all providers together to get an overview of all mobility across a city. This has been a commonly requested feature for cities focused on planning and analysis. Combined with weekly and monthly summaries, our cache dependencies are going to be entering into complex territory, so I want to sketch out some sort of DAG for handling this succinctly and minimizing redundant scans of provider APIs.

report missing database directory

Expanding on issue #3 , let's report an error and exit the process when the database is unable to be created. I have documented the mkdir -p data step in setup, but this is an easy thing to forget, especially during development.

total miles

Metric describing total miles traveled per provider.

loading animation

Add a loading animation when data is being queried from the server.

Seconds vs milliseconds for timestamps

The MDS documentation states that a trip's start_time and end_time timestamps should be in milliseconds. However, I noticed that the timestamp is being parsed with Moment.js as seconds (using X for seconds rather than x for milliseconds) in summarize.js:

function getTimeBins(timestamp) {
  var time = moment(timestamp, "X");
  ...
}

It seems unlikely that this is a bug if people are consuming MDS feeds just fine, but I'm curious if there is a difference in "ground truth" between what one or more providers publish and what the documentation states.

metric privacy filters

Implement privacy filters for features that do not hit the minimum thresholds defined in the Micromobility Metrics Specification.

provider admin UI

A UI for configuring provider tokens for each provider. It should essentially be a list of providers with text boxes for each token.

Export all button generates unreadable JSON

When exporting all map-based metrics, the resulting JSON cannot be loaded into GIS. I haven't posted an example since it would contain data, but ping me if you want this.

Probably related to: #33

documentation for running an instance

Document the suggested implementation:

  • Explain that this repo is designed to be cloned and deployed as a private service
  • Describe how to configure a provider MDS endpoint to be scraped

Pick-ups: Export button generates empty file

When exporting data for the pick up metric, an empty JSON file is generated. (This happens for bins and streets)

For drop-offs, exporting either the bins or the street segments actually exports the bins.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.