Giter Site home page Giter Site logo

sharedstreets / mobility-metrics Goto Github PK

View Code? Open in Web Editor NEW
49.0 49.0 18.0 9.08 MB

Tools for collecting, processing, and interpreting mobility data using SharedStreets

License: MIT License

HTML 3.47% JavaScript 94.43% CSS 2.09% Dockerfile 0.01%

mobility-metrics's Introduction

SharedStreets Builder

The SharedStreets Builder application converts OpenStreetMap data to SharedStreets protocol buffer tiles.

SharedStreets uses this tool to generate and maintain a complete global OSM-dervied tile set. Users can operate the tool directly on their OSM or use a pregenerated global tileset provided by SharedStreets.

Support for non-OSM data sources has been moved to the sharedstreets-conflator tool.

Example use

java -jar ./sharedstreets-builder-0.1-preview.jar --input data/[osm_input_file].pbf --output ./[tile_output_directory]

Notes

The builder application is built on Apache Flink. If memory requirements exceed available space, Flink uses a disk-based cache for processing. Processing large OSM data sets may require several hundred gigabytes of free disk space.

Roadmap

mobility-metrics's People

Contributors

emilyeros avatar jwoyame avatar kpwebb avatar lcc4 avatar molliemcardle avatar morganherlocker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mobility-metrics's Issues

metric privacy filters

Implement privacy filters for features that do not hit the minimum thresholds defined in the Micromobility Metrics Specification.

Style maps by computed quantile

All bins or streets in the maps are styled the same, so users can't tell where the hot spots are when they're using the web interface. Can we bring back the styling from the previous MM iteration?

Screen Shot 2019-04-19 at 11 14 14 AM

Document "Export All" JSON structure

#34

"Export All" downloads a valid, but custom, JSON object containing deduplicated geometry and aggregates. This allows for a compact download of <10MB instead of hundreds of MB of duplicated data. Let's add a section in the readme explaining this structure and how it is used.

provider admin UI

A UI for configuring provider tokens for each provider. It should essentially be a list of providers with text boxes for each token.

Seconds vs milliseconds for timestamps

The MDS documentation states that a trip's start_time and end_time timestamps should be in milliseconds. However, I noticed that the timestamp is being parsed with Moment.js as seconds (using X for seconds rather than x for milliseconds) in summarize.js:

function getTimeBins(timestamp) {
  var time = moment(timestamp, "X");
  ...
}

It seems unlikely that this is a bug if people are consuming MDS feeds just fine, but I'm curious if there is a difference in "ground truth" between what one or more providers publish and what the documentation states.

Pick-ups: Export button generates empty file

When exporting data for the pick up metric, an empty JSON file is generated. (This happens for bins and streets)

For drop-offs, exporting either the bins or the street segments actually exports the bins.

Vehicle Count Methodology

This is somewhat related to #42. To confirm my understanding of the current method:

Total Vehicles = Count of unique device_id by day from Device_Status with an event_type of reserved, available, unavailable at least once in that day

Active Vehicles = Count of unique device_id by day from Trips

Is that right? In the interest of comparing notes, SFMTA has taken a few approaches to counting devices thus far, depending on the purpose, and they all each have their pros & cons. We initially started with a simple count of unique device_ids for each day but since providers may swap out vehicles mid-day, this ended up being misleading.

Hourly snapshots on-street devices = Given a "snapshot date/time" (e.g., 4/25 at 8am), total count of unique device_id from Device_Status that have sent an event within 48 hours of the snapshot date/time and with a latest event of available, reserved, or unavailable

Hourly snapshots of revenue devices = Given a "snapshot date/time" (e.g., 4/25 at 8am), total count of unique device_id from Device_Status that have sent an event within 48 hours of the snapshot date/time and with a latest event of available or reserved

The only difference between the two are whether or not to include the unavailable devices. If measuring towards cap adherence, then we include the unavailable devices (what we've been calling "on-street devices"), but if looking more at actual service and what's available to customers (what we've been calling "revenue devices"), we exclude the unavailable devices. The 48-hour window can also be flexible.

We've also been calculating:
Revenue Hours = sum of total time in an available or reserved state, which is derived from Device_Status

Using the hours helps to account for the change in devices over the course of the day. Given different service models, this may or may not matter. Some providers pull devices from service at night, some rebalance and/or replace devices over the course of the day, and others may just leave devices out all day with minimal rebalancing. The hours would account for this so that we can compare and/or aggregate providers in an apples-to-apples way.

The snapshot approach is a little more intuitive, but the revenue hours is a little more flexible. We're currently working with both now.

atomic backfill by day

Backfill currently works over a range of days. I have noticed that some data does not always fall within the range, so I typically delete the entire database before re-importing all data at once. I would like to clean this up by making backfill operations operate one day at a time, filtering any data that falls outside the target day. This allows an individual day to be reimported without potentially double importing any spillover from a surrounding day. The spillover itself is a sign of a bug in provider implementations of the MDS API, so the providers should be notified, since this has other downstream implications that cannot be worked around.

mds_processor error reporting

The work of mds_processor mostly takes place within an async function. This function does not report unhandled errors within its scope, such as HTTP failures or failures initializing databases. Since these issues are expected to happen from time to time, we should explicitly handle each error by either retrying, ignoring (but reporting to stderr), or exiting the process with helpful log and exit code.

sharedstreets spatial aggregator

Sharedstreets references and geometries should be added as spatial aggregator, alongside h3. Combined with the export download, this would allow metrics to be joined back to a city's centerline data in any mainstream GIS application with a simple column join.

export button

Add a button to the UI that allows the current selection to be downloaded as a GeoJSON file for use in a GIS.

cc @emilyeros

report missing database directory

Expanding on issue #3 , let's report an error and exit the process when the database is unable to be created. I have documented the mkdir -p data step in setup, but this is an easy thing to forget, especially during development.

log version when generating backfill

Storing metadata with backfill data, such as software version, let's us know when breaking changes have been made between updates. This is important for data integrity checks and knowing when to re-backfill.

fit maps to boundary in config

Currently the maps center on a centroid in Detroit. Let's make the map center wherever the user has configured the instance boundary.

total miles

Metric describing total miles traveled per provider.

validate config.template.json in tests

I recently pushed a change that added a trailing comma to the config template. The issue was patched, but I would like to add config.template.json validation to the tests to make sure this does not happen when modifying the template in the future.

backfill interface

Add a admin modal or tab for backfilling from the interface. Currently this is command line only. This interface should also show which days have already been imported.

Windows support

Some users have tried testing the server on Windows, however, they are running into issues installing OSRM, the C++ routing engine used in shst-js. We need to either support OSRM Windows builds or document getting this up and running in a generic VM or docker instance.

cc @emilyeros

total travel time metric

It would be interesting to aggregate total travel time per spatial zone. This could be generated but slicing segments of each trip and adding the duration of each segment's timestamps to the bin.

cache matches during summarization

A major performance bottleneck during summarization of metrics is map matching. We can store matches in memory and look them up to see if a match has already been performed, which should save us a lot of time and CPU cycles.

loading animation

Add a loading animation when data is being queried from the server.

design overhaul

At this point the UI design is very basic. Let's lay out the dashboard UI and identify any graphics that are needed.

  • decide on a css framework (currently bulma, but I'm not using it fully)
  • flexbox style containers for adapting to different interfaces
  • add a place for overview stats above the map
  • use the same map style used by closure interface (cc @indraneel)
  • rethink date, hour, and minute selection; needs to be more accessible
  • sharedstreets branded icon in interface
  • formatted sharedstreets icons for windows and osx
  • modal or tab based page selection for backfill and provider admin panels

cc @emilyeros @molliemcardle

specify node engine version

sharedstreets-js v1 alpha is now a dependency, and it relied on some features from node 11 to run. Let's specify this in the package.json to avoid confusion.

time based vehicle utilization

Vehicle utilization is described as "total vehicle on-street time divided by total trip duration for all vehicles"

Here's an example of one day of data:
Screen Shot 2019-04-23 at 9 42 23 AM

Let's say there are 10 operational hours in this day. Then total vehicle on-street time = 624 vehicles x 10 hrs = 6240 hours

Total trip duration for all vehicles = 22m7s per trip / 60 min per hr * 544 trips = 200 hours

I think there's an error in the description; the numerator and the denominator should be reversed, otherwise you'll always get a utilization of over 100%. But even if I do that, I'd have:

utilization = 200 hours / 6240 hours = 3.2%. Not 34%, like in the example.

I think the calculation has a math error somewhere.

fallback for missing config options

When a new config option is introduced, this can cause confusion, since the server will crash with a cryptic error. Let's add a default fallback for base level config options.

documentation for running an instance

Document the suggested implementation:

  • Explain that this repo is designed to be cloned and deployed as a private service
  • Describe how to configure a provider MDS endpoint to be scraped

v1 mds derived metrics

@morganherlocker here's a preliminary list of metrics that can be derived from MDS:

Zonal (H3 cell):

  • Availability (weighted average fractional vehicles per hour)
  • Uniques (vehicles per hour)
  • Activity (trips per hour - can be further divided into things like "average trip length/duration/etc." )
  • Utilization (% of available capacity used per hour)
  • Zone to Zone flows (trips per hour)

Street-linked (ShSt references):

  • Unlinked trip start/end (trip per hour per ShSt bin)
  • Segment volume (trips per hour traversing ShSt segment)
  • Segment flows (trips across segments measure as % of trips traversing that traversed ShSt segment A that also traversed segment B)

let's break these into metric specific tickets so we can keep track of the discussion on implementation details.

clear maps on date change

Maps currently do not clear or re-query when date changes. This issue is only apparent when moving to a date that has not been backfilled.

decouple mds_processor

Currently mds_processor is kicked off as a forked child_process of the main server. This rescans all data each time the server is rebooted, which takes a substantial amount of time. This process should be manually triggered with a separate command instead.

aggregate across providers

We can add a special provider called "All", which aggregates all providers together to get an overview of all mobility across a city. This has been a commonly requested feature for cities focused on planning and analysis. Combined with weekly and monthly summaries, our cache dependencies are going to be entering into complex territory, so I want to sketch out some sort of DAG for handling this succinctly and minimizing redundant scans of provider APIs.

Use with existing MDS data?

This is a really cool looking project!

Apologies if I have missed something (reading through the docs and code), is there a way to get started having already collected some MDS data? E.g. from JSON files or a database connection?

PUDO by street

Add point matched aggregates for pickup and drop off metrics.

Export all button generates unreadable JSON

When exporting all map-based metrics, the resulting JSON cannot be loaded into GIS. I haven't posted an example since it would contain data, but ping me if you want this.

Probably related to: #33

total trips

Metric describing total unique trips logged.

desktop

I have been working on a desktop implementation of this project on the desktop branch. This branch wraps the existing UI inside an electron app for 1 click installs and cross platform distribution. This greatly simplifies the ease of use for cities with limited IT infrastructure or cities where setting up servers and proxies is a slow process. Overall, decentralized applications put more control in the hands of the user, and more closely model the usage of traditional GIS tools.

todo

Moving away from server/CLI based tooling requires a couple additional UI panels for admin tasks, as well as dev tooling for building across OSX and Windows, our two most commonly requested platforms.

  • osx desktop bundler
  • windows desktop bundler
  • provider admin panel
  • backfill admin panel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.