sharedstreets / mobility-metrics Goto Github PK
View Code? Open in Web Editor NEWTools for collecting, processing, and interpreting mobility data using SharedStreets
License: MIT License
Tools for collecting, processing, and interpreting mobility data using SharedStreets
License: MIT License
Implement privacy filters for features that do not hit the minimum thresholds defined in the Micromobility Metrics Specification.
When exporting all map-based metrics, the resulting JSON cannot be loaded into GIS. I haven't posted an example since it would contain data, but ping me if you want this.
Probably related to: #33
Document the suggested implementation:
I recently pushed a change that added a trailing comma to the config template. The issue was patched, but I would like to add config.template.json
validation to the tests to make sure this does not happen when modifying the template in the future.
Add point matched aggregates for On-street metric.
We're hosting a webinar tomorrow, Friday, May 3rd, at 12:30p Pacific/3:30p Eastern to demo the tool and answer any questions. Please register here to join!
Add point matched aggregates for availability.
Add a config option for privacy filter and use it in summary functions. Error when set < 2?
Add point matched aggregates for pickup and drop off metrics.
https://github.com/sharedstreets/sharedstreets-micromobility-connector#install
The docs currently point the specification repo.
It would be interesting to aggregate total travel time per spatial zone. This could be generated but slicing segments of each trip and adding the duration of each segment's timestamps to the bin.
At this point the UI design is very basic. Let's lay out the dashboard UI and identify any graphics that are needed.
The work of mds_processor mostly takes place within an async function. This function does not report unhandled errors within its scope, such as HTTP failures or failures initializing databases. Since these issues are expected to happen from time to time, we should explicitly handle each error by either retrying, ignoring (but reporting to stderr), or exiting the process with helpful log and exit code.
Vehicle utilization is described as "total vehicle on-street time divided by total trip duration for all vehicles"
Here's an example of one day of data:
Let's say there are 10 operational hours in this day. Then total vehicle on-street time = 624 vehicles x 10 hrs = 6240 hours
Total trip duration for all vehicles = 22m7s per trip / 60 min per hr * 544 trips = 200 hours
I think there's an error in the description; the numerator and the denominator should be reversed, otherwise you'll always get a utilization of over 100%. But even if I do that, I'd have:
utilization = 200 hours / 6240 hours = 3.2%. Not 34%, like in the example.
I think the calculation has a math error somewhere.
I have been working on a desktop implementation of this project on the desktop branch. This branch wraps the existing UI inside an electron app for 1 click installs and cross platform distribution. This greatly simplifies the ease of use for cities with limited IT infrastructure or cities where setting up servers and proxies is a slow process. Overall, decentralized applications put more control in the hands of the user, and more closely model the usage of traditional GIS tools.
Moving away from server/CLI based tooling requires a couple additional UI panels for admin tasks, as well as dev tooling for building across OSX and Windows, our two most commonly requested platforms.
cc @emilyeros
A major performance bottleneck during summarization of metrics is map matching. We can store matches in memory and look them up to see if a match has already been performed, which should save us a lot of time and CPU cycles.
Metric describing total miles traveled per provider.
When exporting data for the pick up metric, an empty JSON file is generated. (This happens for bins and streets)
For drop-offs, exporting either the bins or the street segments actually exports the bins.
We can add a special provider called "All", which aggregates all providers together to get an overview of all mobility across a city. This has been a commonly requested feature for cities focused on planning and analysis. Combined with weekly and monthly summaries, our cache dependencies are going to be entering into complex territory, so I want to sketch out some sort of DAG for handling this succinctly and minimizing redundant scans of provider APIs.
Expanding on issue #3 , let's report an error and exit the process when the database is unable to be created. I have documented the mkdir -p data
step in setup, but this is an easy thing to forget, especially during development.
Metric describing total unique trips logged.
During the refactor, I removed the uber and lime providers. These should be added back with new metrics stream interface. They look almost identical to bird, but there may be differences in the pagination implementations, which are not a part of MDS.
Add a loading animation when data is being queried from the server.
Geometries are not currently exposed from the sharedstreets-js lib. Once this is enabled, add these to the geometry cached used in metrics.
When a new config option is introduced, this can cause confusion, since the server will crash with a cryptic error. Let's add a default fallback for base level config options.
The MDS documentation states that a trip's start_time
and end_time
timestamps should be in milliseconds. However, I noticed that the timestamp is being parsed with Moment.js as seconds (using X
for seconds rather than x
for milliseconds) in summarize.js
:
function getTimeBins(timestamp) {
var time = moment(timestamp, "X");
...
}
It seems unlikely that this is a bug if people are consuming MDS feeds just fine, but I'm curious if there is a difference in "ground truth" between what one or more providers publish and what the documentation states.
Maps currently do not clear or re-query when date changes. This issue is only apparent when moving to a date that has not been backfilled.
Add a admin modal or tab for backfilling from the interface. Currently this is command line only. This interface should also show which days have already been imported.
A UI for configuring provider tokens for each provider. It should essentially be a list of providers with text boxes for each token.
sharedstreets-js v1 alpha is now a dependency, and it relied on some features from node 11 to run. Let's specify this in the package.json to avoid confusion.
Streets and bins need different names to avoid file name collisions when exporting both into GIS tools.
cc @emilyeros
Total counts across providers for all status changes.
Some users have tried testing the server on Windows, however, they are running into issues installing OSRM, the C++ routing engine used in shst-js. We need to either support OSRM Windows builds or document getting this up and running in a generic VM or docker instance.
cc @emilyeros
Storing metadata with backfill data, such as software version, let's us know when breaking changes have been made between updates. This is important for data integrity checks and knowing when to re-backfill.
Sharedstreets references and geometries should be added as spatial aggregator, alongside h3. Combined with the export download, this would allow metrics to be joined back to a city's centerline data in any mainstream GIS application with a simple column join.
@morganherlocker here's a preliminary list of metrics that can be derived from MDS:
Zonal (H3 cell):
Street-linked (ShSt references):
let's break these into metric specific tickets so we can keep track of the discussion on implementation details.
This is a really cool looking project!
Apologies if I have missed something (reading through the docs and code), is there a way to get started having already collected some MDS data? E.g. from JSON files or a database connection?
Add support for spin MDS provider API.
Backfill currently works over a range of days. I have noticed that some data does not always fall within the range, so I typically delete the entire database before re-importing all data at once. I would like to clean this up by making backfill operations operate one day at a time, filtering any data that falls outside the target day. This allows an individual day to be reimported without potentially double importing any spillover from a surrounding day. The spillover itself is a sign of a bug in provider implementations of the MDS API, so the providers should be notified, since this has other downstream implications that cannot be worked around.
Add a button to the UI that allows the current selection to be downloaded as a GeoJSON file for use in a GIS.
cc @emilyeros
Currently the maps center on a centroid in Detroit. Let's make the map center wherever the user has configured the instance boundary.
Currently mds_processor is kicked off as a forked child_process of the main server. This rescans all data each time the server is rebooted, which takes a substantial amount of time. This process should be manually triggered with a separate command instead.
"Export All" downloads a valid, but custom, JSON object containing deduplicated geometry and aggregates. This allows for a compact download of <10MB instead of hundreds of MB of duplicated data. Let's add a section in the readme explaining this structure and how it is used.
This is somewhat related to #42. To confirm my understanding of the current method:
Total Vehicles = Count of unique device_id by day from Device_Status with an event_type of reserved, available, unavailable at least once in that day
Active Vehicles = Count of unique device_id by day from Trips
Is that right? In the interest of comparing notes, SFMTA has taken a few approaches to counting devices thus far, depending on the purpose, and they all each have their pros & cons. We initially started with a simple count of unique device_ids for each day but since providers may swap out vehicles mid-day, this ended up being misleading.
Hourly snapshots on-street devices = Given a "snapshot date/time" (e.g., 4/25 at 8am), total count of unique device_id from Device_Status that have sent an event within 48 hours of the snapshot date/time and with a latest event of available, reserved, or unavailable
Hourly snapshots of revenue devices = Given a "snapshot date/time" (e.g., 4/25 at 8am), total count of unique device_id from Device_Status that have sent an event within 48 hours of the snapshot date/time and with a latest event of available or reserved
The only difference between the two are whether or not to include the unavailable devices. If measuring towards cap adherence, then we include the unavailable devices (what we've been calling "on-street devices"), but if looking more at actual service and what's available to customers (what we've been calling "revenue devices"), we exclude the unavailable devices. The 48-hour window can also be flexible.
We've also been calculating:
Revenue Hours = sum of total time in an available or reserved state, which is derived from Device_Status
Using the hours helps to account for the change in devices over the course of the day. Given different service models, this may or may not matter. Some providers pull devices from service at night, some rebalance and/or replace devices over the course of the day, and others may just leave devices out all day with minimal rebalancing. The hours would account for this so that we can compare and/or aggregate providers in an apples-to-apples way.
The snapshot approach is a little more intuitive, but the revenue hours is a little more flexible. We're currently working with both now.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.