Based on conversations with <a class="user-mention notranslate" data-hovercard-type="u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

All distance-based metrics: <a target="_blank" rel="noopener noreferrer nofollow"

Distance-based QA metrics are available as a Python function <a href="https://github.c

possible QA metrics about reporter-quality-testing-rig HOT 9 CLOSED

opentraffic commented on September 22, 2024

possible QA metrics

from reporter-quality-testing-rig.

Comments (9)

mxndrwgrdnr commented on September 22, 2024

I updated the scoring metric to count each unmatched segment as a miss, even if its been missed already (e.g. if the route loops around the block). The results look similar to previous findings but more closely mimic the phenomenon identified in the Newson and Krumm paper where higher sampling rates produce poorer results at high levels of noise. This trend inverts around 40-60 m of noise.

from reporter-quality-testing-rig.

kpwebb commented on September 22, 2024

@mxndrwgrdnr this looks great.

How are things looking re measuring in terms of matched distance, rather than just segments?

I'm increasingly convinced that the failure case we're looking for is when a mostly good GPS trace falls apart temporarily (signal loss, etc.) and the match jumps way off. Does the distance spike in those cases trying to find a realistic match?

Raises two things about metrics: 1) distance of matched trace matters, 2) are there ways to think about perturbation of GPS that don't just mess up the whole trace, but rather degrade them periodically (would need to think about what GPS failure modes look like but maybe possible to get an idea from real-world traces).

from reporter-quality-testing-rig.

mxndrwgrdnr commented on September 22, 2024

@kpwebb Distance traveled, and relatedly, speed, comparisons are still on the "to-do" list. I'm holding off until we've actually tuned the map-matching HMM using the segment-match-based metric, which should happen at some point next week. Also worth keeping in mind that the speed and distance-based metrics will be significantly impacted by the inclusion of time in the HMM, which is still on the docket. Any distance/speed-based scoring generated now won't necessarily reflect the performance of the finished product, although it will give us a good idea of where we're starting from. In any event, I will have something for you to look at next week.

In the meantime I will keep thinking about the different failure modes of GPS as I agree that's a good way of producing more realistic traces.

from reporter-quality-testing-rig.

mxndrwgrdnr commented on September 22, 2024

Might need to pass reporter-generated segments back to valhalla trace_attributes in order to do length/distance-traveled comparison. The code is already doing this for the sake of route visualizations but I'm currently not saving the rest of the output, which we'd need in order to compare the relevant attributes.

from reporter-quality-testing-rig.

mxndrwgrdnr commented on September 22, 2024

Implemented distance traveled-based scoring metric based on the method used in the Newton and Krumm paper:
.

The results are a near mirror-image of the segment-based matching:

from reporter-quality-testing-rig.

mxndrwgrdnr commented on September 22, 2024

All distance-based metrics:

The top row of plots is comprised of composite metrics of both under- and overmatches (i.e. false negatives and false positives). The left column are count-based scores, and the right column are distance-based. They all track each other nicely, at least in the test region (San Francisco Bay Area).

One noticeable pattern that sticks out to me is that the "undermatches" appear to be more sensitive to sample rate at lower noise levels, while overmatches exhibit greater differentiation at higher levels of noise. Also of note is that the inversion in match quality mentioned above, whereby higher sample rates produce worse matches, is more pronounced for undermatches (false negatives).

from reporter-quality-testing-rig.

mxndrwgrdnr commented on September 22, 2024

I've been exploring different metrics for speed-based matching and I think I've arrived at a useful result. The graph below shows two CDF curves, one for successfully matched segments (red) and one for incorrectly matched segments (blue) for the % error of GPS-derived speed relative to OSM speeds ((GPS speed - OSM speed) / OSM speed). The results suggest a definitive breakpoint for a threshold above which we would throw out the most erroneous matches while retaining the most correct ones. In your post above, @kpwebb, you suggested 2x as a threshold, and the graph certainly supports the notion that any derived/measured/observed speed above 2x the OSM speed is going to be a true negative. However, it also suggests that we'd still be getting a ton of false positives at this threshold (about 70% of them). At least for this region, the SF Bay Area, we could drop that threshold down to 37% above the OSM speed which would allow us to retain > 90% of our good matches while discarding 60% of the false positives. Even more conservative would be threshold around 15% which would retain almost 80% of true positives while rejecting over 70% of the false positives. Its worth noting, too, that this plot represents results from simulated GPS data at all sample rates and all noise levels. The threshold could be easily customized depending on sample rate and expected positional accuracy. My next move will be to see how this threshold might vary along those lines, and to compare the trend across different regions.

from reporter-quality-testing-rig.

nvkelso commented on September 22, 2024

💯 great work!

from reporter-quality-testing-rig.

mxndrwgrdnr commented on September 22, 2024

Distance-based QA metrics are available as a Python function here. Speed-based are here. And the wrapper function that iterates over a number of routes and performs the calculations is here. Functions for generating the metric plots as seen in the validation notebook can be found here. The plots themselves are featured in sections 3. and 6. of the validation notebook.

from reporter-quality-testing-rig.

possible QA metrics about reporter-quality-testing-rig HOT 9 CLOSED

Comments (9)

Related Issues (9)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent