Light

analysing samples with an without timestamp rejection about reporter-quality-testing-rig HOT 10 CLOSED

opentraffic commented on July 20, 2024

analysing samples with an without timestamp rejection

from reporter-quality-testing-rig.

Comments (10)

kevinkreiser commented on July 20, 2024

so some more comments on the findings. for the cases where it didnt fix the route. There are some important things to note. The use of timestamps isnt a global optimization, that is it only applies between two points in the match, so it wont reject match paths that are globally inconsistent but rather only locally. this might be too difficult to explain in text but if you look at the first failing example Record Pressing Com_to_Manulife Financial, there is a point (in one market plaza) in the fake blue colored gps that is about 140m away from the original green colored route. the points adjacent to that point have candidates which, given the time information, are reachable to that extreme point. globally that point is way far off, but locally, ie to its adjacent fake points, it is reachable and so, piecewise, the detour is just specially more consistent with the fake gps than the alternative path which follows the original route.

from reporter-quality-testing-rig.

mxndrwgrdnr commented on July 20, 2024

This looks great @kevinkreiser. W/r/t to setting the GPS accuracy, that param gets defined in my code here. The gps accuracy gets set as the minimum of 100 m and the 95th percentile of a zero-mean normal distribution with a standard deviation equal to the specified amount of noise. The 95th percentile of such a distribution at noise level 100 would be 164, but since I'm taking min(100, 164), it should never go above 100.

I am not, however, ever adjusting the search radius parameter, which means the default value of 50 m is being used. Even so, if valhalla is using max(search_radius, gps_accuracy) as you say, we should never be getting points more than 100 m away from the true location. So I am stumped by this as well.

from reporter-quality-testing-rig.

kevinkreiser commented on July 20, 2024

but what in this bit:

https://github.com/opentraffic/reporter-quality-testing-rig/blob/master/validator/validator.py#L723-L724

keeps the adjusted coordinates from being more than accuracy away from the original point? so far as i can tell you dont make sure that the adjusted ones are at or less than accuracy. i solved this myself for my testing by computing a random vector, normalizing it, and then generating a random scale between 0 and accuracy to make sure that the offset could never be more than accuracy:

https://github.com/valhalla/valhalla/blob/3ef5ada129538912940a3cc6175728e80a371565/test/mapmatch.cc#L164-L170

from reporter-quality-testing-rig.

mxndrwgrdnr commented on July 20, 2024

Ah good catch! OK, so, that is correct, and that's why we're seeing points that are greater than 100 m away from the true location.

I remember changing the accuracy to the 95th percentile of the mean-zero gaussian with stddev == noise because I had been simply using the noise as the accuracy level, which made it impossible to find a match for any point that was perturbed by more than one standard deviation. I can't say I recall the rationale for capping this value at 100 m, do you @kevinkreiser?

I assume there was a good reason, and if so, I should be able to fix the problem by applying the same min(100, ____) to lines 723-724, correct? Otherwise, I can drop the min() function from the accuracy param calculation. Either way, the takeaway seems to be that we want parity between the two.

from reporter-quality-testing-rig.

kevinkreiser commented on July 20, 2024

yes we want parity between the two. you cant simply do the min on the second part of the randomization though because the vector is comprised of two components, which means if both components are near to 100m then their length, will be > 100m. this is why i generate a random offset vector and then normalize it to a length of 1, then i can control the length by just scaling it to a random scale which is <= 100m or whatever your program has set the accuracy to for that run.

from reporter-quality-testing-rig.

kevinkreiser commented on July 20, 2024

oh and wrt to the number 100m... i dont have any clue why we landed on that 😄 i guess its a pretty large amount of noise to have consistently (on each point) of a given trace, especially within a tight city grid.

from reporter-quality-testing-rig.

mxndrwgrdnr commented on July 20, 2024

Makes sense. Although it begs the question, shouldn't we be forcing similar upper limits at all noise levels? 100 m is one sd for the gaussian distribution, which is 65th percentile. For consistency's sake, should we be capping all of our noise generation at the 65th percentile? Why a hard cap just for 100 m of noise?

In the meantime, I will open a new issue for implementing a magnitude-based noise vector similar to what you've done in your code

from reporter-quality-testing-rig.

mxndrwgrdnr commented on July 20, 2024

from reporter-quality-testing-rig.

kevinkreiser commented on July 20, 2024

yes we should enforce the limit for every run. basically, randomly generate a noise level, use that level as the limit for all further randomly generated magnitudes of random offset vectors. this is exactly what the above linked c++ code is doing as well.

from reporter-quality-testing-rig.

mxndrwgrdnr commented on July 20, 2024

Closing this as the main issue is now being tracked in #10 and also in the main reporter repo here.

from reporter-quality-testing-rig.

Related Issues (9)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.