Comments (10)
so some more comments on the findings. for the cases where it didnt fix the route. There are some important things to note. The use of timestamps isnt a global optimization, that is it only applies between two points in the match, so it wont reject match paths that are globally inconsistent but rather only locally. this might be too difficult to explain in text but if you look at the first failing example Record Pressing Com_to_Manulife Financial
, there is a point (in one market plaza) in the fake blue colored gps that is about 140m away from the original green colored route. the points adjacent to that point have candidates which, given the time information, are reachable to that extreme point. globally that point is way far off, but locally, ie to its adjacent fake points, it is reachable and so, piecewise, the detour is just specially more consistent with the fake gps than the alternative path which follows the original route.
from reporter-quality-testing-rig.
This looks great @kevinkreiser. W/r/t to setting the GPS accuracy, that param gets defined in my code here. The gps accuracy gets set as the minimum of 100 m and the 95th percentile of a zero-mean normal distribution with a standard deviation equal to the specified amount of noise. The 95th percentile of such a distribution at noise level 100 would be 164, but since I'm taking min(100, 164)
, it should never go above 100.
I am not, however, ever adjusting the search radius parameter, which means the default value of 50 m is being used. Even so, if valhalla is using max(search_radius, gps_accuracy)
as you say, we should never be getting points more than 100 m away from the true location. So I am stumped by this as well.
from reporter-quality-testing-rig.
but what in this bit:
keeps the adjusted coordinates from being more than accuracy
away from the original point? so far as i can tell you dont make sure that the adjusted ones are at or less than accuracy
. i solved this myself for my testing by computing a random vector, normalizing it, and then generating a random scale between 0 and accuracy
to make sure that the offset could never be more than accuracy:
from reporter-quality-testing-rig.
Ah good catch! OK, so, that is correct, and that's why we're seeing points that are greater than 100 m away from the true location.
I remember changing the accuracy to the 95th percentile of the mean-zero gaussian with stddev == noise because I had been simply using the noise as the accuracy level, which made it impossible to find a match for any point that was perturbed by more than one standard deviation. I can't say I recall the rationale for capping this value at 100 m, do you @kevinkreiser?
I assume there was a good reason, and if so, I should be able to fix the problem by applying the same min(100, ____)
to lines 723-724, correct? Otherwise, I can drop the min()
function from the accuracy param calculation. Either way, the takeaway seems to be that we want parity between the two.
from reporter-quality-testing-rig.
yes we want parity between the two. you cant simply do the min on the second part of the randomization though because the vector is comprised of two components, which means if both components are near to 100m then their length, will be > 100m. this is why i generate a random offset vector and then normalize it to a length of 1, then i can control the length by just scaling it to a random scale which is <= 100m or whatever your program has set the accuracy to for that run.
from reporter-quality-testing-rig.
oh and wrt to the number 100m... i dont have any clue why we landed on that 😄 i guess its a pretty large amount of noise to have consistently (on each point) of a given trace, especially within a tight city grid.
from reporter-quality-testing-rig.
Makes sense. Although it begs the question, shouldn't we be forcing similar upper limits at all noise levels? 100 m is one sd for the gaussian distribution, which is 65th percentile. For consistency's sake, should we be capping all of our noise generation at the 65th percentile? Why a hard cap just for 100 m of noise?
In the meantime, I will open a new issue for implementing a magnitude-based noise vector similar to what you've done in your code
from reporter-quality-testing-rig.
from reporter-quality-testing-rig.
yes we should enforce the limit for every run. basically, randomly generate a noise level, use that level as the limit for all further randomly generated magnitudes of random offset vectors. this is exactly what the above linked c++ code is doing as well.
from reporter-quality-testing-rig.
Closing this as the main issue is now being tracked in #10 and also in the main reporter repo here.
from reporter-quality-testing-rig.
Related Issues (9)
- QA match scoring by segment type HOT 1
- synthesize_gps() should add noise to points based on total magnitude of the "noise vector" rather than applying noise to the individual vector components HOT 2
- possible QA metrics HOT 9
- Ensure synthetic GPS traces with noise and high sample rates do not double back on themselves
- Refactor the matching code to include "right join" logic
- route generation and scoring for different urban density types HOT 1
- tune meili params once they're exposed HOT 3
- synthesize_gps() fails periodically with TypeError
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reporter-quality-testing-rig.