The latest versions of Radiobeacon have introduced increasingly elaborate
filtering of WiFi networks whose geographic location is expected to change
frequently. I am quite skeptical of doing this kind of filtering locally,
before uploading data, as there is quite a risk of false positives:
- Driving along a motorway one might still pick up a signal from a building in
the vicinity. Service stations come to mind... nowadays they are pretty sure to
have at least one access point, and these are perfectly stationary. Filtering
them out on the basis that they are located close to a motorway will discard
some perfectly valid data.
- MAC ranges associated with mobile devices: do we know for sure that chips
with this MAC range were never built into in stationary equipment?
- SSIDs: users can choose any SSID they want, thus finding strings such as
"iPhone", "ASUS" and the like in the SSID are indicators that this access point
MIGHT change its position, but not a sure indicator.
- Finally, there may always be the odd owner of a cabin in the woods who has
taped his old smartphone to the wall and uses it as a WiFi access point.
Despite being a mobile device by all means, its position won't change, and it
may even be the only WiFI around, making it even more valuable. Or a homebrew
WiFi router which uses a USB or PC card WiFI adapter which would be classified
as a mobile device due to its MAC.
On the other hand, there are cases which will never be caught by this approach:
- People or offices moving: they frequently take their equipment with them.
That equipment is fixed in nature, and once it has moved to a new location, it
will stay there for some time. However, after the move the database will still
report them to be at their old location, until someone scans the new location,
after which the database will have both locations.
- Fixed equipment used in temporary installations. I am wondering how many of
these I have picked up as I cycled around the Oktoberfest. They're there for
only two weeks, and who knows where these BSSIDs are going to surface next.
Conclusion: dealing with moving WiFis is a lot more complex than just comparing
against a blacklist of SSIDs and BSSIDs. It will always be a guessing game, and
the more data we base our guess on, the better it is. To get a maximum of data,
we would need to run such heuristics on the server.
As a primary input I would use the actual movements of the WiFi. A conventional
WiFi covers a range of some 100 meters around the base station, so I would
consider a WiFi to have moved whenever two subsequent positions for that WiFi
are significantly more than 200 meters apart. To establish the position of a
WiFi we should then only consider the data collected after the last move.
Additionally, we could introduce a score for each WiFi, which indicates how
reliable position estimates for this WiFi are. That score could then consider
the blacklist criteria, as well as some more:
- Partial SSID match = bad, full SSID match = very bad.
- MAC range match = bad, full MAC match = very bad.
- Mean time between moves: the longer, the better.
- Number of moves recorded: the fewer, the better.
- Time since last move: good if considerably lower than mean time between moves
Clients trying to determine a position based on nearby WiFis could then take
that score into consideration and give preference to WiFis based on two
criteria:
- Good positional stability based on the above score
- Proximity to other WiFis received at the same time: if the positions of two
WiFis in the DB are significantly more than 200 meters apart, it is a sign that
one of the two may have moved.