mapbox / gabbar Goto Github PK

View Code? Open in Web Editor NEW

19.0 98.0 7.0 13.28 MB

Guarding OpenStreetMap from harmful edits using machine learning

License: MIT License

Python 0.26% JavaScript 2.30% Jupyter Notebook 97.44%

machine-learning openstreetmap scikit-learn jupyter-notebook vandalism banished

gabbar's Introduction

gabbar

EXPERIMENTAL: UNDER DEVELOPMENT

Guarding OpenStreetMap from invalid or suspicious edits, Gabbar is an alpha package of a pre-trained binary problematic/not problematic classifier that was trained on manually labelled changesets from OpenStreetMap.

https://en.wikipedia.org/wiki/Gabbar_Singh_(character)

Installation

pip install gabbar

Setup

# Setup a virtual environment with Python 3.
mkvirtualenv --python=$(which python3) gabbar_py3

# Install in locally editable (``-e``) mode.
pip install -e .[test]

# Install node dependencies.
npm install

Prediction

# A prediction of "-1" represents that this feature is an anomaly (outlier).
gabbar 49172351
[
    {
        "attributes": {
            "action_create": 0,
            "action_delete": 0,
            "action_modify": 1,
            "area_of_feature_bbox": 109591.9146,
            "feature_name_touched": 0,
            "feature_version": 17,
            "highway_tag_created": 41,
            "highway_tag_deleted": 0,
            "highway_value_difference": 0,
            "length_of_longest_segment": 0.1577,
            "primary_tags_difference": 1
        },
        "changeset_id": "49172351",
        "feature_id": "124863896",
        "feature_type": "way",
        "prediction": -1,
        "score": -0.1493,
        "timestamp": "2017-07-10 10:33:02.925012",
        "version": "0.6.2"
    }
]

Testing

npm test

Hyperlinks

gabbar's People

Contributors

Stargazers

Watchers

Forkers

danstowell testbigorg matiskay rubythonode isabella232 classicvalues mapclone

gabbar's Issues

Use model_selection instead of deprecated cross_validation

Presently, when training a model, we see the following deprecation message.

$ python training/datatrain.py
/Users/demo/.virtualenvs/gabbar/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
training samples: 12364
[testing] good samples: 5299
[testing] problematic samples: 671
precision = 0.915625
recall = 0.442348
f1_score = 0.596514

Tuning model hyper-parameters is for real!!!

In the past, I have seen a 10% boost in performance when the model hyper-parameters are tuned vs using a vanilla machine learning model. This became apparent when I saw the graph below:

Before tuning hyper-parameters

After tuning hyper-parameters

cc: @anandthakker @geohacker @batpad

sklearn.neighbors.LocalOutlierFactor

Per chat with @jcsg, I briefly tried the LocalOutlierFactor model. Posting early, will spend some more time for a detailed analysis of both the model and the results.

Per https://stackoverflow.com/a/36869611/3453958

The NearestNeighbors class is unsupervised and can not be used for classification but only for nearest neighbor searches.

Neighbors for an inlier (good highway)

Neighbors for an outlier (harmful highway)

Prioritizing problematic edits to flag

Came across the following changeset with a 👎 in osmcha.

https://osmcha.mapbox.com/47569786/

Notes

There does not seem to be any geometric modification
The feature was edited later, modifying Illarangi St to Illarangi Street
Shortening Street with St does not break the map and it could be a common practice

cc: @anandthakker @geohacker @batpad

Building a classifier on changeset comments

Changeset comments can be super interesting! Can a model be trained to learn what changeset comments of 👍 changesets and 👎 changesets look like?

https://osmcha.mapbox.com/47469915/

minor edits / repetition / redundancy / abreviation / duplication / contraction / shortening / cleaning up overbloating /

https://osmcha.mapbox.com/47625854/

Beautiful Fountain nice place for tourist. and a nice grass park where you could sit down and enjoy nature in the city

cc: @anandthakker @geohacker @batpad

Bag of Tags

Ref: #69

In the field of Natural Language Processing (NLP), the Bag of Words technique is a popular one. Basically, text is represented as a bag of words, disregarding grammar and even word order but keeping multiplicity.

https://en.wikipedia.org/wiki/Bag-of-words_model

Something on these lines is the concept of Bag of Tags. All property tags from all samples in the training dataset for the Bag of Tags. Ex:

NOTE: harmful=0 represents a good changeset and harmful=1 a problematic changeset.

Changeset	highway	name	oneway	surface	maxspeed	vehicle	...
47514474	1	1	1	1	0	0	...
46429851	1	1	0	0	0	0	...
47349936	1	1	1	1	1	1	...

We collect all tags from changesets labelled with a 👎 and OneHotEncode them. Then, use this as attributes to train a classifier to learn and predict if changesets are good or problematic based on the occurrence of tags.

cc: @anandthakker @geohacker @batpad

Using results from osm-compare as attributes

We can potentially use the comparators in osm-compare as attributes in Gabbar. The workflow would look like:

Given a real changeset with both new version and old version of features
Pass every feature to osm-compare and collect back results
Ex: Changeset has 2 features where name of feature does not match name on Wikidata

cc: @amishas157 @batpad

Flagged by gabbar?

In osmcha I see some of my edits are flagged by gabbar which I suspect is this thing. It is not immediately clear to me by what mechanism edits are flagged and what this means. Some clarification would be wonderful.

Feature engineering for Gabbar

Changeset meta data

~~Changeset source - Local knowledge~~
Changeset comment - Added a building
changeset_imagery_used - Mapbox
Words in the changeset comment - 3

Features in changeset

Counts of primary tags for all changeset features - highway: 5, building: 20
Primary tags created, modified and deleted
Features of type node, way and relation
Do features in changeset overlap with each other

User name

Number of digits or special characters in username

Harmful changests of types manually not seen before

With the current supervised learning based classifier, we train the model on changesets labelled 👍 and 👎 Soon, a classifier will start predicting new changesets based on training on the labelled dataset. But, we have not manually 👀 all types of harmful changesets, maybe we never will as new kinds of problematic edits come along.

So, ideally in the future we will need some kind of an unsupervised classifier which is not limited by the subset of labelled samples in the dataset instead can make use of each and every changeset that comes along on OpenStreetMap.

cc: @anandthakker @batpad @geohacker

Review a random sample of highways

Ref: #69 and #80

I prepared a random sample of highway features touched to manually 👀 for identifying good and harmful highways. We then use this knowledge to make the highway classifier better.

With @amishas157's help, I created a To-Fix task with 9,533 randomly selected highways. I used the Not an error button for good highways and Fixed button for harmful highway.

To-Fix task: https://osmlab.github.io/to-fix/#/task/labellinghighwaysforgabbar

To start with I reviewed 100 highways and did not find any harmful.

cc: @anandthakker @batpad @geohacker

Increase training size for feature level classifier

Ref #43

We currently use 5,269 changesets for training our feature level classifier.
From changesets reviewed on osmcha with one feature modifications, it looks like we can potentially add upto 4,000 changesets.
This increase in the samples in the training dataset should in-turn improve the model.

Next actions

Update dataset with the additional 4,000 changesets - @bkowshik

cc: @batpad @geohacker

Add Gaussian Naive Bayes classifier for spot checking

We currently spot-check the following models:

LogisticRegression
DecisionTreeClassifier
KNeighborsClassifier
SVC
RandomForestClassifier
GradientBoostingClassifier

Let's add Gaussian Naive Bayes (GaussianNB) to the mix too.

http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html

Increasing number of changesets flagged

For the last two days that gabbar has been live on osmcha-staging, it has flagged less than 30 changesets as problematic everyday. In the real-world, there could potentially be more changesets that are harmful. So:

How can we increase the number of changesets flagged by gabbar?
How can we bring accuracy down a little so that we bring number of false negatives down?

cc: @rodowi

Weekly update from Gabbarland

17th Apr - 23rd Apr, 2017

Datasets for training and testing the model are now on S3.

Labelled/unlabelled changesets from osmcha.
Geojson version of changesets from real-changesets.
User details from osm-comments user api.
Datasets documentation: https://github.com/mapbox/gabbar/blob/master/docs/data.rst

Workflow

Model training and testing is done in one Jupyter notebook.
https://github.com/mapbox/gabbar/blob/master/notebooks/workflow.ipynb

Command line API

Package is now wired up to take a changeset ID and output predictions.
python gabbar/scripts/cli.py --changeset 47734592

Model performance metrics

Performance on both labelled and unlabelled changesets is tracked in metrics.csv
https://github.com/mapbox/gabbar/blob/master/metrics.csv
We have a work in progrss PR with a hit rate around 30%

NOTE: This is our very first weekly update! 🎉

cc: @anandthakker @geohacker @batpad

Bot to catch simple invalid capitalization on OpenStreetMap

From @planemad's post here:

Validation is a good angle to have some bots running to catch simple issues like a invalid capitalization in a tag like Highway=residential

I ran a tile-reduce script looking for invalid capitalization in the 26 primary tags below:

aerialway, aeroway, amenity, barrier, boundary, building, craft, emergency, geological,
highway, historic, landuse, leisure, man_made, military, natural, office, place, power,
public_transport, railway, route, shop, sport, tourism, waterway

I got back 3,186 features on OpenStreetMap with invalid capitalization in primary tags.
Dropbox link to line-delimited features.

I eye-balled a few from the list and the results were true positives. Some of the invalid capitalization's were: Building, Highway, etc.

Ex: The feature way!455096754 has a invalid Building tag. So, as soon the changeset 43871967 was created a bot keeping an 👁️ on the stream, corrects it to a highway and leaves a changeset discussion informing the user about the same along with some documentation links and the corrected changeset ID.

Invalid capitalization do happen often on OpenStreetMap and are corrected by other community members. Ex: For node/426859638, the Highway was corrected to highway after a month.

I love the idea letting the user who created the invalid capitalization know about the simple mistake and make an appropriate change automatically. @planemad, what are the next actions you are seeing to make this a real thing on OpenStreetMap?

Choosing a versioning scheme for packaging gabbar on PyPI

Per https://packaging.python.org/distributing/#choosing-a-versioning-scheme

Different Python projects may use different versioning schemes based on the needs of that particular project, but all of them are required to comply with the flexible public version scheme specified in PEP 440 in order to be supported in tools and libraries like pip and setuptools.

PEP 440

Here are some examples of compliant version numbers:

1.2.0.dev1  # Development release
1.2.0a1     # Alpha Release
1.2.0b1     # Beta Release
1.2.0rc1    # Release Candidate
1.2.0       # Final Release
1.2.0.post1 # Post Release
15.10       # Date based release
23          # Serial release

cc: @rodowi @sgillies

Flag changesets predicted problematic on osmcha

We had Gabbar as part of osmcha previously to predict changesets as they come in for whether they are problematic or not. Changesets predicted by Gabbar would have a label Flagged by gabbar

Changesets flagged by Gabbar on osmcha

With Gabbar now predictions now easily accessible on the gabbar-frontend, we should be getting the connection with osmcha back up and running.

As the work on the feature level classifier gets more interesting, it should be a good one to start sending in changesets that Gabbar predicts as problematic to osmcha. This will make it super-easy to consume predictions from Gabbar. 😃

cc: @batpad @geohacker

Gabbar 0.5

In preparation to release Gabbar 0.5, we will be 👀 changesets flagged as potentially problematic by the latest trained model.

2017-05-15 (Mon)

Changesets predicted harmful by model: 385
Changesets reviewed: 50
Changesets actually problematic: 2
Changesets unsure if problematic: 7

Notes

Potential bias in the predictions:
- Changesets by new users.
- Changesets with low features created. (Ex: 1 building created with 5 nodes)

Graph attribute importance scores along with top 5

In the Jupyter notebook, we currently display the 5 attributes that have the highest importance scores in a table, like below. Let's have a graph with all the attributes and their scores.

Dimensionality reduction using PCA

We currently have 115 attributes for every changeset. It is practically impossible how all these features look like. But, we could use techniques of Dimensionality reduction and project them into 2 dimensions.

We have powerful Principal component analysis (PCA) as part of scikit-learn:

http://scikit-learn.org/stable/modules/decomposition.html

Automate preparation of changesets for manual review

Per #43 (comment)

With the current workflow, every time we have a new trained model, we generate two csv files for manual review:

Fifty unlabelled changesets predicted good
Another fifty unlabelled changesets predicted problematic

Current workflow

Sort changesets by descending order of Gabbar predictions.
Select top 50 rows - changesets with prediction 1, denoting problematic.
Select bottom 50 rows; changesets with prediction of 0, denoting good.

The challenge here is that changesets are by default ordered by changeset ID, thus we don't have a way to have good variety in the results for manual 👀

Let's automate this step, so that when notebook is run, changesets for manual review are automatically generated.

Using the new osmcha API to download changeset labels

The new osmcha is here! 🎉

https://osmcha.mapbox.com/

With it comes some changes to the API, specially render_csv=True being deprecated. 😞 The documentation of the API is at the link below:

https://osmcha.mapbox.com/api-docs/

Changes

The API is paginated so, we will have to do multiple requests
The format of the csv is different and changeset_id is not the first column anymore

geometry.coordinates
geometry.type
id
properties.area
properties.check_date
properties.check_user
properties.checked
properties.comment
properties.create
properties.date
properties.delete
properties.editor
properties.harmful
properties.imagery_used
properties.is_suspect
properties.modify
properties.source
properties.uid
properties.user
type

cc: @anandthakker @batpad @geohacker

Convert binary attributes into rich numericals

Ref #43

In https://osmcha.mapbox.com/47414802/, a place=village was converted to place=town.

At present, the context we give the machine learning model about this modification is along with other attributes are:

place: 1
place_old: 1
place_modification: 1
harmful = 1

But, the model has no knowledge on what the modification was to make an effective prediction on whether the feature modification was a 👍 or a 👎. So, how about we convert the binary value representing the modification into better numerical's to aid the model to make a more informed decision.

Popularity from TagInfo

TagInfo provides values for what percentage of place features have say a city as the value. Ex: 0.43% of all place objects on OpenStreetMap are place=city. With this, the model will currently get the following attributes:

place_new: 24.92% - Percentage of place=village
place_old: 2.21% - Percentage of place=town

cc: @batpad @geohacker

Understanding validation and vandalism detection work on Wikipedia

NOTE: This is a work in progress. Posting here to start discussion around the topic

Wikimedia uses Artificial Intelligence for the following broad categories:

Vandalism detector. Use edit statistics to find correlations and predict if an edit is problematic.
Article edit recommender. Use user edit history to predict which articles could be edited by user.
Article quality prediction. To assess quality of articles on Wikipedia.

On Wikipedia there are 160k edits, 50k new articles and 1400 new editors everyday. The goal is to split the 160k edits into:

Probably OK, almost certainly not vandalism
Needs manual review, might possibly be vandalism

Themes for validation

Points of view or standpoint:
- Wikipedia is a firehose
- Bad edits must be reverted
- Minimize manual effort wasted on quality control work
- Socialize and train newcomers
Design tools for Empowerment vs a Power over.
- Empowerment: I want to hear you, so I'll make space for you to speak and listen.
- Power over: I want to set the tone of our conversation by talking first.
A flipped publication model: Publish first and review later.
Given enough eyeballs, all bugs are shallow. If we have a large enough group of people looking at something, somebody will know the tight way to solve the problem.

Welcoming newcomers

More newcomers is a major Wikimedia goal and new spaces have been developed to support newcomers. Quality control in Wikipedia is being designed with newcomer socialization in mind so that newcomers (especially those who don't conform) are not marginalized and good-faith newcomers are retained. Although anonymous edits on Wikipedia are twice as likely to be vandalism, 90% of anonymous edits are good.

From this Slate article:

Most people first get involved with Wikipedia—one of the largest social movements in history—by making some minor corrections or starting a small article that is missing. If their contributions get deleted, especially if there is no sufficient explanation why, they are likely to quit. It is quite destructive to the community’s long-term survival, as Wikipedia has struggled for quite a while with editor retention.

Popular validation tools

There are around 20 volunteer developed tools, 3 major Wikimedia product initiatives. Some popular ones are:

Objective Revision Evaluation Service (ORES) is intended to provide a generalized service to support quality control and curation work in all wikis.
- Edit quality models for predicting whether or not an edit cause damage, was saved in good-faith or will eventually be reverted.
- Article quality models that helps gauge progress and identify missed opportunities (popular articles that are low quality). Wikipedia 1.0 assessment
Huggle, a diff browser intended for dealing with vandalism and other un-constructive edits on Wikimedia projects.
STiki, a tool used to detect and revert vandalism or other un-constructive edits on Wikipedia, available to trusted users.
User:ClueBot NG, an anti-vandal bot that tries to detect and revert vandalism quickly and automatically. A 0.1% false-positive rate and able to detect 40% of all vandalism.

Basic web interface for ORES at https://ores.wikimedia.org/ui Some of the features used to aid classification of a revision as problematic or not are: Is user anonymous, number of characters/words added, modified and removed, number of repeated characters and bad words added. Prediction scores for a problematic revision look like below:

https://ores.wmflabs.org/scores/enwiki/damaging/642215410

{
  "642215410": {
    "prediction": true,
    "probability": {
      "false": 0.11271979528262599,
      "true": 0.887280204717374
    }
  }
}

There has been quite a lot of research in this field evident from the number of results on Google scholar about Wikipedia vandalism detection.

Hyperlinks

Reading

Videos

Engineering at the Intersection of Productive Efficiency, Ideology, and Ethical AI in Wikipedia

cc: OpenStreetMap Community

Complete community profile checklist

Per https://github.com/mapbox/gabbar/community

Feature level classifier in Gabbar

Gabbar has traditional been a changeset level classifier. Which means, given a changeset ID, Gabbar extracts features at the changeset level to predict if the changeset is harmful or not. Let's try a feature level classifier as part of Gabbar.

Why feature level classifier?

On osmcha, users review changesets and labelled them as either good or harmful. This is a little to binary for a machine learning model. Question arises that when a changeset is labelled harmful, does that mean all features touched in the changeset are harmful?
We have accurate information at the feature level on why a feature modification is a 👍 or a 👎 which gets generalized at the changeset level.

Feature level dataset

Thanks to osmcha's filters, we can filter out changesets reviewed with the maximum number of features created, modified and deleted being one or less.

Looks like there are 14,314 changesets. Yay!!!

One feature	Number of changesets reviewed	Harmful changesets
Created	3,333	413
Modified	9,727	2,264
Deleted	321	20

cc: @batpad

Feature selection

From https://en.wikipedia.org/wiki/Feature_selection

In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construction.

Feature selection techniques are used for three reasons:

Simplification of models to make them easier to interpret by researchers/users

Shorter training times,

Enhanced generalization by reducing overfitting

Model learning a pattern incorrectly during training

Ref: #43

There were 5 changesets in the training dataset, that the model was not able to learn correctly. They were labelled to be 👍 on osmcha but somehow the model was predicting them to be a 👎

	Predicted good	Predicted harmful
Labelled good	4850	5
Labelled harmful	0	437

Curious to understand why, I 👀 the results myself. 4 out of the 5 had a pattern. In each of them, a natural=water feature got a new property in water=marsh.

All attributes except the following are same for all these samples.

changeset_bbox_area
feature_area
feature_area_old

Next actions

Why is the model learning this incorrect behavior?
How do we re-train the model to predict such changesets as 👍

cc: @anandthakker @geohacker @batpad

Make this repository public

@rodowi, you had brought this in one of our voice conversations.

Current setup for osmcha

Copy over changeset_to_data and predict utility functions
Copy over the trained model, autovandal.pkl

Benefits

autovandal can we added to requirements.txt in osmcha
We can version the model to measure progress over time
We only need to update the package version of autovandal in osmcha instead of copying over everything.

cc: @batpad @geohacker

Prototype an anomaly detection model for highways

Ref: #80 and #69

We all know labelled data is gold in machine learning land. But, in the context of OpenStreetMap and osmcha, there are two things:

1. Labelled harmful highways

On osmcha, labelling happens at changeset level. A changeset is either good or harmful. But, there are scenarios where not all features of a changeset are harmful. So, we should not assume all features of harmful changeset are harmful. In Gabbar, we worked with changesets where one feature was touched thus, if the changeset was good, the only feature was good and if the changeset is harmful, the only feature was harmful as there was only one feature in the changeset.

This worked ok for a generic classifier, but in the highway classifier, the size of the dataset is too low. For example, the latest highway classier was trained on 2217 good highways and a mere 55 harmful highways. Yes, the number of harmful highways is low. This means, supervised learning algorithms might not be fed enough to be strong and healthy.

2. Labelled good highways

But, we have an abundance (comparatively) of labelled highway that are good. The 2217 changesets from ^ are there but there is even more. When a changeset is labelled good, it is safe to assume all features in the changeset are good. Which in-turn means, all features in the changeset are good too including the highway features. Yay!

There are 50,000+ changesets labelled on osmcha and assuming every changeset has atleat one highway as highway are one among the frequently edited features on OpenStreetMap, we could potentially have around 50,000+ labelled good highways. This might be an interesting scenario to try anomaly detection models.

From https://en.wikipedia.org/wiki/Anomaly_detection

anomaly detection (also outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset.

Another potentially big advantage of anomaly detection models is that they flag when things are different than expected. This means, we are now not limited by the different types of harmful edits we have seen or given the model for training but in a way are ready for new and unknown types of anomalies. One important thing about anomaly detection is these models don't tell you whether a changeset is good or bad, they tell you if is something expected or something different.

cc: @anandthakker @geohacker @batpad

Datasets: Training, Validation and Testing

1. Training

Labelled changesets from osmcha between January, 2017 to April, 2017
Model will initially be trained on 20% of this dataset called the sample
Before publishing, model will be trained on 100% of this dataset

2. Validation

To estimate how well your model has been trained
Using labelled changesets from osmcha from May, 2017

3. Testing

All changesets from OpenStreetMap on 1st May, 2017

Model baseline performance

A model baseline will help in understanding and measuring progress we are making with the model in terms of its performance. scikit, the package we use in Gabbar has a model just to do that:

sklearn.dummy.DummyClassifier

I trained the DummyClassifier on the training dataset and got predictions on the validation dataset. Baselines look close to what a model generating random predictions would give.

Confusion matrix

	Predicted good	Predicted harmful
Labelled good	2086	247
Labelled harmful	223	27

Classification report

                precision   recall      f1-score    support

0.0             0.90        0.89        0.90        2333
1.0             0.10        0.11        0.10        250

avg / total     0.83        0.82        0.82        2583

roc_auc

Score: 0.49 (0.02) - mean(std dev)

These look very close to what I was expecting. No next actions.

cc: @anandthakker @batpad @geohacker

Explore semi-supervised learning

NOTE: This is not an immediate priority. Opening this ticket to track the idea.

Per http://scikit-learn.org/stable/modules/label_propagation.html

Semi-supervised learning is a situation in which in your training data some of the samples are not labeled. These algorithms can perform well when we have a very small amount of labeled points and a large amount of unlabeled points.

Using reverted changesets for model training

Per text with @batpad,

Changeset comment has `revert`

There are a total of 13,125 changesets on osmcha with revert in the changeset comment. Interestingly, 2,505 (20%) changesets are one feature modification changesets which is what we use in the latest version of Gabbar.

One feature changesets with revert in changeset comment

Assuming, mappers revert a problematic or wrong feature in these one feature modification changesets, this could be an additional dataset we could make use of for the current iteration of the feature level classifier of Gabbar. I manually 👀 a couple of these changesets and they are definitely want we want to catch with Gabbar.

https://osmcha.mapbox.com/49465923

https://osmcha.mapbox.com/49442894

Changesets from revert user accounts

Mappers and DWG sometimes maintain a separate account for reverts. Changesets from these accounts will be interesting to look at as well. Ex:

https://www.openstreetmap.org/user/SomeoneElse_Revert/history

cc: @anandthakker @geohacker

Regression test suite for automated testing

Per http://machinelearningmastery.com/deploy-machine-learning-model-to-production/

Develop Automated Tests For Your Model

Write regression tests for your model.

Collect or contribute a small sample of data on which to make predictions.

Use the production algorithm code and configuration to make predictions.

Confirm the results are expected in the test.

We could start with a 2x manually verified dump of 100 changesets that contains:

50 changesets that are good, and
50 changesets that are problematic

cc: @anandthakker @geohacker @batpad

Effect of attributes on the feature level classifier

Similar to work on training size, we have questions on effect of number of attributes on model:

Does the model have enough attributes
What attributes contribute how much to model metrics
Can less attributes be better in the long term

Workflow

Get a list of all attributes available for training
Increase the training attributes appending one at a time from the attributes list
Train a model with these attributes from the training dataset
Get predictions from the model on this subset of attributes from the validation dataset
Store model metrics on the validation dataset and plot

Notes

There are interesting dips in metrics when the following attributes are added to the list of attributes:
- user_changesets_with_discussions_count
- old_user_name_special_characters_count
- feature_version
- feature_has_website_old
- iD
- Vespucci
The metrics somewhat reach their maximum around the 20 attributes mark except for the occasional dips
I am not sure what else to read out off of this graph.

cc: @anandthakker @batpad @geohacker

Translating names to English for validation using external APIs

NOTE: Posting here to document a potential idea.

In changeset 48269805, there was one feature that was modified:

The name of the marketplace was modified from English to Chinese

I gave Google Translate API a try to translate the new name 亚庇**巴刹, back to English. The result was Kota Kinabalu. These two words match with the English name in the previous version of the feature, Pasar Kota Kinabalu Central.

'use strict';

const Translate = require('@google-cloud/translate');
const projectId = 'Insert project ID';
const translateClient = Translate({
    projectId: projectId
});

translateClient.translate('센트럴마켓', 'en')
.then((results) => {
    const translation = results[0];

    console.log(`Text: ${text}`);
    console.log(`Translation: ${translation}`);
})
.catch((err) => {
    console.error('ERROR:', err);
});

Explore GradientBoosting algorithm used by Wikimedia's ORES

From https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service

The Objective Revision Evaluation Service (ORES) is a web service that provides machine learning as a service for Wikimedia Projects. The system is designed to help automate critical wiki-work -- for example, vandalism detection and removal.

It looks like Wikimedia's Objective Revision Evaluation Service (ORES) makes use of the GradientBoosting algorithm. I am curious about:

Why the choice of GradientBoosting
How would it be useful in the context of gabbar

Hyperlinks

https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service/damaging

Effect of cross validation parameter on model metrics

In cross-validation, the cv parameter determines the cross-validation splitting strategy. Ex: If cv=3, it is a 3-fold cross-validation. I was curious to see the impact of the value of cv on the model metrics.

Workflow

Load up a trained model
Vary cv from 1 to 320 and run cross validation
On each run, record the model metrics in precision, recall and f1 score

The following is the graph I got.

Questions

What should be a good value of cv to use to consistently measure model performance?

cc: @anandthakker @batpad @geohacker

Detect changesets that are very likely to have problems

The 2 Parts

There are two parts to the problem:

High precision
- High percentage of correct vs incorrect predictions or fewer false positives.
- Ex: Predictions are right about 80% of the time but it finds less than 20% of all the problematic edits.
High recall
- Find all or most of the problematic edits.
- Ex: Finds 80% of all problematic edits but is right only 20% of the time.

Ideally, we want a model that has both high precision and high recall. 😇 But, practically we can only hit one at a time. And once we hit one well, we work on the other problem to hit that out too.

With this ticket, I would like to propose:

Building a model that can predict changesets that are VERY LIKELY TO HAVE PROBLEMS.
Tackle the first problem of HIGH PRECISION
Model is right about 80% of the time but finds less than 20% of all the problematic changesets.

cc: @anandthakker @geohacker @batpad

Prototyping Gabbar for highway features

One of the popular problems in machine learning is dogs vs cats; given a picture predict whether the picture is of a dog or a cat. Coming from this initial experience about machine learning, I kept thinking the problem of classification of changesets as good or problematic is something similar. But, today I did an exercise where I wanted to identify one attribute about the changeset that makes it good or problematic. I started with:

https://osmcha.mapbox.com/49563062/
highway=residential is modified to highway=unclassified

The following questions came to mind

What could be the source of knowledge to modify?
Isn't residential better than unclassified; I mean something is better than nothing right?
At version 15, this is quite a mature feature. So, is that alright?
What is the length of the highway; smaller should be residential and longer unclassified?
Why is source=google maps Really?

From https://wiki.openstreetmap.org/wiki/Key:highway

highway=unclassified

The least most important through roads in a country's system – i.e. minor roads of a lower classification than tertiary, but which serve a purpose other than access to properties. Often link villages and hamlets.

highway=residential

Roads which serve as an access to housing, without function of connecting settlements.

From https://osmlab.github.io/osm-deep-history/#/way/103217436

The feature has mostly been highway=unclassified since creation in 2011.

Looking deeper into other changesets where a highway=residential gets modified into highway=unclassified, I find this user, Порфирий who has lots of changesets with the same behavior. Interestingly, the user who added highway=residential is Порфирий too.

https://www.openstreetmap.org/user/Порфирий/history

Eureka!

When a highway modification has so many questions to answer and attributes to look at, what will the scale be when we look at all 26 primary tags together? What about features that don't have any primary tags? Too many questions! Too many attributes! Right?

This does not look a traditional cats vs dogs. It is a little something else.
How about we try something different? How about we build one machine learning model for each object type?
How would it look when there is a model trained on highway's to classify whether the new/modified highway is a 👍 or a 👎
Another trained on buildings, another in water bodies, etc and each knew what a good highway looks like and a problematic highway looks like?
Is this it?

cc: @anandthakker @geohacker @batpad

Metrics for measuring Gabbar performance

NOTE: The numbers below are for the purpose of illustration only.

Everyday

Total number of changesets that day: 25,000
Number of changesets flagged as problematic by Gabbar: 500 (2%)
Number of changesets manually reviewed on osmcha: 250 (1%)
Confusion matrix between manually reviewed and flagged as problematic changesets:

	Predicted good	Predicted harmful
Labelled Good	200	5
Labelled Harmful	10	35

On the validation set (Changesets labelled on omscha in May, 2017)

Total number of changesets: 5,000
Confusion matrix between manually reviewed and flagged as problematic changesets:

	Predicted good	Predicted harmful
Labelled Good	4500	50
Labelled Harmful	100	350

cc: @batpad

Reviewing labelled and predicted changesets from feature classifier

With results from #43, I plan to review 15 changesets in each of the following categories:

Labelled problematic and predicted problematic
Labelled problematic and predicted good
Labelled good and predicted good
Labelled good and predicted problematic

mapbox / gabbar Goto Github PK

gabbar's Introduction

gabbar

Installation

Setup

Prediction

Testing

Hyperlinks

gabbar's People

Contributors

Stargazers

Watchers

Forkers

gabbar's Issues

Before tuning hyper-parameters

After tuning hyper-parameters

Neighbors for an inlier (good highway)

Neighbors for an outlier (harmful highway)

Notes

Changeset meta data

Features in changeset

User name

Next actions

17th Apr - 23rd Apr, 2017

Datasets for training and testing the model are now on S3.

Workflow

Command line API

Model performance metrics

2017-05-15 (Mon)

Notes

Current workflow

Changes

Popularity from TagInfo

Themes for validation

Welcoming newcomers

Popular validation tools

Hyperlinks

Why feature level classifier?

Feature level dataset

Next actions

Current setup for osmcha

Benefits

1. Labelled harmful highways

2. Labelled good highways

1. Training

2. Validation

3. Testing

Confusion matrix

Classification report

roc_auc

Changeset comment has revert

Changesets from revert user accounts

Develop Automated Tests For Your Model

Workflow

Notes

Hyperlinks

Workflow

Questions

The 2 Parts

Eureka!

Everyday

On the validation set (Changesets labelled on omscha in May, 2017)

Recommend Projects

Recommend Topics

Recommend Org

Changeset comment has `revert`