Giter Site home page Giter Site logo

ban-geocode's Introduction

👋 README

👋 Bienvenue

Bienvenue au sein de l'équipe Etalab. Tu trouveras dans cet espace les informations nécessaires à ton arrivée. C'est un document ouvert et collaboratif. N'hésite pas à proposer des améliorations !

Qu'est-ce-qu'Etalab ?

Tu es membre d’Etalab

Cet espace a été conçu pour toi. Depuis GitBook, tu peux utiliser la barre de recherche en haut à droite de ton écran pour faciliter ta navigation. Si tu ne trouves pas la ressource que tu cherches, tu peux poser la question dans la chaîne ~etalab-privé de notre espace Mattermost. Si tu repères une erreur, tu peux la corriger et contribuer à ce guide.

La documentation interne

Toute la documentation d'Etalab n'est pas publique. Si tu as accès au GitBook, tu peux consulter la documentation interne.

En attendant, tu peux voir ce que signifie

Contribuer à cette documentation

Pour éditer ce guide public tu peux apporter directement des modifications aux fichiers .md sur le dépôt etalab/etalab ou accéder à l'éditeur GitBook ici.

Le contenu de ce dépôt est publié sous licence Ouverte 2.0.

ban-geocode's People

Contributors

yohanboniface avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

mr1azl lcorbasson

ban-geocode's Issues

Search for "Le Bourg"

It's something like a common usage to search for "Le Bourg". Eg.:

Le Bourg 61320 Saint-Martin-l'Aiguillon
Le Bourg 61240 Saint-Germain-de-Clairefeuille
Le Bourg 61440 Saint-André-de-Messei
Le Bourg 61700 Rouellé
Le Bourg 09460 Quérigut
Le Bourg 09600 Régat
Le Bourg 43170 Venteuges
Le Bourg 64410 Cabidos
Le Bourg 64190 Narp
Le Bourg 64120 Juxue

Sometimes it's really a hamlet called "Le Bourg", but often it only means "in the main suburb of the village", roughly.

Hauteluce, both village and commune

73132B063N||Hauteluce|73620|Hauteluce|OSM|45.750645|6.585376|Savoie|Rhône-Alpes|village
73132|||73620|Hauteluce|IGN|45.750573|6.584800|Savoie|Rhône-Alpes|commune

"rue Gérard Monod" not in the full dataset

We seem to have only housenumbers of the street:

edoardo:~/D/g/bano ag "Cannes" bano-full.csv | ag "Monod" 
594339:060291490Z-11|11|Rue Gerard Monod|06150|Cannes|CAD|43.550913|7.023104|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number
594340:060291490Z-12|12|Rue Gerard Monod|06150|Cannes|CAD|43.550920|7.022618|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number
594341:060291490Z-13|13|Rue Gerard Monod|06150|Cannes|CAD|43.550875|7.023286|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number
594342:060291490Z-15|15|Rue Gerard Monod|06150|Cannes|CAD|43.550850|7.023407|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number
594343:060291490Z-17|17|Rue Gerard Monod|06150|Cannes|CAD|43.550803|7.023648|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number
594344:060291490Z-2|2|Rue Gerard Monod|06150|Cannes|CAD|43.550954|7.022077|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number
594345:060291490Z-32|32|Rue Gerard Monod|06150|Cannes|CAD|43.550753|7.022765|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number
594346:060291490Z-3|3|Rue Gerard Monod|06150|Cannes|CAD|43.551076|7.022328|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number
594347:060291490Z-4|4|Rue Gerard Monod|06150|Cannes|CAD|43.551025|7.022108|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number
594348:060291490Z-5|5|Rue Gerard Monod|06150|Cannes|CAD|43.551045|7.022480|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number
594349:060291490Z-6|6|Rue Gerard Monod|06150|Cannes|CAD|43.550935|7.022349|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number
594350:060291490Z-7|7|Rue Gerard Monod|06150|Cannes|CAD|43.551019|7.022605|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number
594351:060291490Z-8|8|Rue Gerard Monod|06150|Cannes|CAD|43.550954|7.022359|Alpes-Maritimes|Provence-Alpes-Côte d'Azur|number

Route de Bracieux, Chambord, not in the dataset (but housenumbers are)

7381134:410340020R-10|10|Route de Bracieux|41250|Chambord|CAD|47.611867|1.518545|Loir-et-Cher|Centre|number
7381135:410340020R-11|11|Route de Bracieux|41250|Chambord|CAD|47.611973|1.518691|Loir-et-Cher|Centre|number
7381136:410340020R-1|1|Route de Bracieux|41250|Chambord|CAD|47.611885|1.518991|Loir-et-Cher|Centre|number
7381137:410340020R-12|12|Route de Bracieux|41250|Chambord|CAD|47.612053|1.518886|Loir-et-Cher|Centre|number
7381138:410340020R-13|13|Route de Bracieux|41250|Chambord|CAD|47.611894|1.519056|Loir-et-Cher|Centre|number
7381139:410340020R-13B|13BIS|Route de Bracieux|41250|Chambord|CAD|47.611914|1.519037|Loir-et-Cher|Centre|number
7381140:410340020R-2|2|Route de Bracieux|41250|Chambord|CAD|47.611811|1.518810|Loir-et-Cher|Centre|number
7381141:410340020R-3|3|Route de Bracieux|41250|Chambord|CAD|47.611786|1.518776|Loir-et-Cher|Centre|number
7381142:410340020R-4|4|Route de Bracieux|41250|Chambord|CAD|47.611704|1.518579|Loir-et-Cher|Centre|number
7381143:410340020R-48|48|Route de Bracieux|41250|Chambord|CAD|47.612765|1.518971|Loir-et-Cher|Centre|number
7381144:410340020R-50|50|Route de Bracieux|41250|Chambord|CAD|47.612284|1.519929|Loir-et-Cher|Centre|number
7381145:410340020R-51|51|Route de Bracieux|41250|Chambord|CAD|47.612144|1.519884|Loir-et-Cher|Centre|number
7381146:410340020R-52|52|Route de Bracieux|41250|Chambord|CAD|47.611985|1.519908|Loir-et-Cher|Centre|number
7381147:410340020R-53|53|Route de Bracieux|41250|Chambord|CAD|47.611831|1.519901|Loir-et-Cher|Centre|number
7381148:410340020R-54|54|Route de Bracieux|41250|Chambord|CAD|47.611662|1.520081|Loir-et-Cher|Centre|number
7381149:410340020R-54A|54A|Route de Bracieux|41250|Chambord|CAD|47.611144|1.520137|Loir-et-Cher|Centre|number
7381150:410340020R-55|55|Route de Bracieux|41250|Chambord|CAD|47.609700|1.520501|Loir-et-Cher|Centre|number
7381151:410340020R-5|5|Route de Bracieux|41250|Chambord|CAD|47.611664|1.518571|Loir-et-Cher|Centre|number
7381152:410340020R-56|56|Route de Bracieux|41250|Chambord|CAD|47.609654|1.520706|Loir-et-Cher|Centre|number
7381153:410340020R-58|58|Route de Bracieux|41250|Chambord|CAD|47.612500|1.518846|Loir-et-Cher|Centre|number
7381154:410340020R-59|59|Route de Bracieux|41250|Chambord|CAD|47.612635|1.518795|Loir-et-Cher|Centre|number
7381155:410340020R-61|61|Route de Bracieux|41250|Chambord|CAD|47.612797|1.518364|Loir-et-Cher|Centre|number
7381156:410340020R-62|62|Route de Bracieux|41250|Chambord|CAD|47.612745|1.518567|Loir-et-Cher|Centre|number
7381157:410340020R-63|63|Route de Bracieux|41250|Chambord|CAD|47.612911|1.518282|Loir-et-Cher|Centre|number
7381158:410340020R-6|6|Route de Bracieux|41250|Chambord|CAD|47.611642|1.518299|Loir-et-Cher|Centre|number
7381159:410340020R-7|7|Route de Bracieux|41250|Chambord|CAD|47.611656|1.518245|Loir-et-Cher|Centre|number
7381160:410340020R-80|80|Route de Bracieux|41250|Chambord|CAD|47.583369|1.530664|Loir-et-Cher|Centre|number
7381161:410340020R-8|8|Route de Bracieux|41250|Chambord|CAD|47.611763|1.518255|Loir-et-Cher|Centre|number
7381162:410340020R-95|95|Route de Bracieux|41250|Chambord|CAD|47.612500|1.518859|Loir-et-Cher|Centre|number
7381163:410340020R-9|9|Route de Bracieux|41250|Chambord|CAD|47.611825|1.518407|Loir-et-Cher|Centre|number

Départements numbers in search create noise

"Place de la République, Arles (13)" will not find anything with "match all", so it will fallback on everything unless one term, and then all the "13 place de la République" of France are candidates and raise up the wanted result.

"Place de la République, Arles" does find the expected result, instead.

Petite remarque par rapport au code

Bonjour,

N'étant pas expert en Python, ma remarque est à prendre avec des pincettes, mais chez moi j'ai du modifier 2 lignes pour faire fonctionner le projet :

Dans app.py ligne 30 --> es = elasticsearch.Elasticsearch([{u'host': u'127.0.0.1', u'port': 9200}]) sinon je me prennais une erreur

Ainsi que dans es.py ligne 128 --> reader = csv.DictReader(f, fieldnames=FIELDS, delimiter=',') vu que le csv est délimité par des ',' et non des '|'

Dernière remarque, dans le readme vous dites que l'import du csv dans elasticsearch devrait prendre 10mn, ça a pris 2h40 chez moi, avec un core 2 duo 2.5 ghz, 8go de ram, et un SSD, je ne sais pas si il y a des paramètres particuliers a elasticsearch qui permettent d'améliorer les perfs ?

Voilà

Reduce false positives in match_all=False strategy

Basically, I've the intuition that when we issue a search, we can reduce false positives by adding a filter that limits results to documents where (it's an "or" inside a "must", so at least one of those assertion should be true):

  • OR 100% of the search terms match in the city or postcode fields ("Paris", "Paris 75000")
  • OR 100% of the search terms match in the housenumber or way_label ("24 rue", "boulevard", to handle the autocomplete scenario)
  • OR at least one term match in the street name field

So doing we may avoid false positive when in match_all=False, i.e. the scenario where, for example, the housenumber ("11") matchs, the way label matches ("rue"), the city matches the street name and the real street name itself isn't matched at all.

I've played around with something like this:

    city_or_name = F('or', [
        F({"query": {
            "multi_match": {
                "query": q,
                "type": "best_fields",
                "analyzer": "search_stringanalyzer",
                "minimum_should_match": '100%',
                "fuzziness": 1,
                "prefix_length": 2,
                "fields": ['city', 'postcode']
            }
        }}),
        F({"query": {
            "match": {
                "name.keywords": {
                    "query": q,
                    "analyzer": "search_stringanalyzer",
                    "minimum_should_match": 1,
                    "fuzziness": 1,
                    "prefix_length": 2
                }
            }
        }}),
        F({"query": {
            "match": {
                "street.keywords": {
                    "query": q,
                    "analyzer": "search_stringanalyzer",
                    "minimum_should_match": 1,
                    "fuzziness": 1,
                    "prefix_length": 2
                }
            }
        }}),
        F({"query": {
            "multi_match": {
                "query": q,
                "type": "best_fields",
                "analyzer": "search_stringanalyzer",
                "minimum_should_match": '100%',
                "fuzziness": 1,
                "prefix_length": 2,
                "fields": ['housenumber', 'way_label']
            }
        }}),
    ])
    s = s.filter(city_or_name)

But, and it's not a surprise, given the number of added queries, this dramatically increase the response time (something like x5).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.