Giter Site home page Giter Site logo

address-formatting's Introduction

address formatting

Overview

This project contains templates and test cases for address formats used in territories around the world. The templates can then be processed in any programming language (see below for list of processors).

Build Status

Build Status

An example:

Given a set of address parts like

 house_number:  17
 road:          Rue du Médecin-Colonel Calbairac
 neighbourhood: Lafourguette
 suburb:        Toulouse Ouest
 postcode:      31000
 city:          Toulouse
 county:        Toulouse
 state:         Midi-Pyrénées
 country:       France
 country_code:  FR

we want to write logic to compile an address in the format consumers expect

17 Rue du Médecin-Colonel Calbairac
31000 Toulouse
France

Why would you want to do this?

The intended use case is database or geocoding systems (forward, reverse, autocomplete) where we know both the country of the address and the language of the user/reader. The address is displayed to a consumer (for example in an app) and not used to print on an envelope for actual postal delivery. We use it to format output from the OpenCage Geocoding API.

Which addresses are we talking about?

We have to deal with

  • incomplete data
  • anything with a name (peaks, bridges, bus stops)

Unlike physical post (office) mail we don't have to deal with

  • apartment/flat number, floor numbers
  • PO boxes
  • translating the language of the (destination) address. Whatever language is input is output.

Processing logic

Our goal with this repository is a series of (programming) language independent templates. Those templates can then be processed by whatever software you like.

There are open-source implementations in

We would love more language implementations. The more people who use the templates, the more likely bugs will be reported. If you write a processor, please submit a pull request adding it to the list. Thanks.

International coverage

As of March 2024 coverage is:

We are aware of 251 territories
We have at least one test for 251 (100%) territories
We have rules for 251 (100%) territories
0 (0%) territories have neither rules nor tests

This output is generated by bin/coverage.pl

We need more language specific abbreviations. Please see conf/abbreviations. Pull requests gladly received.

A detailed breakdown of test and configuration coverage can be found by running bin/coverage.pl -d. A list of all known territories is in conf/country_codes.yaml

Please note: the list is simple all officially assigned ISO 3166-1 alpha-2 codes, and is not a political statement on whether or not these territories are or are not or should or should not be political states.

File format

The files are in YAML format. The templates are written in Mustache. Both formats are human readable, strict, solve escaping and support comments. YAML allows references (called "ankers") to avoid copy&paste, Mustache allows sub-templates (called "partials").

How to add your country/territory

  1. edit the .yaml testcase for the country/territory in testcases/countries. The file names correspond to the appropriate ISO 3166-1 alpha-2 code - see conf/country_codes.yaml
  • a good way to get sample data is:
    • find an addressed location (house, business, etc) in your target territory in OpenStreetMap
    • get the coordinates (lat, long) of the location
    • put the coordinates into the OpenCage Geocoding API demo page
    • look at the resulting JSON in the Raw Response tab
  1. edit conf/countries/worldwide.yaml
  • Possibly your country/territory uses an existing generic format as defined at the top of the file. If so, great, just map you country_code to the generic template. You may still want to add clean up code (see the entry for DE as an example).
  • If not you need to define a new generic rule set
    • possibly you will need to define new state/region mappings in conf/state_codes.yaml
  1. to test you will now need to process the .yaml test via a processer (see above) and ensure the input leads to the desired output.

If in doubt, please get in touch by submitting an issue.

Formatting rules

Currently we support the following formatting rules:

  • replace: regex that operates on the input values, useful for removing bureaucratic cruft like "London Borough of ". Note if you define the regex starting with format X=, for example city= it should operate only on values with that key
  • postformat_replace: regex that operates on the final output
  • add_component: with a value of the form component=XXXX
  • change_country: change the country value of the input, useful for dependent territories. Can include a substitution like $state so that that component value is then inserted into the new country value. See testcases/countries/sh.yaml for an example.
  • use_country: use the formating configuration of another country, useful for dependent territories to avoid duplicating configuration

The future

More tests! For every rule about addresses there are exceptions and edge cases to consider. More test cases are always needed.

Planned features:

  • basic error checking, for example ignore things which obviously can not be postcodes
  • define rules for postcode format specifically

We welcome your pull requests. Together we can address the world!

License

This project is licensed under the MIT License - see the LICENSE.txt file for details

Additional resources

If you are working with addresses you may need lists of random addresses/postcodes/coordinates (either in general or for specific countries) for testing.

Further reading on the challenge of address

Here's our blog post anouncing this project and the motivations behind it.

You may enjoy Michael Tandy's Falsehoods Programmers Believe about Addresses.

If it's actual address data you're after, check out OpenStreetMap and OpenAddresses.

If you want to turn longitude, latitude into well formatted addresses or placenames, well that's what a geocoder does. Check out ours: OpenCage Geocoder.

If all this convinces you that address are evil, please check out what3words which allows you to dispense with them entirely.

Who is OpenCage GmbH?

We run a worldwide geocoding API and geosearch service based on open data. Learn more about us.

We also organize Geomob, a series of regular meetups for location based service creators, where we do our best to highlight geoinnovation. If you like geo stuff, you will probably enjoy the Geomob podcast.

address-formatting's People

Contributors

albarrentine avatar antoine-de avatar atom-system avatar baumerdev avatar bb avatar ben-willis avatar danstowell avatar darksmo avatar dkuku avatar flimm avatar freyfogle avatar garetjax avatar gy-mate avatar ivan-mak avatar jakobmiksch avatar jirkachadima avatar kimryan avatar ldvc avatar mar-v-in avatar mrailton avatar mtmail avatar olivier5741 avatar pi-cla avatar rkoeze avatar sarahnazzari avatar sergiogomez avatar stephangeorg avatar timonmasberg avatar woheller69 avatar zverik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

address-formatting's Issues

Precendence of component aliases

I'm trying to write a formatter that passes the test cases but I've run in to a problem with using component aliases.

In this test case we expect the "city_district" to be used as the neighbourhood ahead of "suburb" and substituted in to the formatted address:

description: Supermarket, 33.3559,44.4015
components:
    city: Baghdad
    city_district: Rusafa
    country: Iraq
    country_code: iq
    county: Al Resafa
    postcode: 222
    road: A86/N11/D383
    state: Baghdad
    suburb: Mustansiriya
    supermarket: al mustansriya Central Market
expected: |
    al mustansriya Central Market
    Rusafa
    A86/N11/D383
    Baghdad
    222
    Iraq

https://github.com/OpenCageData/address-formatting/blob/master/testcases/countries/iq.yaml#L3

However in this case we expect "road_reference" instead of "road_reference_intl" to be used for the road:

description: road_reference, 48.406168,2.55452
components:
            country : France
            country_code : fr
            county : Fontainebleau
            postcode : 77630
            road_reference : A 6
            road_reference_intl : E 15
            state : Ile-de-France
            village : Arbonne-la-Forêt
expected:  | 
    A 6
    77630 Arbonne-la-Forêt
    France

https://github.com/OpenCageData/address-formatting/blob/master/testcases/countries/fr.yaml#L170

When looking at the components yaml it doesn't look like order is used (city_district comes below suburb under neighbourhood but road_reference comes above road_reference_intl under road):

name: road
aliases:
    - footway
    - street
    - street_name
    - residential
    - path
    - pedestrian
    - road_reference
    - road_reference_intl    
---
name: neighbourhood
aliases:
    - suburb
    - city_district
    - district
    - quarter
    - houses
    - subdivision

https://github.com/OpenCageData/address-formatting/blob/master/conf/components.yaml#L11

So what order of precedence for component aliases should I use to pass both these test cases?

possible issue with Italian address format as locally expected

As you can see from this example search on the Italian white pages (phone book) website, there's usually no comma between the street name and the house number, and the province code is usually written between (round) brackets.

So for ex: Via Garibaldi 123, 10123 Torino (TO)
(I totally made up the address and the zip code, that's just to show the format)

in some cases there's even no house number so it's 's/n' or 's.n.c.' => "senza numero civico" (means 'no house number')
For ex: Strada Cantonale s/n, 20222 Rozzano (MI)
(again, I totally made up the address and the zip code just to show the format)

deal better with Clerkenwell

We need a better solution when query is something like "Clerkenwell", ie a neighbourhood that doesn't appear in postal address. Right now it's getting turned into "London, United Kingdom" which is nonsense

UAE test address in Dubai

The following coordinate 25.0299178,55.1872123 would be expected by a local resident I know to include the words International Media Production Zone within the opencage formatted address. Right now opencage returns Dubai, United Arab Emirates for the formatted address.

UAE seems to use the generic5 address template (https://github.com/OpenCageData/address-formatting/blob/master/conf/countries/worldwide.yaml#L37) which is not making use of the suburb field, in which the data International Media Production Zone can be found. That data is typically visible on maps like OSM and others.

I don't know how generalisable this test case is, but maybe it's helpful.

Tests for {{first}} templates with literal components

I checked out the Javascript implementation for address formatting and stumbled upon an issue. The {{#first}} lambda splits the text inside the {{#first}}...{{/first}} block at the || signs and then uses the first component, that's not empty. But this would break, as soon as there is one block, that uses a literal (I think, also other implementations have the same issue).

So lets do an example. Lets see the address format for the Philippines:

PH:
    address_template: |
        {{{attention}}}
        {{{house}}}
        {{{house_number}}} {{{road}}}, {{#first}} {{{suburb}}}, || {{{city_district}}}, || {{{neighbourhood}}}, {{/first}} {{#first}} {{{city}}} || {{{town}}} || {{{village}}} || {{{hamlet}}} || {{{suburb}}} || {{{state_district}}} {{/first}}
        {{{postcode}}} {{#first}} {{{municipality}}} {{{region}}} {{{state}}} {{/first}}
        {{{country}}}

When we now use the last test case from the Philippines test cases and exchange the suburb with city_district, I would expect the same outcome.

description: address 
components:
    house_number: 121 
    road: Epifanio Delos Santos Ave.
    city: Mandaluyong
    postcode: 1550
    city_district: Wack-wack Greenhills
    county: Fifth District
    country: Philippines
    country_code: PH
    attention: Mr. Juan Maliksi
    state: Metro Manila
expected:  |
    Mr. Juan Maliksi
    121 Epifanio Delos Santos Ave., Wack-wack Greenhills, Mandaluyong
    1550 Metro Manila
    Philippines

But the result is

Mr. Juan Maliksi
121 Epifanio Delos Santos Ave., Mandaluyong
1550 Metro Manila
Philippines

Now my question is, did I maybe understand the first block wrong (I couldn't find a proper definition in the repository), or should there be a test case for cases like this? Or maybe both.
I can (of course) also open this issue in the javascript repository, but I thought, a test case that would be used in all implementations might be helpful. So I created first here the issue.

There might be similar issues with countries using the generic2, fallback1, CA_fr or SG template.

Difference between `street_number` and `house_number`

I was curious why street_number isn't aliased to house_number.

My current guess, based on this relation from OpenStreetMap is that street_number relates to the number specified for a whole section of street (eg "This is the 500 block on 18th st.") whereas house_number relates to a specific house address (eg "This is 531 18th st.")

Given that the configured components is used as a negative list for the place name (as mentioned here #9 (comment)), it makes sense that street_number shouldn't be included in the place name and shouldn't be included as the house_number, so it's just swallowed.

However, I don't see street_number in any of the test cases so I don't know if this is the correct/expected behavior.

Anyway, just wanted to add that I really appreciate what you guys are doing. Thanks!

UK postcodes should be on a separate line

worldwide.yaml say that a UK postcode goes immediately after the post town and on the same line. It should be on its own on the following line. ie ...

Incorrect:

12 Letsby Avenue
Sheffield S9 1XX

Correct:

12 Letsby Avenue
Sheffield
S9 1XX

Hong Kong address format issue (js library)

Hi, team
The version I am using is src="https://unpkg.com/@fragaria/[email protected]"
The config example is
var formatted = window.addressFormatter.format({
"houseNumber": "",
"road": "211 Test Street",
"neighbourhood": "",
"city": "Hong Kong",
"postcode": "",
"county": "",
"state": "",
"country": "Hong Kong",
"countryCode": "HK",
}, {
output: 'array'
});
The output result will just show "211 Test Street".
Thank you for your attention!

Generic Address Lines

Hello,
Firstly thanks for such a great project. My question is about usage of data from a generic address form where use can enter:

  • Address line 1
  • Address line 2
  • City
  • State
  • Country
  • Postal Code

I am curious about how can I use address line 1 and address line 2 data on address templates. Is it would be correct, to certain extend, placing address line 1 as street/street_name and address line 2 as house/building?

Iraq - Erbil address test cases

Hello Ed and team,

We're seeing some issues with address formatting in Erbil in Iraq where I have colleagues with local knowledge.

  • There may be one issue relating to the fact that Erbil doesn't appear to have a city boundary. Erbil is a node in OSM, not a 'way' (boundary) https://www.openstreetmap.org/node/599019138 Do you think this might cause issues in formatting?

  • A specific case we are looking at is in the 'English Village' in Erbil. This location:

36.19137, 43.97445

is being formatted by opencage as

391, Erbil, 44001, Iraq

It would be expected to be

391 English Village, Erbil, 44001, Iraq

You can see that the English Village is appearing in the neighbourhood field of this location.

Iraqi Kurdistan, 44003, Iraq
i.e. Not obviously in Ankawa or Erbil itself.

Let me know if this information is enough to improve the formatted address. I can hopefully provide more examples in the near future.

Iran format is wrong

In Farsi (Persian), addresses are written from large to small. It's pretty similar to Korean language:

# South Korea - Korean
KR_ko:
    address_template: |
        {{{country}}}
        {{#first}} {{{state}}} {{/first}}
        {{#first}} {{{city}}} || {{{town}}} || {{{village}}} {{/first}}
        {{#first}} {{{suburb}}} || {{{city_district}}} || {{{neighbourhood}}} {{/first}}
        {{{road}}}
        {{{house_number}}}
        {{{house}}}
        {{{attention}}}
        {{{postcode}}}

The generic17 template used in the address templates is probably the right one for English addresses in Iran, but not for Farsi ones.

CA postformat_replace has side effects

Canadian postformat_replace regex (\\w{2}) (\\w{3})(\\w{3})\n has side effects when applied to abbreviated street types:

Input

{
  "road": "Av Demars",
  "house_number": "12345",
  "postcode": "J2T 3T5",
  "city": "St-Hyancinth",
  "countryCode" : "CA"
}

Output

{
  "formattedAddress": [
     "12345 Av Dem ars",
     "St-Hyancinth, J2T 3T5"
  ]
}

Solution
Replace postformat_replace with more strict postal code regex: ([A-Za-z]{2}) ([A-Za-z]\\d[A-Za-z])(\\d[A-Za-z]\\d)

Request: Ways to return geocode results at [city / town / community] level

I'm interested in using OpenCage to geocode the [city / town / community] that a user is in.

Referencing the Response Format example in the documentation, with a query like this:

https://api.opencagedata.com/geocode/v1/json?q=-22.6792%2C+14.5272&key=YOUR-API-KEY&pretty=1

I'd be interested in returning data like this:

        "city" : "Swakopmund",
        "country" : "Namibia",
        "country_code" : "na",
        "postcode" : "13001",
        "state" : "Erongo Region",

As well as the geojson geometry and coordinates for the boundaries of Swakopmund.

This is data that Nominatim offers (specified by the zoom parameter to control the results), and I think it would be quite useful in OpenCage as well.

Just wanted to share in case this is possible!

P.S. Recently came across OpenCage, love what you've built! Great job on the site, the story, the copy, the documentation, the UX, the climate commitments, everything. Looking forward to getting more involved!

Component key specified by `_type` may not be present

This is similar to #28. See it for context.

In #28 I reported that _type in the components section of a geocoding result may match an alias for a component, instead of the component itself. This has been fixed.

This issue is to report that _type may be present, but not refer to any component or alias for that component. See the components dictionary for (43.08226365175275, -87.8847851373871):

{
    "_type" : "building",
    "city" : "Milwaukee",
    "country" : "United States of America",
    "country_code" : "us",
    "county" : "Milwaukee County",
    "house_number" : "2028",
    "postcode" : "53211",
    "road" : "East Edgewood Avenue",
    "state" : "Wisconsin"
}

Notice that _type is set to building, but neither building nor any of its aliases (house or public_building) are present.

Please advise. :)

cc/ @freyfogle

`_type` does not match any component

From @mtmail in #9, I got this advice:

For the address formatting we treat anything that's not a component (anything not listed in https://github.com/lokku/address-formatting/blob/master/conf/components.yaml) to be a name and put it first in the formatted string.

So it's kind of a negative list.

You appear to have tried improving this for v1 of your API:

The formatted placename is created from the various terms in the components hash. This is the raw data we have to work with. We are often asked if there is a definitive list of all possible component keys. Unfortunately not. For convenience we add the key _type with the value set to what we believe the matched location to be. In the case where we can't determine a type we set the value unknown.

I tried to use your API in this manner, but got a response like this at (43.07930055693775, -87.88164978116998):

"components" : {
   "_type" : "house",
   "building" : "Sandburg Commons",
   "city" : "Milwaukee",
   "country" : "United States of America",
   "country_code" : "us",
   "county" : "Milwaukee County",
   "postcode" : "53211",
   "road" : "North Maryland Avenue",
   "state" : "Wisconsin",
   "suburb" : "Downer Woods"
},
"formatted" : "Sandburg Commons, North Maryland Avenue, Milwaukee, WI 53211, United States of America"

You'll notice the key component is declared as house in _type, but the key component, "Sandburg Commons", appears to actually keyed by building.

I've found this eliminates some of the usefulness of _type.

Please advise. :)

cc/ @freyfogle

Postal address formatting

This is such a great resource! This link is really useful but very intimidating and it seems like you've gone to great length to standardise things already.

I know the repo specifically says this is not for postal delivery but I wonder: how far is it from being useable for that purpose?

I've written a small Node.js script that uses this database for address formatting but only realise afterwards it wasn't fit for purpose. However, in most instances it seems extremely close.

How did you source the current formatting? How different is it from postal formatting? I'd be keen to fork/contribute in order to add the missing bits.

Add secondary address unit designators next to delivery address

I have a requirement where I need to put an optional secondary address unit designator next to the delivery address.
I assume this needs to put in 'house' variable but this is set on the line before the delivery address.
Right now I add it in on the same line after the delivery address through post processing.

Actual situation :
Unit 5
1535 Ellison Bridge Rd
Sardis, GA 30456

Requirement :
1535 Ellison Bridge Rd Unit 5
Sardis, GA 30456

Is there a way I can achieve this without postprocessing?

thanks in advance!

'first' support in template

Hi

I have started a javascript implementation of address-formating. I use Handlebars that wrap mustache, and unfortunatly {{#first}} doesn't seems to be supported. Does it require helpers ?

I will commit my implementation if that works.

David

Formatting nominatim address

Hello everybody,

It will be great if I can format my Nominatim address with address-formatting !

Is this possible ? is there any tutorial ?

Help please

Dealing with postal addresses

You're stating in the description:

The address is displayed to a consumer (for example in an app) and not used to print on an envelope for actual postal delivery.

Could you elaborate more on why this can't be used to be printed? I'm actually looking for a solution that can handle such a usecase and the data you've put up is very promising.

House Value for your Reverse Geocoder

I noticed that for your geocoder, the "house" value in these templates will sometimes be filled with other values like "library". Do you have a list of these substitution values?

Example (see "library" and "formatted"):

$ curl -i -X GET 'https://api.opencagedata.com/geocode/v1/json?pretty=1&no_annotations=1&key=<redacted>&q=+44.9753,-93.2359'
HTTP/1.1 200 OK
Date: Sat, 03 Jan 2015 02:52:24 GMT
Server: Apache/2.4.7 (Ubuntu)
X-ratelimit-reset: 1420329600
X-ratelimit-remaining: 2490
X-ratelimit-limit: 2500
Access-control-allow-origin: *
Strict-Transport-Security: max-age=31536000; includeSubDomains
Transfer-Encoding: chunked
Content-Type: application/json; charset=utf-8

{
   "licenses" : [
      {
         "name" : "CC-BY-SA",
         "url" : "http://creativecommons.org/licenses/by-sa/3.0/"
      },
      {
         "name" : "ODbL",
         "url" : "http://opendatacommons.org/licenses/odbl/summary/"
      }
   ],
   "rate" : {
      "limit" : 2500,
      "remaining" : 2490,
      "reset" : 1420329600
   },
   "results" : [
      {
         "bounds" : {
            "northeast" : {
               "lat" : 44.9755634,
               "lng" : -93.2358856
            },
            "southwest" : {
               "lat" : 44.9749964,
               "lng" : -93.2367022
            }
         },
         "components" : {
            "city" : "Minneapolis",
            "country" : "United States of America",
            "country_code" : "us",
            "county" : "Hennepin County",
            "house_number" : "117",
            "library" : "Walter Library",
            "neighbourhood" : "Marcy-Holmes",
            "postcode" : "55455",
            "road" : "Southeast Pleasant Street",
            "state" : "Minnesota",
            "suburb" : "Phillips"
         },
         "confidence" : 10,
         "formatted" : "Walter Library, 117 Southeast Pleasant Street, Minneapolis MN 55455, United States of America",
         "geometry" : {
            "lat" : 44.9753648,
            "lng" : -93.2362952210992
         }
      }
   ],
   "status" : {
      "code" : 200,
      "message" : "OK"
   },
   "thanks" : "For using an OpenCage Data API",
   "timestamp" : {
      "created_http" : "Sat, 03 Jan 2015 02:52:24 GMT",
      "created_unix" : 1420253544
   },
   "total_results" : 1,
   "we_are_hiring" : "http://lokku.com/#jobs"
}

Update Incorrect Data

Hi,

Recently started using OpenCage API (Free Account) with Free Key (2500 limits), using Yetiforce v4.30 as frontend.

I only tested a few search but notice that some of the search result is incorrect. For example the postcode, how can I update/contribute to the correct data?

Thank you.

See: YetiForceCompany/YetiForceCRM#7289

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.