Giter Site home page Giter Site logo

world-cities's Introduction

List of major cities in the world

Data

The data is extracted from geonames, a very exhaustive list of worldwide toponyms.

This datapackage only list cities above 15,000 inhabitants. Each city is associated with its country and subcountry to reduce the number of ambiguities. Subcountry can be the name of a state (eg in United Kingdom or the United States of America) or the major administrative section (eg ''region'' in France''). See admin1 field on geonames website for further info about subcountry.

Notice that :

  • some cities like Vatican city or Singapore are a whole state so they don't belong to any subcountry. Therefore subcountry is N/A.
  • There is no guaranty that a city has a unique name in a country and subcountry (At the time of writing, there are about 60 ambiguities). But for each city, the source data primary key geonameid is provided.

Preparation

Preparation

Python 3.6 .github/workflows/actions.yml

This repository uses dataflows to process and normalize the data.

You first need to install the dependencies:

pip install -r scripts/requirements.txt

Then run the script

python scripts/process.py

License

All data is licensed under the Creative Common Attribution License as is the original data from geonames. This means you have to credit geonames when using the data. And while no credit is formally required a link back or credit to Lexman and the Open Knowledge Foundation is much appreciated.

All source code is licensed under the MIT licence.

world-cities's People

Contributors

anuveyatsu avatar fthomas avatar lexman avatar mikanebu avatar ppkrauss avatar rgieseke avatar tfmorris avatar zelima avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

world-cities's Issues

Native name

I'm reporting here a discution we had when creating the dataset (datasets/awesome-data#47)

  • The native name for each city would be great
  • But is there only one native name ? Think of Anwerpen / Anvers in Belgium or Jaffna / யாழ்ப்பாணம் / යාපනය in Sri Lanka : they have several native names.
  • The orignal dataset has a "isPrefered" field wich is not reliable (see Bejing or Sukkothai)
  • One way to achieve multiple native names would be to have several columns "native_name1", "native_language1", "native_name2", "native_language2", etc. And fill the columns for each official language of the country of the city. It's not very nice in terme of data conception but it does the job

If anyone has a better idea, or want me to implement this one, please share it here...

Improvements / clarifications on contents and fields (subcountry, geopoint etc)

Comments from openspending/cosmopolitan#25 (comment)

  • At least should include standard name and the English variant
  • I assume that subcountry equals what is "region" in geonames, but "subcountry" is confusing terminology (for me)
  • We also take location (geopoint) and population from geonames, but they are not present here. I grant that there are likely better data sources, particularly for population.

@lexman any thoughts? I know we already have #3 re native name. What about second two points?

@pwalsh @lexman re city population note that we have https://github.com/datasets/population-city

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.