Giter Site home page Giter Site logo

wfd_import's Introduction

WFD_import

WFD import is a collection of scripts and tools for batch importing of European water data, reported under the Water Framework Directive, to Wikidata.

It currently supports harvesting data on River Basin Districts (RBDs) and Surface Water Bodies (SWBs).

How to run

To do a basic run call either RBD.py or swb_import.py from the commandline together with the -in_file:<path> argument. The -in_file value should be the url to an RBDSUCA .xml file for an RBD import or an SWB file (there is one per RBD) for an SWB import.

The xml files only include the names of the RBD/SWB in English, if any. To load the names in a local language use the -gml_file:<path> argument together with the path to a .gml file belonging to the RBD or SWB dataset.

You can also use WfdBot.load_data() to create a json dump of either of the .xml or .gml files. The local json dumps can then be used together with the -in_file or -gml_file argument.

To create new items, rather than just enriching pre-existing ones, use the -new flag. A new item is one where the unique RBD/SWB code is not yet associated with an item on Wikidata.

To ensure no data is written to Wikidata you may use the -simulate flag which will raise an error if the scripts attempt to write to Wikidata.

The -preview_file:<path> also prevents data from being written to Wikidata, instead outputting a preview of the data, in wikitext, to the file at the specified path. See this page for an example.

Preview can be combined with the -cutoff:<num> argument which limits the number of items being processed.

The scripts also support any common Pywikibot flags. Use -help to get a list.

Where to find the data files

Sweden is used as an example in all links below:

The data can be found on the Central Data Repository (CDR) under European Union (EU) obligations/Water Framework Directive: River Basin Management Plans - 2016 Reporting.

  • The RBDSUCA file (for Competent Authorities and RBDs) is found under National RBDSUCA.
  • The SWB files can be found in their respective RBD directories under River Basin Districts.
  • Both .gml files (one for RBDs and one for SWBs) are located under National spatial data.

It is worth noting that there may be multiple releases of data, found in separate directories at CDR, so you should always choose the most recent one. For the .gml files it varies from country to country whether either of them are made accessible to the public.

How to add a new country

Before investing energy into adding support for a new country you should ensure that the license of the data is compatible with Wikidata (CC0). The default license on CDR is CC BY which is not free enough. You therefore need a source indicating that the member country released their data under a more permissive CC0 license.

  • Create an item for the Report, and add to the "dataset" entry in mappings.json. See the 2016 Sweden report for an example item.
  • Add the country to the "countryCode" entry in mappings.json.
  • Add a mapping of the three-letter language code to the two-letter code used on Wikidata to the "languageCode" entry in mappings.json.
  • Create items for each of the Competent Authorities listed in the RBDSUCA file, then and add these to the "CompetentAuthority" entry in mappings.json. Instructions for the claims to use on a Competent Authorities item can be found on this page.
  • If you wish to add support for a new language add it to self.langs in WfdBot.__init__ and ensure all the required entries in mappings.json have been translated.

During the run the above additions are validated to ensure all the required mappings are present. Feel free to add any new mappings as a pull request to this repository.

To import the data first run the RBD importer (with -new if needed) before running the SWB importer.

Known issues and limitations

The results are of course limited by the data quality. If a country has not followed the guidelines on e.g. language labeling then imported data will also be wrong.

The full logic of Significant Impact has not been implemented. Specifically it does not adapt its output based on pre-existing values for other years. See see the porperty proposal for the full logic.

RBD does not make use of internationalRBDName (nor does either SWB or RBD make use of nameTextInternational). Although this is supposed to be an international (English) label the field was found to hold a variety of content.

Issues

Issues and bugs are tracked on Phabricator.

Installation

If pip -r requirements.txt does not work correctly you might have to add the --process-dependency-links flag to ensure you get the right version of Pywikibot and lokal-profil/wikidata-stuff.

Note

This repository was split off from lokal-profil/wikidata_batches so the history might be a bit mixed up.

wfd_import's People

Contributors

lokal-profil avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.