dov-vlaanderen / pydov Goto Github PK

View Code? Open in Web Editor NEW

31.0 20.0 17.0 17.15 MB

Python package to retrieve data from Databank Ondergrond Vlaanderen (DOV)

Home Page: https://pydov.readthedocs.io/en/latest/

License: MIT License

Python 78.74% Jupyter Notebook 21.26%

python package data-access water lifewatch oscibio

pydov's Introduction

pydov

pydov is a Python package to query and download data from Databank Ondergrond Vlaanderen (DOV). It is hosted on GitHub and development is coordinated by Databank Ondergrond Vlaanderen (DOV). DOV aggregates data about soil, subsoil and groundwater of Flanders and makes them publicly available. Interactive and human-readable extraction and querying of the data is provided by a web application, whereas the focus of this package is to support machine-based extraction and conversion of the data.

To get started, see the documentation at https://pydov.readthedocs.io.

Please note that downloading DOV data with pydov is governed by the same disclaimer that applies to the other DOV services. Be sure to consult it when using DOV data with pydov.

Installation

You can install pydov stable using pip:

pip install pydov

Or clone the git repository and install with python setup.py install to get the latest snapshot from the master branch.

To contribute to the code, make sure to install the package and all of the development dependencies enlisted in the requirements_dev.txt file. First, clone the git repository. We advice to use an Python development environment, for example with conda or virtualenv. Activate the (conda/virtualenv) environment and install the package in development mode:

pip install -e .[devs]

Need more detailed instructions? Check out the installation instructions and the development guidelines.

Quick start

Read the quick start from the docs or jump straight in:

from pydov.search.boring import BoringSearch
from pydov.util.location import Within, Box

from owslib.fes2 import PropertyIsGreaterThan

boringsearch = BoringSearch()

dataframe = boringsearch.search(
    query=PropertyIsGreaterThan(propertyname='diepte_tot_m', literal='550'),
    location=Within(Box(107500, 202000, 108500, 203000))
)

The resulting dataframe contains the information on boreholes (boringen) within the provided bounding box (as defined by the location argument) with a depth larger than 550m:

>>> dataframe
                                         pkey_boring     boornummer         x         y  mv_mtaw  start_boring_mtaw gemeente  diepte_boring_van  diepte_boring_tot datum_aanvang uitvoerder  boorgatmeting  diepte_methode_van  diepte_methode_tot boormethode
0  https://www.dov.vlaanderen.be/data/boring/1989...  kb14d40e-B777  108015.0  202860.0      5.0                5.0     Gent                0.0              660.0    1989-01-25   onbekend          False                 0.0               660.0    onbekend
1  https://www.dov.vlaanderen.be/data/boring/1972...  kb14d40e-B778  108090.0  202835.0      5.0                5.0     Gent                0.0              600.0    1972-05-17   onbekend          False                 0.0               600.0    onbekend

Documentation

Full documentation of pydov can be found on our ReadTheDocs page.

Contributing

You do not need to be a code expert to contribute to this project as there are several ways you can contribute to this project. Have a look at the contributing page.

pydov's People

Contributors

Stargazers

Watchers

Forkers

stijnvanhoey pjhaest johanvdw roel timfranken chandraalgoe snakesonabrain cuulee jorisvandenbossche meisty fagan2888 kpaenen debaetpa guillaumevandekerckhove rebot rthijs naila-gef

pydov's Issues

improve issue/PR templates

according to https://github.com/blog/2495-multiple-issue-and-pull-request-templates, we can better adapt the template to specific reasons fo issues

XML parsing from downloaded file

The current code for interpretations is applicable for xml parsing from the webservice.
Later on, we should verify which adjustments are necessary to parse downloaded xml files where 'boringen' and 'interpretations' can both be present (currently, the type does not define the full rooth path in the xml of mixed data)

Multiple coveralls reports for single PR

This provides an overflow of information... Having the information on the PR level is useful to evaluate the effort of test-writing when someone adds new features, but having 4 reports for each change is overkill.

The issue is reported on github, and the coveralls documentation provides a potential solution, but the current attempt was without success... Other people experience similar issues. Moreover, the responses to https://github.com/lemurheavy/coveralls-public/issues seems rather low...

I would actually attempt a switch to codecov, as it provides this feature out of the box.

https://docs.codecov.io/v4.3.6/docs/merging-reports

Missing values in XML export

It looks like some values are missing in XML export, but are present in the UI.

Listing those here so we can pass them (grouped) to the DOV development team.

grondwater observaties: detectie

example notebooks - using latest pydov version

Just a small comment:
In the notebooks you do

# Insert current tree in the sys path to be able to import local copy of 'pydov'
sys.path.insert(0, '../../')

You can also use editable (develop) mode to install:

pip install -e

That way you are sure you always use the current version, without having to edit your path.

First example/idea of the aimed functionality, setup

As a functionality, the user would could like this (naming should be improved to better fit the naming in the groundwater domain):

import dov-downloader as dov
dov.download([list of wel]).subset_period('2000':'2007').to_csv("name_file.csv")

(in words: download my list of wells, filter that specific period and write everything into a csv-file)

Basically, there are 3 parts in this setup:

download, i.e. extraction part: downloading data based on a list of stations; this part could be extended towards more powerful download_**** function, e.g. download_from_boundingbox, download_from_aquifer(),... These extension functions of the regular download will always require soma additional service calls, but will end up having a list of stations and use the download function
subset_*, i.e. filter part: this should provide some straightforward functions to filter the downloaded data set. When using pandas DataFrames as the basic data type to store the data (see further), a lot of options will be available.
to_***, i.e. conversion part: The data is stored or exported to a new file-format that could be useful for the user. to_csv/ to_excel are exampled that are already available, but the advantages of this package would be if there are more domain-specific export funtionalities, e.g. to_modflow(), to_menyanthes(), to_swap()

As we're dealing with time series, the usage of Pandas DataFrames as used datatype, provides a lot of built-in options. When needed, we make a new class inhereted on the pd.DataFrame to handle some additional metadata. Multiple stations can be solved by having a Multi-index as column headers. With the row labels as a DateTimeIndex, we have all the data handling options like resampling (daily/monthly/... mean values) and slicng data from Pandas available.

The fact that we will have the XML-format as such (always a complete time serie) as the stable source for data, I would propose to have an xml_to_df conversion function that converts the XML to a Pandas DataFrame as a basic function in direct relation with the other basic functionality download. These two functions (xml_to_df and download) could be the first milestone to implement. Than, more advanced download functions and more advanced export functions can be created ont op of this.

Voorbeeld XML file in repo

As for the moment the XML service based on the URL is not yet provided, it would be useful to have an example file of the XML format in the repo. Hence, we can already provide and test the conversion functionality.

Providing a guide of contribution

As mentioned by @pjhaest, we should write some guidances on how new users can contribute, providing some good practices, choices,... things to mention:

scope of this package (what should be in and what not)
how to contribute (fork, pull request,...) with guidance
advice on how to use docstrings -> numpy doc style?
advice and routines on how to render the documentation as a readthedocs (which can be on github pages)
...

branch poc_boring

PyDOV version: branch poc_boring
Python version: 2.7.5 OSGeo4W distribution for QGIS 2.14 ltr
Operating System: Windows

Description

Testing branch poc_boring yielded some initial dependency issues on the OSGeo4W python distribution:

The owslib module is not up-to-date with the one tested by Roel: the schema.py (owslib/feature/) and other methods are missing.
Therefore, I downloaded the current master from GitHub and replaced this in OSGeo4W\apps\qgis-ltr\python\owslib

Then some warning messages pop-up, which I don't know if we should care about too much at the moment?

following openURL on line 54 of owslib\feature\common.py:

C:\OSGEO4~1\apps\Python27\Lib\site-packages\urllib3\util\ssl_.py:339: SNIMissingWarning: 
An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS 
is not available on this platform. This may cause the server to present an incorrect TLS certificate, 
which can cause validation failures. 
You can upgrade to a newer version of Python to solve this. For more information, see 
https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  SNIMissingWarning

C:\OSGEO4~1\apps\Python27\Lib\site-packages\urllib3\util\ssl_.py:137: InsecurePlatformWarning: 
A true SSLContext object is not available. 
This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. 
You can upgrade to a newer version of Python to solve this. 
For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecurePlatformWarning

or after calling get_fields() in the boring_search.py examples

return self.__wfs.contents[self._layer] on line 104 of pydov/search.py returns:
C:\OSGEO4~1\apps\qgis-ltr\python\owslib\iso.py:116: FutureWarning: the .identification 
and .serviceidentification properties will merge into .identification being a list of properties.  
This is currently implemented in .identificationinfo.  Please see
ttps://github.com/geopython/OWSLib/issues/38 for more information
  FutureWarning)

the pkey_boring link is not printed correctly, but it is ok in the dataframe, no worries

Overall +1! More testing with new queries will follow.

Use ETree instead of xmldict

I have tested using standard Python package xml.etree.ElementTree instead of xmldict for parsing xml.

The code remains very readable and we loose a dependency. Moreover, we don't have to make the distinction between lists of items and a single item.

Example code here:
https://gist.github.com/johanvdw/5f02dcb907578d248aa2613d98caef44

Find a solution to reuse monkeypatches across types

Currently we have to copy/paste all the monkeypatch functions between the tests of different types and change the location of the data, f.ex.:

def mp_remote_wfs_feature(monkeypatch):
    """Monkeypatch the call to get WFS features.

    Parameters
    ----------
    monkeypatch : pytest.fixture
        PyTest monkeypatch fixture.

    """
    def __get_remote_wfs_feature(*args, **kwargs):
        with open('tests/data/types/boring/wfsgetfeature.xml',
                  'r') as f:
            data = f.read()
            if type(data) is not bytes:
                data = data.encode('utf-8')
        return data

    if sys.version_info[0] < 3:
        monkeypatch.setattr(
            'pydov.util.owsutil.wfs_get_feature',
            __get_remote_wfs_feature)
    else:
        monkeypatch.setattr(
            'pydov.util.owsutil.wfs_get_feature',
            __get_remote_wfs_feature)

We should think of a way to only specify the base path once and be able to reuse the monkeypatches between different types.

Cannot use fields from a subtype as return fields.

PyDOV version: master
Python version: 3.6
Operating System: Windows 10

Description

Specifying a field from a subtype as return field gives an error if the resulting dataframe is non-empty.

What I Did

import pydov.search.boring
from owslib.fes import PropertyIsEqualTo

bs = pydov.search.boring.BoringSearch()

bs.search(query=query, return_fields=('pkey_boring',))
                                         pkey_boring
0  https://www.dov.vlaanderen.be/data/boring/2004...

bs.search(query=query, return_fields=('pkey_boring', 'boormethode'))
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Projecten\PyDov\pydov_git\pydov\search\boring.py", line 114, in search
    columns=Boring.get_field_names(return_fields))
  File "C:\Users\rhbav33\python_virtualenvs\3.6_dev\lib\site-packages\pandas\core\frame.py", line 364, in __init__
    data = list(data)
  File "C:\Projecten\PyDov\pydov_git\pydov\types\abstract.py", line 467, in to_df_array
    result = item.get_df_array(return_fields)
  File "C:\Projecten\PyDov\pydov_git\pydov\types\abstract.py", line 524, in get_df_array
    return_fields=return_fields)
  File "C:\Projecten\PyDov\pydov_git\pydov\types\abstract.py", line 386, in get_field_names
    raise InvalidFieldError("Unknown return field: '%s'" % rf)
pydov.util.errors.InvalidFieldError: Unknown return field: 'boormethode'

caching feature

Using the package, XML files need to be downloaded by the user. As we should not expect that users will be fully aware of when XML downloads are needed (versus pure WFS requests) we can counteract multiple downloads by providing a caching functionality:

XML files stored as files in cache folder
before requesting an XML file, check the cache folder for existing local XML files .
check for age; if older than X weeks, redownload the file

(is a package wide functionality, used by the different modules, basically a wrapper around pydov.types.abstract.AbstractDovType#_get_xml_data

caching in database

as discussed in the august code sprint it could be useful to implement an enhancement of the caching where downloaded xmls are stored in a postgres DB, and read from there.

Support polygons as location parameter

It should be possible to provide a polygon as location parameter of the search method. (next to bbox)

error in SonderingDataTypes.xsd

The documentation of the element "start_tov_maaiveld" should be changed (now it's copy-paste from "huidige_maaiveld")

Availability of the XML for 'peilmetingen'

@johanvdw is the service resulting in the XML-file for the peilmetingen, cfr. https://github.com/DOV-Vlaanderen/pydov/blob/master/tests/data/1-0709.xml yet available? Could you provide some links to documentation (or just the URL and query params to use)? Hence, we can schedule the further implementation and documentation for other users.

Next development steps

(cfr. minimal translation of the discussion of fac2face meeting on 2017/10/24)

In terms of functionalities, I want to...

...download data (borehole data, water level,...) for location X (and derived locations, e.g. polygon,...).
...download data for all locations that provide a kind of observation (e.g. all filters measuring arsenic, all interpretations with a certain characteristic).
...download data from ... to ... (dates)

The current existing services (WFS, XML and the REST service of the application), can support these requests. As the REST service is not guaranteed to be stable (for the moment), we will primary focus on the WFS/XML services.

Scenario 1 can be supported by the WFS/XML. For scenario 2 this depends if the WFS contains the specific information. Period information is included in the WFS, so this is searchable on it, but will require some actions to mask this from the package user.

We will focus on scenario 1 for the moment, using the WFS for the location based searching and downloading/parsing the XML.

Translating to main classes

The general capabilities (general DOVObject, name to be defined)
- parsing of the XML download
- location based search/handling functionalities
Main handling classes to work on and inherit from the DOVObject
- DovGroundwater -> focus on time series of levels and observations
- DovBoreholes -> focus on interpretations
- (DovConePenetrationTest -> focus on interpretations; inherit on BoreHoles)

Data representation

The package scope stops at a table representation of the required data in a Pandas data.frame (In a later stage we can add custom translators to other common groundwater formats).

More details (deeper into the XML tree) will result in more columns, as we denormalize the data. -
Logical entities to split (e.g. observations versus levels) are split in individual tables.
aggregated/derived fields useful for searching are not always useful to keep in the table, we exclude these in the resulting data.frame (e.g. are there interpretations). Comments fields are kept ;-)

Meta-organisation

ci: both Travis (pip) and appveyor (conda), also testing it works within the osgeo4w context
unit-tests: py.test
- we'll add data directory with example XML to make sure unit-tests on handling XML can be tested off-line and provide additional tests to check that the XML format is still the same as serverside
- code coverage: using coveralls
documentation: sphinx docs, hosting on github pages (deployed by the travis ci)
- docstrings in numpydoc format
add code of conduct
support for at least 2.7 and 3.5

We always use pull requests to add new features, but you're allowed to merge your own pull requests

@Roel @johanvdw @pjhaest @marleenvd feel free to comment/...

Search query attributes

@Roel you mentioned earlier that the search query can make use of the attributes defined in the wfs schema. Ok so far. With the 'DOV-verkenner' there are more options available to search for. Are these also available for pydov? If so, could you post an attribute table to the docs?

AppVeyor build failing

The AppVeyor build is currently failing because it only installs requirements_dev.txt, and that does not include the -r requirements anymore.

Shall I install the requirements.txt in AppVeyor manually too or do we add the (contents of) requirements.txt to requirements_dev as before?

xsd scheme documentation description mistake

When checking the xsd scheme of DOV, I bumped into duplicate description. I do think the betrouwbaarheid has a wrong description:

file xsd/kern/gwmeetnet/FilterDataTypes.xsd:

...
<xs:element name="zoet" type="generiek:JNOEnumType" minOccurs="0" default="O"><xs:annotation><xs:documentation>omgerekend naar zoet water (ja/nee/onbekend)</xs:documentation></xs:annotation></xs:element><xs:element name="betrouwbaarheid" type="generiek:BetrouwbaarheidEnumType" minOccurs="0" default="onbekend"><xs:annotation><xs:documentation>omgerekend naar zoet water (ja/nee/onbekend)</xs:documentation></xs:annotation></xs:element>
...

Dutch or English object names?

For groundwater we have now used an English term for our object: DovGroundwater.

I'm not sure we should translate things - it often leads to errors (eg filter instead of screen). Most fields in xml are also in Dutch (meetnet, ...) so I think it is better to stick to Dutch.

My proposal: if we are using our own standards we stick to Dutch. If we can use international standards (waterml, inspire, ...) we can use those versions.

Provide documentation on data column origin

Currently, the data documentation provides for each variable the DOV scheme origin. The source should be updated to the real used source (WFS or XML) to clarify the effort of data transfer for a given query

Appveyor build broken for Python2.7

Recent Appveyor builds for Python2.7 fail with:

pip install --no-cache-dir --ignore-installed -r requirements_dev.txt
ERROR: To modify pip, please run the following command:
C:\Miniconda\python.exe -m pip install --no-cache-dir --ignore-installed -r requirements_dev.txt

Use cases

Do we allow for use cases in this repo?
If so, where to put them: in the docs or some other folder?

Compatiblity with pastas Python package

Pastas is an open-source framework for the analysis of hydrological time series, http://pastas.readthedocs.io/en/latest/.

Not a priority, but interesting to log here that it could be worthwhile to provide compatibility in the future. By providing the mapping to the pastas datamodel, the modelling tools developed in pastas would be applicable. As the modelling part is out of scope of this package, both packages are complimentary to each other.

(getting this from earlier correspondence from Pieter Jan Haest)

move `_parse_xml_data` to generic types abstract

_parse_xml_data is currently duplicated amongst different classes (grodwaterfilter, boring). We could move this to the abstract class. If alterations are required for certain classes, overwriting the class is still possible.

work on hydrogeologische stratigrafie

todo

XML boringen

Er werd aangegeven dat de xml van boringen etc reeds in productie is, dus enkele vraagjes:

Er is een probleem voor het verkrijgen van de xml in Python (terwijl ik wel de xml als download in mijn browser krijg). Johan heeft dit probleem al gesignaleerd #5 . Of moeten hier in Python nog zaken voor gedefinieerd worden? Zie onderaan voor het resultaat van een query in Python.
Als ik het goed begrijp blijft de eerste stap: de selectie van de boringen uit de webservice voor een bounding box, wel een POST request met het resultaat in JSON?
Moet er dan een request gemaakt worden op onderstaande url, of naar de wfs? En wat zijn de vereiste parameters voor de bounding box, want deze staan onder Request Payload (?) ?
...www.dov.vlaanderen.be/zoeken-ocdov/proxy-boring/boring/search?maxresults=100

Xml in Python:

url  = 'https://www.dov.vlaanderen.be/data/boring/1981-010840.xml'
r = requests.get(url)
r.text
u'<!DOCTYPE HTML>\r\n<html>\r\n<head>\r\n    <link rel="icon" sizes="192x192"\r\n          href="//dij151upo6vad.cloudfront.net/latest/icons/app-icon/icon-highres-precomposed.png">\r\n    <meta http-equiv="content-type" content="text/html;charset=utf-8"/>\r\n    <title>DOV Portaal</title>\r\n    <script type="text/javascript" language="javascript">\r\n        appVersion = "v1.7.0";\r\n    </script>\r\n    <!-- Name defined in the module xml. -->\r\n    <!-- cfr https://groups.google.com/group/google-web-toolkit/browse_thread/thread/71b17949f9a7c333https://groups.google.com/group/google-web-toolkit/browse_thread/thread/71b17949f9a7c333  -->\r\n    <!-- before your module(*.nocache.js) loading  -->\r\n    <!--[if lt IE 9]>\r\n    <script src="https://html5shim.googlecode.com/svn/trunk/html5.js"></script>\r\n    <![endif]-->\r\n    <!--[if IE 7]>\r\n    <link rel="stylesheet" href="edovboringen/css/font-awesome-ie7.css">\r\n    <![endif]-->\r\n    <!-- your module(*.nocache.js) loading  -->\r\n    <script type="text/javascript" language="javascript" src="portaalclient/portaalclient.nocache.js"></script>\r\n</head>\r\n\r\n<body>\r\n<!-- OPTIONAL: include this if you want history support -->\r\n<iframe src="javascript:\'\'" id="__gwt_historyFrame" tabIndex="-1"\r\n        style="position: absolute; width: 0; height: 0; border: 0"></iframe>\r\n</body>\r\n</html>\r\n'

Coveralls build unreliable

The last Coveralls build of master dates from a few weeks ago, did we break something?

Other branches do have more recent builds, but not reliably for all commits/PRs.?

setup sphinx documentation

No documentation in xsd for grondwaterlichaam, regime and grondwatersysteem

In FilterDataTypes.xsd no documentation is provided for the following items:

<xs:element name="grondwaterlichaam" type="GrondwaterlichaamEnumType" minOccurs="0"><xs:annotation><xs:documentation/></xs:annotation></xs:element><xs:element name="grondwatersysteem" type="GrondwatersysteemEnumType" minOccurs="0"><xs:annotation><xs:documentation/></xs:annotation></xs:element><xs:element name="regime" type="interpretatie:RegimeEnumType"><xs:annotation><xs:documentation>regime</xs:documentation></xs:annotation></xs:element>

Provide a way for the user to see what's going on.

I.e. when downloading XML data, it can be handy to see the progress.

I suggest setting up a simple system of hooks, with a standard implementation printing the progress to stdout.

Extend WFS attributes beyond the default df

Hydrogeologische stratigrafie naming inconsistent at DOV

I found the naming of the 'hydrogeologische stratigrafie' confusing because its element in the xml is 'hydrogeologischeinterpretatie'.
I currently use the latter in pydov, to be adjusted if deemed necessary.

Caching of certain XML files is broken

PyDOV version: caching
Python version: 3.6
Operating System: Windows 10

Description

Caching of certain XML files (I assume ones with funny characters) is broken. This leads to errors when trying to reuse cached data.

The problem is twofold: for some reason some XML files cannot be saved in the cache, but instead an empty file is created causing trouble when trying to reuse this 'cached' data.

We should fix:

the root cause why certain XML's can't be saved
the fact the when saving fails an empty file is created
empty files shouldn't be considered valid cache

What I Did

import pydov.search.boring
from owslib.fes import PropertyIsEqualTo

query = PropertyIsEqualTo('boornummer', 'B/9-000014')
bs = pydov.search.boring.BoringSearch()

bs.search(query=query)
                                         pkey_boring  boornummer         x  \
0  https://www.dov.vlaanderen.be/data/boring/1995...  B/9-000014  197535.0   
          y  mv_mtaw  start_boring_mtaw gemeente  diepte_boring_van  \
0  187210.0    22.61              22.61    Diest                0.0   
   diepte_boring_tot datum_aanvang      uitvoerder  boorgatmeting  \
0              350.0    1995-01-01  Peeters-Ramsel          False   

bs.search(query=query)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Projecten\PyDov\pydov_git\pydov\search\boring.py", line 114, in search
    columns=Boring.get_field_names(return_fields))
  File "C:\Users\rhbav33\python_virtualenvs\3.6_dev\lib\site-packages\pandas\core\frame.py", line 364, in __init__
    data = list(data)
  File "C:\Projecten\PyDov\pydov_git\pydov\types\abstract.py", line 468, in to_df_array
    result = item.get_df_array(return_fields)
  File "C:\Projecten\PyDov\pydov_git\pydov\types\abstract.py", line 532, in get_df_array
    self._parse_xml_data()
  File "C:\Projecten\PyDov\pydov_git\pydov\types\boring.py", line 191, in _parse_xml_data
    tree = etree.fromstring(xml)
  File "src\lxml\etree.pyx", line 3230, in lxml.etree.fromstring (src\lxml\etree.c:81056)
  File "src\lxml\parser.pxi", line 1871, in lxml.etree._parseMemoryDocument (src\lxml\etree.c:121236)
  File "src\lxml\parser.pxi", line 1759, in lxml.etree._parseDoc (src\lxml\etree.c:119912)
  File "src\lxml\parser.pxi", line 1125, in lxml.etree._BaseParser._parseDoc (src\lxml\etree.c:114159)
  File "src\lxml\parser.pxi", line 598, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\etree.c:107724)
  File "src\lxml\parser.pxi", line 709, in lxml.etree._handleParseResult (src\lxml\etree.c:109433)
  File "src\lxml\parser.pxi", line 638, in lxml.etree._raiseParseError (src\lxml\etree.c:108287)
  File "<string>", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1

Update name and description

I would call this package:

pydov

It's shorter than dov-pydownloader, sounds more official (only one pydov package) and easily applicable to R: rdov.

I would also update the description to:

A python package to retrieve data from Databank Ondergrond Vlaanderen (DOV)

Rather than "A python package to extract data from the DOV web application", as you don't really extract data from the application, but from the webservices. Also retrieve implies more of a request/response than extract.

XML string parsing error

XML can contain non-ascii that fail upon encoding (see below).
For example: "u'SocietÃ© Belge des BÃ©tons'"
Check boring type and search for
{'start_boring_mtaw': 61.0, 'boorgatmeting': '{UNRESOLVED}', 'uitvoerder': '{UNRESOLVED}', 'boornummer': 'kb34d93e-B183', 'pkey_boring': 'https://www.dov.vlaanderen.be/data/boring/1928-031159', 'mv_mtaw': '{UNRESOLVED}', 'diepte_boring_van': '{UNRESOLVED}', 'y': 177156.1, 'x': 234685.7, 'datum_aanvang': datetime.date(1928, 1, 2), 'diepte_boring_tot': 15.0, 'gemeente': 'Bilzen'}

File "D:\_wd\pydov\types\abstract.py", line 59, in typeconvert
    return str(x).strip()
UnicodeEncodeError: 'ascii' codec can't encode characters in position 6-7: ordinal not in range(128)

Combine WFS attribute and location on search

Internal links in README broken

I noticed that the internal links on the README are broken?

Document the setup of the rtd, travis and requirements files

As ad-hoc and separate management is required for RTD, binder and the package installation requirements, this should be improved and/or documented in the developers notes.

Support point with buffer as location parameter

It should be possible to define a location with a buffer as a location parameter (next to bbox).

Package structure

A newbie question: I stumbled across the 'good practices' of pytest where /src/pydov is advised, next to /tests/ to overcome problems using virtual environments with tox. See https://docs.pytest.org/en/latest/goodpractices.html
But the current setup seems to work fine?

notebooks and tutorials: integrate examples into the docs folder

shouldn't we put the notebooks and eventual other tutorials under 'examples' instead of 'docs'? Or create a new map 'tutorials'.
The current position under 'docs' is somewhat full with other rst's etc.

Add coordinates to interpretations

During the august code sprint it became apparent that uses not always need 'boring' data next to interpretations. If only interpretations are searched, the users need access to the coordinates as well.
This can be circumvented by adding the wfs fields to the 'return_fields' argument but the column name then differs. Therefore, it is best to add these wfs data to the dataframe for all possible queries.

maxfeatures van geoserver

Kan je het met behulp van OWSlib het maximum aantal features lezen dat je kan opvragen, zonder te itereren met je getfeature query?
Ik vind onder WFS versie 1.1.0 wel een verwijzing naar <ows:Constraint name="DefaultMaxFeatures">, maar dit komt nergens terug in de WFS voor DOV?

add ci to pydov

both Travis (pip) and appveyor (conda), also testing it works within the osgeo4w context
including pypi automatic releases
deploying docs on ~~github pages~~readthedocs when successfull
tests for both Python 2.(7) and 3.(5)
link with coveralls

osgeo4w compatibility

As a subset of the target audience will be installing Python and GIS tools with osgeo4w, it is good to take this into account:

add CI tests in the osgeo4w context.
document the installation for this target audience

correct gebruik namespaces

Momenteel gebruiken we de in de xml gebruikte afkorting van de namespace hardcoded. We zouden beter de namespaces zelf mappen, eventueel naar None, zodat het ook correct werkt met andere XML files die ook aan het schema voldoet.

https://github.com/martinblech/xmltodict#namespace-support

Refactor AbstractSubType to support nested subtypes

This is needed for Filtermeting > Watermonster > Observatie.

dov-vlaanderen / pydov Goto Github PK

pydov's Introduction

pydov

Installation

Quick start

Documentation

Contributing

Meta

pydov's People

Contributors

Stargazers

Watchers

Forkers

pydov's Issues

Description

Description

What I Did

In terms of functionalities, I want to...

Translating to main classes

Data representation

Meta-organisation

Description

What I Did

Recommend Projects

Recommend Topics

Recommend Org