Giter Site home page Giter Site logo

j535d165 / cbsodata Goto Github PK

View Code? Open in Web Editor NEW
41.0 10.0 17.0 106 KB

Unofficial Statistics Netherlands (CBS) open data API client for Python

Home Page: http://cbsodata.readthedocs.io/

License: MIT License

Python 100.00%
python-library open-data data census-data census-api national-statistics netherlands

cbsodata's Introduction

Hi, I'm Jonathan //J535D165

Ever wondered what the ideal (scientific) workflow would look like? And what kind of tools you need for it? It's maybe an impossible question to anser, but many will say the workflow should be efficient, transparent, and reproducible. I don't know the answer as well, but I fully support these principles. Over the past years, I've used my GitHub profile to share and collaborate on projects aimed at developing the ideal academic workflow. The following projects are my top interest at the moment:

  • Data access: If you're looking for an easy way to download scientific data, be sure to check out Datahugger ๐Ÿ‘ - the easiest way to download scientific data! I'm also involved in projects like pyalex (new!), cbsodata, and rispy.

  • Superfast reading: Can we make systematic reviews fun to work on by using AI for the boring ๐Ÿ’ค parts? With ASReview and asreview.ai, we speed up systematic reviewing. I'm lead of ASReview's development team.

  • Transparent workflows: I'm experimenting with projects like scitree and scisort, which help and promote to use repoducible project folder structures.

  • Data linkage: I work on projects like recordlinkage and List of data matching software. Although my attention may sometimes waver from these projects, but they are still close to my heart โค๏ธ.

In addtion to this, you can also find me at Utrecht University (in the Netherlands) as the project lead for the Open and FAIR Data and Software movement.

๐Ÿ’ฌ

cbsodata's People

Contributors

huisman avatar j535d165 avatar ncvanegmond avatar nieuwenhoven avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cbsodata's Issues

typed=True or False works the other way around

When I pass typed = True, the UntypedDataSet is downloaded and the other way around.

E.g.

cbsodata.get_data(tableid, dir='D:/jsondata/', typed=False)

returns the TypedDataSet.

Probably this code

# Download only the typed or untyped data

typed_or_not_str = "TypedDataSet" if typed else "UntypedDataSet"
metadata_table_names.remove(typed_or_not_str)

should be like this?:

typed_or_not_str = "UntypedDataSet" if typed else "TypedDataSet"
metadata_table_names.remove(typed_or_not_str)

in order to remove the option that was not chosen.

Kind regards, Wouter

meta_data()

Seems to be offline, or not working anymore. I guess they updated the syntax?

ConnectionError: ('Connection aborted.', OSError("(54, 'ECONNRESET')",))

In [1]: In [1]: import cbsodata
   ...: In [2]: cbsodata.CBSOPENDATA = "dataderden.cbs.nl"
   ...: In [3]: cbsodata.get_data('47015NED')
   ...: 
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    599                                                   body=body, headers=headers,
--> 600                                                   chunked=chunked)
    601 

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    383                     # otherwise it looks like a programming error was the cause.
--> 384                     six.raise_from(e, None)
    385         except (SocketTimeout, BaseSSLError, SocketError) as e:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/packages/six.py in raise_from(value, from_value)

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    379                 try:
--> 380                     httplib_response = conn.getresponse()
    381                 except Exception as e:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/http/client.py in getresponse(self)
   1330             try:
-> 1331                 response.begin()
   1332             except ConnectionError:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/http/client.py in begin(self)
    296         while True:
--> 297             version, status, reason = self._read_status()
    298             if status != CONTINUE:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/http/client.py in _read_status(self)
    257     def _read_status(self):
--> 258         line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    259         if len(line) > _MAXLINE:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/socket.py in readinto(self, b)
    585             try:
--> 586                 return self._sock.recv_into(b)
    587             except timeout:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py in recv_into(self, *args, **kwargs)
    289             else:
--> 290                 raise SocketError(str(e))
    291         except OpenSSL.SSL.ZeroReturnError as e:

OSError: (54, 'ECONNRESET')

During handling of the above exception, another exception occurred:

ProtocolError                             Traceback (most recent call last)
~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    444                     retries=self.max_retries,
--> 445                     timeout=timeout
    446                 )

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    637             retries = retries.increment(method, url, error=e, _pool=self,
--> 638                                         _stacktrace=sys.exc_info()[2])
    639             retries.sleep()

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    366             if read is False or not self._is_method_retryable(method):
--> 367                 raise six.reraise(type(error), error, _stacktrace)
    368             elif read is not None:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/packages/six.py in reraise(tp, value, tb)
    684         if value.__traceback__ is not tb:
--> 685             raise value.with_traceback(tb)
    686         raise value

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    599                                                   body=body, headers=headers,
--> 600                                                   chunked=chunked)
    601 

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    383                     # otherwise it looks like a programming error was the cause.
--> 384                     six.raise_from(e, None)
    385         except (SocketTimeout, BaseSSLError, SocketError) as e:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/packages/six.py in raise_from(value, from_value)

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    379                 try:
--> 380                     httplib_response = conn.getresponse()
    381                 except Exception as e:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/http/client.py in getresponse(self)
   1330             try:
-> 1331                 response.begin()
   1332             except ConnectionError:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/http/client.py in begin(self)
    296         while True:
--> 297             version, status, reason = self._read_status()
    298             if status != CONTINUE:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/http/client.py in _read_status(self)
    257     def _read_status(self):
--> 258         line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    259         if len(line) > _MAXLINE:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/socket.py in readinto(self, b)
    585             try:
--> 586                 return self._sock.recv_into(b)
    587             except timeout:

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py in recv_into(self, *args, **kwargs)
    289             else:
--> 290                 raise SocketError(str(e))
    291         except OpenSSL.SSL.ZeroReturnError as e:

ProtocolError: ('Connection aborted.', OSError("(54, 'ECONNRESET')",))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
<ipython-input-1-b54313817458> in <module>()
      1 import cbsodata
      2 cbsodata.CBSOPENDATA = "dataderden.cbs.nl"
----> 3 cbsodata.get_data('47015NED')

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/cbsodata.py in get_data(table_id, dir, typed, select, filters)
    277 
    278     metadata = download_data(table_id, dir=dir, typed=typed,
--> 279                              select=select, filters=filters)
    280 
    281     if "TypedDataSet" in metadata.keys():

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/cbsodata.py in download_data(table_id, dir, typed, select, filters)
    171         if table_name in ["TypedDataSet", "UntypedDataSet"]:
    172             metadata = _download_metadata(table_id, table_name,
--> 173                                           select=select, filters=filters)
    174         else:
    175             metadata = _download_metadata(table_id, table_name)

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/cbsodata.py in _download_metadata(table_id, metadata_name, select, filters)
     78     while (url is not None):
     79 
---> 80         r = requests.get(url, params=params)
     81 
     82         res = r.json(encoding='utf-8')

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/requests/api.py in get(url, params, **kwargs)
     70 
     71     kwargs.setdefault('allow_redirects', True)
---> 72     return request('get', url, params=params, **kwargs)
     73 
     74 

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/requests/api.py in request(method, url, **kwargs)
     56     # cases, and look like a memory leak in others.
     57     with sessions.Session() as session:
---> 58         return session.request(method=method, url=url, **kwargs)
     59 
     60 

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    510         }
    511         send_kwargs.update(settings)
--> 512         resp = self.send(prep, **send_kwargs)
    513 
    514         return resp

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/requests/sessions.py in send(self, request, **kwargs)
    620 
    621         # Send the request
--> 622         r = adapter.send(request, **kwargs)
    623 
    624         # Total elapsed time of the request (approximately)

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    493 
    494         except (ProtocolError, socket.error) as err:
--> 495             raise ConnectionError(err, request=request)
    496 
    497         except MaxRetryError as e:

ConnectionError: ('Connection aborted.', OSError("(54, 'ECONNRESET')",))

SSL verification fails even after passing custom certificates

This code

import pandas as pd import cbsodata toc = pd.DataFrame(cbsodata.get_table_list())

Returns
SSLError: HTTPSConnectionPool(host='opendata.cbs.nl', port=443): Max retries exceeded with url: /ODataCatalog/Tables?$format=json (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))

This is a common error for us because our company (APG in Heerlen) intercepts SSL certificates, and we need to provide custom certificates to the Python requests.

The following bit works for 'bare' requests:
import os os.environ['REQUESTS_CA_BUNDLE'] = 'C:/dev/ca-bundle.crt'

Alternatively, this also works in general:
s=requests.Session() s.verify = 'C:/dev/ca-bundle.crt'

However, both fixes still don't fix the original issue with CBSOdata.

I inspected the CBSOdata Python code and could not find anything strange, why for instance os.environ['REQUESTS_CA_BUNDLE'] would be ignored etc., but still believe that adding an option in the options object to set verify to a directory with custom certificates, would be the solution.
Any other ideas/hints/suggestions are welcome of course!

Better error messages

404 results in a error like this:

In [1]: import cbsodata
   ...: import pandas as pd
   ...: data = pd.DataFrame(cbsodata.get_data('47015NED'))
   ...: print(data.head())
   ...: 
/Users/jonathandebruin/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-1-7c8e94229f87> in <module>()
      1 import cbsodata
      2 import pandas as pd
----> 3 data = pd.DataFrame(cbsodata.get_data('47015NED'))
      4 print(data.head())

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/cbsodata.py in get_data(table_id, dir, typed, select, filters)
    277 
    278     metadata = download_data(table_id, dir=dir, typed=typed,
--> 279                              select=select, filters=filters)
    280 
    281     if "TypedDataSet" in metadata.keys():

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/cbsodata.py in download_data(table_id, dir, typed, select, filters)
    155 
    156     # http://opendata.cbs.nl/ODataApi/OData/37506wwm?$format=json
--> 157     metadata_tables = _download_metadata(table_id, "")
    158 
    159     # The names of the tables with metadata

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/cbsodata.py in _download_metadata(table_id, metadata_name, select, filters)
     80         r = requests.get(url, params=params)
     81 
---> 82         res = r.json(encoding='utf-8')
     83         res_value = res['value']
     84 

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/requests/models.py in json(self, **kwargs)
    894                     # used.
    895                     pass
--> 896         return complexjson.loads(self.text, **kwargs)
    897 
    898     @property

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/simplejson/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw)
    533             raise TypeError("use_decimal=True implies parse_float=Decimal")
    534         kw['parse_float'] = Decimal
--> 535     return cls(encoding=encoding, **kw).decode(s)
    536 
    537 

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/simplejson/decoder.py in decode(self, s, _w, _PY3)
    368         if _PY3 and isinstance(s, bytes):
    369             s = str(s, self.encoding)
--> 370         obj, end = self.raw_decode(s)
    371         end = _w(s, end).end()
    372         if end != len(s):

~/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/simplejson/decoder.py in raw_decode(self, s, idx, _w, _PY3)
    398             elif ord0 == 0xef and s[idx:idx + 3] == '\xef\xbb\xbf':
    399                 idx += 3
--> 400         return self.scan_once(s, idx=_w(s, idx).end())

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Add logging

Use logging module to output info (not print).

Cbsodata 1.3.2 crashing in debug mode.

Problem

When importing cbsodata (version 1.3.2) in debug mode script execution crashes.

Where it happens

Pycharm:

  • Python 3.7.3 (debug mode)
  • Python 3.7.8 (debug mode)
  • Haven't tested any other versions

Spyder:

  • Python 3.8 (normal execution and debug mode)

Expected behaviour

Package is imported successfully.

Pycharm error message
Pycharm Community Edition (latest) + Python 3.7.3: Stack overflow error. (Same error in Python 3.7.8)

Fatal Python error: Cannot recover from stack overflow.


Thread 0x00004d14 (most recent call first):
  File "C:\Users\foobar\AppData\Local\Programs\Python\Python37\lib\threading.py", line 300 in wait
  File "C:\Users\foobar\AppData\Local\Programs\Python\Python37\lib\threading.py", line 552 in wait
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.2\helpers\pydev\pydevd.py", line 142 in _on_run
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.2\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 213 in run
  File "C:\Users\foobar\AppData\Local\Programs\Python\Python37\lib\threading.py", line 917 in _bootstrap_inner
  File "C:\Users\foobar\AppData\Local\Programs\Python\Python37\lib\threading.py", line 885 in _bootstrap

Thread 0x000068f8 (most recent call first):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.2\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 283 in _on_run
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.2\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 213 in run
  File "C:\Users\foobar\AppData\Local\Programs\Python\Python37\lib\threading.py", line 917 in _bootstrap_inner
  File "C:\Users\foobar\AppData\Local\Programs\Python\Python37\lib\threading.py", line 885 in _bootstrap

Thread 0x00007ae0 (most recent call first):
  File "C:\Users\foobar\AppData\Local\Programs\Python\Python37\lib\threading.py", line 300 in wait
  File "C:\Users\foobar\AppData\Local\Programs\Python\Python37\lib\queue.py", line 179 in get
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.2\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 358 in _on_run
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.2.2\helpers\pydev\_pydevd_bundle\pydevd_comm.py", line 213 in run
  File "C:\Users\foobar\AppData\Local\Programs\Python\Python37\lib\threading.py", line 917 in _bootstrap_inner
  File "C:\Users\foobar\AppData\Local\Programs\Python\Python37\lib\threading.py", line 885 in _bootstrap

Current thread 0x00002a50 (most recent call first):
  File "C:\Users\foobar\PycharmProjects\powerbi_to_pdf\venv\lib\site-packages\cbsodata\cbsodata3.py", line 84 in __getattr__
  [Continues multiple times]

Spyder error message

Kernel died...

To reproduce the error

Install cbsodata via pip pip install cbsoadata

main.py

import cbsodata as cbs

Run main.py in debug mode.

Run the same line in the versions and editors specified on top, same error.

Pip freeze

cbsodata==1.3.2
certifi==2020.11.8
chardet==3.0.4
idna==2.10
numpy==1.19.0
pandas==1.1.4
pyodbc==4.0.30
python-dateutil==2.8.1
pytz==2020.4
PyYAML==5.3.1
requests==2.24.0
selenium==3.141.0
six==1.15.0
urllib3==1.25.11

Could it be my fault somehow?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.