Giter Site home page Giter Site logo

jail_scrapers's People

Contributors

bfeldman89 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

eenblam

jail_scrapers's Issues

hinds issue when DOB is "//"

Traceback (most recent call last):
File "scrapers.py", line 991, in
main()
File "scrapers.py", line 983, in main
fndictjail.strip()
File "scrapers.py", line 535, in hcdc_scraper
airtab.insert(this_dict, typecast=True)
File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/airtable/airtable.py", line 384, in insert
return self._post(
File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/airtable/airtable.py", line 194, in _post
return self._request("post", url, json_data=json_data)
File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/airtable/airtable.py", line 187, in _request
return self._process_response(response)
File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/airtable/airtable.py", line 175, in _process_response
raise exc
File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/airtable/airtable.py", line 162, in _process_response
response.raise_for_status()
File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: ('422 Client Error: Unprocessable Entity for url: https://api.airtable.com/v0/appVOAkFIPJHZhYZh/intakes', '422 Client Error: Unprocessable Entity for url: https://api.airtable.com/v0/appVOAkFIPJHZhYZh/intakes [Error: {'type': 'INVALID_VALUE_FOR_COLUMN', 'message': 'Cannot parse date value "//" for field DOB'}]')

pdf_stuff.py error

From [email protected]  Fri Jul 31 08:58:48 2020
X-Original-To: blakefeldman
Delivered-To: [email protected]
From: [email protected] (Cron Daemon)
To: [email protected]
Subject: Cron <blakefeldman@Moms-Air> source ~/.bash_profile && cd ~/code/runners && ./local_hourly_runs.sh
X-Cron-Env: <PATH=/usr/local/bin:/usr/local/sbin:~/bin:/usr/bin:/bin:/usr/sbin:/sbin>
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <LOGNAME=blakefeldman>
X-Cron-Env: <USER=blakefeldman>
Date: Tue, 28 Jul 2020 02:25:00 -0500 (CDT)

Traceback (most recent call last):
  File "pdf_stuff.py", line 191, in <module>
    main()
  File "pdf_stuff.py", line 186, in main
    pdf_to_dc()
  File "pdf_stuff.py", line 118, in pdf_to_dc
    obj = dc.documents.get(obj.id)
  File "/Users/blakefeldman/.local/share/virtualenvs/jail_scrapers-50neoEVy/lib/python3.8/site-packages/documentcloud/base.py", line 113, in get
    response = self.client.get(
  File "/Users/blakefeldman/.local/share/virtualenvs/jail_scrapers-50neoEVy/lib/python3.8/site-packages/ratelimit/decorators.py", line 80, in wrapper
    return func(*args, **kargs)
  File "/Users/blakefeldman/.local/share/virtualenvs/jail_scrapers-50neoEVy/lib/python3.8/site-packages/documentcloud/client.py", line 142, in _request
    self._set_tokens()
  File "/Users/blakefeldman/.local/share/virtualenvs/jail_scrapers-50neoEVy/lib/python3.8/site-packages/documentcloud/client.py", line 74, in _set_tokens
    access_token, self.refresh_token = self._refresh_tokens(self.refresh_token)
  File "/Users/blakefeldman/.local/share/virtualenvs/jail_scrapers-50neoEVy/lib/python3.8/site-packages/documentcloud/client.py", line 105, in _refresh_tokens
    response = requests_retry_session().post(
  File "/Users/blakefeldman/.local/share/virtualenvs/jail_scrapers-50neoEVy/lib/python3.8/site-packages/requests/sessions.py", line 578, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/Users/blakefeldman/.local/share/virtualenvs/jail_scrapers-50neoEVy/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/blakefeldman/.local/share/virtualenvs/jail_scrapers-50neoEVy/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/Users/blakefeldman/.local/share/virtualenvs/jail_scrapers-50neoEVy/lib/python3.8/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='accounts.muckrock.com', port=443): Read timed out. (read timeout=10)

polish data broken

Traceback (most recent call last):
File "polish_data.py", line 361, in
main()
File "polish_data.py", line 357, in main
polish_data()
File "polish_data.py", line 27, in polish_data
get_all_intake_deets()
File "polish_data.py", line 323, in get_all_intake_deets
this_dict['intake_age'] = re.search(r"(\d\d) Years Old", chunks[0]).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

add logging for snapshots and archiving

create airtable records for the three new modules and the functions w/in them. Specifically, the following three modules:

  • jail_scrapers/snapshot.py
  • jail_scrapers/weekly_snapshot.py
  • jail_scrapers/airtab_archiving.py

screenshot of log airtable base

Screen Shot 2020-08-30 at 9 46 00 PM

snapshot error

no record
{'Madison admits': 10}
Traceback (most recent call last):
File "snapshot.py", line 48, in
main()
File "snapshot.py", line 44, in main
pop_otd(day, county, jail, quiet=False)
File "snapshot.py", line 17, in pop_otd
airtab_daily.update(record['id'], this_dict)
KeyError: 'id'

database is getting slow due to size

  • get snapshot of jail scrapers

  • update its db_key in .exports

  • duplicate new base

  • In active version, delete records if the last verified date is pre-10/1 and DOI pre-10/1

  • In archived version, delete opposite

  • delete all in jail scrapers archive besides jcdc & rename it jcdc archive

PythonAnywhere problem for scraping several jails

PythonAnywhere tasks started returning errors for functions scraping 5 of the 12 jails. Email them for support. For now, they are being run each hour via a local runner. That is not preferable. It is slow, and there is overlap in when local and PyA tasks run, both of which utilize the airtable API.

Traceback (most recent call last):
  File "scrapers.py", line 965, in <module>
    main()
  File "scrapers.py", line 961, in main
    fndict[jails[-1]]()
  File "scrapers.py", line 378, in kcdc_scraper
    r = requests.get('https://www.kempercountysheriff.com/roster.php')
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.kempercountysheriff.com', port=443): Max retries exceeded with url: /roster.php (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f17ffcb6370>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
  File "scrapers.py", line 965, in <module>
    main()
  File "scrapers.py", line 961, in main
    fndict[jails[-1]]()
  File "scrapers.py", line 300, in tcdc_scraper
    r = requests.get(url)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.tunicamssheriff.com', port=443): Max retries exceeded with url: /roster.php?grp=10 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8804dbbe50>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
  File "scrapers.py", line 965, in <module>
    main()
  File "scrapers.py", line 959, in main
    fndict[jail.strip()]()
  File "scrapers.py", line 533, in ccdc_scraper
    r = requests.get(url)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.claysheriffms.org', port=80): Max retries exceeded with url: /roster.php?grp=10 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2bc9407ca0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
  File "scrapers.py", line 965, in <module>
    main()
  File "scrapers.py", line 959, in main
    fndict[jail.strip()]()
  File "scrapers.py", line 792, in jcdc_scraper
    r = requests.get(url)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.jonesso.com', port=443): Max retries exceeded with url: /roster.php (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f1e76b83580>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
  File "scrapers.py", line 965, in <module>
    main()
  File "scrapers.py", line 959, in main
    fndict[jail.strip()]()
  File "scrapers.py", line 877, in ccj_scraper
    r = requests.get(main_url, headers=muh_headers)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/home/bfeldman89/.virtualenvs/jail_scrapers-aQq2SYv0/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.calhounso.org', port=80): Max retries exceeded with url: /page.php?id=7 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f33b16b0640>: Failed to establish a new connection: [Errno 111] Connection refused'))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.