esonderegger / fecfile Goto Github PK
View Code? Open in Web Editor NEWa python parser for the .fec file format
Home Page: https://esonderegger.github.io/fecfile/
License: Apache License 2.0
a python parser for the .fec file format
Home Page: https://esonderegger.github.io/fecfile/
License: Apache License 2.0
It looks like the F10 and F105 appeared in version 5.0 and existed through version 6.3, being discontinued in version 6.4.
These will need mappings and types.
Form 99- miscellaneous text takes the form of:
[BEGINTEXT]
Sample text
[ENDTEXT]
on lines 3-n of the filing. Currently this text gets ignored. It should be added to the text
field in the top level of the returned dictionary.
There was at least one example of a column b field showing up where I didn't expect to. Adding investigating this to the to-do list.
Hi!
I'm working on a port of this to rust. I'm trying to decide where to source the schema mappings. Possible options I've found are:
I found that FastFEC chose to use this repo as their upstream.
Could you explain what you see as the pros/cons of each of these?
So far, what I see are:
Are there other considerations I'm missing?
I would love to make it so that there was one complete and accurate listing of schemas so that the wide range of parsers would not have to duplicate this effort. Any idea what would be required to make that happen?
example filing: http://docquery.fec.gov/dcdev/posted/245235.fec
Hey @esonderegger I'm finally running some stuff with this library! Thanks for all your work.
One thing I noted is that some filings appear to not be UTF-8. This is external to your library, but causes it to crash and burn. Example: 1260488.fec fails with: 'utf-8' codec can't decode byte 0x92 in position 1062: invalid start byte.
This works fine if one just opens with the qwarg encoding = "ISO-8859-1", but I don't have good metrics yet on what encoding works best for all filings. Can update with more complete stats in a bit.
This is generated by python tests.py AllFormsHaveMappings.test_request
with that test uncommented. These are the missing mappings found from the first 5000 .fec files. Eventually the goal is to get through all filings without raising any FecParserMissingMappingError
exceptions.
And a list of newer missing mapping:
More forms to add to the list (9/28/2018: 1250000-1259000)
It looks like versions 6.3 and earlier for the F3Z and 5.0 and earlier for the SI don't have mappings.
The following forms have missing mappings for at least one paper version:
Line 125 in dc923ed
It looks like the code only allows for redirects on 404
s but doesn't have a condition for 300
s
I keep getting a FilingUnavailableError
:
[ERROR] FilingUnavailableError: The requested FEC file number (FEC-1305714) is unavailable.
Traceback (most recent call last):
File "/var/task/src/FECFileLoader.py", line 312, in lambdaHandler
options={'filter_itemizations': [FILING_TYPE]}):
File "/var/task/fecfile/__init__.py", line 134, in iter_http
raise FilingUnavailableError({'file_number': file_number})
But when I curl
that filing I get a 307 and 200 so it seems the filing is available:
> curl -IL http://docquery.fec.gov/dcdev/posted/1305714.fec
HTTP/1.1 307 Temporary Redirect
Cache-Control: no-cache
Content-length: 0
Location: https://docquery.fec.gov/dcdev/posted/1305714.fec
HTTP/1.1 200 OK
Server: Apache
Last-Modified: Sun, 20 Jan 2019 20:49:01 GMT
Cache-Control: max-age=21600
Expires: Wed, 02 Sep 2020 20:43:10 GMT
Content-Type: mime.types:application/fecprn
Content-Length: 705
Accept-Ranges: bytes
Date: Wed, 02 Sep 2020 14:43:10 GMT
X-Varnish: 111590184
Age: 0
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=63072000
Describe the bug
fecfile is unable to parse some independent expenditure filings since the update for version 8.4 due to missing mappings.
To Reproduce
import fecfile
ie_itemizations = {'filter_itemizations': ['SE','F57'], 'as_strings': True}
filing = fecfile.from_http('1616360', options=ie_itemizations)
FecParserMissingMappingError: cannot parse version 8.4 of form F5N - no mapping found
Expected behavior
The above code should have loaded a dictionary.
Environment
Python 3
Describe the bug
documentation bug
To Reproduce
Getting started with local dev. Clone, create and source venv, then comes the install...
> python setup.py
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help
Expected behavior
A clean install
Additional context
It should be python setup.py install
:)
In some filings, fields are enclosed in quotation marks even though they don't need to be. That means the parser sees values like "4247.66"
and says "that doesn't look like a number to me".
I think if a value that is supposed to be numeric begins and ends with "
after we call strip()
on it, then we should try again with value[1:-1]
I've been getting a log of FilingUnavailableError
with origins here:
Line 134 in dc923ed
This doesn't give me much information about the underlying problem. It's doubly problematic because when I curl
some of these FEC files I get 200. I would really like to see more information like the HTTP status code in the error message.
[ERROR] FilingUnavailableError: The requested FEC file number (FEC-1305734) is unavailable.Traceback (most recent call last): File "/var/task/src/FECFileLoader.py", line 171, in lambdaHandler options={'filter_itemizations': [FILING_TYPE]}): File "/var/task/fecfile/__init__.py", line 134, in iter_http raise FilingUnavailableError({'file_number': file_number}) | [ERROR] FilingUnavailableEr
I'm pretty handy in python if you're up for this PR
When making paper mappings, the electron mappings (which we used as a starting point) appeared to be some column a fields marked as col_b
. This needs additional investigation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.