Giter Site home page Giter Site logo

fech's Introduction

 ______   ______     ______     __
/\  ___\ /\  ___\   /\  ___\   /\ \___
\ \  __\ \ \  __\   \ \ \____  \ \  __ \
 \ \_\    \ \_____\  \ \_____\  \ \_\ \_\
  \/_/     \/_____/   \/_____/   \/_/\/_/

Fech makes it easy to parse electronic campaign finance filings by candidates, parties and political action committees from the Federal Election Commission. It lets you access filing attributes the same way regardless of filing version, and works as a framework for cleaning and filing data. Fech is an open source project of The New York Times, but contributions from anyone interested in working with F.E.C. filings are greatly appreciated.

Latest version: 1.8. For details see the CHANGELOG.

Fech works best under Ruby version 2.x, and has been tested under Ruby versions 1.8.7, 1.9.2, 1.9.3, 2.1.2, 2.2.2 and Rubinius.

Documentation

Can be found at Fech’s Github page.

News

  • August 19, 2015: Version 1.8 released. Replaced ‘sources` directory with submodule pulling from fech-sources.

  • March 28, 2015: Version 1.7 released. Added support for Schedule SA3L and F3Z transactions, fixed some bugs and updated gems and specs.

  • March 11, 2014: Version 1.6.4 released. Bugfix for Schedule E transactions to fix office state and district.

  • Jan. 22, 2014: Version 1.6.3 released. Bugfix to ensure Translator object is available to any Filing object. Thanks to @abstrctn for fix.

  • Jan. 15, 2014: Version 1.6.2 released. Bugfix to add dissemination date to SE mappings.

  • Jan. 15, 2014: Version 1.6.1 released. Bugfix to add FEC version 8.1 mappings for F13 and F24 forms.

  • Jan. 14, 2014: Version 1.6.0 released. Added support for FEC electronic filing version 8.1 headers and mappings.

  • Nov. 14, 2013: Version 1.5.0 released. Bugfix for F3 mapping that did not include col_a_total_receipts_period. Thanks to @capitolmuckrakr for the report.

  • Sep. 10, 2013: Version 1.4.3 released. Bugfix for certain filings that raise encoding errors using Ruby 1.9.3 or greater.

  • Sep. 6, 2013: Version 1.4.2 released. Added support for Ruby 2.0 and fixed improper mapping for F1S records. Thanks to @bycoffe for the report.

  • May 22, 2013: Version 1.4.1 released. Bugfix to add version 6.2 to F5 filing mappings.

  • April 18, 2013: Version 1.4 released. Adds support for unofficial Senate electronic filings.

  • March 26, 2013: Version 1.3.2 released. Bugfix for F99 filing encoding issues. Thanks for the patch, @jgillum.

  • Dec. 11, 2012: Versions 1.3, 1.3.1 released. Adds support for F13 filings from inaugural committees and fixes an encoding bug.

  • Dec. 3, 2012: Version 1.2 released. Fixes encoding errors under Ruby 1.9.3 for ASCII-encoded filings. Thanks to Sanjiv for the bug report.

  • Nov. 13, 2012: Version 1.1 released. CSVDoctor skips rows that don’t match row type being searched for, which provides a performance boost, and smaller bugfixes for Form 99 handling and date-field conversions. Thanks to Sai for several patches.

  • June 16, 2012: Version 1.0.1 released. Bug-fix for older Form 2 support.

  • April 11, 2012: Version 1.0.0 released! Support for Ruby 1.9.3 added, all form types supported.

  • April 9, 2012: Version 1.0.0.rc1 released. Release candidate with backwards-incompatible change (renaming zip attribute to zip_code).

  • March 29, 2012: Version 0.9.10 released. Bug-fix for Form 24 in versions 6.4 and 7.0.

  • March 28, 2012: Version 0.9.9 released. Bug-fix to add support for F3XA form type, support for Schedule H and L.

  • March 23, 2012: Version 0.9.6 and 0.9.5 released. Bug-fixes for F6 mappings.

  • March 10, 2012: Version 0.9.4 released. Added support for F6 filings.

  • March 8, 2012: Version 0.9.3 released. Bug-fix for F2 & F24 mappings.

  • Feb. 29, 2012: Version 0.9.2 released. Bug-fix for F3 mappings, added filing comparison class.

  • Feb. 21, 2012: Versions 0.9.0, 0.9.1 released. Added support for alternative CSV Parsers, F4 filings.

  • Feb. 19, 2012: Version 0.8.2 released. Added layouts for F1M and F2 filings.

  • Feb. 15, 2012: Version 0.8.1 released. Bug-fix to support F3 termination filings.

  • Feb. 13, 2012: Version 0.8.0 released. Layouts for form 3 and form 1 added, for parsing House candidate committee filings.

  • Feb. 11, 2012: Version 0.7.0 released. Layouts for form 9 added, for parsing electioneering communications filings.

  • Jan. 28, 2012: Version 0.6.0 released. Added support for quoted fields.

  • Nov. 22, 2011: Version 0.5.0 released. Layouts for form 3X added, for parsing filings from non-candidate committees.

  • Nov. 13, 2011: Version 0.3.0 released. Layouts for forms 24, 5, 56 and 57 added, for parsing independent expenditure filings.

  • Nov. 4, 2011: Version 0.2.1 released. Bug-fix release to address a problem with the :include option for selecting only certain columns. Thanks to Aaron Bycoffe for the report and patch.

Installation

Install Fech as a gem:

gem install fech

For use in a Rails 3 application, put the following in your Gemfile:

gem 'fech'

then issue the ‘bundle install’ command. Fech has been tested under Ruby versions 1.8.7, 1.9.2, 1.9.3, 2.1.2, 2.2.2 and Rubinius.

How to contribute

To develop locally, you’ll need to clone this repository and then run the following commands to update the ‘sources` directory:

“‘ $ git submodule init $ git submodule update “`

Fech’s goal is to provide support for all electronically filed forms and their row types, so contributors can pick an form type that is currently unsupported and try to implement it, or to improve an existing implementation. For entirely new form types, please include specs showing successful parsing of the summary. A good way to start is to look at the mappings for a similar form type. Please note that Fech currently commits to supporting the F.E.C.‘s filing formats back to version 3.0, where applicable.

Bug reports and feature requests are welcomed by submitting an Issue. Please be advised that development is focused on parsing filings, which may contain errors or be improperly filed, and not on providing wrappers or helper methods for working with filings in another context.

To get started, fork the repo, make your additions or changes and send a pull request.

Pronunciation guide

It’s “fetch”, with a soft “ch” sound. There are no other acceptable pronunciations.

Authors

Michael Strickland, [email protected]

Evan Carmi, [email protected]

Aaron Bycoffe, [email protected]

Derek Willis, [email protected]

Daniel Pritchett, [email protected]

Sai, [email protected]

Jack Gillum, [email protected]

Copyright © 2013 The New York Times Company. See LICENSE for details.

fech's People

Contributors

abstrctn avatar bycoffe avatar dfc avatar dpritchett avatar dwillis avatar jeremyjbowers avatar saizai avatar zstumgoren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fech's Issues

Fech::MapGenerator.convert_header_file_to_row_files can fail in Ruby 1.9.3

Running specs in ruby-1.9.3-p125 results in one failed spec:

  1. Fech::MapGenerator.convert_header_file_to_row_files should not raise error
    Failure/Error: Fech::MapGenerator.convert_header_file_to_row_files(@source_dir)
    ArgumentError:
    invalid byte sequence in UTF-8

    ./lib/fech/map_generator.rb:74:in `block in convert_header_file_to_row_files'

    ./lib/fech/map_generator.rb:73:in`each'

    ./lib/fech/map_generator.rb:73:in `convert_header_file_to_row_files'

    ./spec/map_generator_spec.rb:42:in`block (3 levels) in <top (required)>'

Seems to be a String encoding issue: http://stackoverflow.com/questions/2982677/ruby-1-9-invalid-byte-sequence-in-utf-8

F1S mapping is incorrect

At least for filings with version 8.0, the name of the affiliated committee appears with a key of :change_of_committee_name and the affiliated committee's ID appears with a key of :committee_name.

To re-create:

filing = Fech::Filing.new(768055).download
filing.rows_like(/f1s/).first
=> {:form_type=>"F1S",
  :filer_committee_id_number=>"C00478354",
  :change_of_committee_name=>"FRIENDS OF JOHN BOEHNER",
  :committee_name=>"C00237198"
   ...     }

Other FEC data?

I've started work on a separate parser for some of the FTP data offered by the FEC; that code is here for now: https://github.com/dwillis/FecFTP. So far it offers a wrapper to committee, candidate and committee contribution files (the individual file seems probably too big to process quickly, but it could be done).

The question I raise is whether that effort should be part of Fech or kept separately. We've tried to keep Fech fairly clean in the past, and I understand the reasons for doing so, but was wondering if we should revisit that practice. Any thoughts?

License missing from gemspec

RubyGems.org doesn't report a license for your gem. This is because it is not specified in the gemspec of your last release.

via e.g.

  spec.license = 'MIT'
  # or
  spec.licenses = ['MIT', 'GPL-2']

Including a license in your gemspec is an easy way for rubygems.org and other tools to check how your gem is licensed. As you can imagine, scanning your repository for a LICENSE file or parsing the README, and then attempting to identify the license or licenses is much more difficult and more error prone. So, even for projects that already specify a license, including a license in your gemspec is a good practice. See, for example, how rubygems.org uses the gemspec to display the rails gem license.

There is even a License Finder gem to help companies/individuals ensure all gems they use meet their licensing needs. This tool depends on license information being available in the gemspec. This is an important enough issue that even Bundler now generates gems with a default 'MIT' license.

I hope you'll consider specifying a license in your gemspec. If not, please just close the issue with a nice message. In either case, I'll follow up. Thanks for your time!

Appendix:

If you need help choosing a license (sorry, I haven't checked your readme or looked for a license file), GitHub has created a license picker tool. Code without a license specified defaults to 'All rights reserved'-- denying others all rights to use of the code.
Here's a list of the license names I've found and their frequencies

p.s. In case you're wondering how I found you and why I made this issue, it's because I'm collecting stats on gems (I was originally looking for download data) and decided to collect license metadata,too, and make issues for gemspecs not specifying a license as a public service :). See the previous link or my blog post about this project for more information.

Dealing with ActBlue (and similar) filings

Filings by ActBlue, one of the largest conduit committees, are enormous, and can take minutes to parse Schedule A & B (one recent filing has 152k itemized contributions). I've tried grabbing specific fields, but it still takes awhile. Anyone have any ideas?

How do I grab all the data?

I'd like to replicate the entire FEC filings database.

Currently I'm only scraping the committee & candidate master files (http://www.fec.gov/finance/disclosure/ftpdet.shtml).

  1. Is that totally mooted by the filings?
  2. Where can I get a list of all filings issued to date, or at least the first filing number? http://query.nictusa.com/rss/ can keep me up to date once I do have it all, but I need to get there first.
  3. How do you deal with amendments, bad data, etc? For instance, just from the master files, I note that there are lots of committees or candidates that cross-reference each other wrongly; lots of variance in how "no data" is represented (argh why can't people learn to use nil); various exceptions that broke my validations (eg an undocumented committee type "O" and candidate status "Q").

It seems that NYC's CampaignCash API does at least some of this processing; it would be nice if that code could be open sourced as well, since essentially I am replicating it. (Why: I just don't want to have a dependency on NYT's database. I have to store a bunch of it locally anyway, so I'd rather just have the whole damn thing locally and be able to run my own queries.)

Major overhaul of source mapping

I noticed that a lot of mappings were
a) just wrong (e.g. linked to the wrong record, like col a vs col b, or the wrong version number / line item)
b) missing (e.g. no field to capture some data in a record)
c) duplicated (e.g. multiple fields mapped to the same name)
d) inconsistently named
e) not well segregated (e.g. comma or newline within fields that aren't escaped and are comma/newline separated)

So I'm working on a major overhaul of the source mapping, deriving directly from the e-filing headers all versions.xlsx eFilingFormats file. While at it, I'm having it support versions 1 & 2 as well as deprecated forms.

Because the data import will have to be re-done anyway (because of a-c above), I'm being a bit aggressive about making the names consistent and semantic — e.g. total_receipts_ytd instead of col_b_total_receipts. I'm hoping to reduce the total number of canonical field names from the current ~1.2k to something a bit more sane. ;-)

The new version will have a regex based mapping file, with US delimiters (ascii 31) and field type/size data, both to make it easier to edit in the future and to be able to automatically output a database migration file.

I'm expecting to be done in about a week and will make a pull request then. Right now it's not in a fully consistent state.

So @dwillis et al, please hold off on working on this part of the code for the moment.

(Also, I'll be publishing an .sql.gz dump of the full import to date.)

Add a way to download a filing only if it hasn't already been downloaded

I think something like this has come up before -- whether Filing#download should re-download a filing if the file already exists.

While I still think it should assume the user wants to re-download the file when Filing#download is called, what if we added either:

  • A parameter to Filing#download that told it to download the filing only if the file doesn't already exist.
  • A new method on Filing (maybe Filing#download_if_needed) that would download the filing only if the file doesn't already exist.

I frequently use the pattern:

filing.download unless File.exists?(filing.file_path)

but making that check easier would be helpful in a lot of situations.

F3L - SA3L line csv file?

Question: Is there a Schedule A line SA3L csv file missing from the sources directory? In the lobbyist bundling reports (F3L) a slightly different schedule A is used. It seems to parse with schedule A, but the field names are different; the names listed on these lines aren't donors, of course, but lobbyist bundlers. Not sure where "associated text record" would appear if it were populated...

http://query.nictusa.com/cgi-bin/dcdev/forms/C00518282/840532/sa/3L

It looks like FEC doesn't consider this to be a different line; there's no page for it in FEC_Format_v8.0.xlsx , for instance.

Missing "people" dependency in translator.rb?

It looks like the gem on rubygems is a bit out of sync. I just installed it and I'm hung up:

ruby-1.9.2-p290 :006 > require 'fech'
LoadError: no such file to load -- people
        from /usr/local/rvm/rubies/ruby-1.9.2-p290/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/local/rvm/rubies/ruby-1.9.2-p290/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/fech-0.1.3/lib/fech/translator.rb:1:in `<top (required)>'
        from /usr/local/rvm/rubies/ruby-1.9.2-p290/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/local/rvm/rubies/ruby-1.9.2-p290/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/fech-0.1.3/lib/fech.rb:5:in `<top (required)>'
        from /usr/local/rvm/rubies/ruby-1.9.2-p290/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/local/rvm/rubies/ruby-1.9.2-p290/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from (irb):6
        from /usr/local/rvm/rubies/ruby-1.9.2-p290/bin/irb:16:in `<main>'
ruby-1.9.2-p290 :007 >

Non-standard field names for F3XN, F3N

Is there a reason for the difference in key names between responses for different F3 form types? For example, F3N filings get a col_a_individual_contributions_itemized field, while F3XN filings get a col_a_individuals_itemized field. I know the forms are different, but the fields seem similar enough to be standardized.

F3N F3XN
col_a_individual_contributions_unitemized col_a_individuals_unitemized
col_a_total_individual_contributions col_a_individual_contribution_total
col_a_political_party_contributions col_a_political_party_committees
col_a_pac_contributions col_a_other_political_committees_pacs
col_a_transfers_from_authorized col_a_transfers_from_aff_other_party_cmttees
col_a_offset_to_expenditures col_a_offsets_to_expenditures

Form 24 summary fails

The error message says the map hasn't been generated. Only seems to be an issue for versions 6.4 and 7.0 filings.

How can I retrieve all of FEC filing unique numeric identifiers?

I am trying using 'fech' to automate downloading all FEC filings from fec.gov. So far I found that I can retrieve a specific filing info by unique numeric id's from documentation:
filing = Fech::Filing.new(723604)
filing.download

So how can I retrieve every ids?

F3Z / F3ZT lines

I noticed there's no /sources/ file for F3Z lines. These are pretty odd lines only used by the principal campaign committee; the docs say "3Z: Consolidates the financial activity of other committees authorized by the candidate for the same campaign." In other words, this information should be available in other filings. So...

Some recent-ish examples of filings where this line is present are (filing number first):
853302 | BEAVEN FOR CONGRESS | F3 | 2013-01-31
852981 | COMMITTEE TO ELECT JOHN R COX | F3 | 2013-01-31
852531 | FROST FOR CONGRESS | F3 | 2013-01-31
851742 | DJOU FOR HAWAII | F3 | 2013-01-31
851644 | JOHNSON FOR CONGRESS | F3 | 2013-01-31

Summary mapping missing fields (0.5.0)

Fields on Form 3P missing:
-Col A Political Party Committees (Refunds of Contributions)
-Col B Political Party Committees (Refunds of Contributions)

They're not that important -- Obama's 2008 post-general only lists $300 in them. We'll be forking in the next couple of weeks and will add them.

Schedule I mappings missing

SchI is listed in the sources/headers CSV files, but there's no sources/SchI.csv, and it's not in map_generator.rb, fech_utils.rb, or rendered_maps.rb. There's no documentation to indicate this is intentional. What's the issue?

Integrating FEC.gov search form

Not so much an issue as a topic of discussion - should Fech be able to pull a filing from the FEC's FTP server when possible (or by option)? Would this even work?

Allow FECh to preprocess CSV files before parsing

If a line in a filing contains an illegal quote character, FECh fails.

For example, filing 756218, row 47606, has the following string in the last_name field:

O""Leary

Because double quotes are, by default, the quote character used by FasterCSV and CSV, the parser fails when it gets to this line.

One way around this would be to preprocess the raw CSV file before parsing to remove/replace such characters.

Install fech-source over https

Fech currently installs the fech-source submodule (specified in .gitmodules) using the git/ssh strategy. This causes a source-based install of Fech -- via bundler in our case -- to fail for a user that doesn't have an ssh public key on Github.

Here's an example gist demonstrating the issue.

This likely won't be a problem for local development by most contributors, who presumably have a Github account with their ssh key.

But it proved problematic for us in a production context, where we're installing Fech from source while we await release of a new Gem version that contains the FEC v8.2 fix.

Updating .gitmodules to use https for fech-source resolves the issue.

PR to follow.

How can I download specifc FEC filing download as csv with specific form type?

I am downloading every filing using my this codes:
filing = Fech::Filing.new(1029398)
filing.download
Now I have my filing at /tmp/1029398.fec
http://docquery.fec.gov/cgi-bin/forms/C00580100/1029398/ is a F3P type form
But my this *.fec filetype looks completely unknown to me.
I have opened it in Excel and the content looks scary:
image

Can I download this file as F3P.csv file like this format: https://github.com/dwillis/fech-sources/blob/master/F3P.csv
I have looked through the documentation http://nytimes.github.io/Fech/
But couldnt find any similar options

a few erroneous slugs in sources/F3.csv

The F3 CSV source appears to have erroneous slugs in lines 73-75. This is fairly insidious--note that two of these keys ( col_a_refunds_to_party_committees and col_a_refunds_to_other_committees ) are duplicates (originally appearing on lines 67 and 68 of the same file).

73: col_a_refunds_to_party_committees,59,24. Total Receipts this Period,66,24. Total Receipts this Period,...
74: col_a_refunds_to_other_committees,60,25. Subtotals,67,25. Subtotals,58,25. SubTotal,58,25. SubTotal
75: col_a_total_receipts_period,61,26. Total Disbursements this Period,68,26. Total Disbursements this Period,59,26....

I'm doing a standardization of variable slugs (in part to address the transaction_id / transaction_id_number disagreement) and can include suggested fixes for this in a pull request in a few days if that helps…

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.