Giter Site home page Giter Site logo

evancarroll / db-texas-ethics-commission Goto Github PK

View Code? Open in Web Editor NEW
9.0 5.0 4.0 1.95 MB

A schema loader for the Texas Ethics Commission https://www.ethics.state.tx.us/

License: GNU Affero General Public License v3.0

Perl 71.15% PLpgSQL 20.21% Makefile 8.64%
government-data campaign-contributions postgresql data-import open-data texas

db-texas-ethics-commission's Introduction

Texas Ethics Commission - Schema Loader

This is a PostgreSQL schema loader for the data provided by Texas Ethics Commission.

We provide a utility to

  • extract schema from PDSERF/Plus format export by reading the ReadMe files and parse them to determine the schema, types, and keys and constraints.
  • create the tables needed to load up the 1295 certs -- these are hand written from the pdf documentation by provided by TEC
  • load the data up from csv format into the tables we create

Internally, lines from PDSERF readme are either,

  • Table Description rows
  • Column Description-cotd rows
  • Column rows
  • Start-rows for table (Start with "Record #:")
  • End-rows for table (Containing just a -)

Column lines are either

  • Indented as part of a group (array) replicated a certain amount of times
  • Derived from a "single line"

All data is loaded up into PostgreSQL, including the Descriptions which we pull down as COMMENTS.

You can find the readmes from the Texas Ethics Commission added in this project here:

Coverage

This module loads has full coverage of the meta-data, and data of the TEC.

  • Lobby Reports (tables l_*)
  • Campaign Finance Reports (tables c_*)
  • 1295 Certs
tec.c_assetdata                tec.c_finaldata                tec.form1295_interested_party
tec.c_candidatedata            tec.c_loandata                 tec.l_awardmementodata
tec.c_contributiondata         tec.codes_counties             tec.l_coversheetladata
tec.c_coversheet1data          tec.codes_forms                tec.l_docketdata
tec.c_coversheet2data          tec.codes_office               tec.l_entertainmentdata
tec.c_coversheet3data          tec.codes_reports              tec.l_eventdata
tec.c_creditdata               tec.codes_total                tec.l_foodbeveragedata
tec.c_debtdata                 tec.c_pledgedata               tec.l_giftdata
tec.c_expendcategory           tec.c_spacdata                 tec.l_individualreportingdata
tec.c_expenddata               tec.c_traveldata               tec.l_subjectmatterdata
tec.c_filerdata                tec.form1295_box123            tec.l_transportationdata

Links

Installation

Requirements: PostgreSQL, git, curl

Repo download and database setup (example in bash):

$ git clone https://github.com/EvanCarroll/db-Texas-Ethics-Commission.git
$ cd ./db-Texas-Ethics-Commission
$ make
$ createdb mydb
$ psql -d mydb -f ./runme.sql
$ make clean

Background

Created at Houston Hackathon 2018 as the sole work of Evan Carroll.

License

If you use this, open source all (100%) of your stuff, or I'll litigate. The GPL is not the AGPL. Please read, and be advised:

GNU Affero General Public License v3, see included LICENSE.md

Contact

Contact Evan Carroll 281.901.0011 for a quote on development.

db-texas-ethics-commission's People

Contributors

evancarroll avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

db-texas-ethics-commission's Issues

Updated URLs in Makefile needed

Looks like the TEC has moved the location the files on their site. Please update Makefile with the updated URLS:

download:
curl --progress-bar
-o "$(DIR_TEC_DOCS)/TX_ERF13_7.pdf" "https://www.ethics.state.tx.us/data/search/cf/TX_ERF13_7.pdf"
-o "$(DIR_TEC_DOCS)/HowToImportContributionsAndExpenditures.pdf" "https://www.ethics.state.tx.us/data/filinginfo/HowToImportContributionsAndExpenditures.pdf"
-o "$(DIR_TEC_DOCS)/CampaignFinanceCSVFileFormat.pdf" "https://www.ethics.state.tx.us/data/search/cf/CampaignFinanceCSVFileFormat.pdf"
-o "$(DIR_TEC_DOCS)/1295CertificatesCSVFormat.pdf" "https://www.ethics.state.tx.us/data/search/1295/1295CertificatesCSVFormat.pdf"
-o "$(DIR_DATA)/1295Certificates.csv" "https://www.ethics.state.tx.us/data/search/1295/1295Certificates.csv"
-o "$(DIR_DATA)/TEC_LA_CSV.zip" "https://www.ethics.state.tx.us/data/search/lobby/TEC_LA_CSV.zip"
-o "$(DIR_DATA)/TEC_CF_CSV.zip" "https://www.ethics.state.tx.us/data/search/cf/TEC_CF_CSV.zip";

Also, consider updating the instructions to run: make devgen to rebuild the sql scripts so that any newly added files by TEC are included in the load into the DB.

Appreciate the tool!

Column definition for BigDecimal type update needed

This problem is seen by trying to import a record in the cover.csv file where a filed report contained an erroneous entry of 2104954824.00 in the unitemizedPledgeAmount. The filer later realized the error and filed a corrected report.

The Readme.txt file defines this field as follows:
21 unitemizedPledgeAmount BigDecimal 0000000000.00 12 Total unitemized pledges

The problem is in the Column.pm file where the sub pg_type defines a BigDecimal as numeric(10,2).

We should be defining BigDecimal as numeric(12,2).

Reasoning: https://www.postgresql.org/docs/9.1/datatype-numeric.html
We use the following terms below: The scale of a numeric is the count of decimal digits in the fractional part, to the right of the decimal point. The precision of a numeric is the total count of significant digits in the whole number, that is, the number of digits to both sides of the decimal point. So the number 23.5141 has a precision of 6 and a scale of 4. Integers can be considered to have a scale of zero.

Both the maximum precision and the maximum scale of a numeric column can be configured. To declare a column of type numeric use the syntax:

NUMERIC(precision, scale)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.