Giter Site home page Giter Site logo

wrds2pg's Introduction

WRDS (or SAS) to PostgreSQL

This software has two functions:

  • Download tables from WRDS and feed them to a PostgreSQL database. (Requires access to WRDS and to the data in question.)
  • Import a SAS file (*.sas7dbat) into a PostgreSQL database.

Requirements

1. Python

The software uses Python 3 and depends on Pandas, SQLAlchemy and Paramiko. In addition, the Python scripts generally interact with PostgreSQL using SQLAlchemy and the Psycopg library. These dependencies are installed when you use Pip (see instructions below).

2. A WRDS ID

To use public-key authentication to access WRDS, follow hints taken from here to set up a public key. Copy that key to the WRDS server from the terminal on your computer. (Note that this code assumes you have a directory .ssh in your home directory. If not, log into WRDS via SSH, then type mkdir ~/.ssh to create this.) Here's code (for me) to create the key and send it to WRDS:

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub | ssh [email protected] "cat >> ~/.ssh/authorized_keys"

Use an empty passphrase in setting up the key so that the scripts can run without user intervention.

3. PostgreSQL

You should have a PostgreSQL database to store the data.

4. Environment variables

Environment variables that the code can use include:

  • PGDATABASE: The name of the PostgreSQL database you use.
  • PGUSER: Your username on the PostgreSQL database.
  • PGHOST: Where the PostgreSQL database is to be found (this will be localhost if its on the same machine as you're running the code on)
  • WRDS_ID: Your WRDS ID.

I set these environment variables in ~/.profile:

export PGHOST="localhost"
export PGDATABASE="crsp"
export WRDS_ID="iangow"
export PGUSER="igow"

Using the function wrds_update.

Two arguments table_name and schema are required.

1. WRDS Settings

Set WRDS_ID using either wrds_id=your_wrds_id in the function call or the environment variable WRDS_ID.

2. Environment variables

The software will use the environment variables PGHOST, PGDATABASE, and PGUSER if you If you have set them. Otherwise, you need to provide values as arguments to wrds_udpate(). Default PGPORT is5432.

3. Table settings

To tailor your request, specify the following arguments:

  • fix_missing: set to True to fix missing values. This addresses special missign values, which SAS's PROC EXPORT dumps as strings. Default value is False.
  • fix_cr: set to True to fix characters. Default value is False.
  • drop: add column names to be dropped (e.g., drop="id name" will drop columns id and name).
  • obs: add maxium number of observations (e.g., obs=10 will import the first 10 rows from the table on WRDS).
  • rename: rename columns (e.g., rename="fee=mngt_fee" renames fee to mngt_fee).
  • force: set to True to force update. Default value is False.

Importing SAS data into PostgreSQL

The software can also upload SAS file directly to PostgreSQL. You need to have local SAS in order to use this function. Use fpath to specify the path to the file to be imported

Examples

Here are some examples.

If you are at the home directory of this git repo, you can import and use the software as shown below.

To install it from Github:

sudo -H pip3 install git+https://github.com/iangow/wrds2pg --upgrade

This software is also available from PyPI. To install it from PyPI:

pip3 install wrds2pg

Example usage:

from wrds2pg import wrds_update

# 1. Download crsp.mcti from wrds and upload to pg as crps.mcti
# Simplest version
wrds_update(table_name="mcti", schema="crsp")

# Tailored arguments 
wrds_update(table_name="mcti", schema="crsp", host=your_pghost, 
	dbname=your_pg_database, 
	fix_missing=True, fix_cr=True, drop="b30ret b30ind", obs=10, 
	rename="caldt=calendar_date", force=True)

# 2. Upload test.sas7dbat to pg as crsp.mcti
wrds_update(table_name="mcti", schema="crsp", fpath="your_path/test.sas7dbat")

Report bugs

Author: Ian Gow, [email protected] Contributor: Jingyu Zhang, [email protected]

wrds2pg's People

Contributors

iangow avatar fmcetin avatar evanjo avatar jingyujzhang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.