Giter Site home page Giter Site logo

postgres_to_redshift's Introduction

PostgresToRedshift

This gem copies data from postgres to redshift. It's especially useful to copy data from postgres to redshift in heroku.

Build Status

Installation

Add this line to your application's Gemfile:

gem 'postgres_to_redshift'

And then execute:

$ bundle

Or install it yourself as:

$ gem install postgres_to_redshift

Usage

Set your source and target databases, as well as your s3 intermediary.

export POSTGRES_TO_REDSHIFT_SOURCE_URI='postgres://username:password@host:port/database-name'
export POSTGRES_TO_REDSHIFT_TARGET_URI='postgres://username:password@host:port/database-name'
export POSTGRES_TO_REDSHIFT_TARGET_SCHEMA='testing-data'
export S3_DATABASE_EXPORT_ID='yourid'
export S3_DATABASE_EXPORT_KEY='yourkey'
export S3_DATABASE_EXPORT_BUCKET='some-bucket-to-use'

postgres_to_redshift

Contributing

  1. Fork it ( https://github.com/kitchensurfing/postgres_to_redshift/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

postgres_to_redshift's People

Contributors

anark avatar ashawley avatar dkaufman avatar ericgross avatar jhuangtw avatar toothrot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

postgres_to_redshift's Issues

failed to allocate memory (NoMemoryError)

I have a very large table in Postgres of size ~33GB. After an hour of copying the data down, the memory runs out. Seems to because StringIO is used with Zlib::GzipWriter:

$ postgres_to_redshift
Downloading scores
~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:92:in `write': failed to allocate memory (NoMemoryError)
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:92:in `write'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:92:in `block in copy_table'
        from /usr/local/share/ruby/gems/2.2/gems/pg-0.17.1/lib/pg/connection.rb:138:in `copy_data'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:90:in `copy_table'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:23:in `block in update_tables'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:20:in `each'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:20:in `update_tables'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/bin/postgres_to_redshift:5:in `<top (required)>'
        from ~/bin/postgres_to_redshift:23:in `load'
        from ~/bin/postgres_to_redshift:23:in `<main>'
zlib(finalizer): Zlib::GzipWriter object must be closed explicitly.
zlib(finalizer): the stream was freed prematurely.

Invalid connection! (PG::Error)

I get the error Invalid connection! (PG::Error) from the Postgres source database with a large table that is about 33GB. After about 40 minutes, it gives the following error:

$ postgres_to_redshift
INFO:  Relation "submissions" already exists and will be skipped
Downloading submissions
Uploading submissions.1
Uploading submissions.2
Uploading submissions.3
Uploading submissions.4
Uploading submissions.5
Uploading submissions.6
Uploading submissions.7
Uploading submissions.8
Uploading submissions.9
/usr/local/share/ruby/gems/2.2/gems/pg-0.17.1/lib/pg/connection.rb:140:in `cancel': Invalid connection! (PG::Error)
        from /usr/local/share/ruby/gems/2.2/gems/pg-0.17.1/lib/pg/connection.rb:140:in `rescue in copy_data'
        from /usr/local/share/ruby/gems/2.2/gems/pg-0.17.1/lib/pg/connection.rb:137:in `copy_data'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:97:in `copy_table'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:27:in `block in update_tables'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:24:in `each'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:24:in `update_tables'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/bin/postgres_to_redshift:5:in `<top (required)>'
        from ~/bin/postgres_to_redshift:23:in `load'
        from ~/bin/postgres_to_redshift:23:in `<main>'

This happens for a 16GB table, as well.

NOTE: That I am using the code from #15 to split files up and the code from #14 to avoid out-of memory.

Is the database connection getting lost? I've modified the following settings on the Postgres 9.3 RDS instance at AWS:

tcp_keepalives_count      10
tcp_keepalives_idle     1200
tcp_keepalives_interval 1200

Seems like it should allow 20 minutes of idle and check up to 10 times, but error keeps happening.

Configurable target schema

Hi, since we are importing several apps to the same redshift DB, we want to import them to separate schemas.

If you're interested I've worked on a quick implementation
carwow@688040b

getting zlib error when running the update_tables

Zlib::GzipFile::Error: closed gzip stream

here is the stack trace
/home/deploy/apps/vydia/shared/bundle/ruby/2.1.0/bundler/gems/postgres_to_redshift-1cf00b46e712/lib/postgres_to_redshift.rb:121:in `close' /home/deploy/apps/vydia/shared/bundle/ruby/2.1.0/bundler/gems/postgres_to_redshift-1cf00b46e712/lib/postgres_to_redshift.rb:121:in `ensure in copy_table' /home/deploy/apps/vydia/shared/bundle/ruby/2.1.0/bundler/gems/postgres_to_redshift-1cf00b46e712/lib/postgres_to_redshift.rb:122:in `copy_table' /home/deploy/apps/vydia/shared/bundle/ruby/2.1.0/bundler/gems/postgres_to_redshift-1cf00b46e712/lib/postgres_to_redshift.rb:27:in `block in update_tables' /home/deploy/apps/vydia/shared/bundle/ruby/2.1.0/bundler/gems/postgres_to_redshift-1cf00b46e712/lib/postgres_to_redshift.rb:24:in `each' /home/deploy/apps/vydia/shared/bundle/ruby/2.1.0/bundler/gems/postgres_to_redshift-1cf00b46e712/lib/postgres_to_redshift.rb:24:in `update_tables'

on ruby 2.1.9 directly using the master branch code postgresql is on 9.4

ERROR: cannot drop table TABLE_NAME_updating because other objects depend on it

Version 0.1.2

We have been using PostgresToRedshift.update_tables method to copy tables from PG to Redshift every day since August 2017 and never had a problem. Suddenly we got this error:

ERROR: cannot drop table TABLE_NAME_updating because other objects depend on it
HINT: Use DROP ... CASCADE to drop the dependent objects too.

We are bit lost since no code changes mode for a long time and can't figure out what is causing this error.

Has any one encountered this?

ERROR: syntax error at or near "permissions" (PG::SyntaxError) LINE 1: CREATE TABLE IF NOT EXISTS public.permissions ("id" integer,...

I have a poorly named table in my schema called permissions that causes the command to choke because of a SQL syntax error:

~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:21:in `exec': ERROR:  syntax error at or near "permissions" (PG::SyntaxError)
LINE 1: CREATE TABLE IF NOT EXISTS public.permissions ("id" integer,...
                                          ^
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:21:in `block in update_tables'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:20:in `each'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/lib/postgres_to_redshift.rb:20:in `update_tables'
        from ~/.gem/ruby/2.2/gems/postgres_to_redshift-0.1.1/bin/postgres_to_redshift:5:in `<top (required)>'
        from ~/bin/postgres_to_redshift:23:in `load'
        from ~/bin/postgres_to_redshift:23:in `<main>'

Handle money fields as decimals

Right now, money fields are imported as character varying(max). These should be cast to decimal on export from Postgres, then imported as decimal into redshift.

Since the current export is a COPY * FROM table_name, this is nontrivial.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.