Giter Site home page Giter Site logo

hearthsim / docker-pgredshift Goto Github PK

View Code? Open in Web Editor NEW
61.0 11.0 10.0 21 KB

Redshift docker image based on postgres

Home Page: https://hub.docker.com/repository/docker/hearthsim/pgredshift

License: MIT License

Makefile 1.71% C 7.78% PLpgSQL 18.67% Dockerfile 71.84%
redshift postgres docker localstack

docker-pgredshift's Introduction

docker-pgredshift

A docker image based on Debian with PostgreSQL which simulates an AWS Redshift instance.

Why?

Amazon Redshift is close enough to, and compatible enough with Postgres that you can use a lot of Postgres tooling and queries with it transparently. But some of its features, or slight differences with Postgres, may be harder to work around.

Amazon does not make a local instance of Redshift available, nor is the project open source. This is especially annoying if you are writing tests against code which has to run queries with Redshift-specific syntax in them. Postgres will normally reject them unless you mock the features in some way.

That's what this project is. It's not meant to run in production, but it is meant to help mock Redshift's features for testing purposes.

PLEASE NOTE: As of July 2018, very little is implemented. PRs welcome.

Key differences

The ultimate goal of pgredshift is to be as close as possible to the real Redshift in terms of feature parity. However, some key differences will remain:

  • Redshift is based on Postgres 8.0.2, whereas pgredshift is based on Postgres 10 or newer.
  • pgredshift will enforce various forms of data integrity (such as Foreign Key constraints) which Redshift does not enforce.
  • Some core changes Redshift made to Postgres may not be replicatable in Postgres.
  • Do not expect performance and underlying data storage efficiency to ever be replicated.

Features

The pgredshift image is build on top of Debian "Buster".

plpythonu

The image is built with plpythonu (Python 2.7) language support. More information: https://docs.aws.amazon.com/redshift/latest/dg/udf-python-language-support.html

The following packages are installed:

  • NumPy 1.8.2
  • Pandas 0.18.1 (up from 0.14.1 on Redshift)
  • python-dateutil 2.2
  • PyTZ 2015.7
  • SciPy 0.12.1
  • six 1.3.0
  • wsgiref 0.1.2
  • enum34 1.1.6 (Not available on Redshift)
  • psycopg2 2.7.5 (Not available on Redshift)

The image also includes pip, setuptools and wheel for Python 2.7.

plpython3u

The image is built with plpython3u (Python 3.6) language support. Although Redshift does not support Python 3, you may use this to help ensure compatibility of UDFs across Python 2 and 3.

The image includes pip, setuptools and wheel for Python 3.6.

Postgres extensions

SET query_group

The query_group extension adds support for the SET query_group to ... command. Postgres does not allow setting unknown variables, so including that extension prevents an error when issuing the command. Note that the value is ignored as query groups themselves are not implemented.

Reference: https://docs.aws.amazon.com/redshift/latest/dg/r_query_group.html

Additional tables

Redshift system tables are implemented in 00_stl_tables.sql and 00_stv_tables.sql. Expect them to be empty, or include garbage data, but SELECTs won't necessarily fail.

Additional functions

Additional functions are implemented as Python or SQL UDFs. For a list, see sql/01_functions.sql.

License

This project is dual-licensed under the MIT license and the PostgreSQL license. You may choose whichever license suits your purpose best. The full license texts are available in the LICENSE (MIT) and LICENSE.PostgreSQL (PostgreSQL License) files.

docker-pgredshift's People

Contributors

beheh avatar jleclanche avatar joolean avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docker-pgredshift's Issues

Testing framework

It would be great to have a framework that we could test created functions against expected results.

For example, testing

SELECT getdate();

would ensure that the added function would return a TIMESTAMP and its value is what we expected. These tests would be more complex when testing lots of inputs (see this PR).

I don't know if there is a framework to do this easily but I suspect it will require an additional dependency and the value of ensuring the functions work across postgres versions would be worth it.

Failed scripts when using the container but were OK when run in AWS RS

CREATE TABLE public.__yuniql_schema_version(
    sequence_id  INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
    version VARCHAR(512) NOT NULL,
    applied_on_utc TIMESTAMP NOT NULL DEFAULT(GETDATE()),
    applied_by_user VARCHAR(128) NOT NULL DEFAULT(CURRENT_USER),
    applied_by_tool VARCHAR(32) NOT NULL,
    applied_by_tool_version VARCHAR(16) NOT NULL,
    status VARCHAR(32) NOT NULL,
    duration_ms INTEGER NOT NULL,
    checksum VARCHAR(64) NOT NULL,
    failed_script_path VARCHAR(4000) NULL,
    failed_script_error VARCHAR(4000) NULL,
    additional_artifacts VARCHAR(4000) NULL,
    CONSTRAINT ix___yuniql_schema_version UNIQUE(version)
);
ERR   2021-12-30 21:32:40Z   Failed to execute run function. 42601: syntax error at or near "IDENTITY"
Diagnostics stack trace captured a Npgsql.PostgresException (0x80004005): 

It expects a 'dev' database in Redshift which is "postgres" in pgsql

SELECT 1 from pg_database WHERE datname = 'yuniqldb';

ERR   2021-12-30 21:35:45Z   Failed to execute run function. 3D000: database "dev" does not exist
Diagnostics stack trace captured a Npgsql.PostgresException (0x80004005): 

json_extract_array_element_text for strings

Compare:
select json_extract_array_element_text('["Sandwich", "Omelette", "Tikka Masala"]', 0)

In Redshift, that gets you an unquoted Sandwich, in docker-pgredshift that gets you a double-quoted "Sandwich".

Adding .strip('"') to the json.dumps seems to solve it:

CREATE FUNCTION json_extract_array_element_text(json_array text, array_index int) RETURNS text immutable as $$
    import json
    result = json.loads(json_array)[array_index]
    return json.dumps(result).strip('"')
    $$ LANGUAGE plpythonu;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.