Giter Site home page Giter Site logo

pgspider / parquet_s3_fdw Goto Github PK

View Code? Open in Web Editor NEW
168.0 14.0 23.0 837 KB

ParquetS3 Foreign Data Wrapper for PostgresSQL

License: Other

Makefile 0.13% Python 0.80% C 0.60% C++ 19.47% Shell 0.13% PLpgSQL 78.85% Dockerfile 0.02%
fdw foreign-data-wrapper foreign-tables parquet postgresql postgresql-extension s3 parquets3-fdw

parquet_s3_fdw's Introduction

# PGSpider
PGSpider is High-Performance SQL Cluster Engine for distributed big data.  
PGSpider can access a number of data sources using Foreign Data Wrapper(FDW) and retrieves the distributed data source vertically.  
Usage of PGSpider is the same as PostgreSQL except its program name is `pgspider` and default port number is `4813`. You can use any client applications such as libpq and psql.

## Features
* Multi-Tenant  
    User can get records in multi tables by one SQL easily.  
    If there are tables with similar schema in each data source, PGSpider can view them as a single virtual table: We call it as Multi-Tenant table.  

* Modification  
    User can modify data at Multi-Tenant table by using INSERT/UPDATE/DELETE query.  
    For INSERT feature, PGSpider will use round robin method to choose 1 alive node that supports INSERT feature and is the next to the previous target as rotation to INSERT data.  
    For UPDATE/DELETE feature, PGSpider will execute UPDATE/DELETE at all alive nodes that support UPDATE/DELETE feature.  
    PGSpider supports both Direct and Foreign Modification.  
    PGSpider supports bulk INSERT by using batch_size option.  
    - If user specifies batch size option, we can get batch size from foreign table or foreign server option.
    - If batch_size is 1, tell child nodes to execute simple insert. Otherwise, execute batch insert if child node can do.
    - If batch_size is not specified by user on multi-tenant table, automatically calculation based on batch size of child tables and the number of child nodes using LCM method (Least Common Multiple).  
      If batch size is too large, we use the limited value (6553500) as batch size.  
      PGSpider distributes records to data sources evenly, not only for one query but also for many queries.

* Parallel processing  
    PGSpider executes queries and fetches results from child nodes in parallel.  
    PGSpider expands Multi-Tenant table to child tables, creates new threads for each child table to access corresponding data source.

* Pushdown   
    WHERE clause and aggregation functions are pushed down to child nodes.  
    Pushdown to Multi-tenant tables occur error when AVG, STDDEV and VARIANCE are used.  
    PGSPider improves this error, PGSpider can execute them.
  
* Data Compression Transfer
    PGSpider support transferring data to other datasource via Cloud Function.  
    Data will be compressed, transmitted to Cloud Function, and then transfered to data source.  
    This feature helps PGSpider control and reduce the size of transferred data between PGSpider and destination data source, lead to reduce the usage fee on cloud service

## How to build PGSpider

Clone PGSpider source code.
<pre>
git clone https://github.com/pgspider/pgspider.git
</pre>

Build and install PGSpider and extensions.
<pre>
cd pgspider
./configure
make
sudo make install
cd contrib/pgspider_core_fdw
make
sudo make install
cd ../pgspider_fdw
make
sudo make install
</pre>

Default install directory is /usr/local/pgspider.

## Usage
For example, we will create 2 different child nodes, SQLite and PostgreSQL. They are accessed by PGSpider as root node.
Please install SQLite and PostgreSQL for child nodes. 

After that, we install PostgreSQL FDW and SQLite FDW into PGSpider. 

Install SQLite FDW 
<pre>
cd ../
git clone https://github.com/pgspider/sqlite_fdw.git
cd sqlite_fdw
make
sudo make install
</pre>
Install PostgreSQL FDW 
<pre>
cd ../postgres_fdw
make
sudo make install
</pre>

### Start PGSpider
PGSpider binary name is same as PostgreSQL.  
Default install directory is changed. 
<pre>
/usr/local/pgspider
</pre>

Create database cluster and start server.
<pre>
cd /usr/local/pgspider/bin
./initdb -D ~/pgspider_db
./pg_ctl -D ~/pgspider_db start
./createdb pgspider
</pre>

Connect to PGSpider.
<pre>
./psql pgspider
</pre>

### Load extension
PGSpider (Parent node)
<pre>
CREATE EXTENSION pgspider_core_fdw;
</pre>

PostgreSQL, SQLite (Child node)
<pre>
CREATE EXTENSION postgres_fdw;
CREATE EXTENSION sqlite_fdw;
</pre>

### Create server
PGSpider (Parent node)
<pre>
CREATE SERVER parent FOREIGN DATA WRAPPER pgspider_core_fdw OPTIONS (host '127.0.0.1', port '4813');
</pre>

PostgreSQL, SQLite (Child node)  
In this example, child PostgreSQL node is localhost and port is 5432.  
SQLite node's database is /tmp/temp.db.
<pre>
CREATE SERVER postgres_svr FOREIGN DATA WRAPPER postgres_fdw OPTIONS(host '127.0.0.1', port '5432', dbname 'postgres');
CREATE SERVER sqlite_svr FOREIGN DATA WRAPPER sqlite_fdw OPTIONS(database '/tmp/temp.db');
</pre>

### Create user mapping
PGSpider (Parent node)

Create user mapping for PGSpider. User and password are for current psql user.
<pre>
CREATE USER MAPPING FOR CURRENT_USER SERVER parent OPTIONS(user 'user', password 'pass');
</pre>

PostgreSQL (Child node)
<pre>
CREATE USER MAPPING FOR CURRENT_USER SERVER postgres_svr OPTIONS(user 'user', password 'pass');
</pre>
SQLite (Child node)  
No need to create user mapping.

### Create Multi-Tenant table
PGSpider (Parent node)  
You need to declare a column named "__spd_url" on parent table.  
This column is node location in PGSpider. It allows you to know where the data is comming from node.  
In this example, we define 't1' table to get data from PostgreSQL node and SQLite node.
<pre>
CREATE FOREIGN TABLE t1(i int, t text, __spd_url text) SERVER parent;
</pre>

When expanding Multi-Tenant table to data source tables, PGSpider searches child node tables by name having [Multi-Tenant table name]__[data source name]__0.  

PostgreSQL, SQLite (Child node)
<pre>
CREATE FOREIGN TABLE t1__postgres_svr__0(i int, t text) SERVER postgres_svr OPTIONS (table_name 't1');
CREATE FOREIGN TABLE t1__sqlite_svr__0(i int, t text) SERVER sqlite_svr OPTIONS (table 't1');
</pre>

### Access Multi-Tenant table
<pre>
SELECT * FROM t1;
  i |  t  | __spd_url 
----+-----+----------------
  1 | aaa | /sqlite_svr/
  2 | bbb | /sqlite_svr/
 10 | a   | /postgres_svr/
 11 | b   | /postgres_svr/
(4 rows)
</pre>

### Access Multi-Tenant table using node filter
You can choose getting node with 'IN' clause after FROM items (Table name).

<pre>
SELECT * FROM t1 IN ('/postgres_svr/');
  i | t | __spd_url 
----+---+----------------
 10 | a | /postgres_svr/
 11 | b | /postgres_svr/
(2 rows)
</pre>

### Modify Multi-Tenant table
<pre>
SELECT * FROM t1;
  i |  t  | __spd_url 
----+-----+----------------
  1 | aaa | /sqlite_svr/
 11 | b   | /postgres_svr/
(2 rows)

INSERT INTO t1 VALUES (4, 'c');
INSERT 0 1

SELECT * FROM t1;
  i |  t  | __spd_url 
----+-----+----------------
  1 | aaa | /sqlite_svr/
  4 | c   | /sqlite_svr/
 11 | b   | /postgres_svr/
(3 rows)

UPDATE t1 SET i = 5;
UPDATE 3

SELECT * FROM t1;
 i |  t  | __spd_url 
---+-----+----------------
 5 | aaa | /sqlite_svr/
 5 | c   | /sqlite_svr/
 5 | b   | /postgres_svr/
(3 rows)

DELETE FROM t1;
DELETE 3

SELECT * FROM t1;
 i | t | __spd_url
---+---+-----------
(0 rows)
</pre>

### Modify Multi-Tenant table using node filter
You can choose modifying node with 'IN' clause after table name.

<pre>
SELECT * FROM t1;
  i |  t  | __spd_url 
----+-----+----------------
  1 | aaa | /sqlite_svr/
 11 | b   | /postgres_svr/
(2 rows)

INSERT INTO t1 IN ('/postgres_svr/') VALUES (4, 'c');

SELECT * FROM t1;
  i |  t  | __spd_url 
----+-----+----------------
  1 | aaa | /sqlite_svr/
  4 | c   | /postgres_svr/
 11 | b   | /postgres_svr/
(3 rows)

UPDATE t1 IN ('/postgres_svr/') SET i = 5;
UPDATE 1

SELECT * FROM t1;
 i |  t  | __spd_url 
---+-----+----------------
 1 | aaa | /sqlite_svr/
 5 | c   | /postgres_svr/
 5 | b   | /postgres_svr/
(3 rows)

DELETE FROM t1 IN ('/sqlite_svr/');
DELETE 1

SELECT * FROM t1;
 i | t | __spd_url 
---+---+----------------
 5 | c | /postgres_svr/
 5 | b | /postgres_svr/
(2 rows)
</pre>

## Tree Structure
PGSpider can get data from child PGSpider, it means PGSpider can create tree structure.  
For example, we will create a new PGSpider as root node which connects to PGSpider of previous example.  
The new root node is parent of previous PGSpider node.

### Start new root PGSpider
Create new database cluster with initdb and change port number.  
After that, start and connect to new root node.

### Load extension
PGSpider (new root node)  
If child node is PGSpider, PGSpider use pgspider_fdw.

<pre>
CREATE EXTENSION pgspider_core_fdw;
CREATE EXTENSION pgspider_fdw;
</pre>

### Create server
PGSpider (new root node)
<pre>
CREATE SERVER new_root FOREIGN DATA WRAPPER pgspider_core_fdw OPTIONS (host '127.0.0.1', port '54813') ;
</pre>

PGSpider (Parent node)
<pre>
CREATE SERVER parent FOREIGN DATA WRAPPER pgspider_svr OPTIONS
(host '127.0.0.1', port '4813') ;
</pre>

### Create user mapping
PGSpider (new root node)
<pre>
CREATE USER MAPPING FOR CURRENT_USER SERVER new_root OPTIONS(user 'user', password 'pass');
</pre>

PGSpider (Parent node)
<pre>
CREATE USER MAPPING FOR CURRENT_USER SERVER parent OPTIONS(user 'user', password 'pass');
</pre>

### Create Multi-Tenant table
PGSpider (new root node)  
<pre>
CREATE FOREIGN TABLE t1(i int, t text, __spd_url text) SERVER new_root;
</pre>

PGSpider (Parent node)  
<pre>
CREATE FOREIGN TABLE t1__parent__0(i int, t text, __spd_url text) SERVER parent;
</pre>

### Access Multi-Tenant table

<pre>
SELECT * FROM t1;

  i |  t  |      __spd_url 
----+-----+-----------------------
  1 | aaa | /parent/sqlite_svr/
  2 | bbb | /parent/sqlite_svr/
 10 | a   | /parent/postgres_svr/
 11 | b   | /parent/postgres_svr/
(4 rows)
</pre>

### Create/Drop datasource table
According to the information of a foreign table, you can create/drop a table on remote database.   
  - The query syntax:
    <pre>
    CREATE DATASOURCE TABLE [ IF NOT EXISTS ] table_name;
    DROP DATASOURCE TABLE [ IF EXISTS ] table_name;
    </pre>
  - Parameters:
    - IF NOT EXISTS (in CREATE DATASOURCE TABLE)   
      Do not throw any error if a relation/table with the same name with datasource table already exists in remote server. Note that there is no guarantee that the existing datasouce table is anything like the one that would have been created.
    - IF EXISTS (in DROP DATASOURCE TABLE)   
      Do not throw any error if the datasource table does not exist.
    - table_name   
      The name (optionally schema-qualified) of the foreign table that we can derive the datasource table need to be created.

  - Examples:
    ```sql
    CREATE FOREIGN TABLE ft1(i int, t text) SERVER postgres_svr OPTIONS (table_name 't1');
    CREATE DATASOURCE TABLE ft1; -- new datasource table `t1` is created in remote server
    DROP DATASOURCE TABLE ft1 -- datasource table `t1` is dropped in remote server
    ```

### Migrate table
You can migrate data from source tables to destination tables.   
Source table can be local table, foreign table or multi-tenant table. Destination table can be foreign table or multi-tenant table.

  - The query syntax:
    <pre>
    MIGRATE TABLE source_table
    [REPLACE|TO dest_table OPTIONS (USE_MULTITENANT_SERVER <multitenant_server_name>)]
    SERVER [dest_server OPTIONS ( option 'value' [, ...] ), dest_server OPTIONS ( option 'value' [, ...] ),...]
    </pre>
  - Parameters:
    - source_table   
      The name (optionally schema-qualified) of the source table. Source table can be local table, foreign table or multi-tenant table.
    - REPLACE (optional)   
      If this option is specified, destination table must not be specified, source table will be replaced by a new foreign table/multi-tenant table (with the name sane as source table) remoting to a new data source table. It means source table no longer exists.
    - TO (optional)   
      If it is specified, destination table must be specified. And the name of destination table must be different from the name of source table. After migration, source table is kept, new destination foreign table will be created to remoting to new data source table.   
      - dest_table   
        The name (optionally schema-qualified) of destination table. If destination table already exists, an error will be reported.
        Destination table can be specified with option `USE_MULTITENANT_SERVER`, a multi-tenant destination table will be created same as the destination table.
    - dest_server   
      Foreign server of destination server. If there are many destination servers or there is a signle destination server with `USE_MULTITENANT_SERVER` option is specified, a multi-tenant destination table will be created same as the destination table name.
      - OPTIONS ( option 'value' [, ...] )   
        destination server options, foreign table will be created with these options and datasource table will be created in remote server based on these options.

  - Examples:
    ```sql
    MIGRATE TABLE t1 SERVER postgres_svr;

    MIGRATE TABLE t1 REPLACE SERVER postgres_svr;

    MIGRATE TABLE t1 REPLACE SERVER postgres_svr, postgres_svr;

    MIGRATE TABLE t1 TO t2 SERVER postgres_svr;

    MIGRATE TABLE t1 TO t2 SERVER postgres_svr, postgres_svr;

    MIGRATE TABLE t1 to t2 OPTIONS (USE_MULTITENANT_SERVER 'pgspider_core_svr') SERVER postgres_svr;
    ```

#### Data compression transfer
You can migrate data from source tables to destination tables of other datasources via a cloud function.  
A pgspider_fdw server is required to act as a relay server to transmit data to cloud function.
It is required to provide`endpoint` and `relay` options to active this feature.  

#### Current supported datasources:  
- **PostgreSQL**  
- **MySQL**  
- **Oracle**  
- **GridDB**  
- **PGSpider**  
- **InfluxDB (only support migrating to InfluxDB v2.0)**
- **ObjStorage (only support migrating to AmazonS3 and parquet file)**
#### Options supported:  
- **relay** as *string*, required 
      Specifies foreign server of PGSpider FDW which is used to support Data Compression Transfer Feature.  
- **endpoint** as *string*, required  
      Specifies the endpoint address of cloud service.  
- **socket_port** as *interger*, required, default `4814`  
      Specifies the port number of Socket server  
- **function_timeout**  as *interger*, required, default `900` seconds  
      A socket is opened in FDW to get connection from Function, send data and receive finished notification.  
      If a finished notification of Function does not arrive before timeout expires, or no client connects to the server socket  
      the server socket is closed, and an error is shown.
- **batch_size**  as *interger*, optional, default `1000`  
      batch_size determines the number of records that will be sent to Function each time.
- **proxy** as *string*, optional  
      Proxy for cURL request.  
      If value is set by 'no', disable the use of proxy.  
      If value is not set, cURL uses environmental variables.  
- **org** as *string*, optional  
      The organization name of data store of InfluxDB server v2.0.  
      This option is only used when migrating to InfluxDB server v2.0.  
      If migrating to InfluxDB server v2.0 without org option, error will be raise.  
- **public_host** as *string*, optional  
      The hostname or endpoint of host server.  
      This option is use only in Data Compression Transfer Feature.  
      If PGSPider is behide the NAT, specify host help relay to known host IP address.  
      **public_host** is conflict with **ifconfig_service**, specify both options will raise error.  
- **public_port** as *interger*, optional, default equal to **socket_port**  
      The public port of PGSpider.  
      This option is use only in Data Compression Transfer Feature.  
      If PGSPider is behide the NAT, specify forward port help connection through NAT.  
- **ifconfig_service** as *string*, optional  
      The public service to lookup host ip (Example: ifconfig.me, ifconfig.co).  
      This option is use only in Data Compression Transfer Feature.  
      If PGSPider is behide the NAT, server can request external service to get host IP.  
      **ifconfig_service** is conflict with **public_host**, specify both options will raise error.

Examples:
  ```sql
  -- Create SERVER
  CREATE SERVER cloudfunc FOREIGN DATA WRAPPER pgspider_fdw OPTIONS (endpoint 'http://cloud.example.com:8080', proxy 'no', batch_size '1000');

  CREATE SERVER postgres FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'postgres.example.com', port '5432', dbname 'test');
  CREATE SERVER pgspider FOREIGN DATA WRAPPER pgspider_fdw OPTIONS (host 'pgspider.example.com', port '4813', dbname 'test');
  CREATE SERVER mysql FOREIGN DATA WRAPPER mysql_fdw OPTIONS (host 'mysql.example.com', port '3306');
  CREATE SERVER griddb FOREIGN DATA WRAPPER griddb_fdw OPTIONS (host 'griddb.example.com', port '20002', clustername 'GridDB');
  CREATE SERVER oracle FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver 'oracle.example.com:1521/XE');
  CREATE SERVER influx FOREIGN DATA WRAPPER influxdb_fdw OPTIONS (host 'influxdb.example.com', port '38086', dbname 'test', version '2');
  CREATE SERVER objstorage_with_endpoint FOREIGN DATA WRAPPER objstorage_fdw OPTIONS (endpoint 'http://cloud.example.com:9000', storage_type 's3');
  CREATE SERVER objstorage_with_region FOREIGN DATA WRAPPER objstorage_fdw OPTIONS (region 'us-west-1', storage_type 's3');

  -- MIGRATE NONE
  MIGRATE TABLE ft1 OPTIONS (socket_port '4814', function_timeout '800') SERVER 
          postgres OPTIONS (table_name 'table', relay 'cloudfunc'),
          pgspider OPTIONS (table_name 'table', relay 'cloudfunc'), 
          mysql OPTIONS (dbname 'test', table_name 'table', relay 'cloudfunc'),
          griddb OPTIONS (table_name 'table', relay 'cloudfunc'),
          oracle OPTIONS (table 'table', relay 'cloudfunc'),
          influx OPTIONS (table 'table', relay 'cloudfunc', org 'myorg'),
          objstorage_with_endpoint OPTION (filename 'bucket/file1.parquet', format 'parquet'),
          objstorage_with_endpoint OPTION (dirname 'bucket', format 'parquet');

  -- MIGRATE TO
  MIGRATE TABLE ft1 TO ft2 OPTIONS (socket_port '4814', function_timeout '800') SERVER 
          postgres OPTIONS (table_name 'table', relay 'cloudfunc'),
          pgspider OPTIONS (table_name 'table', relay 'cloudfunc'), 
          mysql OPTIONS (dbname 'test', table_name 'table', relay 'cloudfunc'),
          griddb OPTIONS (table_name 'table', relay 'cloudfunc'),
          oracle OPTIONS (table 'table', relay 'cloudfunc'),
          influx OPTIONS (table 'table', relay 'cloudfunc', org 'myorg'),
          objstorage_with_region OPTION (filename 'bucket/file1.parquet', format 'parquet'),
          objstorage_with_region OPTION (dirname 'bucket', format 'parquet');

  -- MIGRATE REPLACE
  MIGRATE TABLE ft1 REPLACE OPTIONS (socket_port '4814', function_timeout '800') SERVER 
          postgres OPTIONS (table_name 'table', relay 'cloudfunc'),
          pgspider OPTIONS (table_name 'table', relay 'cloudfunc'), 
          mysql OPTIONS (dbname 'test', table_name 'table', relay 'cloudfunc'),
          griddb OPTIONS (table_name 'table', relay 'cloudfunc'),
          oracle OPTIONS (table 'table', relay 'cloudfunc'),
          influx OPTIONS (table 'table', relay 'cloudfunc', org 'myorg');
          objstorage_with_endpoint OPTION (filename 'bucket/file1.parquet', format 'parquet'),
          objstorage_with_region OPTION (dirname 'bucket', format 'parquet');
  ```

## Note
When a query to foreign tables fails, you can find why it fails by seeing a query executed in PGSpider with `EXPLAIN (VERBOSE)`.  
PGSpider has a table option: `disable_transaction_feature_check`:  
- When disable_transaction_feature_check is false:  
  All child nodes will be checked. If there is any child node that does not support transaction, an error will be raised, and the modification will be stopped.
- When disable_transaction_feature_check is true:  
  The modification can be proceeded without checking.

## Limitation
Limitation with modification and transaction:
- Sometimes, PGSpider cannot read modified data in a transaction.
- It is recommended to execute a modify query(INSERT/UPDATE/DELETE) in auto-commit mode. If not, a warning "Modification query is executing in non-autocommit mode. PGSpider might get inconsistent data." is shown.
- RETURNING, WITH CHECK OPTION and ON CONFLICT are not supported with Modification.
- COPY and modify (INSERT/UPDATE/DELETE) foreign partition are not supported.

## Contributing
Opening issues and pull requests are welcome.

## License
Portions Copyright (c) 2018, TOSHIBA CORPORATION

Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.

See the [`LICENSE`][1] file for full details.

[1]: LICENSE

parquet_s3_fdw's People

Contributors

aanhh avatar bichht0608 avatar kanegoon avatar minhla1410 avatar t-kataym avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parquet_s3_fdw's Issues

invalid new-expression of abstract class type ‘S3RandomAccessFile’

Error

When I attempt to compile parquet_s3_fdw I get the following error that causes the build to fail:

error: invalid new-expression of abstract class type ‘S3RandomAccessFile’

Dependencies

I have compiled the following dependencies on Debian GNU/Linux 10 (buster):

git clone --recurse-submodules https://github.com/aws/aws-sdk-cpp --branch 1.8.14
mkdir -p sdk_build 
cd sdk_build 
cmake -DBUILD_ONLY="core;config;s3;transfer" ../aws-sdk-cpp -DCMAKE_BUILD_TYPE=Release 
cmake -DBUILD_ONLY="core;config;s3;transfer" ../aws-sdk-cpp -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF
make 
make install

git clone --recurse-submodules https://github.com/apache/arrow.git --branch apache-arrow-0.15.0
mkdir -p arrow/cpp/release
cd arrow/cpp/release
cmake .. -DARROW_PARQUET=ON -DARROW_S3=ON
make 
make install

Full make Output

g++ -Wall -Wpointer-arith -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security -fno-strict-aliasing -fwrapv -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -std=c++11 -O3 -fPIC -I. -I./ -I/usr/include/postgresql/13/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o parquet_impl.o parquet_impl.cpp
parquet_impl.cpp: In constructor ‘ParquetS3FdwReader::ParquetS3FdwReader(int)’:
parquet_impl.cpp:403:37: warning: ‘ParquetS3FdwReader::reader_entry’ will be initialized after [-Wreorder]
     ReaderCacheEntry               *reader_entry;
                                     ^~~~~~~~~~~~
parquet_impl.cpp:351:37: warning:   ‘int32 ParquetS3FdwReader::reader_id’ [-Wreorder]
     int32                           reader_id;
                                     ^~~~~~~~~
parquet_impl.cpp:412:5: warning:   when initialized here [-Wreorder]
     ParquetS3FdwReader(int reader_id) :
     ^~~~~~~~~~~~~~~~~~
parquet_impl.cpp: In function ‘List* extract_parquet_fields(const char*, const char*, Aws::S3::S3Client*)’:
parquet_impl.cpp:2033:110: error: invalid new-expression of abstract class type ‘S3RandomAccessFile’
             std::shared_ptr<arrow::io::RandomAccessFile> input(new S3RandomAccessFile(s3_client, dname, fname));
                                                                                                              ^
In file included from parquet_impl.cpp:35:
parquet_s3_fdw.hpp:27:7: note:   because the following virtual functions are pure within ‘S3RandomAccessFile’:
 class S3RandomAccessFile : public arrow::io::RandomAccessFile
       ^~~~~~~~~~~~~~~~~~
In file included from /usr/local/include/arrow/io/concurrency.h:22,
                 from /usr/local/include/arrow/io/buffered.h:26,
                 from /usr/local/include/arrow/io/api.h:21,
                 from parquet_impl.cpp:27:
/usr/local/include/arrow/io/interfaces.h:91:18: note:   ‘virtual arrow::Status arrow::io::FileInterface::Tell(int64_t*) const’
   virtual Status Tell(int64_t* position) const = 0;
                  ^~~~
/usr/local/include/arrow/io/interfaces.h:142:18: note:  ‘virtual arrow::Status arrow::io::Readable::Read(int64_t, int64_t*, void*)’
   virtual Status Read(int64_t nbytes, int64_t* bytes_read, void* out) = 0;
                  ^~~~
/usr/local/include/arrow/io/interfaces.h:145:18: note:  ‘virtual arrow::Status arrow::io::Readable::Read(int64_t, std::shared_ptr<arrow::Buffer>*)’
   virtual Status Read(int64_t nbytes, std::shared_ptr<Buffer>* out) = 0;
                  ^~~~
/usr/local/include/arrow/io/interfaces.h:195:18: note:  ‘virtual arrow::Status arrow::io::RandomAccessFile::GetSize(int64_t*)’
   virtual Status GetSize(int64_t* size) = 0;
                  ^~~~~~~
parquet_impl.cpp: In function ‘ForeignScan* parquetS3GetForeignPlan(PlannerInfo*, RelOptInfo*, Oid, ForeignPath*, List*, List*, Plan*)’:
parquet_impl.cpp:2646:5: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
     else
     ^~~~
parquet_impl.cpp:2650:2: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘else’
  return make_foreignscan(tlist,
  ^~~~~~
parquet_impl.cpp: In function ‘int parquetS3AcquireSampleRowsFunc(Relation, int, HeapTupleData**, int, double*, double*)’:
parquet_impl.cpp:2964:125: error: invalid new-expression of abstract class type ‘S3RandomAccessFile’
                 std::shared_ptr<arrow::io::RandomAccessFile> input(new S3RandomAccessFile(fdw_private.s3client, dname, fname));
                                                                                                                             ^
parquet_impl.cpp: In function ‘List* parse_filenames_list(const char*)’:
parquet_impl.cpp:1836:17: warning: this statement may fall through [-Wimplicit-fallthrough=]
                 switch (*cur)
                 ^~~~~~
parquet_impl.cpp:1847:13: note: here
             default:
             ^~~~~~~
make: *** [<builtin>: parquet_impl.o] Error 1

GitHub actions - testing

I was curious if you would be interested in a contribution of a GitHub action running for tests/validation.

GCS

I am trying to use parquet_s3_fdw to connect to my GCS bucket and extract data from parquet files but it seems to be impossible (or I've made a mistake in my code).

here is what I do

Firstly, I create EXTENSION
CREATE EXTENSION parquet_s3_fdw;

Than I create server
CREATE SERVER parquet_s3_srv FOREIGN DATA WRAPPER parquet_s3_fdw OPTIONS (region 'us-west1');

My GCS bucket region is us-west1 (Oregon) but I also tried us-west2.

Afterwards, I create user mapping
CREATE USER MAPPING FOR CURRENT_USER SERVER parquet_s3_srv OPTIONS (user '<access_key>', password '<secret_key>');

I don't think that there is a problem with these keys because I was able to access to my bucket from ClickHouse.

In the end I create foreign table

CREATE FOREIGN TABLE natality_parquet (
  source_year TEXT,
  year TEXT,
  month TEXT,
  day TEXT,
  wday TEXT,
  state TEXT,
  is_male TEXT,
  child_race TEXT,
  weight_pounds TEXT,
  plurality TEXT,
  apgar_1min TEXT,
  apgar_5min TEXT,
  mother_residence_state TEXT,
  mother_race TEXT,
  mother_age TEXT,
  gestation_weeks TEXT,
  lmp TEXT,
  mother_married TEXT,
  mother_birth_state TEXT,
  cigarette_use TEXT,
  cigarettes_per_day TEXT,
  alcohol_use TEXT,
  drinks_per_week TEXT,
  weight_gain_pounds TEXT,
  born_alive_alive TEXT,
  born_alive_dead TEXT,
  born_dead TEXT,
  ever_born TEXT,
  father_race TEXT,
  father_age TEXT,
  record_weight TEXT
) SERVER parquet_s3_srv
OPTIONS (
  filename 's3://example_bucket_natality2/000000000000.parquet'
);

But when I query this foreign table I get this error
select * from natality_parquet limit 5;

SQL Error [XX000]: ERROR: parquet_s3_fdw: failed to exctract row groups from Parquet file: failed to open Parquet file HeadObject failed

Is it actually possible to access to GCS via parquet_s3_fdw? If it is true, than could you please point me where am I mistaken in my code

Installation issue

Hi,

Even with PG installed from sources, the CREATE EXTENSION doesn't seem to find the required libraries.

Error example:

$ psql -d postgres -c "CREATE EXTENSION IF NOT EXISTS parquet_s3_fdw;"
ERROR:  could not load library "/usr/local/pgsql/lib/parquet_s3_fdw.so": libaws-cpp-sdk-core.so: cannot open shared object file: No such file or directory

$ sudo find / -name libaws-cpp-sdk-core.so
/usr/local/lib/libaws-cpp-sdk-core.so

I've installed AWS SDK for C++ using the following steps (on Ubuntu 22.04):

git clone --recurse-submodules https://github.com/aws/aws-sdk-cpp
mkdir sdk_build && cd sdk_build
cmake ../aws-sdk-cpp -DCMAKE_BUILD_TYPE=Debug -DCMAKE_PREFIX_PATH=/usr/local/ -DCMAKE_INSTALL_PREFIX=/usr/local/ -DBUILD_ONLY="s3"
make
sudo make install

And installed this extension using:

wget https://ftp.postgresql.org/pub/source/v16.1/postgresql-16.1.tar.gz
tar xzf postgresql-16.1.tar.gz
wget https://github.com/pgspider/parquet_s3_fdw/archive/refs/tags/v1.1.0.tar.gz
tar -xzf v1.1.0.tar.gz
mv parquet_s3_fdw-1.1.0 postgresql-16.1/contrib/parquet_s3_fdw
cd postgresql-16.1
./configure
make
sudo make install
cd contrib/parquet_s3_fdw
make
sudo make install

Could you help me spot what did I miss please?

Also, what would be your recommended way to integrate this library and build the extension against PG installed with the PGDG package?

Many thanks in advance for your feedback,
Kind Regards

implicit declaration is invalid

When I run make install, I get an error:

parquet_fdw.c:97:5: error: implicit declaration of function 'on_proc_exit' is invalid in C99 [-Werror,-Wimplicit-function-declaration]

Getting this error while running make install

Makefile:45: /contrib/contrib-global.mk: No such file or directory
make: *** No rule to make target '/contrib/contrib-global.mk'. Stop.

Operating System : Ubuntu 20.
All dependencies were installed.

install error

cd /tmp
git clone --depth 1 -b apache-arrow-12.0.0 https://github.com/apache/arrow.git  
cd /tmp/arrow/cpp  
mkdir build-release  
cd /tmp/arrow/cpp/build-release  
cmake -DARROW_DEPENDENCY_SOURCE=BUNDLED ..  
make -j4  
make install  

cd /tmp
apt-get install -y libcurl4-openssl-dev uuid-dev libpulse-dev  
git clone --depth 1 -b 1.11.91 https://github.com/aws/aws-sdk-cpp  
cd /tmp/aws-sdk-cpp  
git submodule update --init --recursive --depth 1  
mkdir build  
cd /tmp/aws-sdk-cpp/build  
cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_ONLY="s3;core"  
make -j4  
make install  

error at:

cd /tmp
git clone --depth 1 -b v1.1.0 https://github.com/pgspider/parquet_s3_fdw  
cd /tmp/parquet_s3_fdw  
USE_PGXS=1 make  
g++ -Wall -Wpointer-arith -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -moutline-atomics -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -std=c++17 -O3 -fPIC -Wno-register -D_GLIBCXX_USE_CXX11_ABI=0 -I. -I./ -I/usr/include/postgresql/14/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o src/reader.o src/reader.cpp
In file included from /usr/include/parquet/metadata.h:29,
                 from /usr/include/parquet/file_reader.h:27,
                 from /usr/include/parquet/arrow/reader.h:26,
                 from src/reader.cpp:19:
/usr/include/parquet/platform.h:90:37: error: ‘CodecOptions’ in namespace ‘arrow::util’ does not name a type
   90 | using CodecOptions = ::arrow::util::CodecOptions;
      |                                     ^~~~~~~~~~~~
In file included from /usr/include/parquet/schema.h:32,
                 from /usr/include/parquet/encryption/encryption.h:26,
                 from /usr/include/parquet/properties.h:30,
                 from /usr/include/parquet/metadata.h:30,
                 from /usr/include/parquet/file_reader.h:27,
                 from /usr/include/parquet/arrow/reader.h:26,
                 from src/reader.cpp:19:
/usr/include/parquet/types.h:490:39: error: ‘CodecOptions’ does not name a type
  490 |                                 const CodecOptions& codec_options);
      |                                       ^~~~~~~~~~~~
In file included from /usr/include/parquet/metadata.h:30,
                 from /usr/include/parquet/file_reader.h:27,
                 from /usr/include/parquet/arrow/reader.h:26,
                 from src/reader.cpp:19:
/usr/include/parquet/properties.h:181:48: error: ‘CodecOptions’ was not declared in this scope
  181 |   void set_codec_options(const std::shared_ptr<CodecOptions>& codec_options) {
      |                                                ^~~~~~~~~~~~
/usr/include/parquet/properties.h:181:60: error: template argument 1 is invalid
  181 |   void set_codec_options(const std::shared_ptr<CodecOptions>& codec_options) {
      |                                                            ^
/usr/include/parquet/properties.h:206:25: error: ‘CodecOptions’ was not declared in this scope
  206 |   const std::shared_ptr<CodecOptions>& codec_options() const { return codec_options_; }
      |                         ^~~~~~~~~~~~
/usr/include/parquet/properties.h:206:37: error: template argument 1 is invalid
  206 |   const std::shared_ptr<CodecOptions>& codec_options() const { return codec_options_; }
      |                                     ^
/usr/include/parquet/properties.h:216:19: error: ‘CodecOptions’ was not declared in this scope; did you mean ‘codec_options’?
  216 |   std::shared_ptr<CodecOptions> codec_options_;
      |                   ^~~~~~~~~~~~
      |                   codec_options
/usr/include/parquet/properties.h:216:31: error: template argument 1 is invalid
  216 |   std::shared_ptr<CodecOptions> codec_options_;
      |                               ^
/usr/include/parquet/properties.h: In member function ‘void parquet::ColumnProperties::set_compression_level(int)’:
/usr/include/parquet/properties.h:176:41: error: ‘CodecOptions’ was not declared in this scope; did you mean ‘codec_options’?
  176 |       codec_options_ = std::make_shared<CodecOptions>();
      |                                         ^~~~~~~~~~~~
      |                                         codec_options
/usr/include/parquet/properties.h:176:55: error: no matching function for call to ‘make_shared<<expression error> >()’
  176 |       codec_options_ = std::make_shared<CodecOptions>();
      |                                                       ^
In file included from /usr/include/c++/10/memory:84,
                 from /usr/local/include/arrow/array/array_base.h:22,
                 from /usr/local/include/arrow/array.h:41,
                 from /usr/local/include/arrow/api.h:22,
                 from src/reader.cpp:16:
/usr/include/c++/10/bits/shared_ptr.h:872:5: note: candidate: ‘template<class _Tp, class ... _Args> std::shared_ptr<_Tp> std::make_shared(_Args&& ...)’
  872 |     make_shared(_Args&&... __args)
      |     ^~~~~~~~~~~
/usr/include/c++/10/bits/shared_ptr.h:872:5: note:   template argument deduction/substitution failed:
In file included from /usr/include/parquet/metadata.h:30,
                 from /usr/include/parquet/file_reader.h:27,
                 from /usr/include/parquet/arrow/reader.h:26,
                 from src/reader.cpp:19:
/usr/include/parquet/properties.h:176:55: error: template argument 1 is invalid
  176 |       codec_options_ = std::make_shared<CodecOptions>();
      |                                                       ^
/usr/include/parquet/properties.h:178:19: error: base operand of ‘->’ is not a pointer
  178 |     codec_options_->compression_level = compression_level;
      |                   ^~
/usr/include/parquet/properties.h: In member function ‘int parquet::ColumnProperties::compression_level() const’:
/usr/include/parquet/properties.h:203:26: error: base operand of ‘->’ is not a pointer
  203 |     return codec_options_->compression_level;
      |                          ^~
/usr/include/parquet/properties.h: At global scope:
/usr/include/parquet/properties.h:480:46: error: ‘CodecOptions’ is not a member of ‘arrow::util’
  480 |         const std::shared_ptr<::arrow::util::CodecOptions>& codec_options) {
      |                                              ^~~~~~~~~~~~
/usr/include/parquet/properties.h:480:58: error: template argument 1 is invalid
  480 |         const std::shared_ptr<::arrow::util::CodecOptions>& codec_options) {
      |                                                          ^
/usr/include/parquet/properties.h:489:46: error: ‘CodecOptions’ is not a member of ‘arrow::util’
  489 |         const std::shared_ptr<::arrow::util::CodecOptions>& codec_options) {
      |                                              ^~~~~~~~~~~~
/usr/include/parquet/properties.h:489:58: error: template argument 1 is invalid
  489 |         const std::shared_ptr<::arrow::util::CodecOptions>& codec_options) {
      |                                                          ^
/usr/include/parquet/properties.h:498:46: error: ‘CodecOptions’ is not a member of ‘arrow::util’
  498 |         const std::shared_ptr<::arrow::util::CodecOptions>& codec_options) {
      |                                              ^~~~~~~~~~~~
/usr/include/parquet/properties.h:498:58: error: template argument 1 is invalid
  498 |         const std::shared_ptr<::arrow::util::CodecOptions>& codec_options) {
      |                                                          ^
/usr/include/parquet/properties.h:688:53: error: ‘CodecOptions’ was not declared in this scope; did you mean ‘codec_options’?
  688 |     std::unordered_map<std::string, std::shared_ptr<CodecOptions>> codec_options_;
      |                                                     ^~~~~~~~~~~~
      |                                                     codec_options
/usr/include/parquet/properties.h:688:53: error: template argument 1 is invalid
/usr/include/parquet/properties.h:688:65: error: template argument 2 is invalid
  688 |     std::unordered_map<std::string, std::shared_ptr<CodecOptions>> codec_options_;
      |                                                                 ^~
/usr/include/parquet/properties.h:688:65: error: template argument 5 is invalid
/usr/include/parquet/properties.h:751:25: error: ‘CodecOptions’ was not declared in this scope
  751 |   const std::shared_ptr<CodecOptions> codec_options(
      |                         ^~~~~~~~~~~~
/usr/include/parquet/properties.h:751:37: error: template argument 1 is invalid
  751 |   const std::shared_ptr<CodecOptions> codec_options(
      |                                     ^
/usr/include/parquet/properties.h: In member function ‘parquet::WriterProperties::Builder* parquet::WriterProperties::Builder::compression_level(const string&, int)’:
/usr/include/parquet/properties.h:451:26: error: no match for ‘operator[]’ (operand types are ‘int’ and ‘const string’ {aka ‘const std::basic_string<char>’})
  451 |       if (!codec_options_[path]) {
      |                          ^
/usr/include/parquet/properties.h:452:23: error: no match for ‘operator[]’ (operand types are ‘int’ and ‘const string’ {aka ‘const std::basic_string<char>’})
  452 |         codec_options_[path] = std::make_shared<CodecOptions>();
      |                       ^
/usr/include/parquet/properties.h:452:49: error: ‘CodecOptions’ was not declared in this scope; did you mean ‘codec_options’?
  452 |         codec_options_[path] = std::make_shared<CodecOptions>();
      |                                                 ^~~~~~~~~~~~
      |                                                 codec_options
/usr/include/parquet/properties.h:452:63: error: no matching function for call to ‘make_shared<<expression error> >()’
  452 |         codec_options_[path] = std::make_shared<CodecOptions>();
      |                                                               ^
In file included from /usr/include/c++/10/memory:84,
                 from /usr/local/include/arrow/array/array_base.h:22,
                 from /usr/local/include/arrow/array.h:41,
                 from /usr/local/include/arrow/api.h:22,
                 from src/reader.cpp:16:
/usr/include/c++/10/bits/shared_ptr.h:872:5: note: candidate: ‘template<class _Tp, class ... _Args> std::shared_ptr<_Tp> std::make_shared(_Args&& ...)’
  872 |     make_shared(_Args&&... __args)
      |     ^~~~~~~~~~~
/usr/include/c++/10/bits/shared_ptr.h:872:5: note:   template argument deduction/substitution failed:
In file included from /usr/include/parquet/metadata.h:30,
                 from /usr/include/parquet/file_reader.h:27,
                 from /usr/include/parquet/arrow/reader.h:26,
                 from src/reader.cpp:19:
/usr/include/parquet/properties.h:452:63: error: template argument 1 is invalid
  452 |         codec_options_[path] = std::make_shared<CodecOptions>();
      |                                                               ^
/usr/include/parquet/properties.h:454:21: error: no match for ‘operator[]’ (operand types are ‘int’ and ‘const string’ {aka ‘const std::basic_string<char>’})
  454 |       codec_options_[path]->compression_level = compression_level;
      |                     ^
/usr/include/parquet/properties.h: In member function ‘parquet::WriterProperties::Builder* parquet::WriterProperties::Builder::codec_options(const string&, const int&)’:
/usr/include/parquet/properties.h:490:21: error: no match for ‘operator[]’ (operand types are ‘int’ and ‘const string’ {aka ‘const std::basic_string<char>’})
  490 |       codec_options_[path] = codec_options;
      |                     ^
/usr/include/parquet/properties.h: In member function ‘std::shared_ptr<parquet::WriterProperties> parquet::WriterProperties::Builder::build()’:
/usr/include/parquet/properties.h:650:31: error: ‘begin’ was not declared in this scope; did you mean ‘std::begin’?
  650 |       for (const auto& item : codec_options_)
      |                               ^~~~~~~~~~~~~~
      |                               std::begin
In file included from /usr/include/c++/10/list:62,
                 from src/reader.cpp:14:
/usr/include/c++/10/bits/range_access.h:108:37: note: ‘std::begin’ declared here
  108 |   template<typename _Tp> const _Tp* begin(const valarray<_Tp>&);
      |                                     ^~~~~
In file included from /usr/include/parquet/metadata.h:30,
                 from /usr/include/parquet/file_reader.h:27,
                 from /usr/include/parquet/arrow/reader.h:26,
                 from src/reader.cpp:19:
/usr/include/parquet/properties.h:650:31: error: ‘end’ was not declared in this scope; did you mean ‘std::end’?
  650 |       for (const auto& item : codec_options_)
      |                               ^~~~~~~~~~~~~~
      |                               std::end
In file included from /usr/include/c++/10/list:62,
                 from src/reader.cpp:14:
/usr/include/c++/10/bits/range_access.h:110:37: note: ‘std::end’ declared here
  110 |   template<typename _Tp> const _Tp* end(const valarray<_Tp>&);
      |                                     ^~~
make: *** [<builtin>: src/reader.o] Error 1

api.h file not found

When I run make install, I get:

parquet_impl.cpp:26:10: fatal error: 'arrow/api.h' file not found

This is on MacOS Big Sur 11.3.1. I have installed Arrow (with brew install apache-arrow and brew install apache-arrow-glib.

We are excited by the project and would appreciate any help with this.

Current Installation Script

After looking through your readme file it looks like the version of arrow is pretty out of date. Do you know if this will work with the newer versions of Arrow and AWS SDK? Do you happen to have an install script somewhere for installing dependencies? I'm trying to get this loaded up in a Postgres v14.6 Alpine Linux Docker image to do some testing with it. I'd appreciate any advice you have about what versions work and don't work.

No matching function call

Hi, I really want to use this on a project. I think it would be useful, but currently it's kinda let down by some incomplete build instructions. Ive managed to get through quite a few hurdles here but I am struggling now with :

arquet_s3_fdw_connection.cpp:518:69: error: no matching function for call to ‘Aws::S3::S3Client::S3Client(Aws::Auth::AWSCredentials&, Aws::Client::ClientConfiguration&)’
  518 |                 s3_client = new Aws::S3::S3Client(cred, clientConfig);

An example build in a Dockerfile would really help I think.

For information, Im trying to add this connector to the timescaledb docker image, which currently is built from this Dockerfile :

FROM timescale/timescaledb-ha:pg15-latest

USER root
RUN apt update && apt install -yq build-essential  postgresql-server-dev-15
RUN apt update && \
    apt install -y -V ca-certificates lsb-release wget && \
    wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb && \
    apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb && \
    apt update && \
    apt install -y -V libarrow-dev libparquet-dev
RUN apt install -yq curl zip unzip tar git

COPY . /parquet_s3_fdw 
WORKDIR /parquet_s3_fdw
RUN cd ./vcpkg && \
    ./bootstrap-vcpkg.sh && \
    ./vcpkg integrate install && \
    ./vcpkg install aws-sdk-cpp

RUN cd /parquet_s3_fdw && make install USE_PGXS=1 CCFLAGS="-std=c++17 -I/parquet_s3_fdw/vcpkg/installed/x64-linux/include"

getting error and so many issue while doing a make install

Hi Team,

As we are facing some issue while doing a make install for parquet_s3_fdw. we got success in installing the libarrow and libparaquet while we are doing the make install we facing the below errors . Help is very much appreciable

image

compression options during modification?

It seems like the the fdw will create an uncompressed parquet file when inserting into a created foreign table, which surprised me.

I think it would be great if this library supported snappy, but more ideal: zstd for parquet compression. Was there a reason this was defaulted to uncompressed? I was expecting compression to be an option in the CREATE FOREIGN TABLE statement.

I think this is just a missing statement before arrow's WriteTable call as described in their docs:

// Choose compression
std::shared_ptr<WriterProperties> props =
    WriterProperties::Builder().compression(arrow::Compression::SNAPPY)->build();

Was this left out for a reason? Perhaps it makes build dependencies or licensing harder?

This library is ALMOST there as a missing piece of the puzzle for smooth integration between OLTP and OLAP/warehouse workloads. Great work!

delta.io support

Are there any plans to support delta lake (delta.io) ACID parquet files in the future as well?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.