eulerto / wal2json Goto Github PK

JSON output plugin for changeset extraction

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.03% PLpgSQL 10.31% C 88.66%

postgres replication plugin json cdc

wal2json's Introduction

Introduction

wal2json is an output plugin for logical decoding. It means that the plugin have access to tuples produced by INSERT and UPDATE. Also, UPDATE/DELETE old row versions can be accessed depending on the configured replica identity. Changes can be consumed using the streaming protocol (logical replication slots) or by a special SQL API.

format version 1 produces a JSON object per transaction. All of the new/old tuples are available in the JSON object. Also, there are options to include properties such as transaction timestamp, schema-qualified, data types, and transaction ids.

format version 2 produces a JSON object per tuple. Optional JSON object for beginning and end of transaction. Also, there are a variety of options to include properties.

Build and Install

This extension is supported on those platforms that PostgreSQL is. The installation steps depend on your operating system. PostgreSQL yum repository and PostgreSQL apt repository provide wal2json packages.

In Red Hat/CentOS:

$ sudo yum install wal2json_16

In Debian/Ubuntu:

$ sudo apt-get install postgresql-16-wal2json

You can also keep up with the latest fixes and features cloning the Git repository.

$ git clone https://github.com/eulerto/wal2json.git

Unix based Operating Systems

Before installing wal2json, you should have PostgreSQL 9.4+ installed (including the header files). If PostgreSQL is not in your search path, add it. If you are using PostgreSQL yum repository, install postgresql16-devel and add /usr/pgsql-16/bin to your search path (yum uses 16, 15, 14, 13, 12, 11, 10, 96 or 95). If you are using PostgreSQL apt repository, install postgresql-server-dev-16 and add /usr/lib/postgresql/16/bin to your search path. (apt uses 16, 15, 14, 13, 12, 11, 10, 9.6 or 9.5).

If you compile PostgreSQL by yourself and install it in /home/euler/pg16:

$ tar -zxf wal2json-wal2json_2_6.tar.gz
$ cd wal2json-wal2json_2_6
$ export PATH=/home/euler/pg16/bin:$PATH
$ make
$ make install

If you are using PostgreSQL yum repository:

$ sudo yum install postgresql16-devel
$ tar -zxf wal2json-wal2json_2_6.tar.gz
$ cd wal2json-wal2json_2_6
$ export PATH=/usr/pgsql-16/bin:$PATH
$ make
$ make install

If you are using PostgreSQL apt repository:

$ sudo apt-get install postgresql-server-dev-16
$ tar -zxf wal2json-wal2json_2_6.tar.gz
$ cd wal2json-wal2json_2_6
$ export PATH=/usr/lib/postgresql/16/bin:$PATH
$ make
$ make install

Windows

There are several ways to build wal2json on Windows. If you are build PostgreSQL too, you can put wal2json directory inside contrib, change the contrib Makefile (variable SUBDIRS) and build it following the Installation from Source Code on Windows instructions. However, if you already have PostgreSQL installed, it is also possible to compile wal2json out of the tree. Edit wal2json.vcxproj file and change c:\pg\16 to the PostgreSQL prefix directory. The next step is to open this project file in MS Visual Studio and compile it. Final step is to copy wal2json.dll to the pg_config --pkglibdir directory.

Configuration

postgresql.conf

You need to set up at least two parameters at postgresql.conf:

wal_level = logical
#
# these parameters only need to set in versions 9.4, 9.5 and 9.6
# default values are ok in version 10 or later
#
max_replication_slots = 10
max_wal_senders = 10

After changing these parameters, a restart is needed.

Parameters

include-xids: add xid to each changeset. Default is false.
include-timestamp: add timestamp to each changeset. Default is false.
include-schemas: add schema to each change. Default is true.
include-types: add type to each change. Default is true.
include-typmod: add modifier to types that have it (eg. varchar(20) instead of varchar). Default is true.
include-type-oids: add type oids. Default is false.
include-domain-data-type: replace domain name with the underlying data type. Default is false.
include-column-positions: add column position (pg_attribute.attnum). Default is false.
include-origin: add origin of a piece of data. Default is false.
include-not-null: add not null information as columnoptionals. Default is false.
include-default: add default expression. Default is false.
include-pk: add primary key information as pk. Column name and data type is included. Default is false.
numeric-data-types-as-string: use string for numeric data types. JSON specification does not recognize Infinity and NaN as valid numeric values. There might be potential interoperability problems for double precision numbers. Default is false.
pretty-print: add spaces and indentation to JSON structures. Default is false.
write-in-chunks: write after every change instead of every changeset. Only used when format-version is 1. Default is false.
include-lsn: add nextlsn to each changeset. Default is false.
include-transaction: emit records denoting the start and end of each transaction. Default is true.
include-unchanged-toast (deprecated): Don't use it. It is deprecated.
filter-origins: exclude changes from the specified origins. Default is empty which means that no origin will be filtered. It is a comma separated value.
filter-tables: exclude rows from the specified tables. Default is empty which means that no table will be filtered. It is a comma separated value. The tables should be schema-qualified. *.foo means table foo in all schemas and bar.* means all tables in schema bar. Special characters (space, single quote, comma, period, asterisk) must be escaped with backslash. Schema and table are case-sensitive. Table "public"."Foo bar" should be specified as public.Foo\ bar.
add-tables: include only rows from the specified tables. Default is all tables from all schemas. It has the same rules from filter-tables.
filter-msg-prefixes: exclude messages if prefix is in the list. Default is empty which means that no message will be filtered. It is a comma separated value.
add-msg-prefixes: include only messages if prefix is in the list. Default is all prefixes. It is a comma separated value. wal2json applies filter-msg-prefixes before this parameter.
format-version: defines which format to use. Default is 1.
actions: define which operations will be sent. Default is all actions (insert, update, delete, and truncate). However, if you are using format-version 1, truncate is not enabled (backward compatibility).

Examples

There are two ways to obtain the changes (JSON objects) from wal2json plugin: calling functions via SQL or pg_recvlogical.

pg_recvlogical

Besides the configuration above, it is necessary to configure a replication connection to use pg_recvlogical. A logical replication connection in version 9.4, 9.5, and 9.6 requires replication keyword in the database column. Since version 10, logical replication matches a normal entry with a database name or keywords such as all.

First, add a replication connection rule at pg_hba.conf (9.4, 9.5, and 9.6):

local    replication     myuser                     trust

If you are using version 10 or later:

local    mydatabase      myuser                     trust

Also, set max_wal_senders at postgresql.conf:

max_wal_senders = 1

A restart is necessary if you changed max_wal_senders.

You are ready to try wal2json. In one terminal:

$ pg_recvlogical -d postgres --slot test_slot --create-slot -P wal2json
$ pg_recvlogical -d postgres --slot test_slot --start -o pretty-print=1 -o add-msg-prefixes=wal2json -f -

In another terminal:

$ cat /tmp/example1.sql
CREATE TABLE table1_with_pk (a SERIAL, b VARCHAR(30), c TIMESTAMP NOT NULL, PRIMARY KEY(a, c));
CREATE TABLE table1_without_pk (a SERIAL, b NUMERIC(5,2), c TEXT);

BEGIN;
INSERT INTO table1_with_pk (b, c) VALUES('Backup and Restore', now());
INSERT INTO table1_with_pk (b, c) VALUES('Tuning', now());
INSERT INTO table1_with_pk (b, c) VALUES('Replication', now());
SELECT pg_logical_emit_message(true, 'wal2json', 'this message will be delivered');
SELECT pg_logical_emit_message(true, 'pgoutput', 'this message will be filtered');
DELETE FROM table1_with_pk WHERE a < 3;
SELECT pg_logical_emit_message(false, 'wal2json', 'this non-transactional message will be delivered even if you rollback the transaction');

INSERT INTO table1_without_pk (b, c) VALUES(2.34, 'Tapir');
-- it is not added to stream because there isn't a pk or a replica identity
UPDATE table1_without_pk SET c = 'Anta' WHERE c = 'Tapir';
COMMIT;

DROP TABLE table1_with_pk;
DROP TABLE table1_without_pk;

$ psql -At -f /tmp/example1.sql postgres
CREATE TABLE
CREATE TABLE
BEGIN
INSERT 0 1
INSERT 0 1
INSERT 0 1
3/78BFC828
3/78BFC880
DELETE 2
3/78BFC990
INSERT 0 1
UPDATE 1
COMMIT
DROP TABLE
DROP TABLE

The output in the first terminal is:

{
	"change": [
	]
}
{
	"change": [
	]
}
{
    "change": [
        {
            "kind": "message",
            "transactional": false,
            "prefix": "wal2json",
            "content": "this non-transactional message will be delivered even if you rollback the transaction"
        }
    ]
}
WARNING:  table "table1_without_pk" without primary key or replica identity is nothing
{
	"change": [
		{
			"kind": "insert",
			"schema": "public",
			"table": "table1_with_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [1, "Backup and Restore", "2018-03-27 11:58:28.988414"]
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table1_with_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [2, "Tuning", "2018-03-27 11:58:28.988414"]
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table1_with_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [3, "Replication", "2018-03-27 11:58:28.988414"]
		}
        ,{
            "kind": "message",
            "transactional": true,
            "prefix": "wal2json",
            "content": "this message will be delivered"
        }
		,{
			"kind": "delete",
			"schema": "public",
			"table": "table1_with_pk",
			"oldkeys": {
				"keynames": ["a", "c"],
				"keytypes": ["integer", "timestamp without time zone"],
				"keyvalues": [1, "2018-03-27 11:58:28.988414"]
			}
		}
		,{
			"kind": "delete",
			"schema": "public",
			"table": "table1_with_pk",
			"oldkeys": {
				"keynames": ["a", "c"],
				"keytypes": ["integer", "timestamp without time zone"],
				"keyvalues": [2, "2018-03-27 11:58:28.988414"]
			}
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table1_without_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "numeric(5,2)", "text"],
			"columnvalues": [1, 2.34, "Tapir"]
		}
	]
}
{
    "change": [
    ]
}
{
    "change": [
    ]
}

Dropping the slot in the first terminal:

Ctrl+C
$ pg_recvlogical -d postgres --slot test_slot --drop-slot

SQL functions

$ cat /tmp/example2.sql
CREATE TABLE table2_with_pk (a SERIAL, b VARCHAR(30), c TIMESTAMP NOT NULL, PRIMARY KEY(a, c));
CREATE TABLE table2_without_pk (a SERIAL, b NUMERIC(5,2), c TEXT);

SELECT 'init' FROM pg_create_logical_replication_slot('test_slot', 'wal2json');

BEGIN;
INSERT INTO table2_with_pk (b, c) VALUES('Backup and Restore', now());
INSERT INTO table2_with_pk (b, c) VALUES('Tuning', now());
INSERT INTO table2_with_pk (b, c) VALUES('Replication', now());
SELECT pg_logical_emit_message(true, 'wal2json', 'this message will be delivered');
SELECT pg_logical_emit_message(true, 'pgoutput', 'this message will be filtered');
DELETE FROM table2_with_pk WHERE a < 3;
SELECT pg_logical_emit_message(false, 'wal2json', 'this non-transactional message will be delivered even if you rollback the transaction');

INSERT INTO table2_without_pk (b, c) VALUES(2.34, 'Tapir');
-- it is not added to stream because there isn't a pk or a replica identity
UPDATE table2_without_pk SET c = 'Anta' WHERE c = 'Tapir';
COMMIT;

SELECT data FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'pretty-print', '1', 'add-msg-prefixes', 'wal2json');
SELECT 'stop' FROM pg_drop_replication_slot('test_slot');

DROP TABLE table2_with_pk;
DROP TABLE table2_without_pk;

The script above produces the output below:

$ psql -At -f /tmp/example2.sql postgres
CREATE TABLE
CREATE TABLE
init
BEGIN
INSERT 0 1
INSERT 0 1
INSERT 0 1
3/78C2CA50
3/78C2CAA8
DELETE 2
3/78C2CBD8
INSERT 0 1
UPDATE 1
COMMIT
{
    "change": [
        {
            "kind": "message",
            "transactional": false,
            "prefix": "wal2json",
            "content": "this non-transactional message will be delivered even if you rollback the transaction"
        }
    ]
}
psql:/tmp/example2.sql:17: WARNING:  table "table2_without_pk" without primary key or replica identity is nothing
{
	"change": [
		{
			"kind": "insert",
			"schema": "public",
			"table": "table2_with_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [1, "Backup and Restore", "2018-03-27 12:05:29.914496"]
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table2_with_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [2, "Tuning", "2018-03-27 12:05:29.914496"]
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table2_with_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [3, "Replication", "2018-03-27 12:05:29.914496"]
		}
        ,{
            "kind": "message",
            "transactional": true,
            "prefix": "wal2json",
            "content": "this message will be delivered"
        }
		,{
			"kind": "delete",
			"schema": "public",
			"table": "table2_with_pk",
			"oldkeys": {
				"keynames": ["a", "c"],
				"keytypes": ["integer", "timestamp without time zone"],
				"keyvalues": [1, "2018-03-27 12:05:29.914496"]
			}
		}
		,{
			"kind": "delete",
			"schema": "public",
			"table": "table2_with_pk",
			"oldkeys": {
				"keynames": ["a", "c"],
				"keytypes": ["integer", "timestamp without time zone"],
				"keyvalues": [2, "2018-03-27 12:05:29.914496"]
			}
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table2_without_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "numeric(5,2)", "text"],
			"columnvalues": [1, 2.34, "Tapir"]
		}
	]
}
stop
DROP TABLE
DROP TABLE

Let's repeat the same example with format-version 2:

$ cat /tmp/example3.sql
CREATE TABLE table3_with_pk (a SERIAL, b VARCHAR(30), c TIMESTAMP NOT NULL, PRIMARY KEY(a, c));
CREATE TABLE table3_without_pk (a SERIAL, b NUMERIC(5,2), c TEXT);

SELECT 'init' FROM pg_create_logical_replication_slot('test_slot', 'wal2json');

BEGIN;
INSERT INTO table3_with_pk (b, c) VALUES('Backup and Restore', now());
INSERT INTO table3_with_pk (b, c) VALUES('Tuning', now());
INSERT INTO table3_with_pk (b, c) VALUES('Replication', now());
SELECT pg_logical_emit_message(true, 'wal2json', 'this message will be delivered');
SELECT pg_logical_emit_message(true, 'pgoutput', 'this message will be filtered');
DELETE FROM table3_with_pk WHERE a < 3;
SELECT pg_logical_emit_message(false, 'wal2json', 'this non-transactional message will be delivered even if you rollback the transaction');

INSERT INTO table3_without_pk (b, c) VALUES(2.34, 'Tapir');
-- it is not added to stream because there isn't a pk or a replica identity
UPDATE table3_without_pk SET c = 'Anta' WHERE c = 'Tapir';
COMMIT;

SELECT data FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'format-version', '2', 'add-msg-prefixes', 'wal2json');
SELECT 'stop' FROM pg_drop_replication_slot('test_slot');

DROP TABLE table3_with_pk;
DROP TABLE table3_without_pk;

The script above produces the output below:

$ psql -At -f /tmp/example3.sql postgres
CREATE TABLE
CREATE TABLE
init
BEGIN
INSERT 0 1
INSERT 0 1
INSERT 0 1
3/78CB8F30
3/78CB8F88
DELETE 2
3/78CB90B8
INSERT 0 1
UPDATE 1
COMMIT
psql:/tmp/example3.sql:20: WARNING:  no tuple identifier for UPDATE in table "public"."table3_without_pk"
{"action":"M","transactional":false,"prefix":"wal2json","content":"this non-transactional message will be delivered even if you rollback the transaction"}
{"action":"B"}
{"action":"I","schema":"public","table":"table3_with_pk","columns":[{"name":"a","type":"integer","value":1},{"name":"b","type":"character varying(30)","value":"Backup and Restore"},{"name":"c","type":"timestamp without time zone","value":"2019-12-29 04:58:34.806671"}]}
{"action":"I","schema":"public","table":"table3_with_pk","columns":[{"name":"a","type":"integer","value":2},{"name":"b","type":"character varying(30)","value":"Tuning"},{"name":"c","type":"timestamp without time zone","value":"2019-12-29 04:58:34.806671"}]}
{"action":"I","schema":"public","table":"table3_with_pk","columns":[{"name":"a","type":"integer","value":3},{"name":"b","type":"character varying(30)","value":"Replication"},{"name":"c","type":"timestamp without time zone","value":"2019-12-29 04:58:34.806671"}]}
{"action":"M","transactional":true,"prefix":"wal2json","content":"this message will be delivered"}
{"action":"D","schema":"public","table":"table3_with_pk","identity":[{"name":"a","type":"integer","value":1},{"name":"c","type":"timestamp without time zone","value":"2019-12-29 04:58:34.806671"}]}
{"action":"D","schema":"public","table":"table3_with_pk","identity":[{"name":"a","type":"integer","value":2},{"name":"c","type":"timestamp without time zone","value":"2019-12-29 04:58:34.806671"}]}
{"action":"I","schema":"public","table":"table3_without_pk","columns":[{"name":"a","type":"integer","value":1},{"name":"b","type":"numeric(5,2)","value":2.34},{"name":"c","type":"text","value":"Tapir"}]}
{"action":"C"}
stop
DROP TABLE
DROP TABLE

License

Copyright (c) 2013-2024, Euler Taveira de Oliveira All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Neither the name of the Euler Taveira de Oliveira nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

wal2json's People

Contributors

Stargazers

Watchers

Forkers

ethanfrey jarreds dmitriy-riabov dmonroy uphold-forks birdandbees postsql irshusdock amiganguli torsten-deriv emrul mxku guptam dvarrazzo niquola dpirotte goldfix paulczajka wind39 koordinates the4thdoctor deep-ghost notxcain wolverinoid pkoyani igouss davidfetter weiboyiyou bertall zhuomingliang keunsy linz mobiletta instructure hmilkovi tomsmalley suhuaguo mrxliu chengxiaoz alexshakurin marshallshen sumitjhamindtickle henrytseng ondrazizka alagalah streamsets db4u huopaopao erikvanzijst timostolz akaitoua rkrage pgexperts jwdeitch zhaoshiling1017 natalliabarysevich nhurd xrmx yum-install-brains greigwise obashitechnology benjie aiven mattrobenolt compasses smudduluru getsentry vox-tecnologia zylo47 bahaa skondla sam-jain mahabobur-rahman-m4 bmd-studio downssd dongzl dongbin86 phoenixhadoop metafour-int lipaul capablemonkey novanbramantya enggarranu pxtruy evgwpw oliverhao jmealo haizz cybernetics ssaldi abhilashr2020 nachocab private-citizen fabriziomello junneyang aspros-zhong ruiaylin bwalendz noyb34 mywin

wal2json's Issues

"One change per row" output mode

We use wal2json to do daily batch of incremental updates of an OLAP db (SnowFlake).
The way the JSON is consumed is that the JSON structure is loaded by a SQL function as a table with a single column and one table row per text line; And then accessed using a JSON-parsing SQL functions.

Having multiple changes in a single text line makes processing it quite difficult - it is necessary (if not impossible) to iterate over the changes: [...] array.

{"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change": [
  {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
  {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}
]}

It would relieve a lot of work if the wal2json plugin could optionally print a single change ,per JSON entry like this: one-change-per-entry = true would lead to:

{"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change": 
  {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}
}

{"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change": 
  {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}
}

Along with #70, this would still keep each entry uniquely identifiable:

{"xid":1074, "sn":1, ... }
{"xid":1074, "sn":2, ... }

This way, feeding the data to the OLAP like SnowFlake would need no pre-processing, which would be a big improvement.

Thanks for considering.

Size unhandled in pg_decode_message

Hello again,

While testing some code path, I found that the content_size isn't used in every code path of pg_decode_mesage(), which can lead to erreoneous out. For instance:

rjuju=# SELECT pg_logical_emit_message(false, 'wal2json', 'a very long message');
 pg_logical_emit_message 
-------------------------
 0/629A5358
(1 row)

rjuju=# SELECT pg_logical_emit_message(false, 'wal2json', 'meh');
 pg_logical_emit_message 
-------------------------
 0/629A5398
(1 row)

rjuju=# select * from pg_logical_slot_peek_changes('wal2json', null, null, 'pretty-print', '1');
    lsn     | xid |                           data                           
------------+-----+----------------------------------------------------------
 0/629A5358 |   0 | {                                                       +
            |     |         "change": [                                     +
            |     |                 {                                       +
            |     |                         "kind": "message",              +
            |     |                         "transactional": false,         +
            |     |                         "prefix": "wal2json",           +
            |     |                         "content": "a very long message"+
            |     |                 }                                       +
            |     |         ]                                               +
            |     | }
 0/629A5398 |   0 | {                                                       +
            |     |         "change": [                                     +
            |     |                 {                                       +
            |     |                         "kind": "message",              +
            |     |                         "transactional": false,         +
            |     |                         "prefix": "wal2json",           +
            |     |                         "content": "meh"                +
            |     |                 }                                       +
            |     |         ]                                               +
            |     | }
(2 rows)

rjuju=# select * from pg_logical_slot_peek_changes('wal2json', null, null, 'pretty-print', '0');
    lsn     | xid |                                                   data                                                    
------------+-----+-----------------------------------------------------------------------------------------------------------
 0/629A5358 |   0 | {"change":[{"kind":"message","transactional":false,"prefix":"wal2json","content":"a very long message"}]}
 0/629A5398 |   0 | {"change":[{"kind":"message","transactional":false,"prefix":"wal2json","content":"mehery long message"}]}
(2 rows)

(see last two messages content).

I didn't read upstream infrastructure, but I assume that for performance issue content isn't zeroed on every call, so you can't rely on it being NULL-terminated.

I wanted to provide a fix for this, but I have some questions.

I'm not sure why when pretty-print is asked you do a appendBinaryStringInfo() and otherwise call quote_escape_json(). I assume that quote_escape_json() should be called in both cases, or is there something I missed?

If quote_escape_json() has to be called() for both, I think we can ad a Size parameter, and iterate as an array instead of currrent pointer loop. We could also t manually call enlargeStringInfo() with the given size, instead of relying on it doubling it's size autmatically. That probably won't be a huge win, but it might still win some cycles.

What do you thing?

Sequence of decoding

Hello,
This is more of a question than an issue.
Assuming I have 3 concurrent transactions A,B and C all started at the same time x.
B happens to commit before A , and A commits before C.
So the commit order is something in the line of :

B,A,C.

giving they are commit at different times, do they all get decoded in the same order of commit. Or is it possible that in an environment with high concurrency , it's possible that the decoding is not strictly followed by the order/time with which each transaction was committed ?

If it happens that the transaction could be decoded regardless of their commit time , then it does suffice to say that the transaction commit time is not a sure way to track consumed messages and start replaying from where one left off.
Is it then possible to have the LSN number for both a begin and a commit statement ?

I know in the current setup of wal2json and the output messages, the begin header , transaction message and the commit header are all in a single message making it impossible to track the commit LSN.

Could you please shed some light on this topic ?

regards,

Add parameter to filter on table name

I see one of the use cases is to replicate a single table with known data schema to Kafka topic.
What do you thing about adding table parameter to be used as a filter on relation name?
Maybe there is some other way?

UPD: I forgot to add that schema also should be filtered.

Provide an option to output type OIDs instead of type names

Some connectors, namely Python psycopg2 in my case, use type OIDs directly for internal type conversion methods. In these cases it is far simpler to provide these OIDs directly and not rely on the type names.

Since a JSON output is a better API than the text-based example provided by Postgres, we should accommodate softwares using already existing connectors and frameworks. I hope this feature can be useful for a lot of people out there.

I'm providing a pull request for this feature, and am at your disposal for corrections or improvements.

Question on the format of datetime

I use wal2json and psycopg2 to got postgresql data change. I have a question on the format of datatime. In psql I have column type timestamp with timezone

In psql console:

postgres=# select create_time from test limit 1;
          create_time
-------------------------------
 2018-05-18 03:06:22.446039+00

In some graph ide like DataGrip (after I click on the column)

create_time
2018-05-18 03:06:22.446039 +00:00

The wal2json return '2018-05-18 03:06:22.446039+00'. But I look at https://docs.python.org/3/library/datetime.html The %z is '+0000' have 4 digit.
Does I got some misunderstanding on the stand?

Options Uknown

When I try to select/get changes with the following parameter :
'include_xids': '1',
'include_timestamp': '1',
'include_schemas': '1',
'include_types': '1',
'pretty_print' : '1'

there's always a warning :
2017-04-04 22:32:44 CEST [27933-25] postgres@test_db WARNING: option include_xids = 1 is unknown
2017-04-04 22:32:44 CEST [27933-26] postgres@test_db CONTEXT: slot "test", output plugin "wal2json", in the startup callback
2017-04-04 22:32:44 CEST [27933-27] postgres@test_db WARNING: option include_timestamp = 1 is unknown
2017-04-04 22:32:44 CEST [27933-28] postgres@test_db CONTEXT: slot "test", output plugin "wal2json", in the startup callback
2017-04-04 22:32:44 CEST [27933-29] postgres@test_db WARNING: option include_schemas = 1 is unknown

Only the option : write-in-chunks
works without any warning.

Are these options supported ?

regards

the removal of 'x' in "\x" corrupts data

I have data in a table the holds valid hex \x sequences (e.g. "\x1b" for an encoded [ESC] character). The following code in quote_escape_json seems to be the culprit:

/* XXX suppress \x in bytea field? */
if (ch == '\\' && *(valptr + 1) == 'x')
{
	valptr++;
	continue;
}

With this, my "\x1b" value is emitted from this plugin as "\1b".

Even if this is right thing to do for bytea values (not convinced it is), it is not correct for other textual data.

Why is the LSN key called 'nextlsn'?

This is more of a query than an issue.
The LSN for each change comes with key as 'nextlsn' even though the value is the actual LSN. This gives the impression that the value given by wal2json for the 'nextlsn' key is for the next change rather than the current one. This threw me off.

The code says that it is the value pointing to the next commit record.

Output using JDBC and setting the slot option include-lsn to true:

{
	"xid": 740,
	"nextlsn": "0/18154A8",
	"timestamp": "2018-02-16 12:46:40.877369+05:30",
	"change": [
		{
			"kind": "insert",
			"schema": "public",
			"table": "test_table",
			"columnnames": ["pk", "name"],
			"columntypes": ["integer", "text"],
			"columnvalues": [160, "val1"]
		}
	]
}
-----
{
	"xid": 741,
	"nextlsn": "0/1815558",
	"timestamp": "2018-02-16 12:46:43.485518+05:30",
	"change": [
		{
			"kind": "insert",
			"schema": "public",
			"table": "test_table",
			"columnnames": ["pk", "name"],
			"columntypes": ["integer", "text"],
			"columnvalues": [161, "val2"]
		}
	]
}
-----
{
	"xid": 742,
	"nextlsn": "0/1815608",
	"timestamp": "2018-02-16 12:46:46.437739+05:30",
	"change": [
		{
			"kind": "insert",
			"schema": "public",
			"table": "test_table",
			"columnnames": ["pk", "name"],
			"columntypes": ["integer", "text"],
			"columnvalues": [162, "val3"]
		}
	]
}

Output from psql:

SELECT * FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL);
    lsn    | xid |                                                                              data                                                                              
-----------+-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------
 0/18154A8 | 740 | {"change":[{"kind":"insert","schema":"public","table":"test_table","columnnames":["pk","name"],"columntypes":["integer","text"],"columnvalues":[160,"val1"]}]}
 0/1815558 | 741 | {"change":[{"kind":"insert","schema":"public","table":"test_table","columnnames":["pk","name"],"columntypes":["integer","text"],"columnvalues":[161,"val2"]}]}
 0/1815608 | 742 | {"change":[{"kind":"insert","schema":"public","table":"test_table","columnnames":["pk","name"],"columntypes":["integer","text"],"columnvalues":[162,"val3"]}]}

As you can see the nextlsn from wal2json maps to the lsn column given by Postgres.

Can someone clear the confusion?

Alpine Linux PostgreSQL Docker image: wal2json.so: __snprintf_chk: symbol not found

I have compiled wal2json against PostgreSQL 9.5, in Docker container of postgres:9.5.
Then I am trying to load the plugin in PostgreSQL 9.5, again, Docker container, postgres:9.5-alpine.
But here is what I get after SELECT * FROM pg_create_logical_replication_slot('export5lot', 'wal2json');:

ERROR: could not load library "/usr/local/lib/postgresql/wal2json.so": Error relocating /usr/local/lib/postgresql/wal2json.so: __snprintf_chk: symbol not found

This is due to the Alpine Linux using musl instead of glibc.

I'm not quite sure what to do about it. It's been some time since my last C project.
Could you please see if there's a way to get around that? Perhaps some flag to use something else than __snprintf_chk when building for Alpine?

Json without terminating line

Hi,
I noticed that sometimes json is not terminated with the final line "]}". After the final change array element I have directly a line similar to this:
'{"xid":000000,"change":['
and next the other changes and eventually the json terminating line.
This makes json validation more difficult.

Provide a primary key flag for columns

When the event is processed downstream it is important to know what columns forms the primary key. Could you please provide a flag for columns that says that it is a part of primary key?

postpone write parameter

Currently, the plugin outputs JSON in pieces (got data and sent it). The receiver has to maintain the state to know if a complete JSON object has already arrived then it can process it.

At least for the SQL interface POV, it would be a good idea to support buffering JSON object until its end. It would avoid maintain the state in the receiver. The drawback is long transactions would consume a lot of server memory.

PS> unfortunately, pg_recvlogical can't take advantage of this option because you can only provide parameters at startup. Of course, you could choose a convenient default for all your streaming.

Transaction id and timestamp options

Why can I not use -o include_xids=1 or -o include_timestamp=1 as options.

When I try to use these options when running --start i get the following error:

pg_recvlogical: unexpected termination of replication stream: ERROR:  option "include_xids" = "1" is unknown
CONTEXT:  slot "test_slot", output plugin "wal2json", in the startup callback

What am i doing wrong? and how to I show transaction ids and timestamps?

Warnings from materialized views

Logical replication will emit events for materialized views, which will not have primary keys. Depending on the complexity of the mat view, this can result in many empty warnings which will saturate the backend. Is there a way around this?

Table whitelist

Hi, thanks for the filter-tables feature. I tried it today and it works fine, despite the fact, that it works like a blacklist. I have a ton of tables in my database, but I only want a couple of them to be streamed. Is there a way to somehow "inverse" that filter and make it work like a whitelist?

Invalid JSON is generated for quoted identifiers

When a qutoed identifiers are used - like when a table name contains space - then the JSON generator is not escaping the quotes and as a result the JSON is invalid.

Possible to ignore tables w/o PK?

Hi,

We are testing this plugin and got tons of log warnings ( > 1M lines) all referring to the same table from the output plugin due to a table without a PK. The table in question, isn't one that we are interested in changes from and is not part of the list we pass into the "add-tables" parameter when we connect.

Is it possible to ignore changes we're not interested in?

Thanks for any thoughts on this.

Connection request:

PGReplicationStream stream =
replConnection.getReplicationAPI()
.replicationStream()
.logical()
.withSlotName("app_decoder_slot")
.withSlotOption("add-tables", subscribedTables.join(','))
.withSlotOption("include-unchanged-toast", false)
.withStatusInterval(20, TimeUnit.SECONDS)
.start();

Sample from log output

... ] 82 WARNING: table "domains_290906" without primary key or replica identity is nothing
... ] 83 CONTEXT: slot "cache_decoder_slot", output plugin "wal2json", in the change callback, associated LSN 3D13/A20D2B60
... ] 84 WARNING: table "domains_290906" without primary key or replica identity is nothing
... ] 85 CONTEXT: slot "cache_decoder_slot", output plugin "wal2json", in the change callback, associated LSN 3D13/A20D2C30
... ] 86 WARNING: table "domains_290906" without primary key or replica identity is nothing
... ] 87 CONTEXT: slot "cache_decoder_slot", output plugin "wal2json", in the change callback, associated LSN 3D13/A20D2D00
... ] 88 WARNING: table "domains_290906" without primary key or replica identity is nothing
... ] 89 CONTEXT: slot "cache_decoder_slot", output plugin "wal2json", in the change callback, associated LSN 3D13/A20D2DC8
... ] 90 WARNING: table "domains_290906" without primary key or replica identity is nothing

Include user name and application name in output?

Hi,

I wondered if it would be possible to (optionally) include the SESSION_USER and application_name values for the backend that committed a transaction in the json output?

Thanks.

pretty print option

Currently, pretty print JSON is the only option. It would be desirable to add an option to print a compact format.

Compile issue using USE_PGXS=1 make && make install

Hi, I've been trying to use wal2son with kafka connect. However, I am not able to compile because the directory/file containing pgxs.mk cannot be found

I'm using Ubuntu 10.04 and postgres is installed under /usr/lib but it doesn't have the .mk file

I also tried installing postgresql from the source still didn't work. Any help would be greatly appreciated :)

"no known snapshots" error when db rows are large

We are using wal2json with Debezium to replicate some db tables containing large rows. While streaming changes, we have encountered the following error:

2018-07-27T19:54:09.616547179Z ERROR:  no known snapshots
--
2018-07-27T19:54:09.616587252Z CONTEXT:  slot "debezium", output plugin "wal2json", in the change callback, associated LSN 0/16D3850

We think this is a bug with wal2json's handling of TOAST columns, since this behavior only started happening after commit ce82d73

I have a more detailed analysis, including steps to reproduce the issue, up on Debezium's bugtracker:
https://issues.jboss.org/browse/DBZ-842

Toast Columns

HI,

there is a contraindication to force read of "Toast Columns"?

Maybe there are performance problem?

		/* XXX Unchanged TOAST Datum does not need to be output */
		if (!isnull && typisvarlena && VARATT_IS_EXTERNAL_ONDISK(origval))
		{
			elog(WARNING, "column \"%s\" has an unchanged TOAST", NameStr(attr->attname));
			continue;
		}

Thanks!

ciao
p

Amazon RDS

Hi,
I'm using version 9.6.3 and cannot load the wal2json extension. I cannot see it when i list the available extensions as well.
I can however work with 'auto_complete' and apparently they were both added together.

You need to install postgresql-server-dev-X.Y for building a server-side extension or libpq-dev for building a client-side application

Hello,

I have postgres 9.5 installed on Ubuntu and trying to install wal2json.
During execution of USE_PGXS=1 make or USE_PGXS=1 make install, I'm getting below error.
"/opt/wal2json# USE_PGXS=1 make
You need to install postgresql-server-dev-X.Y for building a server-side extension or libpq-dev for building a client-side application.
make: *** No targets. Stop.
:/opt/wal2json# USE_PGXS=1 make install
You need to install postgresql-server-dev-X.Y for building a server-side extension or libpq-dev for building a client-side application.
make: *** No rule to make target 'install'. Stop."

Please suggest.

Are there any plans to create release versions?

I find tracking how far behind the tip of the master branch we are is much easier with version numbers than commit hashes, and it would be good to know which commits you think warrant minor and major version number increases.

Cheers.

Option for a "column":"value" data structure

Currently, the output data structure is:

{"change":[{"kind":"update","table":"mytable2",
    "columnnames":["id","name"],
    "columnvalues":[400,"Update 1"],
    "oldkeys":{"keynames":["id"],"keyvalues":[400]}}]
}

For postprocessing purposes, it would be more appropriate to have this format:

{"change":[{"kind":"update","table":"mytable2",
    "changes":{"id":400, "name":"Update 1"},
    "oldkeys":{"id":400}}]
}

I suggest this could be an option

changes-data-structure = two-arrays | key-value-hash

Thanks for considering.

Placeholder `{organization}` in readme and license

The readme and license file contain an {organization} placeholder that should be replaced with something else.

ERROR: invalid memory alloc request size 1073741824

Hello ,
Irecently ran into an interesting situation with the plugin.

- cenario :
I made a ddl change in a relatively large table involves adding a new column with a default value, and a type; all in a single transaction . The table has a trigger added to it , but was disabled during the operation.

--
What I found out was that the wal2json plugin while decoding that part of the WAL output a lot of warning messages :

WARNING: column "idl_old_id_column_nanme" has an unchanged TOAST :

And eventually error out with the following error message:

ERROR: invalid memory alloc request size 1073741824

--
beyond this point , there was no decoding possible. Even after I manually provided LSNs and purged out all the data just before the complaining LSN value.

The above error message shows that it is trying to use more work_mem that it could be provided, and a few digging around points to a gradual reduction of the work_mem to prevent this; which didn't help either.

-- My question

As this could lead to some serious problems as data loss is eminent since no decoding can be done ,
what is the way to avoid this in the future.
What's the best way to fix decoding after it abruptly stopped after a huge transaction. And would not continue again even after a restart.
Is it necessary for all tables that were to be decoded have at least the minimal replica identity even when the decoded data are being discarded simply because they are not needed ?
Is there a plan to eventually implement table/schema filtering while decoding the WAL ?

cheers .

Ga support 9.5

Which user caused the changes?

Is there any way to figure out which user caused the changes? thanks so much.

btw. thanks for this awesome plugin :)

Options defaults are not very sensible

I don't think the current defaults are particularly sensible:

include-xid = true: xids are only useful if you know what to do with them, as well as LSN. Otherwise they are just a source of noise to be filtered out. See test suite.
write-in-chunks = true: with this value records emitted are not valid JSON objects and a streaming parser is required. I don't even know what is it useful for

piro=# SELECT data::json FROM pg_logical_slot_peek_changes('jslot', NULL, NULL, 'write-in-chunks', 'f');
     data      
---------------
 {"change":[]}
(1 row)

piro=# SELECT data::json FROM pg_logical_slot_peek_changes('jslot', NULL, NULL, 'write-in-chunks', 't');
ERROR:  invalid input syntax for type json
DETAIL:  The input string ended unexpectedly.
CONTEXT:  JSON data, line 1: {"change":[

I don't know what is the current policy about backwards compatibility, but I think these values should default to false.

Official Docker release

It's nice to have an official docker image based on official postgres with at least 2 distributions: debian and alpine.

Provide an optional flag for columns

When one is building an Avro schema for the events it might be important to distinguish between optinal and non-optional fields for downstream clients. Would it be possible to add a flag to columns that they are optional?

Thanks!

Isn't compatible with PostgreSQL 10

The extension doesn't compile with lots of various errors.

Numeric/Decimals should be strings?

NUMERICOID is encoded as a number in the JSON output, but decimals/numeric values are typically encoded as strings to preserve the precision and lengths without floating point abiguity. Gut feel is wal2json should do the same thing.

Thoughts?

(maybe) wrong memory context?

More a question than a bug report, as I don't know many details of Postgres allocator:

pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
				 Relation relation, ReorderBufferChange *change)

	/* ... */

	old = MemoryContextSwitchTo(data->context);

	/* ... */

	switch (change->action)
	{
		case REORDER_BUFFER_CHANGE_INSERT:
			if (change->data.tp.newtuple == NULL)
			{
				elog(WARNING, "no tuple data for INSERT in table \"%s\"", NameStr(class_form->relname));
				return;
			}
			break;

	/* ... */

	MemoryContextSwitchTo(old);
	MemoryContextReset(data->context);
}

if the return is early, isn't the memory context left in an inconsistent state?

Thank you.

Create an official release or tag

Hi, the Debezium project is interested in using this plug-in; would it be possible to cut an official release and/or tag so we have a stable version we can refer to? I'm a bit reluctant to pull in just the HEAD of master as it's a moving target inherently. So we currently refer a specific commit id, but having a named tag would be better of course. Thanks a lot for considering this request!

Control-chars should be escaped in strings

The JSON specification requires that control chars are properly escaped. If the string value contains control chars then it is sent as is without conversion to escape sequence. This causes JSON parsers to fail.

Warning with update on table without primary key and data are lost

Hi!

Executing tests from README.md I take this case:

[email protected]:moses_dev_db> SELECT data FROM pg_logical_slot_get_changes('test_slot', NULL, NULL, 'pretty-print', '1', 'write-in-chunks', '0');
WARNING:  table "table_without_pk" without primary key or replica identity is nothing

+--------+
| data   |
|--------|
| {
        "change": [
        ]
}        |
+--------+
SELECT 1
Time: 0.095s

But using test_decoding within another slot:

[email protected]:moses_dev_db> SELECT data FROM pg_logical_slot_get_changes('test_slot2', NULL, NULL);
+-------------------------------------------------------------------------------------+
| data                                                                                |
|-------------------------------------------------------------------------------------|
| BEGIN 1890                                                                          |
| table public.table_without_pk: UPDATE: a[integer]:1 b[numeric]:2.34 c[text]:'anta2' |
| COMMIT 1890                                                                         |
+-------------------------------------------------------------------------------------+
SELECT 3
Time: 0.084s

Looking the source code I see another possibly points where WARNINGs will be printed and data discated, how I can bypass this and don't lose data? For my case use test_decoding and implement data parser is better option?

Thanks a lot!

Events are resent

Hi,

I noticed a scenario in which events are resent. This can be found here: https://www.postgresql.org/message-id/CANcxKGKg5eof4Zm1fj-8EhirX9ZT4gnZyNoVEXqp74wruJvg8A%40mail.gmail.com

I suspect that this is related to the wal2json plugin.

Please let me know when you need more information or testing from my side.

Thanks you.

Deferred PK are not recognized

I have an issue with a table which has a primary key defined as DEFERRABLE INITIALLY DEFERRED.

Here is postgres setup:

wal_level = logical
max_wal_senders = 3
max_replication_slots = 3

The PK is ignored and any data processed as an UPDATE or DELETE operation is not written by wal2json plugin.
Looking at the code, I found:

/* Make sure rd_replidindex is set */
RelationGetIndexList(relation);
...
if (!OidIsValid(relation->rd_replidindex) && relation->rd_rel->relreplident != REPLICA_IDENTITY_FULL)

RelationGetIndexList does not seem to set the right PK.

I created a small test case to show this behaviour (I updated the given sample).

$ cat /tmp/deferredPK.sql 
CREATE TABLE table_with_deferred_pk (a SERIAL, b VARCHAR(30), c TIMESTAMP NOT NULL, PRIMARY KEY(a, c) DEFERRABLE INITIALLY DEFERRED);

BEGIN;
INSERT INTO table_with_deferred_pk (b, c) VALUES('Backup and Restore', now());
INSERT INTO table_with_deferred_pk (b, c) VALUES('Tuning', now());
INSERT INTO table_with_deferred_pk (b, c) VALUES('Replication', now());
UPDATE table_with_deferred_pk SET b = 'Tuning - update' where b = 'Tuning';
DELETE FROM table_with_deferred_pk WHERE a < 3;
COMMIT;

$ psql -At -f /tmp/deferredPK.sql  postgres
CREATE TABLE
BEGIN
INSERT 0 1
INSERT 0 1
INSERT 0 1
UPDATE 1
DELETE 2
COMMIT

We can see that only INSERT operations are written by wal2json

$ pg_recvlogical -d postgres --slot test_slot --start -o pretty-print=1 -f -
{
	"change": [
	]
}
WARNING:  table "table_with_deferred_pk" without primary key or replica identity is nothing
WARNING:  table "table_with_deferred_pk" without primary key or replica identity is nothing
WARNING:  table "table_with_deferred_pk" without primary key or replica identity is nothing
{
	"change": [
		{
			"kind": "insert",
			"schema": "public",
			"table": "table_with_deferred_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [1, "Backup and Restore", "2018-04-19 07:19:02.867699"]
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table_with_deferred_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [2, "Tuning", "2018-04-19 07:19:02.867699"]
		}
		,{
			"kind": "insert",
			"schema": "public",
			"table": "table_with_deferred_pk",
			"columnnames": ["a", "b", "c"],
			"columntypes": ["integer", "character varying(30)", "timestamp without time zone"],
			"columnvalues": [3, "Replication", "2018-04-19 07:19:02.867699"]
		}
	]
}

Note:

Changing the PK to DEFERRABLE INITIALLY IMMEDIATE produces the same behaviour

bytea data not properly decoded

Hello,
I recently found out that bytea columns are not properly decoded. The bytea data comes in some
strange decoding that is not in any way similar to the source data.

Could this be some kind of bug ?

cheers .

rugging24

Clarify what's the encoding for JSON messages

Hi, we're seeking some clarity on the encoding used for sending the messages by wal2json. More specifically, is it safe to assume that the bytes can be interpreted using UTF-8?

Add per-transaction unique number to each change

Hi, we use wal2json to transfer incremental changes to an OLAP server (SnowFlake).
It's done in a batch after some period (nightly).
During the day, one table row may change several times.
By SnowFlake import, the changes are considered as a set, not an ordered list, so the duplications may cause the earlier change override the former. For instance:

START TRANSACTION;
UPDATE myTable2 SET name = 'Update AA' WHERE id = 401;
UPDATE myTable2 SET name = 'Update BB' WHERE id = 401;
COMMIT;

Leads to:

{"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change": [
  {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
  {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}
]}

It would be great if wal2json could add a number to each change

{"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change": [
  {"sn":1, "kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
  {"sn":12 "kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}
]}

Thanks for considering.

Previous value for update statement

Is it available get by update statement previous value in result diff? For example next queries

create table test_table(id NUMERIC PRIMARY KEY, value VARCHAR(200));
insert into test_table(id, value) values(1, 'original');
update test_table set value = 'modified' where id = 1;

generate next diff

{
    "xid": 8273,
    "change": [
        {
            "kind": "insert",
            "schema": "public",
            "table": "test_table",
            "columnnames": ["id", "value"],
            "columntypes": ["numeric", "varchar"],
            "columnvalues": [1, "original"]
        }
    ]
}
{
    "xid": 8274,
    "change": [
        {
            "kind": "update",
            "schema": "public",
            "table": "test_table",
            "columnnames": ["id", "value"],
            "columntypes": ["numeric", "varchar"],
            "columnvalues": [1, "modified"],
            "oldkeys": {
                "keynames": ["id"],
                "keytypes": ["numeric"],
                "keyvalues": [1]
            }
        }
    ]
}

It not allow propagate changes to external system. For example tables can be connected to one document and send to elasticsearch. After update some reference field we should invalidate previous value(it can be back-reference)

my env info:

[root@8cd4bd6e-ee63-667f-f105-85303b90d559 ~/wal2json]# gcc --version
gcc (GCC) 4.9.4
[root@8cd4bd6e-ee63-667f-f105-85303b90d559 ~/wal2json]# make --version
GNU Make 4.1
[root@8cd4bd6e-ee63-667f-f105-85303b90d559 ~/wal2json]# ld --version
GNU ld (GNU Binutils) 2.26.1
postgres=# select version();
                                     version
---------------------------------------------------------------------------------
 PostgreSQL 9.5.9 on x86_64-sun-solaris2.11, compiled by gcc (GCC) 4.9.4, 64-bit

and the error:

[root@8cd4bd6e-ee63-667f-f105-85303b90d559 ~/wal2json]# USE_PGXS=1 make
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -pipe -O2 -pipe -O2 -I/opt/local/include -I/opt/local/include/ncurses -DLDAP_DEPRECATED -I/usr/include -fPIC -L/opt/local/lib -L/opt/local/gcc49/lib/gcc/x86_64-sun-solaris2.11/4.9.4 -Wl,-R/opt/local/gcc49/lib/gcc/x86_64-sun-solaris2.11/4.9.4 -L/opt/local/lib -Wl,-R/opt/local/lib -L/usr/lib/amd64 -Wl,-R/usr/lib/amd64 -L/opt/local/lib -L/opt/local/lib -L/opt/local/lib -L/opt/local/lib  -Wl,--as-needed -Wl,-R'/opt/local/lib'  -shared -o wal2json.so wal2json.o
/opt/local/x86_64-sun-solaris2.11/bin/ld:/opt/local/gcc49/lib/gcc/x86_64-sun-solaris2.11/4.9.4/../../../libgcc-unwind.map: file format not recognized; treating as linker script
/opt/local/x86_64-sun-solaris2.11/bin/ld:/opt/local/gcc49/lib/gcc/x86_64-sun-solaris2.11/4.9.4/../../../libgcc-unwind.map:1: syntax error
collect2: error: ld returned 1 exit status
/opt/local/lib/postgresql/pgxs/src/makefiles/../../src/Makefile.port:22: recipe for target 'wal2json.so' failed
make: *** [wal2json.so] Error 1

Tests fail _sometimes_

Hi,

I noticed that various tests sometimes fail. I didn't explore the issue deeply but it looks like just a flaw of the tests, not a bug in the extension itself.

Example 1:
http://afiskon.ru/s/fc/53ae61ede5_regression.diffs.txt
http://afiskon.ru/s/d4/bd9a065311_regression.out.txt

Example 2:
http://afiskon.ru/s/79/11a4ea2430_regression.diffs.txt
http://afiskon.ru/s/35/1523b25c8b_regression.out.txt

The environment is Arch Linux x64, GCC 7.2.0, Core i7 3632QM 2200 Mhz (4 cores, 8 with HT).

eulerto / wal2json Goto Github PK

wal2json's Introduction

Introduction

Build and Install

Unix based Operating Systems

Windows

Configuration

postgresql.conf

Parameters

Examples

pg_recvlogical

SQL functions

License

wal2json's People

Contributors

Stargazers

Watchers

Forkers

wal2json's Issues

Recommend Projects

Recommend Topics

Recommend Org