Giter Site home page Giter Site logo

2ndquadrant / pglogical Goto Github PK

View Code? Open in Web Editor NEW
938.0 83.0 150.0 1.84 MB

Logical Replication extension for PostgreSQL 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.

Home Page: http://2ndquadrant.com/en/resources/pglogical/

License: Other

Makefile 1.22% C 83.39% PLpgSQL 11.03% Perl 4.36%
postgresql replication logical-decoding database-replication subscription publish-subscribe data-transformation data-transport etl cdc

pglogical's People

Contributors

alvherre avatar amenonsen avatar amrutadeolasee avatar ants avatar brushtyler avatar eulerto avatar ibarwick avatar irionr avatar jaime2ndquadrant avatar josh-heyer avatar juxtin avatar koichi-szk avatar laurenz avatar leonardoce avatar m99coder avatar martinmarques avatar maw3ll avatar mnencia avatar moench-tegeder avatar pallavisontakke avatar pashagolub avatar pavanvd avatar pensnarik avatar petere avatar pjmodos avatar psoo avatar rascaldan avatar ringerc avatar simonat2ndquadrant avatar turnstep avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pglogical's Issues

pglogical extention update from 1.0.1 to 1.1.2 ERROR

Have this config:

psql (9.5.2)
Type "help" for help.

xdray=# \dx pglogical 
                   List of installed extensions
   Name    | Version |  Schema   |          Description           
-----------+---------+-----------+--------------------------------
 pglogical | 1.0.1   | pglogical | PostgreSQL Logical Replication
(1 row)

Installed new version:

rpm -qa|grep pglogical
postgresql95-pglogical-output-1.1.0-1_2ndQuadrant.el6.x86_64
postgresql95-pglogical-1.1.2-1_2ndQuadrant.el6.x86_64 

Try to update extention:

alter extension pglogical update;
ERROR:  could not find function "pglogical_alter_node_add_interface" in file "/usr/pgsql-9.5/lib/pglogical.so"

replication in catchup state, wal sender high cpu usage

replication stop working after 2 hours:

postgres=# select * from pg_stat_replication;
  pid  | usesysid | usename  | application_name | client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state 
-------+----------+----------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
  7036 |       10 | postgres | walreceiver      | 192.168.0.55 |                 |       55638 | 2016-09-07 13:54:47.633061+03 |              | streaming | 162/2D57F1B8  | 162/2D57F1B8   | 162/2D57F1B8   | 162/2D57D490    |             0 | async
 14934 |       10 | postgres | audit2           | 192.168.0.57 |                 |       54697 | 2016-09-08 09:44:11.601138+03 |     82827775 | catchup   | 160/3C3DFE90  | 160/3C3DFE90   | 160/3C3DFE90   | 160/3C3DFE90    |             0 | async
(2 rows)

postgres=# select * from pg_replication_slots;
              slot_name               |      plugin      | slot_type |  datoid  |    database     | active | active_pid |   xmin   | catalog_xmin | restart_lsn  
--------------------------------------+------------------+-----------+----------+-----------------+--------+------------+----------+--------------+--------------
 pgl_lsd_master_orig_provider1_audit2 | pglogical_output | logical   | 60735254 | lsd_master_orig | t      |      14934 |          |     82379410 | 160/3949FA30
 pgl_lsd_master_orig_provider1_audit1 | pglogical_output | logical   | 60735254 | lsd_master_orig | f      |            |          |     82287138 | 15F/D5675338
 db2_slot                             |                  | physical  |          |                 | t      |       7036 | 82858274 |              | 162/2D586190
(3 rows)

and got 100% cpu usage

postgres  7012  0.0  0.0 6577536 2896 ?        Ss   Sep07   0:34 postgres: wal writer process   
postgres  7036  0.1  0.0 6579500 3548 ?        Ss   Sep07   1:24 postgres: wal sender process postgres 192.168.0.55(55638) streaming 162/2D7C17E8
postgres 14934 99.5  0.1 6600412 24988 ?       Rs   09:44  33:54 postgres: wal sender process postgres 192.168.0.57(54697) idle

in logs on provider:

2016-09-08 09:44:11 MSK [14934]: [3-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 DEBUG:  received replication command: START_REPLICATION SLOT "pgl_lsd_master_orig_provider1_audit2" LOGICAL 160/393F5CF0 (expected_encoding 'UTF8', min_proto_version '1', max_proto_version '1', startup_params_format '1', "binary.want_internal_basetypes" '1', "binary.want_binary_basetypes" '1', "binary.basetypes_major_version" '905', "binary.sizeof_datum" '8', "binary.sizeof_int" '4', "binary.sizeof_long" '8', "binary.bigendian" '0', "binary.float4_byval" '1', "binary.float8_byval" '1', "binary.integer_datetimes" '1', "hooks.setup_function" 'pglogical.pglogical_hooks_setup', "pglogical.forward_origins" '"all"', "pglogical.replication_set_names" 'audit_set', "relmeta_cache_size" '-1', pg_version '90503', pglogical_version '1.1.2', pglogical_version_num '10102')
2016-09-08 09:44:11 MSK [14934]: [5-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 DEBUG:  Unrecognised pglogical parameter startup_params_format ignored
2016-09-08 09:44:11 MSK [14934]: [6-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 CONTEXT:  slot "pgl_lsd_master_orig_provider1_audit2", output plugin "pglogical_output", in the startup callback
2016-09-08 09:44:11 MSK [14934]: [7-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 DEBUG:  Unrecognised pglogical parameter pglogical.forward_origins ignored
2016-09-08 09:44:11 MSK [14934]: [8-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 CONTEXT:  slot "pgl_lsd_master_orig_provider1_audit2", output plugin "pglogical_output", in the startup callback
2016-09-08 09:44:11 MSK [14934]: [9-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 DEBUG:  Unrecognised pglogical parameter pglogical.replication_set_names ignored
2016-09-08 09:44:11 MSK [14934]: [10-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 CONTEXT:  slot "pgl_lsd_master_orig_provider1_audit2", output plugin "pglogical_output", in the startup callback
2016-09-08 09:44:11 MSK [14934]: [11-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 DEBUG:  Unrecognised pglogical parameter relmeta_cache_size ignored
2016-09-08 09:44:11 MSK [14934]: [12-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 CONTEXT:  slot "pgl_lsd_master_orig_provider1_audit2", output plugin "pglogical_output", in the startup callback
2016-09-08 09:44:11 MSK [14934]: [13-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 DEBUG:  Unrecognised pglogical parameter pglogical_version ignored
2016-09-08 09:44:11 MSK [14934]: [14-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 CONTEXT:  slot "pgl_lsd_master_orig_provider1_audit2", output plugin "pglogical_output", in the startup callback
2016-09-08 09:44:11 MSK [14934]: [15-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 DEBUG:  Unrecognised pglogical parameter pglogical_version_num ignored
2016-09-08 09:44:11 MSK [14934]: [16-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 CONTEXT:  slot "pgl_lsd_master_orig_provider1_audit2", output plugin "pglogical_output", in the startup callback
2016-09-08 09:44:11 MSK [14934]: [17-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 LOG:  starting logical decoding for slot "pgl_lsd_master_orig_provider1_audit2"
2016-09-08 09:44:11 MSK [14934]: [19-1] db=lsd_master_orig,appname=audit2,user=postgres,client=192.168.0.57 LOG:  logical decoding found consistent point at 160/3949FA30

on subscriber:

2016-09-08 09:44:06 MSK [6937]: [5-1] db=,appname=,user=,client= ERROR:  connection to other side has died
2016-09-08 09:44:06 MSK [938]: [135-1] db=,appname=,user=,client= LOG:  worker process: pglogical apply 186620:1800593394 (PID 6937) exited with exit code 1
2016-09-08 09:44:06 MSK [938]: [136-1] db=,appname=,user=,client= DEBUG:  unregistering background worker "pglogical apply 186620:1800593394"
2016-09-08 09:44:11 MSK [938]: [137-1] db=,appname=,user=,client= DEBUG:  registering background worker "pglogical apply 186620:1800593394"
2016-09-08 09:44:11 MSK [938]: [138-1] db=,appname=,user=,client= DEBUG:  starting background worker process "pglogical apply 186620:1800593394"
2016-09-08 09:44:11 MSK [25448]: [1-1] db=,appname=,user=,client= LOG:  starting apply for subscription audit2
2016-09-08 09:44:11 MSK [25448]: [2-1] db=,appname=,user=,client= DEBUG:  connecting to provider provider1, dsn host=192.168.0.51 port=5432 dbname=lsd_master_orig
2016-09-08 09:44:12 MSK [25448]: [3-1] db=,appname=,user=,client= DEBUG:  upstream Pg version is 9.5.2
2016-09-08 09:44:12 MSK [25448]: [4-1] db=,appname=,user=,client= DEBUG:  changeset origin forwarding enabled: t

upstream Pg version is 9.5.2 -- this is strange, our postgresql server was 9.5.0 and then was updated to 9.5.4

postgres=# select version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 9.5.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit
(1 row)

I have one set with 1 table on provider

SELECT pglogical.create_node(
    node_name := 'provider1',
    dsn := 'host=192.168.0.51 port=5432 dbname=lsd_master_orig'
);
select pglogical.create_replication_set('audit_set', TRUE, TRUE, TRUE, FALSE);
select pglogical.replication_set_add_table('audit_set', 'audit.public$mc_case_y2016_m9', true);

subscriber:

SELECT pglogical.create_node(
    node_name := 'subscriber1',
    dsn := 'host=192.168.0.57 port=5432 dbname=lsd_master_orig'
);

SELECT pglogical.create_subscription(
    subscription_name := 'audit2',
    provider_dsn := 'host=192.168.0.51 port=5432 dbname=lsd_master_orig',
    replication_sets := '{audit_set}',
        synchronize_structure := false,
        synchronize_data := true
);
lsd_master_orig=# select pglogical.show_subscription_status();
                                                          show_subscription_status                                                          
--------------------------------------------------------------------------------------------------------------------------------------------
 (audit2,replicating,provider1,"host=192.168.0.51 port=5432 dbname=lsd_master_orig",pgl_lsd_master_orig_provider1_audit2,{audit_set},{all})
(1 row)

I have the same problem with previous subscription (audit1). So I have droped database on subscriber and droped extension pglogical on provider, updated postgresql packets, restart, . Then create database on subscriber and start over with no luck.

perf top for wal sender on provider

Samples: 2M of event 'cpu-clock', Event count (approx.): 13123950452                                                                                                                                                                                                          
Overhead  Shared Object        Symbol                                                                                                                                                                                                                                         
  87.27%  postgres             [.] hash_seq_search
   5.37%  postgres             [.] CatalogCacheIdInvalidate
   2.07%  postgres             [.] hash_search_with_hash_value
   1.04%  postgres             [.] LocalExecuteInvalidationMessage
   1.00%  postgres             [.] CallSyscacheCallbacks
   0.61%  postgres             [.] RelfilenodeMapInvalidateCallback
   0.58%  postgres             [.] hash_uint32
   0.52%  postgres             [.] hash_search
   0.32%  postgres             [.] ReorderBufferCommit
   0.25%  postgres             [.] hash_seq_init
   0.21%  postgres             [.] deregister_seq_scan
   0.14%  postgres             [.] RelationCacheInvalidateEntry
   0.14%  postgres             [.] uint32_hash
   0.13%  pglogical_output.so  [.] relmeta_cache_callback
   0.08%  postgres             [.] PlanCacheRelCallback
   0.05%  postgres             [.] InvalidateCatalogSnapshot
   0.05%  pglogical_output.so  [.] hash_search@plt
   0.05%  [kernel]             [k] __do_softirq
   0.04%  postgres             [.] GetCurrentTransactionNestLevel
   0.02%  [kernel]             [k] vmxnet3_poll_rx_only
   0.01%  [kernel]             [k] _raw_spin_unlock_irqrestore
   0.01%  [kernel]             [k] vmxnet3_xmit_frame
   0.00%  [kernel]             [k] tcp_wfree
   0.00%  [kernel]             [k] vmxnet3_tq_tx_complete
   0.00%  [kernel]             [k] __slab_free
   0.00%  [kernel]             [k] kfree
   0.00%  [kernel]             [k] tcp_urg
   0.00%  [kernel]             [k] skb_release_data
   0.00%  [kernel]             [k] bio_advance
   0.00%  [kernel]             [k] tcp_v4_do_rcv
   0.00%  postgres             [.] binaryheap_first
   0.00%  [kernel]             [k] unmap_single
   0.00%  [kernel]             [k] skb_release_head_state
   0.00%  [kernel]             [k] raw_local_deliver
   0.00%  [kernel]             [k] blk_update_bidi_request
   0.00%  [kernel]             [k] tcp_send_delayed_ack
   0.00%  [kernel]             [k] sock_wfree
   0.00%  [kernel]             [k] vmxnet3_rq_rx_complete
   0.00%  [kernel]             [k] read_tsc
   0.00%  [kernel]             [k] check_leaf.isra.7
   0.00%  [kernel]             [k] tg_load_down
   0.00%  [kernel]             [k] __inet_lookup_established
   0.00%  [kernel]             [k] tcp_valid_rtt_meas
   0.00%  postgres             [.] NamespaceCallback
   0.00%  libc-2.17.so         [.] __memcmp_sse4_1
   0.00%  [kernel]             [k] queue_work_on
   0.00%  [kernel]             [k] vmxnet3_unmap_tx_buf
   0.00%  [kernel]             [k] net_rx_action
   0.00%  [kernel]             [k] swiotlb_unmap_page
   0.00%  postgres             [.] ResourceOwnerEnlargeRelationRefs
   0.00%  [kernel]             [k] __slab_alloc
   0.00%  [kernel]             [k] sock_def_readable
   0.00%  postgres             [.] PlanCacheFuncCallback
   0.00%  [kernel]             [k] retint_careful
   0.00%  [kernel]             [k] napi_complete
   0.00%  [kernel]             [k] inet_csk_destroy_sock
   0.00%  [kernel]             [k] consume_skb

strace for wal sender:

read(14, "H\0\0\0\0\0\0\0`\245{;`\1\0\0\4\0\0\0\0\0\0\0\236t\0\0\0\0\0\0"..., 72) = 72
read(14, "", 0)                         = 0
read(14, "H\0\0\0\0\0\0\0\240\245{;`\1\0\0\4\0\0\0\0\0\0\0\236t\0\0\0\0\0\0"..., 72) = 72
read(14, "", 0)                         = 0
read(14, "H\0\0\0\0\0\0\0\340\245{;`\1\0\0\1\0\0\0\0\0\0\0\177\6\0\0\26\277\236\3"..., 72) = 72
read(14, "", 0)                         = 0
read(14, "H\0\0\0\0\0\0\0@\250{;`\1\0\0\4\0\0\0\0\0\0\0\237t\0\0\0\0\0\0"..., 72) = 72
read(14, "", 0)                         = 0
read(14, "H\0\0\0\0\0\0\0\200\250{;`\1\0\0\4\0\0\0\0\0\0\0\237t\0\0\0\0\0\0"..., 72) = 72
read(14, "", 0)                         = 0

postgresql.conf:

max_worker_processes = 10
max_wal_senders = 10
wal_level = logical
wal_receiver_status_interval=10
shared_preload_libraries = 'pg_stat_statements,pgstattuple,pglogical'
wal_buffers = 64
wal_keep_segments = 1000
max_replication_slots = 10
track_commit_timestamp = on
# rpm -qa | grep postgres
postgresql95-pglogical-output-1.1.0-1_2ndQuadrant.el7.centos.x86_64
postgresql95-libs-9.5.4-1PGDG.rhel7.x86_64
postgresql95-pglogical-debuginfo-1.1.2-1_2ndQuadrant.el7.centos.x86_64
postgresql95-pglogical-1.1.2-1_2ndQuadrant.el7.centos.x86_64
postgresql95-9.5.4-1PGDG.rhel7.x86_64
postgresql95-pglogical-output-debuginfo-1.1.0-1_2ndQuadrant.el7.centos.x86_64
postgresql95-contrib-9.5.4-1PGDG.rhel7.x86_64
postgresql95-debuginfo-9.5.4-1PGDG.rhel7.x86_64
postgresql95-server-9.5.4-1PGDG.rhel7.x86_64

PGLogical slave crush loop

The setup:

  • Two PG instances are setup on the same host, with different port numbers and data directories.
  • The streaming replication is setup from one instance to another.
    The problem:
  • While at streaming replication, we create a table 'xxx' and call pg_switch_xlog(), then slave instance is promoted, master/slave have pglogical nodes created, etc to have working pglogical replication.
    We test data replication and delays - all is working.
    The two different results we get are:
    1. The 'xxx' table doesn't exist on slave - but nothing crushes. Understood, we need a delay.
      OR
    2. The 'xxx' table does exist on slave, but if we try inserting anything into the 'xxx' table, slave goes into crush-start-crush.. infinite loop.
      Both cases are prevented with a delay before calling pg_switch_xlog() until 'xxx' table is detected on the slave. However, it would be nice replacing crush loop with an error message of some kind, suggesting something.

Using synchronize_structure errors while replicating `create schema pglogical_origin` on 9.4

note: This appears to relate directly to #35, but I figured I'd report it separately since it doesn't involve upgrades.

I have an existing vagrant environment that sets up two Ubuntu 16.04 boxes running Postgresql 9.4 and sets up pglogical replication from one to the other. As of pglogical 1.2.0, I can no longer start replication on the replica node. The logs suggest that the create schema pglogical_origin command is being replicated, which fails as it is already installed on the replica.

Some other pertinent details:

  • synchronize_structure is set to TRUE
  • the identical steps worked with pglogical 1.1.0 (same scripts)

I appreciate any help you can offer. Thanks!

Relevant logs

2016-10-05 18:05:04 UTC [4683-30] LOG:  registering background worker "pglogical apply 16548:1763399739"
2016-10-05 18:05:04 UTC [4683-31] LOG:  starting background worker process "pglogical apply 16548:1763399739"
2016-10-05 18:05:04 UTC [5057-1] LOG:  starting apply for subscription subscription1
2016-10-05 18:05:05 UTC [5066-1] rep@test_replication ERROR:  schema "pglogical_origin" already exists
2016-10-05 18:05:05 UTC [5066-2] rep@test_replication STATEMENT:  CREATE SCHEMA pglogical_origin;



pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 7; 2615 16551 SCHEMA pglogical_origin postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  schema "pglogical_origin" already exists
    Command was: CREATE SCHEMA pglogical_origin;



2016-10-05 18:05:05 UTC [5057-2] ERROR:  could not execute command "/usr/lib/postgresql/9.4/bin/pg_restore --section="pre-data" --exit-on-error -1 -d "host=192.168.33.10 port=5432 dbname=test_replication user=rep" "/tmp/pglogical-5057.dump""
2016-10-05 18:05:05 UTC [5057-3] LOG:  apply worker [5057] at slot 1 generation 1 crashed
2016-10-05 18:05:05 UTC [4683-32] LOG:  worker process: pglogical apply 16548:1763399739 (PID 5057) exited with exit code 1

ERROR: permission denied for schema pglogical

Hi,

I have an error I do a refresh materialized view or when I launch a script in which there is a "drop temporary table" statement. In both case, I get the error above and I'm forced to apply "grant usage on schema pglogical" for the user that runs this statement.

ERROR: permission denied for schema pglogical
CONTEXT: SQL statement "DROP TABLE pg_temp_6.pg_temp_22541_2, pg_temp_6.pg_temp_22541"
STATEMENT: REFRESH MATERIALIZED VIEW CONCURRENTLY blablabla WITH DATA;

ERROR relation "pglogical.replication_set_relation" does not exist after extention update

After updating extention from 1.0.1 to 1.1.2 and restarting of provider db see repeated errors:

< 2016-06-23 10:00:01.856 MSK >LOG:  connection authorized: user=xdray database=xdray
< 2016-06-23 10:01:04.359 MSK >LOG:  connection received: host=172.21.206.240 port=56735
< 2016-06-23 10:01:04.364 MSK >LOG:  replication connection authorized: user=xdray
< 2016-06-23 10:01:04.367 MSK >LOG:  starting logical decoding for slot "pgl_xdray_msk_xdray_1_transactions_msk"
< 2016-06-23 10:01:04.367 MSK >DETAIL:  streaming transactions committing after 18/EC0F0B8, reading WAL from 18/EC0F0B8
< 2016-06-23 10:01:04.387 MSK >LOG:  logical decoding found consistent point at 18/EC0F0B8
< 2016-06-23 10:01:04.387 MSK >DETAIL:  Logical decoding will begin using saved snapshot.
< 2016-06-23 10:01:04.388 MSK >ERROR:  relation "pglogical.replication_set_relation" does not exist
< 2016-06-23 10:01:04.388 MSK >CONTEXT:  slot "pgl_xdray_msk_xdray_1_transactions_msk", output plugin "pglogical_output", in the change callback, associated LSN 18/EC5E908

The provider DB under was under load (inserts and updates).

psql (9.5.2)
Type "help" for help.

xdray=# \dx pglogical
                   List of installed extensions
   Name    | Version |  Schema   |          Description
-----------+---------+-----------+--------------------------------
 pglogical | 1.1.2   | pglogical | PostgreSQL Logical Replication
(1 row)

xdray=# \dt pglogical.replication_set_relation
                    List of relations
  Schema   |           Name           | Type  |  Owner
-----------+--------------------------+-------+----------
 pglogical | replication_set_relation | table | postgres
(1 row)

Upgrading to 1.3

We are currently running:

postgresql94-pglogical-output-1.1.0-1_2ndQuadrant.el6.x86_64
pglogical-rhel-1.0-2.noarch
postgresql94-pglogical-1.1.2-1_2ndQuadrant.el6.x86_64

We would like to use the new row filtering that seems to be available in the latest version. Do we only need to run the "1.20->1.3.0.sql" file? Or do we need to reinstall the new RPM's and redo all of our subscriptions, etc? essentially re-install on every db.

Also, we are running postgresql-9.4

Duplicate rows on initial replication

postgresql version: 9.4.9
pglogical verison: 1.2.1

We are seeing intermittent failures in initial replication that appear to be caused by some rows getting replicated twice. We're still working on a repro for this, but it's something along these lines:

  1. Have a table on the source-side with a uuid primary key (not sure if that matters), some other columns, and a last_active timestamp (which is populated by the application, not the database)
  2. Add all tables in the public schema to the default replication set, using replication_set_add_all_tables with synchronize_data=true
  3. While the source continues to serve requests as normal, create a subscription on the replica
    a. All the tables get created on the replica side
    b. Two rows appear on the replica side, each with the same uuid primary key, but with different last_active timestamps
    c. PostgreSQL logs an error, saying "could not create unique index ..." for the primary key
    d. pglogical logs an error saying "subscriber initialization failed during nonrecoverable step (c), please try the setup again"

It /appears/ to be triggered to be correlated with write activity on the source side, but that's only anecdotal at this point.

Document requirement for normal and replication pg_hba.conf entries

Hi,

When I've configured the provider, I have added an entry under the replication section, as described in the doc.
However, I still get the error above and I have to add an extra entry under IPV4 for the particular database I want to replicate.

Why ? It seems in the doc that the replication entry should be enough.

conflict_resolution setting is ignored

Hello,

I have set the conflict_resolution to 'last_update_wins':

set pglogical.conflict_resolution = 'first_update_wins';

But evertime I change something at the provider host, the data is updated at the subscriber and the logfile says:

remote UPDATE on relation xxxxx. Resolution: apply_remote.

What did I do wrong?

Thanks,
Michael

Fatal replication slot does not exist error after sync with concurrent traffic

Posted on pglogical/bdr list and Craig Ringer suggested I post an issue here. The database for which we are attempting to sync data is very active, and the issue we are having only occurs when we try to sync a table that is several million rows or larger. Other tables sync perfectly fine. What happens on larger tables is that the sync appears to be working fine, but as soon as initial copy finishes, it attempts to reconnect and continue the subscription after that point, and a fatal error is created because it is looking for a replication slot that does not exist.

I have reproduced this scenario multiple times, but it only fails on larger tables. Here is my setup:

9.5 provider to 9.5 provider pglogical replication (9.5 provider also has a 9.5 streamer which still works fine).

Here are my steps I followed:

(1) Installed extensions, repl role both places:

cluster_a: db_prod:

SET ROLE postgres;  CREATE EXTENSION IF NOT EXISTS pglogical;

CREATE EXTENSION

cluster_c: db_prod:

SET ROLE postgres;  CREATE EXTENSION IF NOT EXISTS pglogical;

CREATE EXTENSION

(2) Add provider node

cluster_a: db_prod:

SELECT pglogical.create_node(
  node_name := 'provider',
  dsn := 'host=0.0.0.0 port=0 dbname=db_prod user=foo’
  );

(3) Add custom replication set
cluster_a: monitor_prod:

SELECT pglogical.create_replication_set
(set_name:='insert_update'
,replicate_insert:=TRUE
,replicate_update:=TRUE
,replicate_delete:=FALSE
,replicate_truncate:=FALSE);

(4) Add subscriber node
cluster_c: db_prod:

SELECT pglogical.create_node(
  node_name := 'subscriber',
  dsn := 'host=0.0.0.1 port=0 dbname=db_prod user=foo’
  );

(5) Copy all table structure from provider to subscriber (via pg_dump)
(6) Added subscription for this replication set on subscriber
cluster_c: db_prod:

SELECT pglogical.create_subscription(
  subscription_name := 'insert_update_subscription_to_cluster_a_db_prod'
  , provider_dsn := 'host=0.0.0.0 dbname=db_prod port=0 user=foo’
  , replication_sets := ARRAY['insert_update']
  , synchronize_structure := FALSE
  , synchronize_data := FALSE
);

(7) Added 1 table to 1 replication set on provider which replicates INSERT and UPDATE only with synchronize_data = true to start copy
monitoring_a: monitor_prod:

SELECT n.nspname, c.relname, pglogical.replication_set_add_table(
  set_name:='insert_update'
  ,relation:=c.oid
  --default synchronize_data is false
  ,synchronize_data:=true
)
FROM pg_class c
INNER JOIN pg_namespace n
  ON c.relnamespace = n.oid
  AND n.nspname ~* '^foo$'
  AND c.relname ~* '^bar$'
  AND c.relkind = 'r'
  AND c.relpersistence = 'p'

I can't enable logging level all at this time, but I can provide the logs relevant to replication only. Here they are from the point of creating the nodes to the point of the error:

SUBSCRIBER:

2016-09-30 14:12:18.977 CDT,,,70701,,57eeb912.1142d,1,,2016-09-30 14:12:18 CDT,5/37559,0,LOG,00000,"starting pglogical database manager for database db_prod",,,,,,,,,"pglogical manager 16415"
2016-09-30 14:14:18.539 CDT,,,78720,"0.0.0.2:10",57eeb98a.13380,1,"",2016-09-30 14:14:18 CDT,,0,LOG,00000,"connection received: host=0.0.0.2 port=10",,,,,,,,,""
2016-09-30 14:14:18.558 CDT,"userfoo","db_prod",78720,"0.0.0.2:10",57eeb98a.13380,2,"authentication",2016-09-30 14:14:18 CDT,6/30594,0,LOG,00000,"connection authorized: user=userfoo database=db_prod SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:14:18.566 CDT,"userfoo","db_prod",78720,"0.0.0.2:10",57eeb98a.13380,3,"idle",2016-09-30 14:14:18 CDT,,0,LOG,00000,"disconnection: session time: 0:00:00.027 user=userfoo database=db_prod host=0.0.0.2 port=10",,,,,,,,,"psql"
2016-09-30 14:14:18.685 CDT,,,78721,"0.0.0.2:11",57eeb98a.13381,1,"",2016-09-30 14:14:18 CDT,,0,LOG,00000,"connection received: host=0.0.0.2 port=11",,,,,,,,,""
2016-09-30 14:14:18.707 CDT,"userfoo","db_prod",78721,"0.0.0.2:11",57eeb98a.13381,2,"authentication",2016-09-30 14:14:18 CDT,6/30596,0,LOG,00000,"connection authorized: user=userfoo database=db_prod SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:14:18.745 CDT,,,78722,"0.0.0.1:12",57eeb98a.13382,1,"",2016-09-30 14:14:18 CDT,,0,LOG,00000,"connection received: host=0.0.0.1 port=12",,,,,,,,,""
2016-09-30 14:14:18.758 CDT,"userreplication","db_prod",78722,"0.0.0.1:12",57eeb98a.13382,2,"authentication",2016-09-30 14:14:18 CDT,7/55237,0,LOG,00000,"connection authorized: user=userreplication database=db_prod SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:14:18.760 CDT,"userreplication","db_prod",78722,"0.0.0.1:12",57eeb98a.13382,3,"idle",2016-09-30 14:14:18 CDT,,0,LOG,00000,"disconnection: session time: 0:00:00.015 user=userreplication database=db_prod host=0.0.0.1 port=12",,,,,,,,,"create_subscription"
2016-09-30 14:14:18.767 CDT,"userfoo","db_prod",78721,"0.0.0.2:11",57eeb98a.13381,3,"idle",2016-09-30 14:14:18 CDT,,0,LOG,00000,"disconnection: session time: 0:00:00.081 user=userfoo database=db_prod host=0.0.0.2 port=11",,,,,,,,,"psql"
2016-09-30 14:14:18.768 CDT,,,78723,,57eeb98a.13383,1,,2016-09-30 14:14:18 CDT,7/0,0,LOG,00000,"starting apply for subscription insert_update_subscription_to_cluster_a_db_prod",,,,,,,,,"pglogical apply 16415:1811666624"
2016-09-30 14:16:37.081 CDT,,,86472,"0.0.0.2:13",57eeba15.151c8,1,"",2016-09-30 14:16:37 CDT,,0,LOG,00000,"connection received: host=0.0.0.2 port=13",,,,,,,,,""
2016-09-30 14:16:37.103 CDT,"userfoo","postgres",86472,"0.0.0.2:13",57eeba15.151c8,2,"authentication",2016-09-30 14:16:37 CDT,6/30616,0,LOG,00000,"connection authorized: user=userfoo database=postgres SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:16:38.318 CDT,,,86473,"0.0.0.2:14",57eeba16.151c9,1,"",2016-09-30 14:16:38 CDT,,0,LOG,00000,"connection received: host=0.0.0.2 port=14",,,,,,,,,""
2016-09-30 14:16:38.337 CDT,"userfoo","db_prod",86473,"0.0.0.2:14",57eeba16.151c9,2,"authentication",2016-09-30 14:16:38 CDT,8/25684,0,LOG,00000,"connection authorized: user=userfoo database=db_prod SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:16:38.341 CDT,"userfoo","postgres",86472,"0.0.0.2:13",57eeba15.151c8,3,"idle",2016-09-30 14:16:37 CDT,,0,LOG,00000,"disconnection: session time: 0:00:01.260 user=userfoo database=postgres host=0.0.0.2 port=13",,,,,,,,,"psql"
2016-09-30 14:16:46.601 CDT,,,53136,,57eba236.cf90,1618,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""
2016-09-30 14:16:50.130 CDT,,,53136,,57eba236.cf90,1619,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint complete: wrote 36 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=3.511 s, sync=0.011 s, total=3.529 s; sync files=24, longest=0.002 s, average=0.000 s; distance=22 kB, estimate=423536 kB",,,,,,,,,""
2016-09-30 14:17:19.726 CDT,"userfoo","db_prod",86473,"0.0.0.2:14",57eeba16.151c9,3,"idle",2016-09-30 14:16:38 CDT,,0,LOG,00000,"disconnection: session time: 0:00:41.408 user=userfoo database=db_prod host=0.0.0.2 port=14",,,,,,,,,"psql"
2016-09-30 14:18:10.686 CDT,,,94965,,57eeba72.172f5,1,,2016-09-30 14:18:10 CDT,6/0,0,LOG,00000,"starting sync of table schema_name.table_name for subscriber insert_update_subscription_to_cluster_a_db_prod",,,,,,,,,"pglogical sync table_name 16415:1811666624"
2016-09-30 14:18:10.740 CDT,,,94966,"0.0.0.1:15",57eeba72.172f6,1,"",2016-09-30 14:18:10 CDT,,0,LOG,00000,"connection received: host=0.0.0.1 port=15",,,,,,,,,""
2016-09-30 14:18:10.752 CDT,"userreplication","db_prod",94966,"0.0.0.1:15",57eeba72.172f6,2,"authentication",2016-09-30 14:18:10 CDT,8/25698,0,LOG,00000,"connection authorized: user=userreplication database=db_prod SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:18:28.689 CDT,,,53136,,57eba236.cf90,1620,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint starting: xlog",,,,,,,,,""
2016-09-30 14:18:45.579 CDT,,,53136,,57eba236.cf90,1621,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint complete: wrote 25401 buffers (19.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=16.629 s, sync=0.218 s, total=16.890 s; sync files=21, longest=0.093 s, average=0.010 s; distance=718625 kB, estimate=718625 kB",,,,,,,,,""
2016-09-30 14:18:48.636 CDT,,,53136,,57eba236.cf90,1622,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoints are occurring too frequently (20 seconds apart)",,"Consider increasing the configuration parameter ""max_wal_size"".",,,,,,,""
2016-09-30 14:18:48.636 CDT,,,53136,,57eba236.cf90,1623,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint starting: xlog",,,,,,,,,""
2016-09-30 14:19:06.373 CDT,,,53136,,57eba236.cf90,1624,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint complete: wrote 26826 buffers (20.5%); 0 transaction log file(s) added, 0 removed, 44 recycled; write=17.106 s, sync=0.286 s, total=17.736 s; sync files=5, longest=0.123 s, average=0.057 s; distance=730917 kB, estimate=730917 kB",,,,,,,,,""
2016-09-30 14:19:08.431 CDT,,,53136,,57eba236.cf90,1625,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoints are occurring too frequently (20 seconds apart)",,"Consider increasing the configuration parameter ""max_wal_size"".",,,,,,,""
2016-09-30 14:19:08.431 CDT,,,53136,,57eba236.cf90,1626,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint starting: xlog",,,,,,,,,""
2016-09-30 14:19:15.397 CDT,"userreplication","db_prod",94966,"0.0.0.1:15",57eeba72.172f6,3,"idle",2016-09-30 14:18:10 CDT,,0,LOG,00000,"disconnection: session time: 0:01:04.657 user=userreplication database=db_prod host=0.0.0.1 port=15",,,,,,,,,"pglogical_copy"
2016-09-30 14:19:15.532 CDT,,,94965,,57eeba72.172f5,2,,2016-09-30 14:18:10 CDT,6/0,0,FATAL,XX000,"could not send replication command ""START_REPLICATION SLOT ""pgl_db_prod_provider_insert_u6bfbd6c_7a94e8ae"" LOGICAL FBF/E4FDF518 (expected_encoding 'UTF8', min_proto_version '1', max_proto_version '1', startup_params_format '1', ""binary.want_internal_basetypes"" '1', ""binary.want_binary_basetypes"" '1', ""binary.basetypes_major_version"" '905', ""binary.sizeof_datum"" '8', ""binary.sizeof_int"" '4', ""binary.sizeof_long"" '8', ""binary.bigendian"" '0', ""binary.float4_byval"" '1', ""binary.float8_byval"" '1', ""binary.integer_datetimes"" '1', ""hooks.setup_function"" 'pglogical.pglogical_hooks_setup', ""pglogical.forward_origins"" 'all', ""pglogical.replicate_only_table"" 'schema_name.table_name', ""relmeta_cache_size"" '-1', pg_version '90503', pglogical_version '1.1.2', pglogical_version_num '10102')"": ERROR:  replication slot ""pgl_db_prod_provider_insert_u6bfbd6c_7a94e8ae"" does not exist

, sqlstate: 42704",,,,,,,,,"pglogical sync table_name 16415:1811666624"
2016-09-30 14:19:15.536 CDT,,,53133,,57eba236.cf8d,57,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"worker process: pglogical sync table_name 16415:1811666624 (PID 94965) exited with exit code 1",,,,,,,,,""
2016-09-30 14:23:25.482 CDT,,,53136,,57eba236.cf90,1627,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint complete: wrote 26108 buffers (19.9%); 0 transaction log file(s) added, 0 removed, 44 recycled; write=256.950 s, sync=0.023 s, total=257.051 s; sync files=8, longest=0.014 s, average=0.002 s; distance=716933 kB, estimate=729518 kB",,,,,,,,,""
2016-09-30 14:24:08.526 CDT,,,53136,,57eba236.cf90,1628,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""
2016-09-30 14:27:51.045 CDT,"postgres","db_prod",85725,"[local]",57eeaea6.14edd,4,"idle",2016-09-30 13:27:50 CDT,,0,LOG,00000,"disconnection: session time: 1:00:00.226 user=postgres database=db_prod host=[local]",,,,,,,,,""
2016-09-30 14:27:51.047 CDT,,,33473,"[local]",57eebcb7.82c1,1,"",2016-09-30 14:27:51 CDT,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,""
2016-09-30 14:27:51.050 CDT,"postgres","db_prod",33473,"[local]",57eebcb7.82c1,2,"authentication",2016-09-30 14:27:51 CDT,3/98370,0,LOG,00000,"connection authorized: user=postgres database=db_prod",,,,,,,,,""
2016-09-30 14:28:38.110 CDT,,,53136,,57eba236.cf90,1629,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint complete: wrote 10783 buffers (8.2%); 0 transaction log file(s) added, 0 removed, 44 recycled; write=269.514 s, sync=0.016 s, total=269.584 s; sync files=14, longest=0.006 s, average=0.001 s; distance=248512 kB, estimate=681418 kB",,,,,,,,,""
2016-09-30 14:29:08.141 CDT,,,53136,,57eba236.cf90,1630,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""
2016-09-30 14:29:08.205 CDT,,,53136,,57eba236.cf90,1631,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint complete: wrote 1 buffers (0.0%); 0 transaction log file(s) added, 6 removed, 9 recycled; write=0.005 s, sync=0.001 s, total=0.064 s; sync files=1, longest=0.001 s, average=0.001 s; distance=15 kB, estimate=613277 kB",,,,,,,,,""
2016-09-30 14:34:08.305 CDT,,,53136,,57eba236.cf90,1632,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""
2016-09-30 14:34:08.316 CDT,,,53136,,57eba236.cf90,1633,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.003 s, sync=0.000 s, total=0.010 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=551950 kB",,,,,,,,,""
2016-09-30 14:39:08.405 CDT,,,53136,,57eba236.cf90,1634,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""
2016-09-30 14:39:08.415 CDT,,,53136,,57eba236.cf90,1635,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.003 s, sync=0.000 s, total=0.009 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=496755 kB",,,,,,,,,""
2016-09-30 14:43:20.972 CDT,,,97380,"0.0.0.2:10",57eec058.17c64,1,"",2016-09-30 14:43:20 CDT,,0,LOG,00000,"connection received: host=0.0.0.2 port=10",,,,,,,,,""
2016-09-30 14:43:20.992 CDT,"userfoo","db_prod",97380,"0.0.0.2:10",57eec058.17c64,2,"authentication",2016-09-30 14:43:20 CDT,6/30783,0,LOG,00000,"connection authorized: user=userfoo database=db_prod SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:43:21.003 CDT,,,97381,,57eec058.17c65,1,,2016-09-30 14:43:20 CDT,8/0,0,LOG,00000,"starting sync of table schema_name.table_name for subscriber insert_update_subscription_to_cluster_a_db_prod",,,,,,,,,"pglogical sync table_name 16415:1811666624"
2016-09-30 14:43:23.338 CDT,,,97381,,57eec058.17c65,2,,2016-09-30 14:43:20 CDT,8/25702,5550,ERROR,XX000,"tuple concurrently updated",,,,,,,,,"pglogical sync table_name 16415:1811666624"
2016-09-30 14:43:23.340 CDT,"userfoo","db_prod",97380,"0.0.0.2:10",57eec058.17c64,3,"idle",2016-09-30 14:43:20 CDT,,0,LOG,00000,"disconnection: session time: 0:00:02.367 user=userfoo database=db_prod host=0.0.0.2 port=10",,,,,,,,,"psql"
2016-09-30 14:43:23.341 CDT,,,53133,,57eba236.cf8d,58,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"worker process: pglogical sync table_name 16415:1811666624 (PID 97381) exited with exit code 1",,,,,,,,,""
2016-09-30 14:44:08.515 CDT,,,53136,,57eba236.cf90,1636,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""
2016-09-30 14:44:08.932 CDT,,,53136,,57eba236.cf90,1637,,2016-09-28 05:57:58 CDT,,0,LOG,00000,"checkpoint complete: wrote 5 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.405 s, sync=0.004 s, total=0.417 s; sync files=5, longest=0.002 s, average=0.000 s; distance=3 kB, estimate=447080 kB",,,,,,,,,""

PROVIDER:

2016-09-30 14:09:05.994 CDT,,,25148,,57eeb851.623c,1,,2016-09-30 14:09:05 CDT,13/114406,0,LOG,00000,"starting pglogical database manager for database db_prod",,,,,,,,,"pglogical manager 16402"
2016-09-30 14:14:18.726 CDT,"userreplication","db_prod",45784,"0.0.0.1:87",57eeb98a.b2d8,2,"authentication",2016-09-30 14:14:18 CDT,19/1519855,0,LOG,00000,"connection authorized: user=userreplication database=db_prod SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:14:18.730 CDT,"userreplication","db_prod",45784,"0.0.0.1:87",57eeb98a.b2d8,3,"idle",2016-09-30 14:14:18 CDT,,0,LOG,00000,"disconnection: session time: 0:00:00.014 user=userreplication database=db_prod host=0.0.0.1 port=87",,,,,,,,,"create_subscription"
2016-09-30 14:14:18.742 CDT,"userreplication","db_prod",45785,"0.0.0.1:8",57eeb98a.b2d9,2,"authentication",2016-09-30 14:14:18 CDT,19/1519857,0,LOG,00000,"replication connection authorized: user=userreplication SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:14:18.744 CDT,"userreplication","db_prod",45785,"0.0.0.1:8",57eeb98a.b2d9,3,"idle",2016-09-30 14:14:18 CDT,,0,LOG,00000,"disconnection: session time: 0:00:00.012 user=userreplication database=db_prod host=0.0.0.1 port=8",,,,,,,,,"create_subscription"
2016-09-30 14:14:18.783 CDT,"userreplication","db_prod",45786,"0.0.0.1:9",57eeb98a.b2da,2,"authentication",2016-09-30 14:14:18 CDT,19/1519858,0,LOG,00000,"connection authorized: user=userreplication database=db_prod SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:14:18.790 CDT,"userreplication","db_prod",45786,"0.0.0.1:9",57eeb98a.b2da,3,"idle",2016-09-30 14:14:18 CDT,,0,LOG,00000,"disconnection: session time: 0:00:00.018 user=userreplication database=db_prod host=0.0.0.1 port=9",,,,,,,,,"pglogical_snapshot"
2016-09-30 14:14:18.804 CDT,"userreplication","db_prod",45787,"0.0.0.1:7",57eeb98a.b2db,2,"authentication",2016-09-30 14:14:18 CDT,19/1519860,0,LOG,00000,"replication connection authorized: user=userreplication SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:14:18.822 CDT,"userreplication","db_prod",45787,"0.0.0.1:7",57eeb98a.b2db,3,"idle",2016-09-30 14:14:18 CDT,19/0,0,LOG,00000,"logical decoding found consistent point at FBF/E3129028","There are no running transactions.",,,,,,,,"pglogical_snapshot"
2016-09-30 14:14:18.822 CDT,"userreplication","db_prod",45787,"0.0.0.1:7",57eeb98a.b2db,4,"idle",2016-09-30 14:14:18 CDT,19/1519861,1117040183,LOG,00000,"exported logical decoding snapshot: ""4294AE37-1"" with 0 transaction IDs",,,,,,,,,"pglogical_snapshot"
2016-09-30 14:14:18.827 CDT,"userreplication","db_prod",45787,"0.0.0.1:7",57eeb98a.b2db,5,"idle in transaction",2016-09-30 14:14:18 CDT,,0,LOG,00000,"disconnection: session time: 0:00:00.035 user=userreplication database=db_prod host=0.0.0.1 port=7",,,,,,,,,"pglogical_snapshot"
2016-09-30 14:14:18.838 CDT,"userreplication","db_prod",45788,"0.0.0.1:4",57eeb98a.b2dc,2,"authentication",2016-09-30 14:14:18 CDT,19/1519862,0,LOG,00000,"replication connection authorized: user=userreplication SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:14:18.841 CDT,"userreplication","db_prod",45788,"0.0.0.1:4",57eeb98a.b2dc,3,"idle",2016-09-30 14:14:18 CDT,19/0,0,LOG,00000,"starting logical decoding for slot ""pgl_db_prod_provider_insert_u6bfbd6c""","streaming transactions committing after FBF/E3129060, reading WAL from FBF/E3129028",,,,,,,,"insert_update_subscription_to_cluster_a_db_prod"
2016-09-30 14:14:18.841 CDT,"userreplication","db_prod",45788,"0.0.0.1:4",57eeb98a.b2dc,4,"idle",2016-09-30 14:14:18 CDT,19/0,0,LOG,00000,"logical decoding found consistent point at FBF/E3129028","There are no running transactions.",,,,,,,,"insert_update_subscription_to_cluster_a_db_prod"
2016-09-30 14:18:10.706 CDT,"userreplication","db_prod",62095,"0.0.0.1:2",57eeba72.f28f,2,"authentication",2016-09-30 14:18:10 CDT,14/486324,0,LOG,00000,"replication connection authorized: user=userreplication SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:18:10.719 CDT,"userreplication","db_prod",62095,"0.0.0.1:2",57eeba72.f28f,3,"idle",2016-09-30 14:18:10 CDT,14/0,0,LOG,00000,"logical decoding found consistent point at FBF/E4FDF4E0","There are no running transactions.",,,,,,,,"pglogical_copy"
2016-09-30 14:18:10.720 CDT,"userreplication","db_prod",62095,"0.0.0.1:2",57eeba72.f28f,4,"idle",2016-09-30 14:18:10 CDT,14/486325,1117042620,LOG,00000,"exported logical decoding snapshot: ""4294B7BC-1"" with 0 transaction IDs",,,,,,,,,"pglogical_copy"
2016-09-30 14:18:10.736 CDT,"userreplication","db_prod",62096,"0.0.0.1:6",57eeba72.f290,2,"authentication",2016-09-30 14:18:10 CDT,17/1418131,0,LOG,00000,"connection authorized: user=userreplication database=db_prod SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:19:11.961 CDT,"userreplication","db_prod",62096,"0.0.0.1:6",57eeba72.f290,3,"idle",2016-09-30 14:18:10 CDT,,0,LOG,00000,"disconnection: session time: 0:01:01.236 user=userreplication database=db_prod host=0.0.0.1 port=6",,,,,,,,,"pglogical_copy"
2016-09-30 14:19:15.398 CDT,"userreplication","db_prod",62095,"0.0.0.1:2",57eeba72.f28f,5,"idle in transaction",2016-09-30 14:18:10 CDT,,0,LOG,00000,"disconnection: session time: 0:01:04.708 user=userreplication database=db_prod host=0.0.0.1 port=2",,,,,,,,,"pglogical_copy"
2016-09-30 14:19:15.414 CDT,"userreplication","db_prod",65996,"0.0.0.1:1",57eebab3.101cc,2,"authentication",2016-09-30 14:19:15 CDT,14/486326,0,LOG,00000,"connection authorized: user=userreplication database=db_prod SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:19:15.423 CDT,"userreplication","db_prod",65996,"0.0.0.1:1",57eebab3.101cc,3,"idle",2016-09-30 14:19:15 CDT,,0,LOG,00000,"disconnection: session time: 0:00:00.023 user=userreplication database=db_prod host=0.0.0.1 port=1",,,,,,,,,"cleanup"
2016-09-30 14:19:15.530 CDT,"userreplication","db_prod",65997,"0.0.0.1:3",57eebab3.101cd,2,"authentication",2016-09-30 14:19:15 CDT,14/486329,0,LOG,00000,"replication connection authorized: user=userreplication SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:19:15.533 CDT,"userreplication","db_prod",65997,"0.0.0.1:3",57eebab3.101cd,3,"idle",2016-09-30 14:19:15 CDT,14/0,0,ERROR,42704,"replication slot ""pgl_db_prod_provider_insert_u6bfbd6c_7a94e8ae"" does not exist",,,,,,,,,"insert_update_subscription_to_cluster_a_db_prod"
2016-09-30 14:19:15.536 CDT,"userreplication","db_prod",65997,"0.0.0.1:3",57eebab3.101cd,4,"idle",2016-09-30 14:19:15 CDT,14/0,0,LOG,08006,"could not receive data from client: Connection reset by peer",,,,,,,,,"insert_update_subscription_to_cluster_a_db_prod"
2016-09-30 14:19:15.536 CDT,"userreplication","db_prod",65997,"0.0.0.1:3",57eebab3.101cd,5,"idle",2016-09-30 14:19:15 CDT,,0,LOG,00000,"disconnection: session time: 0:00:00.017 user=userreplication database=db_prod host=0.0.0.1 port=3",,,,,,,,,"insert_update_subscription_to_cluster_a_db_prod"
2016-09-30 14:20:15.518 CDT,"userreplication","db_prod",45788,"0.0.0.1:4",57eeb98a.b2dc,5,"idle",2016-09-30 14:14:18 CDT,19/0,0,LOG,00000,"terminating walsender process due to replication timeout",,,,,,,,,"insert_update_subscription_to_cluster_a_db_prod"
2016-09-30 14:20:15.518 CDT,"userreplication","db_prod",45788,"0.0.0.1:4",57eeb98a.b2dc,6,"idle",2016-09-30 14:14:18 CDT,,0,LOG,00000,"disconnection: session time: 0:05:56.689 user=userreplication database=db_prod host=0.0.0.1 port=4",,,,,,,,,"insert_update_subscription_to_cluster_a_db_prod"
2016-09-30 14:43:23.329 CDT,"userreplication","db_prod",66389,"0.0.0.1:5",57eec05b.10355,2,"authentication",2016-09-30 14:43:23 CDT,16/391063,0,LOG,00000,"connection authorized: user=userreplication database=db_prod SSL enabled (protocol=TLSv1.2, cipher=X-Y-Z-X-Y-Z, compression=on)",,,,,,,,,""
2016-09-30 14:43:23.336 CDT,"userreplication","db_prod",66389,"0.0.0.1:5",57eec05b.10355,3,"idle",2016-09-30 14:43:23 CDT,,0,LOG,00000,"disconnection: session time: 0:00:00.015 user=userreplication database=db_prod host=0.0.0.1 port=5",,,,,,,,,"cleanup"

Any insight is very appreciated!!!

Understanding Sequence replication process

Hi,

I'm using :

  • pglogical | 1.2.1
  • PostgreSQL 9.6.0 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
  • OS Debian Jessy

On Master I create a sample table with serial (sequence) :
create table herve_wseq (id serial primary key, val text);

Do the same on the Slave with same command :
create table herve_wseq (id serial primary key, val text);

Then add it to the replication from the Master:

SELECT pglogical.replication_set_add_table(
  set_name := 'default',
    relation := 'herve_wseq',
      synchronize_data := TRUE
      );

Then add data like this on Master :
insert into herve_wseq (val) values ('test');
insert into herve_wseq (val) values ('test2');
etc.

On Master I have now :

master=# select * from herve_wseq;
 id |  val
----+-------
  1 | test
  2 | test2
  3 | test3
  4 | test4
  5 | test5
(5 rows)

On Slave I have :

pglogical=# select * from herve_wseq;
 id |  val
----+-------
  1 | test
  2 | test2
  3 | test3
  4 | test4
  5 | test5
(5 rows)

Until this all is perfect, replication is working, but if I looking at the sequence itself of this table.

Master :

master=# select * from herve_wseq_id_seq;
   sequence_name   | last_value | start_value | increment_by |      max_value      | min_value | cache_value | log_cnt | is_cycled | is_called
-------------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------
 herve_wseq_id_seq |          5 |           1 |            1 | 9223372036854775807 |         1 |           1 |      32 | f         | t
(1 row)

Slave :

pglogical=# select * from herve_wseq_id_seq;
   sequence_name   | last_value | start_value | increment_by |      max_value      | min_value | cache_value | log_cnt | is_cycled | is_called
-------------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------
 herve_wseq_id_seq |       1004 |           1 |            1 | 9223372036854775807 |         1 |           1 |       0 | f         | t
(1 row) 

So why the Slave version of the sequence last value is 1004 and not 5 ?
Did I missed something in the way I have processed ?
How Can I solve this situation ... I can't have the sequence on the Slave with a different value than my Master.

Thanks per advance for your answer.
Hervé

create_subscription() with synchorize_structure := true creates objects not in the replication set

provider (pg1) subscriber (pg2)
create database 94to94p; create database 94to94s
\c 94to94p \c 94to94s
create table a (z integer primary key);
create schema s1;
create table s1.a (z integer primary key);
create schema s2;
create table s2.a (z integer primary key);
select * from pglogical.create_replication_set(set_name := 's1');
SELECT pglogical.create_subscription(subscription_name := 'subscription1', replication_sets := ARRAY['s1'], provider_dsn := 'host=pg1 dbname=94to94p');

Schema s1 and s2, and tables public.a, s1.a and s2.a are created on the subscriber.

how is data changes handled during initial data synchronization?

We aim to do online upgrade from 9.4 to 9.5 with pglogical. Our tables are quite large (37GB), and take over 1hr to synchronize with SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public'], true);

Will the data changes made during the synchronize process be missing from subscriber? If it is, is there a way to ensure those changes are carried over to subscriber?

`replication_set_add_all_tables` function causes `ERROR: relation "public.pg_class" does not exist`

Update: reproduction scripts here

I've been seeing some strange behavior around pglogical.replication_set_add_all_tables. Under certain circumstances (I haven't been able to narrow it down too much, but I'll add details below), calling the function causes the following to start flooding the logs:

2016-08-12 22:07:45 UTC [12175-1] test@test_replication ERROR:  relation "public.pg_class" does not exist                   
2016-08-12 22:07:45 UTC [12175-2] test@test_replication STATEMENT:  COPY "public"."pg_class" TO stdout                      
2016-08-12 22:07:45 UTC [12175-3] test@test_replication LOG:  could not receive data from client: Connection reset by peer  
2016-08-12 22:07:45 UTC [12174-4] test@test_replication LOG:  could not receive data from client: Connection reset by peer  
2016-08-12 22:07:45 UTC [12174-5] test@test_replication LOG:  unexpected EOF on client connection with an open transaction  

Note the reference to the nonexistent relation public.pg_class, which is presumably based on postgres.pg_class.

I don't have a minimal set of steps to reproduce the bug yet, but here's what I've got:

  • this doesn't seem to happen the first time we call pglogical.replication_set_add_all_tables, which happens before there are any subscribers or tables.
  • we're adding all tables in the public schema, as you might guess from the ERROR.
  • when this happens, there's already a subscriber node with an active subscription and populated tables.

Again, I apologize that this isn't a comprehensive step-by-step reproduction, but I figured I'd get this filed while I work on that just to see if anything rings a bell.

No data copied from provider to subscriber

CREATE EXTENSION pglogical;

on SUBSCRIBER HOST:


SELECT pglogical.create_node(
    node_name := 'provider1',
    dsn := 'host=1.1.1.1 port=5432 dbname=DB'
);

SELECT pglogical.create_subscription(
    subscription_name := 'subscription1',
    provider_dsn := 'host=1.1.1.1 port=5432 dbname=DB',
    synchronize_structure := false,
    synchronize_data := true
);

on PROVIDER host:

SELECT pglogical.create_node(
    node_name := 'subscriber1',
    dsn := 'host=1.1.1.2 port=5435 dbname=DB'
);

select pglogical.replication_set_add_all_tables('default', ARRAY['public']);

select pglogical.replication_set_add_table(
 set_name := 'default',
 relation := 'public.mytable',
 synchronize_data := true);


select * from pg_stat_replication;


  pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state 
-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
 13648 |       10 | postgres | subscription1    | 1.1.1.2  |                 |       48054 | 2016-10-04 17:00:56.730186+02 |              | streaming | 0/1947E3C8    | 0/1947E3C8     | 0/1947E3C8     | 0/1947E3C8      |             0 | async
(1 riga)

Tried also:
select pglogical.alter_subscription_resynchronize_table('subscription1', 'public.go_ovr');

but still:

PROVIDER log says:

postgres@DB CONTEXT: COPY mytable, row 1
ISTRUCTION: COPY "public"."mytable" FROM stdin
DETAILS: Key (id)=(28) already exist.

SUBSCRIBER LOG:

2016-10-04 16:47:10 CEST [8449-1] LOG: starting pglogical database manager for database DB
2016-10-04 16:47:10 CEST [8450-1] LOG: starting apply for subscription subscription1
2016-10-04 16:47:11 CEST [8463-1] LOG: manager worker [8463] at slot 3 generation 1 detaching cleanly

I've created only the structure on the SUBSCRIBER host, no data are present (with pg_dump -s).
Why the Provider continue to say that key already exist if no data at all are present?

DB=# select * from pglogical.local_sync_status;
sync_kind | sync_subid | sync_nspname | sync_relname | sync_status
-----------+------------+--------------+--------------+-------------
d | 1763399739 | | | r
d | 1763399739 | public | mytable | r

Enhancement idea - supporting Eventually Consistent requirements in same DB

Hi,

First off, thank you for working on such a great extension to Postgres. Looking over at an overview of it http://bonesmoses.org/2016/10/14/pg-phriday-perfectly-logical/ we see this comment:

if I wanted to make an eventually consistent ‘view’ table that stored things like aggregations, etc. could I (and should I..) create a logical replication slot back into the same DB? Essentially, trying to recreate functionality that an asynchronous trigger might give us.

And this reply:

You probably can, but not with pglogical. So far as I can tell, the table names in the provider and subscriber have to match. So if you somehow managed to get a pglogical node to subscribe to itself for the same subscription set, you’d just end up with a geometric recursive loop. You couldn’t get the inserts redirected into another table within the same database. That’s a cool idea, though. I wonder if they would add recipient table name as an option somewhere.

So opening an issue here so it can be kept track of since I think that can be a useful enhancement. In our particular case, we keep track of multiple hierarchies (paths) and currently they're kept updated using triggers BUT there's no need to keep them up-to-date within insert/update transactions. It'd be great if we could run them on some kind of background job which this type of improvement would give us.

Document pglogical retry behaviour and keepalives

it would be fine to have a function for immadiate retry to connect to the provider. After a provider-downtime it takes a few seconds /minutes. And i think it would be fine to see something about the retry-interval within show_subscription_status() , maybe only in down-state. (the timestamp for next automatical retry or something similar)

spurious CONFLICTs detected

Setup: two PostgreSQL instances db1 (port 5433) and db2 (port 5434). Data is the pgbench database with an added primary key on pgbench_history (dbname is bench). The db2 instance's "bench" will be subscriber to the db1 instance's "bench".

No matter wether I set up db2 via pglogical_create_subscriber or via SQL pglogical.create_subscription(..., synchronize_structure = 't', synchronize_data := 't') an update to pgbench_accounts (update pgbench_accounts update set abalance = abalance + 1 where aid = 1;) results in a CONFLICT log message:
LOG: CONFLICT: remote UPDATE on relation public.pgbench_accounts. Resolution: apply_remote.
Subsequent UPDATEs to the same row do not yield CONFLICT messages. (Other tables are affected, too - but this is the straight forward reproducer).

Versions:

  • PostgreSQL 9.5.3 on x86_64-unknown-freebsd10.3, compiled by gcc (FreeBSD Ports Collection) 4.8.5, 64-bit, behaviour also reproducable on Linux
  • pg_logical 1.1.2 (as per pg_extension), also reproducable with currently released Debian packages

Provider disconnect causes extreme logging on subscriber

When the provider node is no longer reachable the subscriber node logs an extremely large amount to the postgresql.log file.

Versions

PostgreSQL 9.5.2
pglogical 1.0.1
pglogical_output 1.0.1

Steps to reproduce

  • Set up a provider and subscriber pair.
  • Kill the PostgreSQL server running on the provider

Observed behavior

The following group of lines are printed at ~300 times per second to the postgresql.log file on the subscriber server.

2016-07-26 19:25:34 GMT::5797b92e.70a5:@:[28837]:LOG:  starting apply for subscription region_0_subscription
2016-07-26 19:25:34 GMT::5797b92e.70a5:@:[28837]:ERROR:  could not connect to the postgresql server in replication mode: could not connect to server: Connection refused
                Is the server running on host "192.168.211.161" and accepting
                TCP/IP connections on port 5432?

2016-07-26 19:25:34 GMT::5797b92e.70a5:@:[28837]:DETAIL:  dsn was: dbname='database' host='hostname' user='username' port='5432'
2016-07-26 19:25:34 GMT::5797b103.184b:@:[6219]:LOG:  worker process: pglogical apply 20633:1658409966 (PID 28837) exited with exit code 1

I'm not sure if this is really a bug, but if you don't catch the connection loss quickly this can easily cause tons of disk space to be consumed by postgresql logs very quickly.

Is there a way to configure a timeout between attempts to start the apply process if we can't connect to the provider?

subscriber lose provider after hard reboot of provider

Have such env:

  • pgsql 9.5 and pglogical 1.1.2
  • server dev-xdray-2 provider
  • server dev-xdray-1 subscriber

Make hard reboot of dev-xdray-2 (turn off the power).

But the subscriber (dev-xdray-1) do not have any reaction:

xdray@dev-xdray-1 ~]$ ps -ef|grep 29947|grep -v grep
postgres 29947 27840  0 15:11 ?        00:00:00 /usr/pgsql-9.5/bin/postmaster -D /home/db/pgsql/xdray
postgres 29948 29947  0 15:11 ?        00:00:00 postgres: logger process                             
postgres 29950 29947  0 15:11 ?        00:00:00 postgres: checkpointer process                       
postgres 29951 29947  0 15:11 ?        00:00:00 postgres: writer process                             
postgres 29952 29947  0 15:11 ?        00:00:00 postgres: wal writer process                         
postgres 29953 29947  0 15:11 ?        00:00:00 postgres: autovacuum launcher process                
postgres 29954 29947  0 15:11 ?        00:00:00 postgres: archiver process                           
postgres 29955 29947  0 15:11 ?        00:00:00 postgres: stats collector process                    
postgres 29956 29947  0 15:11 ?        00:00:00 postgres: bgworker: pglogical supervisor             
postgres 29967 29947  0 15:11 ?        00:00:00 postgres: bgworker: pglogical manager 16386          
postgres 29972 29947  0 15:11 ?        00:00:00 postgres: bgworker: pglogical apply 16386:306048493   
postgres 31763 29947  0 15:22 ?        00:00:00 postgres: bgworker: pglogical apply 16386:391041143 

Logs are clean.
This process postgres: bgworker: pglogical apply 16386:391041143 communicate with dev-xdray-2, and:

[xdray@dev-xdray-1 ~]$ sudo netstat -anp| grep 31763
tcp        0      0 192.168.0.104:33464         192.168.0.204:5432          ESTABLISHED 31763/postgres: bgw 

Strace:

[root@dev-xdray-1 ~]# strace -p31763
Process 31763 attached
restart_syscall(<... resuming interrupted call ...>) = 0
gettimeofday({1466599566, 879580}, NULL) = 0
gettimeofday({1466599566, 879623}, NULL) = 0
read(10, 0x7ffc6ce549b0, 16)            = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=12, events=POLLIN}, {fd=10, events=POLLIN}, {fd=6, events=POLLIN}], 3, 1000) = 0 (Timeout)
gettimeofday({1466599567, 880303}, NULL) = 0
gettimeofday({1466599567, 880433}, NULL) = 0
read(10, 0x7ffc6ce549b0, 16)            = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=12, events=POLLIN}, {fd=10, events=POLLIN}, {fd=6, events=POLLIN}], 3, 1000) = 0 (Timeout)
gettimeofday({1466599568, 881523}, NULL) = 0
gettimeofday({1466599568, 881560}, NULL) = 0
read(10, 0x7ffc6ce549b0, 16)            = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=12, events=POLLIN}, {fd=10, events=POLLIN}, {fd=6, events=POLLIN}], 3, 1000) = 0 (Timeout)
gettimeofday({1466599569, 882776}, NULL) = 0
gettimeofday({1466599569, 882813}, NULL) = 0
read(10, 0x7ffc6ce549b0, 16)            = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=12, events=POLLIN}, {fd=10, events=POLLIN}, {fd=6, events=POLLIN}], 3, 1000^CProcess 31763 detached
 <detached ...>

Still having issues getting pglogical compiled on Gentoo

Additional info requested in closed issue:

pglogical version?
1.1.0

make command?
GNU Make 4.1
Built for x86_64-pc-linux-gnu

PATH environment version?
Not sure how to find this. Gentoo version is 4.0.5-gentoo.

How PostgreSQL was installed from where?

Installed via emerge and package name dev-db/postgresql-9.4.4:9.4

pgxs should be ensuring that the PostgreSQL libraries are on the linker path, assuming you compiled with USE_PGXS=1. If you didn't you'd get errors like in #1 . So I have to guess you've done something different, possibly using some Gentoo packaging or script.

When I stall this with USE_PGXS=1 I get the following error:

Makefile:127: ../../src/Makefile.global: No such file or directory
Makefile:128: /contrib/contrib-global.mk: No such file or directory
make: *** No rule to make target '/contrib/contrib-global.mk'. Stop.

Took advice in article to get passed this error: http://blog.pgxn.org/post/20908326485/lose-use-pgxs

and then got error:

86_64-pc-linux-gnu-gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -march=native -O2 -pipe -fpic pglogical_create_subscriber.o pglogical_fe.o -L/usr/lib64/postgresql-9.4/lib64 -Wl,-O1 -Wl,--as-needed -Wl,--as-needed -Wl,-rpath,'/usr/lib64/postgresql-9.4/lib64',--enable-new-dtags -Wl,-O1 -Wl,--as-needed -L/usr/lib64/postgresql-9.4/lib64 -lpgcommon -lpgport -L/usr/lib64/postgresql-9.4/lib64 -lpq -lpgcommon -lpgport -lxslt -lxml2 -lpam -lssl -lcrypto -lz -lreadline -lrt -lcrypt -ldl -lm -o pglogical_create_subscriber
/usr/lib/gcc/x86_64-pc-linux-gnu/5.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lpgcommon
/usr/lib/gcc/x86_64-pc-linux-gnu/5.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lpgport
/usr/lib/gcc/x86_64-pc-linux-gnu/5.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lpgcommon
/usr/lib/gcc/x86_64-pc-linux-gnu/5.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lpgport
collect2: error: ld returned 1 exit status
Makefile:73: recipe for target 'pglogical_create_subscriber' failed
make: *** [pglogical_create_subscriber] Error 1

Thanks,

Bryan

`pglogical.show_subscription_status` reports `replicating` when provider node is down

I have a test environment with two nodes (on separate VMs), let's call them "Primary" and "Replica". Not surprisingly, Primary is a provider and Replica is a subscriber.

Under normal circumstances, there are no problems at all. When I check show_subscription_status(), I see what I would expect to see:

select subscription_name, status FROM pglogical.show_subscription_status();

 subscription_name |   status
-------------------+-------------
 subscription1     | replicating

And indeed, if I gracefully shut down Primary, this also has the expected result on Replica:

select subscription_name, status FROM pglogical.show_subscription_status();

 subscription_name |   status
-------------------+-------------
 subscription1     | down

However, if I abruptly pause/kill the Primary VM without allowing it to stop postgres, Replica still thinks everything is ok:

select subscription_name, status FROM pglogical.show_subscription_status();

 subscription_name |   status
-------------------+-------------
 subscription1     | replicating

Is there any way to tell that there's a failure under conditions like that? I'm thinking of real-world issues like power failures and network partitions that affect provider nodes but not subscriber nodes.

Tried to compile on Gentoo Linux with Postgres 9.4.4. pglogical_output worked but pglogical failed.

86_64-pc-linux-gnu-gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -march=native -O2 -pipe -fpic pglogical_create_subscriber.o pglogical_fe.o -L/usr/lib64/postgresql-9.4/lib64 -Wl,-O1 -Wl,--as-needed  -Wl,--as-needed -Wl,-rpath,'/usr/lib64/postgresql-9.4/lib64',--enable-new-dtags -Wl,-O1 -Wl,--as-needed -L/usr/lib64/postgresql-9.4/lib64 -lpgcommon -lpgport -L/usr/lib64/postgresql-9.4/lib64 -lpq -lpgcommon -lpgport -lxslt -lxml2 -lpam -lssl -lcrypto -lz -lreadline -lrt -lcrypt -ldl -lm  -o pglogical_create_subscriber
/usr/lib/gcc/x86_64-pc-linux-gnu/5.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lpgcommon
/usr/lib/gcc/x86_64-pc-linux-gnu/5.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lpgport
/usr/lib/gcc/x86_64-pc-linux-gnu/5.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lpgcommon
/usr/lib/gcc/x86_64-pc-linux-gnu/5.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -lpgport
collect2: error: ld returned 1 exit status
Makefile:73: recipe for target 'pglogical_create_subscriber' failed
make: *** [pglogical_create_subscriber] Error 1

cascading replication

When setting up cascading replication p1->p2->p3 data does not get replicated to the p3-node. There is an error message in logs:

>LOG:  starting apply for subscription sub_p2_to_p3_insert_only
>ERROR:  cache lookup failed for replication origin '[p1-node-name]'

AFAIK this was previously reported on pg-general mailing list. Is there anything I can do to work this around?

pglogical 1.1.2
PostgreSQL 9.5.2 x86_64 on CentOS 7

ALWAYS and REPLICA triggers on subscriber cause segmentation fault

Versions

PostgreSQL - 9.5.2
pglogical - 1.1.2
pglogical_output - 1.1.0

Steps to reproduce

  • Set up replication between two nodes (one provider and one subscriber)
  • Create a test table on both nodes:
    • CREATE TABLE test (id BIGINT PRIMARY KEY, data VARCHAR);
  • Add the test table to the replication set
  • Set up a trivial audit trigger on the subscriber node:
CREATE TABLE IF NOT EXISTS audit_test(id BIGINT);

CREATE OR REPLACE FUNCTION test_trigger()
RETURNS TRIGGER AS
$BODY$
BEGIN
        INSERT INTO audit_test VALUES (NEW.id);
        RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;

DROP TRIGGER IF EXISTS test_trigger ON test;

CREATE TRIGGER test_trigger before INSERT ON test
FOR EACH ROW EXECUTE PROCEDURE test_trigger();

ALTER TABLE test DISABLE TRIGGER test_trigger;
ALTER TABLE test ENABLE REPLICA TRIGGER test_trigger
  • Insert a row into the table on the provider node.

Result

postgresql.log output:

2016-07-19 20:13:15 GMT::578e836d.2352:@:[9042]:LOG:  worker process: pglogical apply 20587:1658409966 (PID 10399) was terminated by signal 11: Segmentation fault
2016-07-19 20:13:15 GMT::578e836d.2352:@:[9042]:LOG:  terminating any other active server processes
2016-07-19 20:13:15 GMT::578e89d9.2898:@:[10392]:WARNING:  terminating connection because of crash of another server process
2016-07-19 20:13:15 GMT::578e89d9.2898:@:[10392]:DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2016-07-19 20:13:15 GMT::578e89d9.2898:@:[10392]:HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2016-07-19 20:13:15 GMT:[local]:578e89db.289d:root@vmdb_production:[10397]:WARNING:  terminating connection because of crash of another server process
2016-07-19 20:13:15 GMT:[local]:578e89db.289d:root@vmdb_production:[10397]:DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2016-07-19 20:13:15 GMT:[local]:578e89db.289d:root@vmdb_production:[10397]:HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2016-07-19 20:13:15 GMT::578e836d.2352:@:[9042]:LOG:  all server processes terminated; reinitializing

gdb backtrace:

(gdb) bt
#0  ExecBRInsertTriggers (estate=0x2c5b1b0, relinfo=0x2c4b610, slot=0x2c4ba00) at trigger.c:2060
#1  0x00007fe3450a234d in handle_insert (s=s@entry=0x7ffe6e3b6650) at pglogical_apply.c:506
#2  0x00007fe3450a31a8 in replication_handler (s=0x7ffe6e3b6650) at pglogical_apply.c:1312
#3  apply_work (streamConn=streamConn@entry=0x2c1e3d0) at pglogical_apply.c:1582
#4  0x00007fe3450a39da in pglogical_apply_main (main_arg=<optimized out>) at pglogical_apply.c:1977
#5  0x000000000064847c in StartBackgroundWorker () at bgworker.c:726
#6  0x00000000006538e5 in do_start_bgworker (rw=0x2baf380) at postmaster.c:5503
#7  maybe_start_bgworker () at postmaster.c:5678
#8  0x0000000000654465 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:4939
#9  <signal handler called>
#10 0x00007fe34eb853f3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
#11 0x0000000000468616 in ServerLoop () at postmaster.c:1648
#12 0x00000000006555e6 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x2b81620) at postmaster.c:1292
#13 0x0000000000469bde in main (argc=3, argv=0x2b81620) at main.c:228

It seems aestate->estate->es_trig_tuple_slot is not initialized in handle_insert before being passed to ExecBRInsertTriggers. I don't know enough about PostgreSQL internals to fix this myself, but hopefully this will get you guys close enough.

This happens for both ENABLE REPLICA and ENABLE ALWAYS triggers.

Let me know if I can do anything else to help.

Column waiting is not present in pg_stat_activity PostgreSQL 9.6 pglogical 1.2

The Log shows this error continuouly:

2016-10-31 14:15:01 UTC [22842]: [1-1] db=template1,user=repl ERROR: column s.waiting does not exist at character 34
2016-10-31 14:15:01 UTC [22842]: [2-1] db=template1,user=repl STATEMENT: SELECT CASE
WHEN s.waiting = 't' THEN 'waiting for lock'
WHEN s.query = ''
THEN 'insufficient privilege'
WHEN s.state IS NULL THEN 'undefined'
ELSE s.state
END,
extract('epoch' FROM
date_trunc('milliseconds', current_timestamp-s.xact_start)
), s.query, current_setting('max_connections')
FROM pg_stat_activity AS s
JOIN pg_database d ON d.oid=s.datid
WHERE d.datallowconn
;

Segmentation fault when using last_update_wins

Hello,

I'm experimenting with a 2-node bidirectional replication using pglogical 1.1.2. I'm able to consistently cause a seg fault when I set pglogical.conflict_resolution = 'last_update_wins', and when either:

  1. Executing simultaneous identical UPDATE statements on both nodes (using psql and terminator) or
  2. Disabling both subscriptions, executing UPDATE statements on both nodes (that will cause a conflict) and then re-enabling one or both subscriptions.

I'm expecting that the last_update_wins strategy will do the right thing and either 'keep local' or 'apply remote' based on the commit timestamp. Instead, I'm seeing an attempt to resolve the conflict (e.g. keep local), a crash of PostgreSQL, an automatic restart, and then a second attempt to resolve the conflict (e.g. apply remote).

For example, in pg_log/postgresql-Wed.log:

worker process: pglogical apply 16384:1871150101 (PID 19795) was terminated by signal 11: Segmentation fault

Question: Is using pglogical in this way (bidirectional) not recommended at this point in time?

Some findings: Using this 2-node bidirectional-subscription setup, I have had success with simple INSERT, UPDATE and DELETE statements that don't require conflict resolution. And I've had success with INSERT and DELETE statements that do require conflict resolution. It's the UPDATE statements that are blocking me. The first_update_wins strategy also triggers the seg fault. The apply_remote does not.

Thank you,

...Jelle

p.s. RPMS:
postgresql95-server-9.5.3-2PGDG.rhel6.x86_64
postgresql95-contrib-9.5.3-2PGDG.rhel6.x86_64
postgresql95-pglogical-1.1.2-1_2ndQuadrant.el6.x86_64
postgresql95-pglogical-output-1.1.0-1_2ndQuadrant.el6.x86_64

p.p.s. My config

wal_level = 'logical'
max_worker_processes = 10
max_replication_slots = 10
max_wal_senders = 10
shared_preload_libraries = 'pglogical'
track_commit_timestamp = on
pglogical.conflict_resolution = 'last_update_wins'

p.p.p.s.
I tried to use gdb to get a backtrace but the resulting gdb.txt was full of "No symbol table info available."
This may help:

Core was generated by `postgres: bgworker: pglogical apply'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000007c683b in ?? ()

p.p.p.p.s.
Using bdr would be ideal: I'm looking for async multi-master replication. But using a patched PostgreSQL 9.4 is a show-stopper. Should I just wait for a future version of PostgreSQL to have in-core BDR support?

Questions: upgrading from 9.4 to 9.5 - data sync not performing

Hello! Thank you for pglogical. I'm trying to do an upgrade from 9.4 to new server with fresh install of PostgreSQL 9.5. Doing setup according to docs but data sync isn't working.

Expected result

Streaming is working and data sync is performed.

Actual result

Streaming is working but data sync is NOT performed (no signs of data sync: no network or disk activity on both servers, in PostgreSQL logs no new messages appearing after executing pglogical.alter_subscription_synchronize('smp_test_replication')).

Setup

Producer

I have a PostgreSQL 9.4 server which I want to upgrade.

  1. I'm installing latest postgresql-9.4-pglogical package (1.1.2-2trusty).

  2. Adding settings to postgresql.conf:

    wal_level = 'logical'
    max_worker_processes = 10
    max_replication_slots = 10 
    max_wal_senders = 10
    shared_preload_libraries = 'pglogical'
    
  3. Adding this and consumer's IPs to pg_hba.conf:

    host    replication postgres    127.0.0.0/8     md5
    host    replication postgres    172.1.1.1/32    md5
    host    replication postgres    172.1.1.2/32    md5
    
  4. restarting…

  5. Adding pglogical and pglogical_origin (as 9.4 requires it) extensions to database.

  6. Executing:

    SELECT pglogical.create_node('test-db-0', 'host=172.1.1.1 port=5432 dbname=smp user=postgres password=secure');
    SELECT pglogical.replication_set_add_all_tables('default', schema_names := ARRAY['public'], synchronize_data := true);
    SELECT pglogical.replication_set_add_all_sequences('default', schema_names:= ARRAY['public'], synchronize_data := true);
  7. Doing a dump of DB:

     pg_dump -U smp -F custom smp > file.dump
    

Subscriber

A brand new server with fresh PostgreSQL 9.5 installed.

Steps 1 to 4 are same (with 9.5 in package name).

  1. Adding pglogical extension to DB.

  2. Restoring schema-only dump from producer db:

    pg_restore -U smp -d smp -F custom --clean --schema-only --schema public file.dump
    
  3. Executing:

    SELECT pglogical.create_node('test-db-1', 'host=172.1.1.2 port=5432 dbname=smp user=postgres password=secure');
    
     create_node
    -------------
      2826448934
    (1 row)
    
    SELECT pglogical.create_subscription(
      subscription_name     := 'smp_test_replication',
      provider_dsn          := 'host=172.1.1.1 port=5432 dbname=smp user=postgres password=secure',
      synchronize_structure := false,
      synchronize_data      := true
    );
    
     create_subscription
    ---------------------
              3501287532
    (1 row)
    
    SELECT pglogical.alter_subscription_synchronize('smp_test_replication');
     alter_subscription_synchronize
    --------------------------------
     t
    (1 row)
  4. After that:

    SELECT * FROM pglogical.show_subscription_status();
    
      subscription_name   |   status    | provider_node |                           provider_dsn                            |             slot_name             |           replication_sets            | forward_origins
    ----------------------+-------------+---------------+-------------------------------------------------------------------+-----------------------------------+---------------------------------------+-----------------
     smp_test_replication | replicating | test-db-0     | host=172.1.1.1 port=5432 dbname=smp user=postgres password=secure | pgl_smp_test_db_0_smp_bsktd0b1686 | {default,default_insert_only,ddl_sql} | {all}

Looks like everything is good, and streaming is working: If I create some new row in table at producer it will appear at consumer.

But there are no signs of data sync: no network or disk activity on both servers, in PostgreSQL logs no new messages appearing after executing pglogical.alter_subscription_synchronize('smp_test_replication').

Questions

  1. May be I missing something in my setup?
  2. Can I see data sync progress for whole DB, node or subscription?

Thank you in advance.

With best regards, Andrey Novikov.

pglogical inserted into search_path?

  1. Create an unprivileged user on a database that has pglogical installed.
  2. Connect as that user.
  3. Create a temporary table.
  4. Attempt to drop temporary table.
  5. Despair.
$ psql -h localhost -d test -U joenobody -W
Password for user joenobody:
psql (9.6.0)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

test=> create temporary table x (y integer);
CREATE TABLE
test=> drop table if exists x;
ERROR:  permission denied for schema pglogical
test=> show search_path;
   search_path
-----------------
 "$user", public
(1 row)

pglogical.replicate_ddl_command(...) can fail to apply due to user rights differences

If a superuser invokes pglogical.replicate_ddl_command they can do things like alter tables that they do not own. Fine.

If the same user name exists on the downstream but is not a superuser, applying the DDL will fail with something like ERROR: must be owner of relation ....

The issue isn't confined to superuser rights though. I think it can also occur when a a role on the upstream has been GRANTed roles that it hasn't been on the downstream. Possibly other cases too.

It'd be nice to help users understand what's going on here, even if we can't fix it. Unsure if we can do more than document it though...

(See RT#46213)

Using synchronize_structure fails when replicating from 9.4 to 9.5

Hi

I'm trying to replicate from a 9.4 server to a 9.5 server.
I created the nodes on both provider & subscriber and also added all tables from the public schema to the default replication set.

When creating the subscription with:

LECT pglogical.create_subscription(
    subscription_name := 'subscriptionname',
    provider_dsn := 'host=providerhost port=5432 dbname=dbname user=replicator password=teste',
    synchronize_structure := true
);

The command fails because it tries to create the pglogical_origin extension in the 9.5 server (that is included in the generated dump).

This extension is not available in 9.5 servers, making the synchronize_structure non usable when creating the subscription.

Any clues or workarounds?

Thanks

Debian packages conflict between major releases

Hello

The postgresql-9.5-pglogical and postgresql-9.4-pglogical version 1.2.0-2jessie can not be installed together without --force. They both include the following files:

  • /usr/include/postgresql/pglogical_output/compat.h
  • /usr/include/postgresql/pglogical_output/hooks.h

Shouldn't these files go under /usr/include/postgresql/9.X/server/ ?

Thanks

pglogical sync delay

Hi,

We are currently exploring pglogical tool. Firstly, we do find pglogical a great tool which can be used to replicate tables. We are currently recommending one of our customers to evaluate this tool as a replacement for Slony to avoid various complexities.

Whist i am still new to this tool, I have the following queries -

Query 1 :

When i set up pglogical replication between PostgreSQL instance - 9.5 and 9.4, the replication works fine, but, there seems to be a delay in replicating the data changes. Not sure why and how to figure out. I am not sure if i am missing something here ? The setup is very simple with two postgresql instances running on the same host. Any help would be appreciated here.

Query 2:

pglogical can only be used to replicate tables with primary keys ?

Regards,
Venkata B N

Fujitsu Australia

Colum null value when replicating

I'm trying to replicate a not-so-large table (500,000 rows) with 30 columns, 6 of which are integer[] not null columns from 9.4 to 9.5. One of those columns usually is fairly large (usually 300-1000 integers).
The table in question is live, and changes all the time. After creating the subscription with synchronize_structure := true, the table is copied, logical decoding starts, and then the replication fails with the error "ERROR: null value in column "mylist" violates not-null constraint"

It would appear that the large column is not properly encoded, defaults to null and as such causes an error for the not null column. Everything works fine if I replicate a simple table.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.4 LTS
Release: 14.04
Codename: trusty

$ uname -a
Linux db-master 3.13.0-49-generic #83-Ubuntu SMP Fri Apr 10 20:11:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
9.4 main 5432 online postgres /var/lib/postgresql/9.4/main /var/log/postgresql/postgresql-9.4-main.log
9.5 main 5433 online postgres /var/lib/postgresql/9.5/main /var/log/postgresql/postgresql-9.5-main.log

On the provider db:
$ sudo -u postgres psql mydb
create extension pglogical;
SELECT pglogical.create_node(node_name := 'provider1', dsn := 'host=127.0.0.1 port=5432 dbname=mydb');
SELECT pglogical.replication_set_add_table('default', 'mytable');

On the subscriber db (clean database with no tables):
$ sudo -u postgres psql -p 5433 mydb
create extension pglogical;
SELECT pglogical.create_node(node_name := 'provider1', dsn := 'host=127.0.0.1 port=5432 dbname=mydb');
SELECT pglogical.create_subscription(subscription_name := 'subscription1', provider_dsn := 'host=127.0.0.1 port=5432 dbname=mydb', synchronize_structure := true);

Log output for provider:
2016-06-01 08:56:25 UTC LOG: logical decoding found consistent point at 650/430C4050
2016-06-01 08:56:25 UTC DETAIL: There are no running transactions.
2016-06-01 08:56:25 UTC LOG: exported logical decoding snapshot: "255AA3A4-1" with 0 transaction IDs
2016-06-01 08:56:54 UTC LOG: duration: 28623.776 ms statement: COPY "public"."mytable" TO stdout
2016-06-01 08:56:55 UTC LOG: starting logical decoding for slot "pgl_mydb_provider1_subscription1"
2016-06-01 08:56:55 UTC DETAIL: streaming transactions committing after 650/430C4088, reading WAL from 650/430C4050
2016-06-01 08:56:55 UTC LOG: logical decoding found consistent point at 650/430C4050
2016-06-01 08:56:55 UTC DETAIL: There are no running transactions.
2016-06-01 08:56:55 UTC LOG: could not receive data from client: Connection reset by peer
2016-06-01 08:56:55 UTC LOG: unexpected EOF on standby connection

Log output for subscriber:
2016-06-01 10:56:25 CEST [22723-1] LOG: starting apply for subscription subscription1
2016-06-01 10:56:55 CEST [22727-1] postgres@mydb LOG: duration: 29860.347 ms statement: COPY "public"."mytable" FROM stdin
2016-06-01 10:56:55 CEST [22723-2] ERROR: null value in column "mylist" violates not-null constraint
2016-06-01 10:56:55 CEST [22723-3] DETAIL: Failing row contains removed.
2016-06-01 10:56:55 CEST [16095-8] LOG: worker process: pglogical apply 84094:1763399739 (PID 22723) exited with exit code 1
2016-06-01 10:57:00 CEST [23097-1] LOG: starting apply for subscription subscription1
2016-06-01 10:57:00 CEST [23097-2] ERROR: null value in column "mylist" violates not-null constraint
2016-06-01 10:57:00 CEST [23097-3] DETAIL: Failing row contains removed.
2016-06-01 10:57:00 CEST [16095-9] LOG: worker process: pglogical apply 84094:1763399739 (PID 23097) exited with exit code 1

The failing row (in the subscriber log) indeed has the column mylist as null, but when I check the original database, the column is, of course, not null, since it has the "not null" constraint. It always fails with the error for the same large column (mylist), but different rows each time I initiate the create_subscription (depends on which rows are changed during the initial table copy).

worker registration failed

Hello,

I'm trying to setup pglogical with an initial synchronization of structure and data. Following the quick setup part of the documentation, the advice to configure max_worker_processes parameter is # one per node needed on subscriber node.
I have two nodes on the subscriber side and max_worker_processes = 4, but I still get an error during initial replication:
ERROR: worker registration failed, you might want to increase max_worker_processes setting
LOG: pglogical supervisor exited with code 1
LOG: starting pglogical supervisor

Unexpected cascaded replication for conflicting items using PostgreSQL 9.5 with pglogical 1.1.2

Having two PostgreSQL nodes running on 5590 and 5591 ports respectively. Both nodes are providers and also each one is subscribing to the other one. One simple table to be replicated: public.products ( product_no integer PRIMARY KEY, name text, price numeric ).

The attached shell script (test_conflicts_with_disabled_subscriptions.txt) is doing essentially the same thing as explained below in order to reproduce the issue.

Steps to reproduce the issue:

Disable both subscriptions on both nodes (trying to simulate a network fragmentation here but this will also fail with stopped PostgreSQL nodes)
Having the same row on both nodes, update that row (with the same unique id).
Re-enable both subscriptions and check the replication

Expected results:
The last update should win and the same value should be observed on both nodes

What happened instead:
In a first phase, a proper replication can be observed. However, after few seconds, there is a subsequent replication attempt that will generate a segmentation fault on one of the nodes (and the database is in recovery mode not being able to respond to sql queries).

Note: The process will end up switching the values between nodes. Sometimes, the results are unpredictable and both nodes end up with the first (oldest) update

Logs on 5590:

LOG:  CONFLICT: remote UPDATE on relation public.products. Resolution: apply_remote.
LOG:  starting logical decoding for slot "pgl_pglogicaldemo_node_5590_subscrip0fb5853"
DETAIL:  streaming transactions committing after 0/1782080, reading WAL from 0/1781094
LOG:  logical decoding found consistent point at 0/1781094
DETAIL:  There are no running transactions.
LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on standby connection
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
ERROR:  connection to other side has died
LOG:  worker process: pglogical apply 16384:2489372982 (PID 3471) exited with exit code 1
LOG:  starting logical decoding for slot "pgl_pglogicaldemo_node_5590_subscrip0fb5853"
DETAIL:  streaming transactions committing after 0/1782080, reading WAL from 0/1781094
LOG:  logical decoding found consistent point at 0/1781094
DETAIL:  There are no running transactions.
LOG:  could not receive data from client: Connection reset by peer
LOG:  unexpected EOF on standby connection
LOG:  starting logical decoding for slot "pgl_pglogicaldemo_node_5590_subscrip0fb5853"
DETAIL:  streaming transactions committing after 0/1782080, reading WAL from 0/1781094
LOG:  logical decoding found consistent point at 0/1781094
DETAIL:  There are no running transactions.

Logs on 5591:

LOG:  CONFLICT: remote UPDATE on relation public.products. Resolution: keep_local.
LOG:  worker process: pglogical apply 16384:263554356 (PID 3499) was terminated by signal 11: Segmentation fault
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2016-09-26 19:26:12 UTC
LOG:  recovered replication state of node 1 to 0/1781EF8
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  invalid record length at 0/1782494
LOG:  redo is not required
LOG:  MultiXact member wraparound protections are now enabled

pglogical_create_subscriber: Race Condition between subscriber promotion and remove_unwanted_data()

pglogical_create_subscriber lets the newly created node catch up to the origin node via streaming replication - and waits for the new node to become ready via wait_postmaster_connection(). As recovery.conf (as written by pglogical_create_subscriber) has "standby_mode = on" and the postgresql.conf may have "hot_standby = on", just waiting for a connection is not enough: we already get a connection to the cluster while it's in hot standby mode. remove_unwanted_data() then fails spectacularly ("cannot DROP EXTENSION in a read only transaction").
We should wait for the end-of-recovery-promotion to complete.

Handle missing toasted values

After starting replication of the logical slot:

START_REPLICATION SLOT json LOGICAL 0/0 (
  max_proto_version '1',
  min_proto_version '1',
  proto_format 'json',
  startup_params_format '1'
)`);

the following shows in postgres logs:

ERROR:  XX000: missing chunk number 0 for toast value 17032 in pg_toast_16408
CONTEXT:  slot "json", output plugin "pglogical_output", in the change callback, associated LSN 0/1A62538
LOCATION:  toast_fetch_datum, tuptoaster.c:1945

I believe the issue is in pglogical_proto_json.c#L104 where heap_copy_tuple_as_datum unsafely detoasts the values.

Existing tables don't have a queue_truncate_trigger trigger

When creating a new table, pglogical will add an internal trigger for truncate operations queue_truncate_trigger_NNNNN, so if a truncate is executed over the table it will queue the command for the subscribers to consume.

The problem is that pglogical.truncate_trigger_add() isn't invoked on tables that were already present in the database before the node was subscribed, and so they don't have a truncate internal trigger.

IMO, we should add the queue_truncate_trigger to tables added to a replication set instead of doing so for all tables created.

Thoughts?

pglogical_origin is mandatory for PostgreSQL 9.4

Hi,

When trying to create the pglogical extension in a 9.4 server (producer), I get this error:

os=# CREATE EXTENSION pglogical;
ERROR:  required extension "pglogical_origin" is not installed

As per the README ( https://github.com/2ndQuadrant/pglogical#quick-setup ), it should only be necessary when having a 9.4 subscriber as well, but I'm going to subscribe from a 9.5. note this information is not available in the website ( https://2ndquadrant.com/en/resources/pglogical/pglogical-docs/ ).

Is this a copy error or the pglogical_origin should not be required for 9.4 in producer nodes?

Thanks

Can't compile source code

Hi,

I would like to try the pglogical and pglogical-output extensions, but I can't compile the source code with the tarballs provided and PostgreSQL 9.4.7 and I can't find any documentation on how to do it.

Makefile:41: ../../src/Makefile.global: No such file or directory
Makefile:42: /contrib/contrib-global.mk: No such file or directory
make: *** No rule to make target '/contrib/contrib-global.mk'. Stop.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.