Giter Site home page Giter Site logo

shayonj / pg-osc Goto Github PK

View Code? Open in Web Editor NEW
494.0 15.0 17.0 1.72 MB

Easy CLI tool for making zero downtime schema changes and backfills in PostgreSQL

License: MIT License

Ruby 99.63% Shell 0.33% Dockerfile 0.05%
postgresql postgres schema-changes online-schema-change postgresql-database database cli hacktoberfest

pg-osc's Introduction

pg-osc

CI Smoke Test PG 9.6 Smoke Test PG 13.6 Gem Version

pg-online-schema-change (pg-osc) is a tool for making schema changes (any ALTER statements) in Postgres tables with minimal locks, thus helping achieve zero downtime schema changes against production workloads.

pg-osc uses the concept of shadow table to perform schema changes. At a high level, it creates a shadow table that looks structurally the same as the primary table, performs the schema change on the shadow table, copies contents from the primary table to the shadow table and swaps the table names in the end while preserving all changes to the primary table using triggers (via audit table).

pg-osc is inspired by the design and workings of tools like pg_repack and pt-online-schema-change (MySQL). Read more below on how does it work, prominent features, the caveats and examples

Table of Contents

Installation

Add this line to your application's Gemfile:

gem "pg_online_schema_change"

And then execute:

$ bundle install

Or install it yourself as:

$ gem install pg_online_schema_change

This will include all dependencies accordingly as well. Make sure the following requirements are satisfied.

Or via Docker:

docker pull shayonj/pg-osc:latest

https://hub.docker.com/r/shayonj/pg-osc

Requirements

  • PostgreSQL 9.6 and later
  • Database user should have permissions for TRIGGER and/or a SUPERUSER

Usage

pg-online-schema-change help perform

Usage:
  pg-online-schema-change perform -a, --alter-statement=ALTER_STATEMENT -d, --dbname=DBNAME -h, --host=HOST -p, --port=N -s, --schema=SCHEMA -u, --username=USERNAME

Options:
  -a, --alter-statement=ALTER_STATEMENT                                    # The ALTER statement to perform the schema change
  -s, --schema=SCHEMA                                                      # The schema in which the table is
                                                                           # Default: public
  -d, --dbname=DBNAME                                                      # Name of the database
  -h, --host=HOST                                                          # Server host where the Database is located
  -u, --username=USERNAME                                                  # Username for the Database
  -p, --port=N                                                             # Port for the Database
                                                                           # Default: 5432
  -w, [--password=PASSWORD]                                                # DEPRECATED: Password for the Database. Please pass PGPASSWORD environment variable instead.
  -v, [--verbose], [--no-verbose]                                          # Emit logs in debug mode
  -f, [--drop], [--no-drop]                                                # Drop the original table in the end after the swap
  -k, [--kill-backends], [--no-kill-backends]                              # Kill other competing queries/backends when trying to acquire lock for the shadow table creation and swap. It will wait for --wait-time-for-lock duration before killing backends and try upto 3 times.
  -w, [--wait-time-for-lock=N]                                             # Time to wait before killing backends to acquire lock and/or retrying upto 3 times. It will kill backends if --kill-backends is true, otherwise try upto 3 times and exit if it cannot acquire a lock.
                                                                           # Default: 10
  -c, [--copy-statement=COPY_STATEMENT]                                    # Takes a .sql file location where you can provide a custom query to be played (ex: backfills) when pgosc copies data from the primary to the shadow table. More examples in README.
  -b, [--pull-batch-count=N]                                               # Number of rows to be replayed on each iteration after copy. This can be tuned for faster catch up and swap. Best used with delta-count.
                                                                           # Default: 1000
  -e, [--delta-count=N]                                                    # Indicates how many rows should be remaining before a swap should be performed. This can be tuned for faster catch up and swap, especially on highly volume tables. Best used with pull-batch-count.
                                                                           # Default: 20
  -o, [--skip-foreign-key-validation], [--no-skip-foreign-key-validation]  # Skip foreign key validation after swap. You shouldn't need this unless you have a very specific use case, like manually validating foreign key constraints after swap.
Usage:
  pg-online-schema-change --version, -v

print the version

Prominent features

  • pg-osc supports when a column is being added, dropped or renamed with no data loss.
  • pg-osc acquires minimal locks throughout the process (read more below on the caveats).
  • Copies over indexes and Foreign keys.
  • Optionally drop or retain old tables in the end.
  • Reduce bloat (since pg-osc creates a new table and drops the old one post swap).
  • Tune how slow or fast should replays be from the audit/log table (Replaying larger workloads).
  • Backfill old/new columns as data is copied from primary table to shadow table, and then perform the swap. Example

Load test

More about the preliminary load test figures here

Examples

Renaming a column

export PGPASSWORD=""
pg-online-schema-change perform \
  --alter-statement 'ALTER TABLE books RENAME COLUMN email TO new_email' \
  --dbname "postgres" \
  --host "localhost" \
  --username "jamesbond" \

Multiple ALTER statements

export PGPASSWORD=""
pg-online-schema-change perform \
  --alter-statement 'ALTER TABLE books ADD COLUMN "purchased" BOOLEAN DEFAULT FALSE; ALTER TABLE books RENAME COLUMN email TO new_email;' \
  --dbname "postgres" \
  --host "localhost" \
  --username "jamesbond" \
  --drop

Kill other backends after 5s

If the operation is being performed on a busy table, you can use pg-osc's kill-backend functionality to kill other backends that may be competing with the pg-osc operation to acquire a lock for a brief while. The ACCESS EXCLUSIVE lock acquired by pg-osc is only held for a brief while and released after. You can tune how long pg-osc should wait before killing other backends (or if at all pg-osc should kill backends in the first place).

export PGPASSWORD=""
pg-online-schema-change perform \
  --alter-statement 'ALTER TABLE books ADD COLUMN "purchased" BOOLEAN DEFAULT FALSE;' \
  --dbname "postgres" \
  --host "localhost" \
  --username "jamesbond" \
  --wait-time-for-lock 5 \
  --kill-backends \
  --drop

Replaying larger workloads

If you have a table with high write volume, the default replay iteration may not suffice. That is - you may see that pg-osc is replaying 1000 rows (pull-batch-count) in one go from the audit table. pg-osc also waits until the remaining row count (delta-count) in audit table is 20 before making the swap. You can tune these values to be higher for faster catch up on these kind of workloads.

export PGPASSWORD=""
pg-online-schema-change perform \
  --alter-statement 'ALTER TABLE books ADD COLUMN "purchased" BOOLEAN DEFAULT FALSE;' \
  --dbname "postgres" \
  --host "localhost" \
  --username "jamesbond" \
  --pull-batch-count 2000
  --delta-count 500
  --wait-time-for-lock 5 \
  --kill-backends \
  --drop

Backfill data

When inserting data into the shadow table, instead of just copying all columns and rows from the primary table, you can pass in a custom sql file to perform the copy and do any additional work. For instance - backfilling certain columns. By providing the copy-statement, pg-osc will instead play the query to perform the copy operation.

IMPORTANT NOTES:

  • It is possible to violate a constraint accidentally or not copy data, so proceed with caution.
    • You must use OUTER JOINs when joining in the custom SQL, or you will lose rows which do not match the joined table.
  • The ALTER statement can change the table's structure, so proceed with caution.
  • Preserve %{shadow_table} as that will be replaced with the destination of the shadow table.
  • Users are STRONGLY URGED to test and validate results before using in production!
-- file: /src/query.sql
INSERT INTO %{shadow_table}(foo, bar, baz, rental_id, tenant_id)
SELECT a.foo,a.bar,a.baz,a.rental_id,r.tenant_id AS tenant_id
FROM ONLY examples a
LEFT OUTER JOIN rentals r
ON a.rental_id = r.id
pg-online-schema-change perform \
  --alter-statement 'ALTER TABLE books ADD COLUMN "tenant_id" VARCHAR;' \
  --dbname "postgres" \
  --host "localhost" \
  --username "jamesbond" \
  --copy-statement "/src/query.sql" \
  --drop

Running using Docker

docker run --network host -it --rm shayonj/pg-osc:latest \
    pg-online-schema-change perform \
    --alter-statement 'ALTER TABLE books ADD COLUMN "purchased" BOOLEAN DEFAULT FALSE; ALTER TABLE books RENAME COLUMN email TO new_email;' \
    --dbname "postgres" \
    --host "localhost" \
    --username "jamesbond" \
    --drop

Few things to keep in mind

  • Partitioned tables are not supported as of yet. Pull requests and ideas welcome.
  • A primary key should exist on the table; without it, pg-osc will raise an exception
    • This is because - currently there is no other way to uniquely identify rows during replay.
  • pg-osc will acquire ACCESS EXCLUSIVE lock on the parent table twice during the operation.
    • First, when setting up the triggers and the shadow table.
    • Next, when performing the swap and updating FK references.
    • Note: If kill-backends is passed, it will attempt to terminate any competing operations during both times.
  • By design, pg-osc doesn't kill any other DDLs being performed. It's best to not run any DDLs against the parent table during the operation.
  • Due to the nature of duplicating a table, there needs to be enough space on the disk to support the operation.
  • Index, constraints and sequence names will be altered and lose their original naming.
    • Can be fixed in future releases. Feel free to open a feature req.
  • Foreign keys are dropped & re-added to referencing tables with a NOT VALID. A follow on VALIDATE CONSTRAINT is run.
    • Ensures that integrity is maintained and re-introducing FKs doesn't acquire additional locks, hence the NOT VALID.

How does it work

  • Primary table: A table against which a potential schema change is to be run
  • Shadow table: A copy of an existing primary table
  • Audit table: A table to store any updates/inserts/delete on a primary table

how-it-works

  1. Create an audit table to record changes made to the parent table.
  2. Acquire a brief ACCESS EXCLUSIVE lock to add a trigger on the parent table (for inserts, updates, deletes) to the audit table.
  3. Create a new shadow table and run ALTER/migration on the shadow table.
  4. Copy all rows from the old table.
  5. Build indexes on the new table.
  6. Replay all changes accumulated in the audit table against the shadow table.
    • Delete rows in the audit table as they are replayed.
  7. Once the delta (remaining rows) is ~20 rows, acquire an ACCESS EXCLUSIVE lock against the parent table within a transaction and:
    • swap table names (shadow table <> parent table).
    • update references in other tables (FKs) by dropping and re-creating the FKs with a NOT VALID.
  8. Runs ANALYZE on the new table.
  9. Validates all FKs that were added with NOT VALID.
  10. Drop parent (now old) table (OPTIONAL).

Development

  • Install ruby 3.3.0
\curl -sSL https://get.rvm.io | bash

rvm install 3.3.0

rvm use 3.3.0
  • Spin up postgres via Docker Compose - docker compose up
  • bundle exec rspec to run the tests.
  • You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install.

Local testing

docker compose up

pgbench --initialize -s 10 --foreign-keys --host localhost -U jamesbond -d postgres
pgbench -T 60000 -c 5 --host localhost -U jamesbond -d postgres

bundle exec bin/pg-online-schema-change perform -a 'ALTER TABLE pgbench_accounts ALTER COLUMN aid TYPE BIGINT' -d "postgres" -h "localhost" -u "jamesbond" -w "password"

Releasing

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/shayonj/pg-osc.

pg-osc's People

Contributors

ahilmer avatar brycethornton avatar dependabot[bot] avatar jfrost avatar jjb avatar lingrino avatar shayonj avatar sylviaziyuzhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pg-osc's Issues

INSERT's are failing for long table names

Hi pg-osc team,

we successfully used your tool to rebuild a bloated table on production without any downtime! It shrank from around 400GB down to 100GB.

During the testing phase we had to apply some custom adoptions to the source code which we would like to contribute back ๐Ÿ˜ƒ

This one handles a bug where an INSERT into the primary table fails during the execution on a large table name ( e.g. this_is_a_table_with_a_very_long_name).

Steps to reproduce

  1. Minimal test setup
drop table if exists "this_is_a_table_with_a_very_long_name";

 CREATE TABLE IF NOT EXISTS "this_is_a_table_with_a_very_long_name" (
id int PRIMARY KEY,
"createdOn" TIMESTAMP NOT NULL
);
      
insert into "this_is_a_table_with_a_very_long_name"("id", "createdOn") values(1, '2012-01-01')
  1. Add a breakpoint

Selection_885

  1. run pg-osc
bundle exec bin/pg-online-schema-change perform  --host localhost --dbname postgres --username celonis  --alter-statement 'alter table this_is_a_table_with_a_very_long_name alter column id TYPE int'
  1. Run an INSERT

Selection_884

==> Result: The INSERT into the primary table fails

Suggested Solution

Read the real sequence name using SELECT pg_get_serial_sequence(:table, :column)

Auto-Vacuum Isn't Re-enabled

It appears that auto-vacuum is turned off via the disable_vacuum! method but it's never turned back on. Is this intentional?

fix something on README about installation

In the Installation section

Or install it yourself as:

$ gem install pg_online_schema_change

Some dependency installation commands need to be added, not everyone is a Ruby expert

The dependencies

gem install ougai
gem install thor
gem install pg
gem install pg_query

Re-enable autovacuum and fix sequence naming post swap

pgosc disables autovacuum for the new table it creates - (probably because we want to go with faster inserts ),so we need to re-enable autovacuum manually for the new fresh table.

Also, it gives some weird name to sequence and primary key which we also need to rename back/correct. In my case , there was a dependent function on that table sequence ,so that stopped working :)

Look into lock_timeout

During the swap, we will need to be knowledgeable of any other long queries happening against the primary table. In which case, should we set a lock_timeout and cancel other queries?

Is there a better experience? -

  • maybe wait for 5 mins and retry upto 3 times before cancelling other queries? (--wait-timeout)
  • or fail hard if pgosc couldn't acquire a lock for the swap perhaps? (continue replaying during this time).
    • If failing hard, then pgosc should do the cleanup.
    • Introducing a flag to convey user preference is also possible
    • Similar to pg_repack? (re-purposing the lessons?) (combination of --no-kill-backend and --wait-timeout)

Drop tables and cleanup

Tracking

Once tasks are performed, drop the old table and clean up. Dropping the old table can be controlled via a new flag. The --dry-run flag is not used currently either. For cleanup, we should

  • restore vacuum settings
  • restore statement timeout
  • restore client_min_messages

Scope of this tool

Hi,

Is this pg-osc, only available for ALTER TABLE statements or does it work for any ALTER statements, like ALTER INDEX and some more..?

As when I tried to run the following alter statement with pg-osc , got an error metioned below.

pg-online-schema-change perform --alter-statement 'ALTER INDEX IF EXISTS unq_trace_id RENAME TO modified_name;' --dbname "testdb" --host "localhost" --username "postgres" --wait-time-for-lock 5 --kill-backends --drop

Error:
/var/lib/gems/3.0.0/gems/pg_online_schema_change-0.9.4/lib/pg_online_schema_change/orchestrate.rb:56:in `run!': Parent table has no primary key, exiting... (PgOnlineSchemaChange::Error)

How does this will work in a live project.

Hi,

I have my live project, that I want to use pg-osc to alter my tables where those tables have billions of rows of data. Now, how can I do it.

I have only idea of every time calling the pg-osc command with alter statements manually.

Other than, this is there any way to do the same task?

Add foreign keys to shadow table during swap

We setup the shadow table with the FKs (if any) so its structurally the same as the primary table since beginning

'ALTER TABLE %s add constraint %s %s',

During INSERT INTO (copying date from primary table to shadow) it can become a blocking operation as the new data is being validated by the FK. Rather, we can trust the FK on the primary table and add back the FKs on the shadow table prior to swap, so post swap they are structurally the same. We can accordingly extend validate_constraint to validate constraint on primary and referential tables.

fails to get view definition if it is from another schema

This is probably related to #112 and #117. When script tries to get view definition it fails because view is not from public schema
image
As you can see it tries to select pg_get_viewdef(format('%I.%I', 'public', dependent_view.relname)::regclass) as view_definition and not all of my views are from public schema. Probably should be something like

SELECT DISTINCT dependent_view.relname as view_name, pg_get_viewdef(format('%I.%I', view_ns.nspname, dependent_view.relname)::regclass) as view_definition
FROM pg_depend
JOIN pg_rewrite ON pg_depend.objid = pg_rewrite.oid
JOIN pg_class as dependent_view ON pg_rewrite.ev_class = dependent_view.oid
JOIN pg_class as source_table ON pg_depend.refobjid = source_table.oid
JOIN pg_namespace source_ns ON source_ns.oid = source_table.relnamespace
JOIN pg_namespace view_ns ON view_ns.oid = dependent_view.relnamespace
where
source_ns.nspname = 'public'
AND source_table.relname = 'pgosc_op_table_users';

Validate constraint post FK referesh

Tracking

After FK's are refreshed (dropped and added) validate constraint from a new transaction. This means it will also happen post swap/rename, since we have limited window during the swap. Post swap the validation constraint will fail hard.

race condition between replay and swap

I had the idea this morning to do a validation test by using a custom pgbench script that applies the same update to two different tables. The setup looks like this:

pgbench -i -s 500 pgbench
CREATE TABLE pgbench_accounts_validate AS SELECT * FROM pgbench_accounts ;
ALTER TABLE pgbench_accounts_validate ADD PRIMARY KEY (aid);

Validate that the tables match before the test:

(TABLE pgbench_accounts EXCEPT TABLE pgbench_accounts_validate)
UNION ALL
(TABLE pgbench_accounts_validate EXCEPT TABLE pgbench_accounts)
;


โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ aid โ”‚ bid โ”‚ abalance โ”‚ filler โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
(0 rows)

Put this:

\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
UPDATE pgbench_accounts_validate SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;

Into ~/git/pg-online-schema-change/pgbench-validate.sql

Kick off a pgbench like so:

pgbench --file ~/git/pg-online-schema-change/pgbench-validate.sql -T 180 pgbench

Validate that the tables are still the same:

pgbench=# (TABLE pgbench_accounts EXCEPT TABLE pgbench_accounts_validate)                                                                                                                                   UNION ALL                                                                                                                                                                                                   (TABLE pgbench_accounts_validate EXCEPT TABLE pgbench_accounts) ;
โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ aid โ”‚ bid โ”‚ abalance โ”‚ filler โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
(0 rows)

Now, kick off the same pgbench as above, but also do a pg-osc run at the same time:

pgbench --file ~/git/pg-online-schema-change/pgbench-validate.sql -T 180 pgbench

and

bundle exec bin/pg-online-schema-change perform --alter-statement 'ALTER TABLE pgbench_accounts ALTER COLUMN aid TYPE BIGINT;' -d pgbench --host localhost --username pgosc --password '' --drop

And compare again after it is all done:

(23314 rows) returned by the EXCEPT query. Hereโ€™s a sample row in both tables:

pgbench=# select * from pgbench_accounts where aid = 99996;
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  aid  โ”‚ bid โ”‚ abalance โ”‚                                        filler                                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 99996 โ”‚   1 โ”‚     8620 โ”‚                                                                                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
(1 row)

Time: 1.589 ms
pgbench=# select * from pgbench_accounts_validate where aid = 99996;
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  aid  โ”‚ bid โ”‚ abalance โ”‚                                        filler                                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 99996 โ”‚   1 โ”‚    10721 โ”‚                                                                                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
(1 row)

Unable to perform alterations more than once on a table.

Hi @shayonj ,

I found, that there is an issue with performing the pg-osc command more than once, when we didn't use --drop. When I haven't used --drop and ran the command, it works fine and created a old table. Now, again I tried to run the command on the same table, but now its raising an error as table already exits as below.
ERROR: relation "pgosc_op_table_employees" already exists (PG::DuplicateTable)
Solution:
I guess, when renaming the old table, you may have give the name format as "pgosc_op_table_{tablename}"--> this will cause the error, To resolve this, You may change the format to "pgosc_op_table_{tablename}_{index}"

Here index/id can be a unique value.

Views aren't updated after the swap

It appears that views that reference the table being re-written are updated at some point to use the temp pgosc_op_table... table but after the swap they continue to reference that temp table, preventing it from being dropped cleanly (you'll get errors mentioning cannot drop table pgosc_op_table_... because other objects depend on it). This appears to be because Postgres stores the view as parsed query tree that keeps a reference to the object identifier instead of just using the name. I found a StackOverflow post that mentions this . They mention that using CREATE OR REPLACE VIEW can fix this, but we'd have to store the original SQL for the view before the swap and then call CREATE OR REPLACE VIEW immediately after the swap to reset it to the original table name.

Ability to show progress

I'm currently attempting to use ps-osc on a two billion row, 482 GB (223 table/259 index) table. Needless to say, it's taking a long time. I'm actually starting to doubt if it's actually working as the output is just this so far after twenty hours:

{"name":"pg-online-schema-change","hostname":"Bryces-MacBook-Pro-M2.local","pid":19905,"level":40,"time":"2023-07-27T17:34:49.736-04:00","v":0,"msg":"DEPRECATED: -w is deprecated. Please pass PGPASSWORD environment variable instead.","version":"0.9.2"}
WARNING:  there is already a transaction in progress
{"name":"pg-online-schema-change","hostname":"Bryces-MacBook-Pro-M2.local","pid":19905,"level":30,"time":"2023-07-27T17:34:51.126-04:00","v":0,"msg":"Setting up audit table","audit_table":"pgosc_at_mytable_c16c36","version":"0.9.2"}
{"name":"pg-online-schema-change","hostname":"Bryces-MacBook-Pro-M2.local","pid":19905,"level":30,"time":"2023-07-27T17:34:51.382-04:00","v":0,"msg":"Setting up triggers","version":"0.9.2"}
WARNING:  there is already a transaction in progress
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
{"name":"pg-online-schema-change","hostname":"Bryces-MacBook-Pro-M2.local","pid":19905,"level":30,"time":"2023-07-27T17:34:51.714-04:00","v":0,"msg":"Setting up shadow table","shadow_table":"pgosc_st_mytable_c16c36","version":"0.9.2"}
WARNING:  there is already a transaction in progress
WARNING:  there is already a transaction in progress
{"name":"pg-online-schema-change","hostname":"Bryces-MacBook-Pro-M2.local","pid":19905,"level":30,"time":"2023-07-27T17:34:51.937-04:00","v":0,"msg":"Running alter statement on shadow table","shadow_table":"pgosc_st_mytable_c16c36","parent_table":"mytable","version":"0.9.2"}
WARNING:  there is already a transaction in progress
{"name":"pg-online-schema-change","hostname":"Bryces-MacBook-Pro-M2.local","pid":19905,"level":30,"time":"2023-07-27T17:34:52.051-04:00","v":0,"msg":"Clearing contents of audit table before copy..","shadow_table":"pgosc_st_mytable_c16c36","parent_table":"mytable","version":"0.9.2"}
WARNING:  there is already a transaction in progress
{"name":"pg-online-schema-change","hostname":"Bryces-MacBook-Pro-M2.local","pid":19905,"level":30,"time":"2023-07-27T17:34:52.148-04:00","v":0,"msg":"Copying contents..","shadow_table":"pgosc_st_mytable_c16c36","parent_table":"mytable","version":"0.9.2"}
WARNING:  there is already a transaction in progress
WARNING:  there is already a transaction in progress

Is there a way to show any sort of progress indicator while the tool is running? I realize this would be challenging as the copy_data! command is just running a query inside a larger transaction. I'm not sure if it's even possible to get visibility into this, but it sure would be nice!

Introduce a PK/id field on audit table for ordered reads

There is a potential edge case right now when reading from audit table. Since the read uses the primary_key of the primary table, it may be out of order if two updates happen for the same row.

Its rather better to have a dedicated id field (PK) on the audit table, that way when reading we can have ordered entries w/o using a timestamp field

SELECT * FROM #{audit_table} ORDER BY #{primary_key} LIMIT #{PULL_BATCH_COUNT};

This field name, needs to be non conflicting (with primary table) as well. Similar to: #47

Validating foreign key constraints locks table

The main issue I've ran into when using pg-osc has been at the very end of the process. If I have foreign keys on the table I'm modifying it tends to lock up during the constraint validation phase. I think this happens because a ShareLock is acquired when the constraint is a foreign key. I'd love to ensure my foreign keys remain valid after the swap but I can't afford to lock up the table when doing it. Is there a way around this? Maybe a flag to turn off constraint validation?

Restore index, foreign key and sequence constraint names

The naming can get weird or look weird when you pgosc has run multiple times on a table. It will always prefix the the prefix_key and additional run into name length limit issues.

This also applies to sequence names.

Can happen as part of cleanup.

Using pg-osc only for backfilling data

Hi,

I would be interested in using pg-osc but without using any alter statement. My use case would be to get rid of a bloated table by creating a new table and filling it with (a delta) of the original table.

E.g. like this:

pg-online-schema-change perform \
  --dbname "postgres" \
  --host "localhost" \
  --username "jamesbond" \

Would be adding a "fake" alter statement be a workaround or do you have any other ideas?

  --alter-statement 'ALTER TABLE books ADD COLUMN email IF NOT EXISTS new_email varchar' 

Ability to avoid any DDLs while pgosc is in motion

When pgosc is copying, replaying, etc - no other process/transaction should be able to perform DDL on the primary table.

Can be achieved by holding a access share lock on the primary table, except during swap. Probably from separate connection (?)

Potential need for a `pull-batch-delay` option

I ran into an issue recently while testing this library on a rather large (250GB) table on Heroku. I've used this tool in a few other cases but this is the largest table I've tried yet. Here's the message I got from Heroku while running the alter statement:

Your database is currently unable to keep up with writes.

Specifically, Postgres is unable to archive Postgres write-ahead logs (WAL) fast enough to keep up with write volume. WAL archiving is critical to maintaining continuous protection of data.

If the WAL drive fills completely, the database will shut down and will be at risk of data loss. To prevent this, Heroku is temporarily throttling your database connections to allow the backlog of WAL to be archived.

Heroku strongly recommends pausing or reducing any bulk data loading activity that is running.

Heroku will remove the connection throttling once WAL archiving is able to keep up with database writes.

Read more about this here: https://devcenter.heroku.com/articles/postgres-write-ahead-log-usage

So, it was writing too fast and eventually the connection was killed. My assumption is that it might be able to succeed if I slow down the batches a bit by adding an optional delay/sleep argument for each batch. I don't really want the job to take even longer but it might be the only way in this situation. What are your thoughts on adding this? I'm happy to do the work, I just wanted to check to see if there were other potential options for this case before I opened a PR.

Thanks.

Set the appropriate next sequence value on the new table

First, thank you for creating this project. It's great to have something that can automate table re-writing with all the best practices to avoid downtime.

When I used this a few months back everything worked great besides the primary key's sequence value. The value was left at 1 which meant I immediately starting seeing duplicate key issues. When I figured out what was happening all I needed to do was manually set the value of the sequence to the max id + 1 of the primary key and things started working again.

  1. Is this a known issue? I've seen other issues around sequence naming but I haven't seen others mention the actual sequence value. I feel like I'm missing something.

  2. Would there be any potential issues with executing something like this pseudocode at the end of the swap! method?

SELECT SETVAL('#{client.pk_sequence}', (SELECT MAX(#{client.pk}) FROM #{client.table})+1);

Another idea I had was to simply allow a configurable --post-swap-statement option that would run custom SQL to do whatever lingering cleanup was needed for the user's specific case.

Add a preview/dry-run command

One idea I had for pg-osc would be to do some quick observations about the table (and related tables) before actually starting the process. We could warn the user about any potential issues with large tables, constraints, etc. Or, we could just keep this functionality as it's own process that runs with a preview command instead of perform. Just thoughts. Here's what the output could look like:

pg-online-schema-change preview {all the same alter/connection options as perform}

Table to alter: table_name_from_alter_statement
Estimated number of records in table: 345,000,000
Size of table on disk: 30GB
Estimated time to write new table: 8 hours (based on a test run of 1% of your records)
Estimated time to validate constraints: 6 hours (based on ?)
Constraints to validate:
- Referential foreign key: column_name -> foreign_key_table (1.2B records) (WARNING: this could take a significant amount of time due to the large number of records)
- Referential foreign key: another_column_name -> another_table (200 records)
- Self foreign key: some_table has foreign key to table_name_from_alter_statement.id (20M records)
- Self foreign key: some_table2 has foreign key to table_name_from_alter_statement.id (2M records)
- Self foreign key: some_table3 has foreign key to table_name_from_alter_statement.id (2B records) (WARNING: this could take a significant amount of time due to the large number of records)

Suggestions:
- Given the large number of records in some related tables you may want to use the `--skip-foreign-key-validation` flag and then validate constraints separately to speed up the process.
- ???

Ability to modify the SQL used to populate the new table

If we allow users the ability to modify the SQL used to populate the new table in copy_data!, then we enable backfilling of new or existing columns. This is an advanced feature which needs a stern warning that the wrong query will eat your data.

Integration/smoke test

We should be able to do the following to have a high signal integration spec

  • Spin up a PG container
  • Setup pgbench - pgbench --initialize -s 10
  • Setup validation table
CREATE TABLE pgbench_accounts_validate AS SELECT * FROM pgbench_accounts ;
ALTER TABLE pgbench_accounts_validate ADD PRIMARY KEY (aid);
  • Run with a concurrency of 2 - pgbench --file spec/fixtures/bench.sql -T 600 -c 2, playing the following
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
UPDATE pgbench_accounts_validate SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;
  • Run pg-osc
  • Ensure that the new pgbench_accounts table matches in content with pgbench_accounts_validate.

Can use rspec to setup and assert with the following

(TABLE pgbench_accounts EXCEPT TABLE pgbench_accounts_validate)
UNION ALL
(TABLE pgbench_accounts_validate EXCEPT TABLE pgbench_accounts);

h/t to @jfrost

Update Readme reg- docker setup of pg-osc

Hey,

please update readme regarding the installation of pg-osc through docker and give steps to use it using docker.

image
Also, explain the fields and could you please provide the values of the fields.

Add additional safety for custom copy sql

We can add an additional safety measure against the custom SQL destroying all or most of someone's table. Simply compare the results from SELECT reltuples FROM pg_class WHERE relname = <old table> against the same query for the new table after the ANALYZE has been run, but before the tables are swapped. Since this is an estimate, we should probably use a comparison that looks something like >= 0.95 * old_tuples. We should also add a flag like --copy_percentage that lets you set a lower threshold for that comparison for use cases where the user is purposely deleting much of the table data.

This would help guard against a user not understanding the documentation and using something unfortunate like:

-- file: /src/query.sql
INSERT INTO %{shadow_table}(foo, bar, baz, rental_id, tenant_id)
SELECT 1,1,1,1,1

Release process

Tracking

Rolling out a sem-ver based release process using CI and git based tagging. Artifacts:

  • gem
  • docker image (easily plugable in container-environments with flags as envs or similar)

Will need to look whats out there to support docker image releases.

Some issues while working with partitioned table

We should modify readme to not use it with partitioned table :) atleast for now/unless we address these issues :) .

When I inserted some data on my partitioned table and after that just quickly ran pgosc on that partitioned table in order to check out how does it work.. it actually , created another normal table -added column there but didn't copy the data

-- Here are the tests:

-- my partitioned table with some data:

CREATE TABLE shiwangini.events2 (
  device_id bigint,
  event_id bigserial,
  event_time timestamptz default now(),
  data jsonb not null,
  PRIMARY KEY (device_id, event_id)
  
)PARTITION BY Hash (device_id);

create table shiwangini.events2001 partition of shiwangini.events2 FOR VALUES WITH (modulus 4, remainder 1);
create table shiwangini.events2002 partition of shiwangini.events2 FOR VALUES WITH (modulus 4, remainder 2);
create table shiwangini.events2003 partition of shiwangini.events2 FOR VALUES WITH (modulus 4, remainder 3);
create table shiwangini.events2004 partition of shiwangini.events2 FOR VALUES WITH (modulus 4, remainder 0);


INSERT INTO shiwangini.events2 (device_id, data)
SELECT s % 100, ('{"measurement":'||random()||'}')::jsonb FROM generate_series(1,100000) s;

-- I can see the data now:

table shiwangini.events2 ;

Now, I ran below pgosc command to add a column.. the statement executed successfully on console:

pg-online-schema-change perform --alter-statement 'alter table events2 add column "name" varchar ;' --schema "shiwangini" --dbname "dev" --host "xx.xx.xx.xx"  --username "dev"  --pull-batch-count 1000 --delta-count 20   --wait-time-for-lock 5
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":40,"time":"2022-03-19T17:28:55.211+00:00","v":0,"msg":"DEPRECATED: -w is deprecated. Please pass PGPASSWORD environment variable instead.","version":"0.7.1"}
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.240+00:00","v":0,"msg":"Setting up audit table","audit_table":"pgosc_at_events2_f6ffc9","version":"0.7.1"}
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.251+00:00","v":0,"msg":"Setting up triggers","version":"0.7.1"}
WARNING:  there is already a transaction in progress
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.258+00:00","v":0,"msg":"Setting up shadow table","shadow_table":"pgosc_st_events2_f6ffc9","version":"0.7.1"}
WARNING:  there is already a transaction in progress
WARNING:  there is already a transaction in progress
WARNING:  there is already a transaction in progress
WARNING:  there is already a transaction in progress
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.276+00:00","v":0,"msg":"Running alter statement on shadow table","shadow_table":"pgosc_st_events2_f6ffc9","parent_table":"events2","version":"0.7.1"}
WARNING:  there is already a transaction in progress
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.279+00:00","v":0,"msg":"Clearing contents of audit table before copy..","shadow_table":"pgosc_st_events2_f6ffc9","parent_table":"events2","version":"0.7.1"}
WARNING:  there is already a transaction in progress
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.281+00:00","v":0,"msg":"Copying contents..","shadow_table":"pgosc_st_events2_f6ffc9","parent_table":"events2","version":"0.7.1"}
WARNING:  there is already a transaction in progress
WARNING:  there is already a transaction in progress
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.290+00:00","v":0,"msg":"Performing ANALYZE!","version":"0.7.1"}
INFO:  analyzing "shiwangini.events2" inheritance tree
INFO:  "events2001": scanned 350 of 350 pages, containing 30000 live rows and 0 dead rows; 9013 rows in sample, 30000 estimated total rows
INFO:  "events2002": scanned 291 of 291 pages, containing 25000 live rows and 0 dead rows; 7494 rows in sample, 25000 estimated total rows
INFO:  "events2003": scanned 198 of 198 pages, containing 17000 live rows and 0 dead rows; 5099 rows in sample, 17000 estimated total rows
INFO:  "events2004": scanned 326 of 326 pages, containing 28000 live rows and 0 dead rows; 8394 rows in sample, 28000 estimated total rows
INFO:  analyzing "shiwangini.events2001"
INFO:  "events2001": scanned 350 of 350 pages, containing 30000 live rows and 0 dead rows; 30000 rows in sample, 30000 estimated total rows
INFO:  analyzing "shiwangini.events2002"
INFO:  "events2002": scanned 291 of 291 pages, containing 25000 live rows and 0 dead rows; 25000 rows in sample, 25000 estimated total rows
INFO:  analyzing "shiwangini.events2003"
INFO:  "events2003": scanned 198 of 198 pages, containing 17000 live rows and 0 dead rows; 17000 rows in sample, 17000 estimated total rows
INFO:  analyzing "shiwangini.events2004"
INFO:  "events2004": scanned 326 of 326 pages, containing 28000 live rows and 0 dead rows; 28000 rows in sample, 28000 estimated total rows
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.872+00:00","v":0,"msg":"Remaining rows below delta count, proceeding towards swap","version":"0.7.1"}
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.872+00:00","v":0,"msg":"Performing swap!","version":"0.7.1"}
WARNING:  there is already a transaction in progress
WARNING:  there is already a transaction in progress
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.874+00:00","v":0,"msg":"Replaying rows, count: 0","version":"0.7.1"}
WARNING:  there is already a transaction in progress
WARNING:  there is already a transaction in progress
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.878+00:00","v":0,"msg":"Performing ANALYZE!","version":"0.7.1"}
INFO:  analyzing "shiwangini.events2"
INFO:  "events2": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.879+00:00","v":0,"msg":"Validating constraints!","version":"0.7.1"}
{"name":"pg-online-schema-change","hostname":"dev-host","pid":1714106,"level":30,"time":"2022-03-19T17:28:55.887+00:00","v":0,"msg":"All tasks successfully completed","version":"0.7.1"}

But,now when I run table shiwangini.events2; I don't see any data. However, when I manually check schema of events2 table - I see name column has been added.(but, I don't see any data there)

-- shiwangini.events2 definition

-- Drop table

-- DROP TABLE shiwangini.events2;

CREATE TABLE shiwangini.events2 (
	device_id int8 NOT NULL,
	event_id int8 NOT NULL DEFAULT nextval('shiwangini.pgosc_st_events2_f6ffc9_event_id_seq'::regclass),
	event_time timestamptz NULL DEFAULT now(),
	"data" jsonb NOT NULL,
	"name" varchar NULL,
	CONSTRAINT pgosc_st_events2_f6ffc9_pkey PRIMARY KEY (device_id, event_id)
)
WITH (
	autovacuum_enabled=false
);


However , I see another (probably old table with the name : pgosc_op_table_events2) - which has old data and old partitions and exactly old schema. If I specify --drop in the end , the old table with all data get dropped :D

Ability to reverse the change post swap

Idea: pgosc should support the ability to reverse the change (with no data loss) after the alter statements and swap has happened. pgosc should make sure that the data is being replayed in both directions (tables) before and after the swap. So in case of any issues, you can always go back to the original table.

Requires re-acrchitecting some core constructs. Most things should be re-usable.

A separate command/invocation point can be used to go back to the previous state. I am thinking -

  1. Perform alterations as usual without dropping the table in the end (thus making it easier to go back). Example:
pg-online-schema-change perform -a "ALTER..." --drop false ....
  1. Next, perform reversal by providing the primary table name
pg-online-schema-change reverse -t "books"

This involves re-transferring the FKs and running analyze (?).

Bulk push remaining rows in audit table before swap

After we have initiated the swap (when remaining rows are below 20), we should do one last replay of all rows from audit to shadow before rename to account for any rows missed in the time it takes to execute the swap. This is a race condition that can happen on tables with extreme high amount of writes. Tweaking DELTA_COUNT is also another option (which can be exposed via flag).

https://github.com/shayonj/pg-online-schema-change/blob/3c48a294fc8f896e9ea6155e452127a9693feccf/lib/pg_online_schema_change/orchestrate.rb#L167-L174

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.