I'm currently attempting to use ps-osc on a two billi

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Ability to show progress,about shayonj/pg-osc

brycethornton commented on August 21, 2024 1

Yeah, I haven't been able to find a way to do it since the shadow table creation and insertion are all within the same transaction. I can't get any visibility.

from pg-osc.

brycethornton commented on August 21, 2024 1

My workaround on RDS was to monitor the free storage space. Since this was such a big table I could roughly see how far along it was as free storage space shrunk:

from pg-osc.

jfrost commented on August 21, 2024 1

Oh good point and you could watch the raw metric in cloudwatch if it's a smaller table. That would work well enough for DBs with relatively stable sizes and big tables like you mentioned!

from pg-osc.

ngan commented on August 21, 2024 1

@shayonj is it possible to do this on the shadow table and compare it to the original table? That'll give you a decent sense of progress. I have no idea whether this works on the shadow table while it's in a transaction.

SELECT pg_size_pretty(pg_total_relation_size('"<schema>"."<table>"'));

from pg-osc.

jfrost commented on August 21, 2024

You could take a look at pg_stat_activity with something like:

SELECT *
  FROM pg_stat_activity
 WHERE state <> 'idle'
;

and see what's happening on the db side.

from pg-osc.

brycethornton commented on August 21, 2024

@jfrost Yeah, I can see the INSERT INTO running in the process list. Just no idea how long it will take. My original command failed with an PQconsumeInput() server closed the connection unexpectedly (PG::ConnectionBad) error after over 20 hours. I have it running again to see if I can actually get it to run successfully. It brings to mind so many nice potential features:

Showing progress
Restarting a failed process (need to keep persisted state somehow)
Some sort of preview/recommendation feature that scan the size of your database, looks at the host resources (CPU/Memory), and various PG settings (max_wal_size etc.), and gives an idea of how long the process might take and any tweaks that might help.

Anyway, we'll see how this turns out. I also have a custom SQL script running on a test instance that uses a different approach for changing a primary key from int4 to int8. Hopefully one of these will succeed and I can run it on the real thing and move on.

from pg-osc.

brycethornton commented on August 21, 2024

Quick update, I did finally get the command to finish on a test instance. It took a little over 21 hours. Some way to show progress would be nice, though. Feel free to close this issue if there's no viable way to do it.

from pg-osc.

sasharevzin commented on August 21, 2024

@brycethornton They have --verbose command to show you what is currently happening.

from pg-osc.

brycethornton commented on August 21, 2024

They have --verbose command to show you what is currently happening.

@sasharevzin Thanks. Yeah, that does add some handy output but the specific area where I would love some more visibility is during the Copying contents.. portion. I know what happens during this time but I don't know how far long it is at any given time. It's not a huge deal on smaller tables but on a really large table it would be really hand to know that it's 25%, 50%, 75%, etc. done with the "INSERT INTO..." statement. I don't think there's a good way to do this unfortunately.

from pg-osc.

shayonj commented on August 21, 2024

I wonder if we periodically compare the count of the rows between the two tables and log a percentage complete every minute or so. I guess it might be hard to do with the transaction that is running the copy not completed 🤔

from pg-osc.

jfrost commented on August 21, 2024

Sorry I'm late writing back...vacation got a little hectic. The joys of MVCC keep us from really seeing what the row count is. The only thing I can think of would be to look at the on disk file sizes, but you can only do that if you have access to the filesystem, so no way to do it on platforms like RDS.

from pg-osc.

shayonj commented on August 21, 2024

Thats a good call. I think a good approach would be to aim for a balanced estimation. It seems using pg_table_size would be more suitable in this scenario, given that pg_total_relation_size includes indices, which aren't present in the shadow table during the copy_data! process. pg_table_size gives us a closer approximation of the table's size alone (like say if we were to instead perform a count(*) on both tables for estimate).

However, there are a few things to keep in mind, such as the possibility of the original table being bloated. To address this, we could run an analyze and vacuum on it beforehand. This also means the progress indicator in the logs might not reach 100%, as the new table could end up being smaller. But I think that's acceptable, as long as we have some form of an ongoing progress indicator.

PR incoming shortly.

from pg-osc.

shayonj commented on August 21, 2024

~~ah! i missed that the table creation happens from the same transaction as well, which means the table isn't visible to another connection either.~~

~~This is interesting....~~

pg-osc/lib/pg_online_schema_change/orchestrate.rb

Lines 166 to 175 in baa638f

    
           def setup_shadow_table! 
        
             # re-uses transaction with serializable 
        
             # This ensures that all queries from here till copy_data run with serializable. 
        
             # This is to to ensure that once the trigger is added to the primay table 
        
             # and contents being copied into the shadow, after a delete all on audit table, 
        
             # any replaying of rows that happen next from audit table do not contain 
        
             # any duplicates. We are ensuring there are no race conditions between 
        
             # adding the trigger, till the copy ends, since they all happen in the 
        
             # same serializable transaction. 
        
             Query.run(client.connection, "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE", true)

UPDATE: Fixed

from pg-osc.

shayonj commented on August 21, 2024

v0.9.7 is now out with the new estimated progress in logs during copy: https://github.com/shayonj/pg-osc/releases/tag/v0.9.7

Please feel free to share any ideas, feedback and bugs. Thanks everyone for all the brainstorming and feedback.

Closing now.

from pg-osc.

brycethornton commented on August 21, 2024

Thank you, @shayonj! I'm excited to give this a spin soon!

from pg-osc.

shayonj commented on August 21, 2024

Woot! Would love to hear your thoughts.

Going to release 0.9.8 today evening. Looks like there are some gem version conflicts currently. Stay tuned.

UPDATE: https://github.com/shayonj/pg-osc/releases/tag/v0.9.8

from pg-osc.

Ability to show progress about pg-osc HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def setup_shadow_table!
	# re-uses transaction with serializable
	# This ensures that all queries from here till copy_data run with serializable.
	# This is to to ensure that once the trigger is added to the primay table
	# and contents being copied into the shadow, after a delete all on audit table,
	# any replaying of rows that happen next from audit table do not contain
	# any duplicates. We are ensuring there are no race conditions between
	# adding the trigger, till the copy ends, since they all happen in the
	# same serializable transaction.
	Query.run(client.connection, "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE", true)