Comments (18)
Could you give it another test and run this query as your RDS admin user when you see those waiting locks:
WITH RECURSIVE
c(requested, current) AS
( VALUES
('AccessShareLock'::text, 'AccessExclusiveLock'::text),
('RowShareLock'::text, 'ExclusiveLock'::text),
('RowShareLock'::text, 'AccessExclusiveLock'::text),
('RowExclusiveLock'::text, 'ShareLock'::text),
('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('RowExclusiveLock'::text, 'ExclusiveLock'::text),
('RowExclusiveLock'::text, 'AccessExclusiveLock'::text),
('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
('ShareUpdateExclusiveLock'::text, 'ShareLock'::text),
('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text),
('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text),
('ShareLock'::text, 'RowExclusiveLock'::text),
('ShareLock'::text, 'ShareUpdateExclusiveLock'::text),
('ShareLock'::text, 'ShareRowExclusiveLock'::text),
('ShareLock'::text, 'ExclusiveLock'::text),
('ShareLock'::text, 'AccessExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'ShareLock'::text),
('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text),
('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text),
('ExclusiveLock'::text, 'RowShareLock'::text),
('ExclusiveLock'::text, 'RowExclusiveLock'::text),
('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
('ExclusiveLock'::text, 'ShareLock'::text),
('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('ExclusiveLock'::text, 'ExclusiveLock'::text),
('ExclusiveLock'::text, 'AccessExclusiveLock'::text),
('AccessExclusiveLock'::text, 'AccessShareLock'::text),
('AccessExclusiveLock'::text, 'RowShareLock'::text),
('AccessExclusiveLock'::text, 'RowExclusiveLock'::text),
('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
('AccessExclusiveLock'::text, 'ShareLock'::text),
('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
('AccessExclusiveLock'::text, 'ExclusiveLock'::text),
('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text)
),
l AS
(
SELECT
(pl.locktype,pl.DATABASE,pl.relation::regclass::text,pl.page,pl.tuple,pl.virtualxid,pl.transactionid,pl.classid,pl.objid,pl.objsubid) AS target,
pl.virtualtransaction,
pl.pid,
pl.mode,
pl.granted,
psa.query
FROM pg_catalog.pg_locks pl
JOIN pg_catalog.pg_stat_activity psa
ON pl.pid = psa.pid
),
t AS
(
SELECT
blocker.target AS blocker_target,
blocker.pid AS blocker_pid,
blocker.mode AS blocker_mode,
blocker.query AS blocker_query,
blocked.target AS target,
blocked.pid AS pid,
blocked.mode AS mode
FROM l blocker
JOIN l blocked
ON ( NOT blocked.granted
AND blocker.granted
AND blocked.pid != blocker.pid
AND blocked.target IS NOT DISTINCT FROM blocker.target)
JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)
),
r AS
(
SELECT
blocker_target,
blocker_pid,
blocker_mode,
'1'::int AS depth,
target,
pid,
mode,
blocker_pid::text || ',' || pid::text AS seq,
blocker_query
FROM t
UNION ALL
SELECT
blocker.blocker_target,
blocker.blocker_pid,
blocker.blocker_mode,
blocker.depth + 1,
blocked.target,
blocked.pid,
blocked.mode,
blocker.seq || ',' || blocked.pid::text,
blocker.blocker_query
FROM r blocker
JOIN t blocked
ON (blocked.blocker_pid = blocker.pid)
WHERE blocker.depth < 1000
)
SELECT * FROM r
ORDER BY seq;
Would also be good to just grab a dump of pg_stat_activity like so:
SELECT * FROM pg_stat_activity;
Make sure to REDACT any sensitive info before pasting it into your reply.
from pg-osc.
If the query I sent over returned zero results while the issue is happening and you can see the lock waits in Performance Insights, then it's possible you are running the query as a user which doesn't have the appropriate permissions. You can do a simple SELECT * FROM pg_stat_activity;
to see if you user can see queries or not.
from pg-osc.
@shayonj Apparently the issue is the audit table is getting locked because of the ALTER TABLE #{audit_table} SET (autovacuum_enabled = false, toast.autovacuum_enabled = false);
query run in transaction which commits after data is copied.
So it gets stuck on the trigger that wants to acquire row level lock but can't acquire on audit table to insert the log of insert/update. And yes this behavior is only on RDS, we still haven't figured out why.
Running the disable vacuum query before the serializable transaction starts fixes the issue. I have this fork https://github.dev/yash-toddleapp/pg-osc which has this fix with some other fixes as well. I'll make a PR once I get time.
We have tables as big as 50-100 million rows
from pg-osc.
BTW, it might also be helpful to see the queries section of Performance insights from the same time period you posted the screenshot. Again, remember to blur any sensitive info if you post that.
from pg-osc.
Yeah, the output of the above query will def be useful.
Thinking out loud - I wonder if its a by product of using TRANSACTION ISOLATION LEVEL SERIALIZABLE
until we add the triggers (during initial copy), this can be prohibitive on large DBs 🤔
from pg-osc.
@shayonj , I also think that it's the byproduct of TRANSACTION ISOLATION LEVEL SERIALIZABLE
. Because the transaction carries from setup_shadow_table
to copy_data
functions. What if we don't do copy_data
in the same transaction?
@jfrost Also here are the screenshots of queries section of performance insights. Largely the updates query on the original tables are stuck. Here INSERT INTO
is happening to the shadow table, and update
is happening on the original table.
from pg-osc.
@shayonj I am not finding locks per say with the query @jfrost shared but this is the behavior while trying to replicate the same on local. You can see how queries goes from taking < 10ms to like 400ms and more.
clip of what's happening while trying to replicate it on local: https://imgur.com/M5h5ZPm
from pg-osc.
Yeah, thats interesting. If you have any reproducible script or similar, that could be super useful. I also see that in the video, the latency spikes for a handful of queries but most are <100ms (?).
I will take a deeper look and get back within the week. I have been meaning to refactor parts of this code and introduce concept of two connections to setup copy and trigger, like pg_repack, which I think would allow us to also drop the SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
. In the meantime, any more info like @jfrost mentioned or reproducible script would be super useful. Thanks!!
from pg-osc.
@shayonj as of now we have only been able to replicate the locking behaviour on our Amazon's RDS instances (it doesn't happen on local environment). And also copying data in batches instead of a single query solves. So I don't think serialised transaction is the issue.
from pg-osc.
@yash-toddleapp Any chance you are running those UPDATE queries on your RDS instance with serializable isolation or something other than read committed?
from pg-osc.
@shayonj @jfrost Nah I don't think so. I found something more weird. The table only locks while INSERT INTO #{shadow_table}(#{insert_into_columns.join(", ")}) SELECT #{select_columns.join(", ")} FROM ONLY #{client.table_name}
when ran from the pg-osc.
What I did was make the list of all the queries that were being executed with the help of --verbose flag. And ran them using psql. And it does not lock the table. So there's something in the tool itself I don't know what that is causing the lock.
from pg-osc.
Interesting! I will look a bit more deeply in the next few days and get back
from pg-osc.
So, i am unable to replicate this in our smoke spec env and not seeing any consistent locking the on the parent table. I am curious
- How did you pin it down to the issue being at
INSERT INTO #{shadow_table}(#{insert_into_columns.join(", ")}) SELECT #{select_columns.join(", ")}
- Also, what version of PG you are on?
- Lastly, do you have any custom setting set on your RDS instances?
Thanks
from pg-osc.
*also how big the table is roughly how many writes/updates it gets. Thanks!
from pg-osc.
That’s a good find! And makes sense. Since serializable transaction shouldn’t be causing issues. I will get in my refactor after your patch. Let me know if I can help with the patch or anything. Thanks!
curious why it’s only happening on RDS. Will try to look into it later, but I can see how the alter can cause an access exclusive lock and a lock queue.
from pg-osc.
I proposed a simplified change here: #97
Feel free to add to it or open new PRs with any other fixes you found, also if you are able to test/verify, then that'd be great too. Thanks!
from pg-osc.
v.0.9.2 is now out with the fix: https://github.com/shayonj/pg-osc/releases/tag/v0.9.2
from pg-osc.
Closing this, thanks for the report and brainstorming! please feel free to open reports or suggest PRs too. Thanks again!
from pg-osc.
Related Issues (20)
- Set the appropriate next sequence value on the new table HOT 11
- 很不错的工具 HOT 1
- Potential need for a `pull-batch-delay` option HOT 4
- Views aren't updated after the swap HOT 7
- Auto-Vacuum Isn't Re-enabled HOT 14
- Using pg-osc only for backfilling data HOT 4
- Ability to show progress HOT 16
- Views from different schema aren't updated because of error HOT 1
- INSERT's are failing for long table names HOT 2
- Validating foreign key constraints locks table HOT 17
- Scope of this tool HOT 1
- Add a preview/dry-run command
- fails to get view definition if it is from another schema HOT 4
- Does pg-osc is able to insert millions of rows to an existing table? HOT 2
- will support `after` sql syntax ? HOT 1
- How does this will work in a live project. HOT 4
- Unable to perform alterations more than once on a table. HOT 2
- Update Readme reg- docker setup of pg-osc HOT 2
- Error case with delta-count and pull-batch-count values HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pg-osc.