Giter Site home page Giter Site logo

Comments (18)

jfrost avatar jfrost commented on August 21, 2024 1

Could you give it another test and run this query as your RDS admin user when you see those waiting locks:

WITH RECURSIVE
     c(requested, current) AS
       ( VALUES
         ('AccessShareLock'::text, 'AccessExclusiveLock'::text),
         ('RowShareLock'::text, 'ExclusiveLock'::text),
         ('RowShareLock'::text, 'AccessExclusiveLock'::text),
         ('RowExclusiveLock'::text, 'ShareLock'::text),
         ('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
         ('RowExclusiveLock'::text, 'ExclusiveLock'::text),
         ('RowExclusiveLock'::text, 'AccessExclusiveLock'::text),
         ('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
         ('ShareUpdateExclusiveLock'::text, 'ShareLock'::text),
         ('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
         ('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text),
         ('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text),
         ('ShareLock'::text, 'RowExclusiveLock'::text),
         ('ShareLock'::text, 'ShareUpdateExclusiveLock'::text),
         ('ShareLock'::text, 'ShareRowExclusiveLock'::text),
         ('ShareLock'::text, 'ExclusiveLock'::text),
         ('ShareLock'::text, 'AccessExclusiveLock'::text),
         ('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text),
         ('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
         ('ShareRowExclusiveLock'::text, 'ShareLock'::text),
         ('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
         ('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text),
         ('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text),
         ('ExclusiveLock'::text, 'RowShareLock'::text),
         ('ExclusiveLock'::text, 'RowExclusiveLock'::text),
         ('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
         ('ExclusiveLock'::text, 'ShareLock'::text),
         ('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
         ('ExclusiveLock'::text, 'ExclusiveLock'::text),
         ('ExclusiveLock'::text, 'AccessExclusiveLock'::text),
         ('AccessExclusiveLock'::text, 'AccessShareLock'::text),
         ('AccessExclusiveLock'::text, 'RowShareLock'::text),
         ('AccessExclusiveLock'::text, 'RowExclusiveLock'::text),
         ('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
         ('AccessExclusiveLock'::text, 'ShareLock'::text),
         ('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
         ('AccessExclusiveLock'::text, 'ExclusiveLock'::text),
         ('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text)
       ),
     l AS
       (
         SELECT
             (pl.locktype,pl.DATABASE,pl.relation::regclass::text,pl.page,pl.tuple,pl.virtualxid,pl.transactionid,pl.classid,pl.objid,pl.objsubid) AS target,
             pl.virtualtransaction,
             pl.pid,
             pl.mode,
             pl.granted,
             psa.query
           FROM pg_catalog.pg_locks pl
           JOIN pg_catalog.pg_stat_activity psa
             ON pl.pid = psa.pid
       ),
     t AS
       (
         SELECT
             blocker.target  AS blocker_target,
             blocker.pid     AS blocker_pid,
             blocker.mode    AS blocker_mode,
             blocker.query   AS blocker_query,
             blocked.target  AS target,
             blocked.pid     AS pid,
             blocked.mode    AS mode
           FROM l blocker
           JOIN l blocked
             ON ( NOT blocked.granted
              AND blocker.granted
              AND blocked.pid != blocker.pid
              AND blocked.target IS NOT DISTINCT FROM blocker.target)
           JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)
       ),
     r AS
       (
         SELECT
             blocker_target,
             blocker_pid,
             blocker_mode,
             '1'::int        AS depth,
             target,
             pid,
             mode,
             blocker_pid::text || ',' || pid::text AS seq,
             blocker_query
           FROM t
         UNION ALL
         SELECT
             blocker.blocker_target,
             blocker.blocker_pid,
             blocker.blocker_mode,
             blocker.depth + 1,
             blocked.target,
             blocked.pid,
             blocked.mode,
             blocker.seq || ',' || blocked.pid::text,
             blocker.blocker_query
           FROM r blocker
           JOIN t blocked
             ON (blocked.blocker_pid = blocker.pid)
           WHERE blocker.depth < 1000
       )
SELECT * FROM r
  ORDER BY seq;

Would also be good to just grab a dump of pg_stat_activity like so:

SELECT * FROM pg_stat_activity;

Make sure to REDACT any sensitive info before pasting it into your reply.

from pg-osc.

jfrost avatar jfrost commented on August 21, 2024 1

If the query I sent over returned zero results while the issue is happening and you can see the lock waits in Performance Insights, then it's possible you are running the query as a user which doesn't have the appropriate permissions. You can do a simple SELECT * FROM pg_stat_activity; to see if you user can see queries or not.

from pg-osc.

yash-toddleapp avatar yash-toddleapp commented on August 21, 2024 1

@shayonj Apparently the issue is the audit table is getting locked because of the ALTER TABLE #{audit_table} SET (autovacuum_enabled = false, toast.autovacuum_enabled = false); query run in transaction which commits after data is copied.

So it gets stuck on the trigger that wants to acquire row level lock but can't acquire on audit table to insert the log of insert/update. And yes this behavior is only on RDS, we still haven't figured out why.

Running the disable vacuum query before the serializable transaction starts fixes the issue. I have this fork https://github.dev/yash-toddleapp/pg-osc which has this fix with some other fixes as well. I'll make a PR once I get time.

We have tables as big as 50-100 million rows

from pg-osc.

jfrost avatar jfrost commented on August 21, 2024

BTW, it might also be helpful to see the queries section of Performance insights from the same time period you posted the screenshot. Again, remember to blur any sensitive info if you post that.

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

Yeah, the output of the above query will def be useful.

Thinking out loud - I wonder if its a by product of using TRANSACTION ISOLATION LEVEL SERIALIZABLE until we add the triggers (during initial copy), this can be prohibitive on large DBs 🤔

Query.run(client.connection, "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE", true)

from pg-osc.

dhrumil-toddleapp avatar dhrumil-toddleapp commented on August 21, 2024

@shayonj , I also think that it's the byproduct of TRANSACTION ISOLATION LEVEL SERIALIZABLE. Because the transaction carries from setup_shadow_table to copy_data functions. What if we don't do copy_data in the same transaction?

@jfrost Also here are the screenshots of queries section of performance insights. Largely the updates query on the original tables are stuck. Here INSERT INTO is happening to the shadow table, and update is happening on the original table.

Screenshot 2023-06-25 at 10 06 45 AM

from pg-osc.

yash-toddleapp avatar yash-toddleapp commented on August 21, 2024

@shayonj I am not finding locks per say with the query @jfrost shared but this is the behavior while trying to replicate the same on local. You can see how queries goes from taking < 10ms to like 400ms and more.

clip of what's happening while trying to replicate it on local: https://imgur.com/M5h5ZPm

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

Yeah, thats interesting. If you have any reproducible script or similar, that could be super useful. I also see that in the video, the latency spikes for a handful of queries but most are <100ms (?).

I will take a deeper look and get back within the week. I have been meaning to refactor parts of this code and introduce concept of two connections to setup copy and trigger, like pg_repack, which I think would allow us to also drop the SET TRANSACTION ISOLATION LEVEL SERIALIZABLE. In the meantime, any more info like @jfrost mentioned or reproducible script would be super useful. Thanks!!

from pg-osc.

yash-toddleapp avatar yash-toddleapp commented on August 21, 2024

@shayonj as of now we have only been able to replicate the locking behaviour on our Amazon's RDS instances (it doesn't happen on local environment). And also copying data in batches instead of a single query solves. So I don't think serialised transaction is the issue.

from pg-osc.

jfrost avatar jfrost commented on August 21, 2024

@yash-toddleapp Any chance you are running those UPDATE queries on your RDS instance with serializable isolation or something other than read committed?

from pg-osc.

yash-toddleapp avatar yash-toddleapp commented on August 21, 2024

@shayonj @jfrost Nah I don't think so. I found something more weird. The table only locks while INSERT INTO #{shadow_table}(#{insert_into_columns.join(", ")}) SELECT #{select_columns.join(", ")} FROM ONLY #{client.table_name}
when ran from the pg-osc.

What I did was make the list of all the queries that were being executed with the help of --verbose flag. And ran them using psql. And it does not lock the table. So there's something in the tool itself I don't know what that is causing the lock.

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

Interesting! I will look a bit more deeply in the next few days and get back

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

So, i am unable to replicate this in our smoke spec env and not seeing any consistent locking the on the parent table. I am curious

  • How did you pin it down to the issue being at INSERT INTO #{shadow_table}(#{insert_into_columns.join(", ")}) SELECT #{select_columns.join(", ")}
  • Also, what version of PG you are on?
  • Lastly, do you have any custom setting set on your RDS instances?

Thanks

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

*also how big the table is roughly how many writes/updates it gets. Thanks!

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

That’s a good find! And makes sense. Since serializable transaction shouldn’t be causing issues. I will get in my refactor after your patch. Let me know if I can help with the patch or anything. Thanks!

curious why it’s only happening on RDS. Will try to look into it later, but I can see how the alter can cause an access exclusive lock and a lock queue.

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

I proposed a simplified change here: #97

Feel free to add to it or open new PRs with any other fixes you found, also if you are able to test/verify, then that'd be great too. Thanks!

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

v.0.9.2 is now out with the fix: https://github.com/shayonj/pg-osc/releases/tag/v0.9.2

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

Closing this, thanks for the report and brainstorming! please feel free to open reports or suggest PRs too. Thanks again!

from pg-osc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.