Giter Site home page Giter Site logo

Comments (10)

jfrost avatar jfrost commented on August 21, 2024 1

Yep, so the current query looks like:

INSERT INTO #{shadow_table}
SELECT #{columns}
FROM ONLY #{client.table}

A custom SQL would just replace the 2nd and 3rd lines above with something like:

SELECT a.foo,a.bar,a.baz,a.rental_id,r.tenant_id AS tenant_id
FROM ONLY examples a
LEFT OUTER JOIN rentals r
ON a.rental_id = r.id

from pg-osc.

jfrost avatar jfrost commented on August 21, 2024 1

One other reason came to me this morning. Suppose you are splitting up your database geographically or just sharding by some hash of tenant ID. So, you want to replicate to the various new shards, then during a maintenance, you promote all the shards to primaries and then delete the unwanted data. You could write a custom SQL that looks like:

SELECT a.id,a.foo,a.bar,a.baz
FROM ONLY examples a
WHERE a.id IN (1,5,9,13....)

And that would be the equivalent of a DELETE plus a pg_repack but should be much faster.

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

Thats a great idea! I was thinking something like providing a custom sql file like pgbench and that will get replayed.

Do you have an example (even high level is fine). I am curious to understand the before/after state of the table structurally and what might a potential custom backfil query (or set of queries) may look like.

There is also some other interesting stuff we can do by plugging into the triggers and re-using the replay_data. So, trying to understand where does it makes the most sense.

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

I have a few use cases where this would be super nice, so I am also considering support this before launch (as part of v1.0.0) too.

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

ooh! thats very interesting. Nice, I am planning on prioritizing next week (hopefully), stay tuned.

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

Given that copy_data happens before alter, this would only be possible for existing columns, which i think is a fair start (?). We can change the order too, but then we would need to ensure that we automatically skip the data being copied for any dropped rows or copy accordingly for renamed rows

https://github.com/shayonj/pg-online-schema-change/blob/ebbb3168f15fea33258b05c9a44bea4e6bd1c0e8/lib/pg_online_schema_change/orchestrate.rb#L31-L32

Lmk if you have any thoughts.

from pg-osc.

jfrost avatar jfrost commented on August 21, 2024

Hrmm, I hadn't realized this. Wouldn't you want the run_alter_statement to happen before copy_data? Otherwise, you have to rewrite the entire table if you change a column type with the ALTER. If you do it beorehand, then you can just copy the data in.

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

yeah thats a good point, plus wasteful copy for when column is removed. I am whipping up a PR to alter first, allows for other custom stuff too.

from pg-osc.

jfrost avatar jfrost commented on August 21, 2024

BTW, back to the original point: if you are using the custom SQL option, you are already in the danger zone and doing something that will probably destroy some of your data (hopefully on purpose!), so it's ok to have limited functionality.

That said, I think flipping the order for the run_alter_statement and copy_data operations is a good idea.

from pg-osc.

shayonj avatar shayonj commented on August 21, 2024

Yep, I agree all around. I refactored and updated the logic in so that alter happens first, then copy. Will whip up another PR to support a custom option after this (#32)

from pg-osc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.