Giter Site home page Giter Site logo

Comments (6)

joshuataylor avatar joshuataylor commented on August 16, 2024 1

Sounds great, will have a look at the delete+insert

With regards to last point, yes, when using the when matched part is very slow with Snowflake, and using insert only is MUCH faster - atleast from our testing on 50m rows/800m rows.

from dbt-snowflake.

jtcohen6 avatar jtcohen6 commented on August 16, 2024

Hey @joshuataylor, thanks for opening! Have you tried defining an incremental model without a unique_key? That has the effect of running insert into, i.e. "append-only":

{% macro snowflake__get_merge_sql(target, source_sql, unique_key, dest_columns, predicates) -%}
{#
Workaround for Snowflake not being happy with a merge on a constant-false predicate.
When no unique_key is provided, this macro will do a regular insert. If a unique_key
is provided, then this macro will do a proper merge instead.
#}
{%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute='name')) -%}
{%- set sql_header = config.get('sql_header', none) -%}
{%- set dml -%}
{%- if unique_key is none -%}
{{ sql_header if sql_header is not none }}
insert into {{ target }} ({{ dest_cols_csv }})
(
select {{ dest_cols_csv }}
from {{ source_sql }}
)
{%- else -%}

On dbt-spark, we have an explicit incremental strategy named append for this behavior. On most other adapters, this behavior is implicitly supported via one of the existing strategies when unique_key is not defined. Might make a lot of sense to rationalize this behavior across adapters! (cc @VersusFacit)

from dbt-snowflake.

joshuataylor avatar joshuataylor commented on August 16, 2024

Removing the unique_key sounds like a good quickfix, but the WHEN NOT MATCHED will then not insert if the record accidentally is in the resultset.

from dbt-snowflake.

jtcohen6 avatar jtcohen6 commented on August 16, 2024

Ah! Sorry, I misunderstood the original issue — you're totally right.

Just using the merge, but this can take a while.

It sounds like performance is your main motivation here. A couple options to try:

  • You can try using the delete+insert strategy instead. Anecdotally, some users have found this to be faster
  • There's an open PR that proposes to speed up the merge strategy by using a view instead of a temp table for the intermediate results: #93
  • Your proposal in this issue: We could add logic here saying that, if the user has configured merge_update_columns = [] (docs), dbt should totally skip templating the when matched then update set statement. In the meantime, you could try this with just merge_update_columns = ['id'] (or whatever your unique_key column is named), to see if that speeds things up. My hunch is that the row-by-row unique_key match is the more expensive part of the operation, relative to the column-by-column update — but I could be wrong, and even so, every little bit may help.

from dbt-snowflake.

stijn-meersman avatar stijn-meersman commented on August 16, 2024

I have a similar requirement, I want to do a merge to insert records if they do not exist yet, without updating them.
The problem with adding the unique key to the merge_update_columns is that you are now needlessly updating records which is bad for performance.

I think that making sure that when merge_update_columns = [], then no when matched then update set statement is added is the best solution.

from dbt-snowflake.

github-actions avatar github-actions commented on August 16, 2024

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

from dbt-snowflake.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.