Describe the feature With Snowflake, we have a table that should b

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

[CT-453] Add support for WHEN NOT MATCHED, INSERT incremental update option for Snowflake,about dbt-labs/dbt-snowflake

joshuataylor commented on August 16, 2024 1

Sounds great, will have a look at the delete+insert

With regards to last point, yes, when using the when matched part is very slow with Snowflake, and using insert only is MUCH faster - atleast from our testing on 50m rows/800m rows.

from dbt-snowflake.

jtcohen6 commented on August 16, 2024

Hey @joshuataylor, thanks for opening! Have you tried defining an incremental model without a unique_key? That has the effect of running insert into, i.e. "append-only":

dbt-snowflake/dbt/include/snowflake/macros/materializations/merge.sql

Lines 1 to 23 in 184eaeb

    
           {% macro snowflake__get_merge_sql(target, source_sql, unique_key, dest_columns, predicates) -%} 
        
               {# 
        
                  Workaround for Snowflake not being happy with a merge on a constant-false predicate. 
        
                  When no unique_key is provided, this macro will do a regular insert. If a unique_key 
        
                  is provided, then this macro will do a proper merge instead. 
        
               #} 
        
               {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute='name')) -%} 
        
               {%- set sql_header = config.get('sql_header', none) -%} 
        
               {%- set dml -%} 
        
               {%- if unique_key is none -%} 
        
                   {{ sql_header if sql_header is not none }} 
        
                   insert into {{ target }} ({{ dest_cols_csv }}) 
        
                   ( 
        
                       select {{ dest_cols_csv }} 
        
                       from {{ source_sql }} 
        
                   ) 
        
               {%- else -%}

On dbt-spark, we have an explicit incremental strategy named append for this behavior. On most other adapters, this behavior is implicitly supported via one of the existing strategies when unique_key is not defined. Might make a lot of sense to rationalize this behavior across adapters! (cc @VersusFacit)

from dbt-snowflake.

joshuataylor commented on August 16, 2024

Removing the unique_key sounds like a good quickfix, but the WHEN NOT MATCHED will then not insert if the record accidentally is in the resultset.

from dbt-snowflake.

jtcohen6 commented on August 16, 2024

Ah! Sorry, I misunderstood the original issue — you're totally right.

Just using the merge, but this can take a while.

It sounds like performance is your main motivation here. A couple options to try:

You can try using the delete+insert strategy instead. Anecdotally, some users have found this to be faster
There's an open PR that proposes to speed up the merge strategy by using a view instead of a temp table for the intermediate results: #93
Your proposal in this issue: We could add logic here saying that, if the user has configured merge_update_columns = [] (docs), dbt should totally skip templating the when matched then update set statement. In the meantime, you could try this with just merge_update_columns = ['id'] (or whatever your unique_key column is named), to see if that speeds things up. My hunch is that the row-by-row unique_key match is the more expensive part of the operation, relative to the column-by-column update — but I could be wrong, and even so, every little bit may help.

from dbt-snowflake.

stijn-meersman commented on August 16, 2024

I have a similar requirement, I want to do a merge to insert records if they do not exist yet, without updating them.
The problem with adding the unique key to the merge_update_columns is that you are now needlessly updating records which is bad for performance.

I think that making sure that when merge_update_columns = [], then no when matched then update set statement is added is the best solution.

from dbt-snowflake.

github-actions commented on August 16, 2024

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

from dbt-snowflake.

[CT-453] Add support for WHEN NOT MATCHED, INSERT incremental update option for Snowflake about dbt-snowflake HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	{% macro snowflake__get_merge_sql(target, source_sql, unique_key, dest_columns, predicates) -%}

	{#
	Workaround for Snowflake not being happy with a merge on a constant-false predicate.
	When no unique_key is provided, this macro will do a regular insert. If a unique_key
	is provided, then this macro will do a proper merge instead.
	#}

	{%- set dest_cols_csv = get_quoted_csv(dest_columns \| map(attribute='name')) -%}
	{%- set sql_header = config.get('sql_header', none) -%}

	{%- set dml -%}
	{%- if unique_key is none -%}

	{{ sql_header if sql_header is not none }}

	insert into {{ target }} ({{ dest_cols_csv }})
	(
	select {{ dest_cols_csv }}
	from {{ source_sql }}
	)

	{%- else -%}