Giter Site home page Giter Site logo

dbt-vertica's Introduction

dbt-vertica

PyPI version License

dbt adapter for Vertica. The adapter uses vertica-python to connect to your Vertica database.

For more information on using dbt with Vertica, consult the Vertica-Setup and Configuration pages.

dbt-vertica Versions Tested

dbt-vertica has been developed using the following software and versions:

  • Vertica Server 23.4.0-0
  • Python 3.11
  • vertica-python client 1.3.1
  • dbt-core 1.6.0
  • dbt-tests-adapter 1.6.0

Supported Features

dbt Core Features

Below is a table for what features the current Vertica adapter supports for dbt. This is constantly improving and changing as both dbt adds new functionality, as well as the dbt-vertica driver improves. This list is based upon dbt 1.3.0

dbt Core Features Supported
Table Materializations Yes
Ephemeral Materializations Yes
View Materializations Yes
Incremental Materializations - Append Yes
Incremental Materailizations - Merge Yes
Incremental Materializations - Delete+Insert Yes
Incremental Materializations - Insert_Overwrite Yes
Snapshots - Timestamp Yes
Snapshots - Check Cols No
Seeds Yes
Tests Yes
Documentation Yes
External Tables Untested
  • Yes - Supported, and tests pass.
  • No - Not supported or implemented.
  • Untested - May support out of the box, though hasn't been tested.
  • Passes Test - The tests have passed, though haven't tested in a production like environment.

Installation

$ pip install dbt-vertica

You don't need to install dbt separately. Installing dbt-vertica will also install dbt-core and vertica-python.

Sample Profile Configuration

your-profile:
  outputs:
    dev:
      type: vertica # Don't change this!
      host: [hostname]
      port: [port] # or your custom port (optional)
      username: [your username] 
      password: [your password] 
      database: [database name] 
      oauth_access_token: [access token]
      schema: [dbt schema] 
      connection_load_balance: True
      backup_server_node: [list of backup hostnames or IPs]
      retries: [1 or more]
      threads: [1 or more] 
  target: dev

Description of Profile Fields:

Property Description Required? Default Value Example
type The specific adapter to use. Yes None vertica
host The host name or IP address of any active node in the Vertica Server. Yes None 127.0.0.1
port The port to use, default or custom. Yes 5433 5433
username The username to use to connect to the server. Yes None dbadmin
password The password to use for authenticating to the server. Yes None my_password
database The name of the database running on the server. Yes None my_db
oauth_access_token To authenticate via OAuth, provide an OAuth Access Token that authorizes a user to the database. No "" Default: ""
schema The schema to build models into. No None VMart
connection_load_balance A Boolean value that indicates whether the connection can be redirected to a host in the database other than host. No true true
backup_server_node List of hosts to connect to if the primary host specified in the connection (host, port) is unreachable. Each item in the list should be either a host string (using default port 5433) or a (host, port) tuple. A host can be a host name or an IP address. No none ['123.123.123.123','www.abc.com',('123.123.123.124',5433)]
retries The retry times after an unsuccessful connection. No 2 3
threads The number of threads the dbt project will run on. No 1 3
label A session label to identify the connection. No An auto-generated label with format of: dbt_username dbt_dbadmin

For more information on Vertica’s connection properties please refer to Vertica-Python Connection Properties.

Changelog

See the changelog

Contributing guidelines

Have a bug or an idea? Please see CONTRIBUTING.md for details

Develop

Run a local Vertica instance like:

docker run -p 5433:5433 \
           -p 5444:5444 \
           -e VERTICA_DB_NAME=docker \
           -e VMART_ETL_SCRIPT="" \
           -e VMART_ETL_SQL="" \
           vertica/vertica-ce

Access the local Vertica instance like:

docker exec -it <docker_image_name> /opt/vertica/bin/vsql

You need the pytest dbt adapter:

pip3 install  dbt-tests-adapter==1.5.0

Run tests via:

pytest tests/functional/adapter/
# run an individual test 
pytest tests/functional/adapter/test_basic.py

dbt-vertica's People

Contributors

ajay-abrol2 avatar andyreagan avatar arosychuk avatar barloc avatar mpcarter avatar nrodriguezmicrofocus avatar perttus avatar sachin-thakur avatar sitingren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbt-vertica's Issues

Upcoming "dbt clone" implementation should use Vertica's copy_table function instead of CTAS

As Vertica natively supports zero-copy cloning (via COPY_TABLE function), it should be used for implementing "dbt clone". This is also stated in dbt documentation: "If your data platform supports zero-copy cloning of tables, and this model exists as a table in the source environment, dbt will create it in your target environment as a clone".

Current CTAS: https://github.com/vertica/dbt-vertica/blob/master/dbt/include/vertica/macros/materializations/clone.sql#L11-L13

Reference:
Dbt docs: https://docs.getdbt.com/reference/commands/clone
Vertica docs: https://docs.vertica.com/12.0.x/en/sql-reference/functions/management-functions/table-functions/copy-table/

upgrade to support dbt-core v1.3.0

Background

The latest release cut for 1.3.0, dbt-core==1.3.0rc2 was published on October 3, 2022 (PyPI | Github). We are targeting releasing the official cut of 1.3.0 in time for the week of October 16 (in time for Coalesce conference).

We're trying to establish a following precedent w.r.t. minor versions:
Partner adapter maintainers release their adapter's minor version within four weeks of the initial RC being released. Given the delay on our side in notifying you, we'd like to set a target date of November 7 (four weeks from today) for maintainers to release their minor version

Timeframe Date (intended) Date (Actual) Event
D - 3 weeks Sep 21 Oct 10 dbt Labs informs maintainers of upcoming minor release
D - 2 weeks Sep 28 Sep 28 core 1.3 RC is released
Day D October 12 Oct 12 core 1.3 official is published
D + 2 weeks October 26 Nov 7 dbt-adapter 1.3 is published

How to upgrade

dbt-labs/dbt-core#6011 is an open discussion with more detailed information, and dbt-labs/dbt-core#6040 is for keeping track of the community's progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

  • Python Models (if applicable)
  • Incremental Materialization: cleanup and standardization
  • More functional adapter tests to inherit

Update pypi 1.4.4 package

For some reason pypi package (1.3.0) has not been updated to match the latest 1.4.4 release. Could it be triggered manually?

upgrade to support dbt-core v1.2.0

We've just published the release cut of dbt-core 1.2.0, dbt-core 1.2.0rc1 (PyPI | GitHub release notes).

dbt-labs/dbt-core#5468 is an open discussion with more detailed information, and dbt-labs/dbt-core#5474 is for keeping track of the communities progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

  • migrate necessary cross-db macros into adapter and ensure they're tested accordingly
  • remove any copy-and-pasted materialization (if your adapter inherits from another adapter)
  • add new basic tests BaseDocsGenerate and BaseDocsGenReferences
  • consider checking and testing support for Python 3.10

dbt-labs/dbt-core#5432 might make it into the second release cut in the next week, in which case, you'll also might want to:

  • implement method and tests for connection retry logic

upgrade to support dbt-core v1.4.0

Background

The latest version of dbt Core,dbt-core==1.4.0, was published on January 25, 2023 (PyPI | Github). In fact, a patch, dbt-core==1.4.1 (PyPI | Github), was also released on the same day.

How to upgrade

dbt-labs/dbt-core#6624 is an open discussion with more detailed information. If you have questions, please put them there! dbt-labs/dbt-core#6849 is for keeping track of the community's progress on releasing 1.4.0

The above linked guide has more information, but below is a high-level checklist of work that would enable a successful 1.4.0 release of your adapter.

  • support Python 3.11 (only if your adapter's dependencies allow)
  • Consolidate timestamp functions & macros
  • Replace deprecated exception functions
  • Add support for more tests

the next minor release: 1.5.0

FYI, dbt-core==1.5.0 is expected to be released at the end of April. Please plan on allocating a more effort to upgrade support compared to previous minor versions. Expect to hear more in the middle of April.

At a high-level expect much greater adapter test coverage (a very good thing!), and some likely heaving renaming and restructuring as the API-ification of dbt-core is now well underway. See https://github.com/dbt-labs/dbt-core/milestone/82 for more information.

insert+delete for incremental

Hello
In 0.18 version we use insert+delete strategy for incremental models.
In new release (1.0+) is only merge strategy, which is very slow (40+ minute for merge vs 60 sec for i+d).
Please, return old strategy :)

upgrade to support dbt-core v1.7.0

Background

Minor version v1.7 is targeted for final release on Ocotber 26, 2023. As a maintainer of a dbt adapter, we strongly encourage you to release a corresponding minor version increment to ensure users of your adapter can make use of this new minor version.

How to upgrade

dbt-labs/dbt-core#8307 is an open discussion with more detailed information. If you have questions, please put them there!

upgrade to support dbt-core v1.2.0

We've just published the release cut of dbt-core 1.2.0, dbt-core 1.2.0rc1 (PyPI | GitHub release notes).

dbt-labs/dbt-core#5468 is an open discussion with more detailed information, and dbt-labs/dbt-core#5474 is for keeping track of the communities progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

  • migrate necessary cross-db macros into adapter and ensure they're tested accordingly
  • remove any copy-and-pasted materialization (if your adapter inherits from another adapter)
  • add new basic tests BaseDocsGenerate and BaseDocsGenReferences
  • consider checking and testing support for Python 3.10

dbt-labs/dbt-core#5432 might make it into the second release cut in the next week, in which case, you'll also might want to:

  • implement method and tests for connection retry logic

seed.sql

Hi Matthew,
I am running dbt-vertica 15.0 (I need dbt 0.17.2) but I got an error at the "dbt seed" step :

"dbt found two macros with the name "basic_load_csv_rows" in the namespace "dbt". Since these macros have the same name and exist in the same namespace, dbt will be unable to decide which to call. To fix this, change the name of one of these macros:

  • macro.dbt_vertica.basic_load_csv_rows (macros/materializations/seed.sql)
  • macro.dbt_vertica.basic_load_csv_rows (macros/materializations/seed.sql)"

Could you please check the reason of that?
Thanks a lot for your adapter for dbt!
Kind regards,
Tanya
image

upgrade to support dbt-core v1.8.0

Background

Minor version v1.8 is targeted for final release within dbt Core on May 9, 2024.

As a maintainer of a dbt adapter, we strongly encourage you to release a corresponding minor version increment to ensure users of your adapter can make use of this new minor version.

As of dbt-core v1.8.0, we no longer need to encourage you to release a new minor version anytime we do. After following the linked upgrade guide, we guarantee your adapter will be forward compatible with all future minor versions of dbt-core (at least until v2.0 which is not yet planned).

Another major win: you can now make your adapter truly SemVer compliant, as you can release new versions of your adapter without needing to wait for a new dbt-core release. You can actually follow

How to upgrade

dbt-labs/dbt-core#9798 is an open discussion with more detailed information. If you have questions, please put them there!

BUG: after create partitioned table there are partitions with null in name

upgrade to support dbt-core v1.2.0

We've just published the release cut of dbt-core 1.2.0, dbt-core 1.2.0rc1 (PyPI | GitHub release notes).

dbt-labs/dbt-core#5468 is an open discussion with more detailed information, and dbt-labs/dbt-core#5474 is for keeping track of the communities progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

  • migrate necessary cross-db macros into adapter and ensure they're tested accordingly
  • remove any copy-and-pasted materialization (if your adapter inherits from another adapter)
  • add new basic tests BaseDocsGenerate and BaseDocsGenReferences
  • consider checking and testing support for Python 3.10

dbt-labs/dbt-core#5432 might make it into the second release cut in the next week, in which case, you'll also might want to:

  • implement method and tests for connection retry logic

upgrade to support dbt-core v1.3.0

Background

The latest release cut for 1.3.0, dbt-core==1.3.0rc2 was published on October 3, 2022 (PyPI | Github). We are targeting releasing the official cut of 1.3.0 in time for the week of October 16 (in time for Coalesce conference).

We're trying to establish a following precedent w.r.t. minor versions:
Partner adapter maintainers release their adapter's minor version within four weeks of the initial RC being released. Given the delay on our side in notifying you, we'd like to set a target date of November 7 (four weeks from today) for maintainers to release their minor version

Timeframe Date (intended) Date (Actual) Event
D - 3 weeks Sep 21 Oct 10 dbt Labs informs maintainers of upcoming minor release
D - 2 weeks Sep 28 Sep 28 core 1.3 RC is released
Day D October 12 Oct 12 core 1.3 official is published
D + 2 weeks October 26 Nov 7 dbt-adapter 1.3 is published

How to upgrade

dbt-labs/dbt-core#6011 is an open discussion with more detailed information, and dbt-labs/dbt-core#6040 is for keeping track of the community's progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

  • Python Models (if applicable)
  • Incremental Materialization: cleanup and standardization
  • More functional adapter tests to inherit

Broken pipe error

Hey Matthew! I am having an issue with the dbt-vertica adapter. If I have a dbt model that runs longer than ~5 minutes, it fails with a broken pipe error:
Unhandled error while executing model.embold_prototype.ssm_ccm_member_months
[Errno 32] Broken pipe

I have tried changing adding timeout values in the profiles.yml file but nothing works.
The errors that we are seeing on the Vertica server have to do with load balancing.

Provide Examples with VMart Data

With the provided VMart data from the Vertica CE image, provide a full, working example of how the dbt-vertica adapter works, creating a real world model, with the features that are used and provided.

Examples:

  • Table Materializations
  • Ephemeral Materializations
  • View Materializations
  • Incremental Materializations
  • Snapshots
  • Tests
  • Sources
  • Seeds
  • Documentation

add adapter to dbt docs's "Available Adapters" page

The Available Adapters page is one of the dbt community's most-visited docs pages. It would be of great benefit for first-time visitors to the dbt docs to see:

  1. that this adapter is a possible option for using dbt-core, and
  2. how many large the dbt ecosystem of support databases is.

dbt-labs/docs.getdbt.com#1489 exists to address this with all as-of-yet undocumented adapters.

We just released Documenting a new adapter, a new guide on how to add an adapter to the Available Adapters page. I'd love to see this adapter on that page, so feel free to reach out with any questions/blockers by either replying to this issue, or posting in the #adapter-ecosystem channel of the dbt Community Slack.

Looking forward to the contribution!

On Incremental model schema upgrade

Hi All,

We are using the incremental model in one of the applications. We are already in production, due to some business requirements we have allocated model member attributes with VARCHAR(19) and now changed the same to VARCHAR(256). This works fine in the fresh deployment but fails in the upgrade time.

Due to customer data, we can't execute the "dbt run --full-refresh " as the customer will lose the data. Please suggest to us how to handle this in upgrade time without the help of dbt flag i.e. --full-refresh.

Dbt command:
Below the command, we run always for the scheduled job once a day
dbt run -m <folder_name> --vars "{'product_tag':'<tag_name>','product_name':'<product_name>','lite_mau_max_limit':20}"

Thanks & Best Regards
-Santosh

MERGE Strategy - Unique Key vs Merge Columns

The current merge strategy has a problem.

It uses a variable called merge_columns to build the join in the merge, which upon later inspection of other adapters, this is done with the unique_key, and the merge_columns is the columns to actually be updated (while the rest are not).

My naïve approach when I added this functionality is creating this issue, and it should be refactored, as a breaking change.

  1. Use the unique_key for the source and destination join
  2. Support the merge_columns to only merge the columns specified

In some cases, the unique_key could be an list instead of a string, so that would also need to be supported.

Update collect_freshness

There's a deprecation warning when running dbt source freshness:

The 'collect_freshness' macro signature has changed to return the full query
result, rather than just a table of values. See the v1.5 migration guide for
details on how to update your custom macro:
https://docs.getdbt.com/guides/migration/versions/upgrading-to-v1.5

So we need to simply remove the .table from the line

{{ return(load_result('collect_freshness').table) }}

Running dbt snapshot succeeds the first time and fails in the subsequent run to the dbt snapshot

Hi Matthew,
I am using DBT and dbt snapshot with timestamp strategy within one model and facing the issue. Running dbt snapshot succeeds the first time and fails in the subsequent run to the dbt snapshot.

This issue may be related to the dbt-vertica adapter. It is reporting the below error.

Severity: ERROR, Message: Syntax error at or near ")", Sqlstate: 42601, Position: 649, Routine: base_yyerror, File: /data/qb_workspaces/jenkins2/ReleaseBuilds/Grader/REL-9_2_1-x_grader/build/vertica/Parser/scan.l, Line: 1043, Error Code: 4856, SQL: '/* {"app": "dbt", "dbt_version": "0.18.1", "profile_name": "default", "target_name": "dev", "node_id": "snapshot.my_new_project.user_snap"} */ merge into "investigate"."snapshots_dev"."user_snap" as DBT_INTERNAL_DEST using "user_snap__dbt_tmp" as DBT_INTERNAL_SOURCE on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id when matched and DBT_INTERNAL_DEST.dbt_valid_to is null and DBT_INTERNAL_SOURCE.dbt_change_type = 'update' then update set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to when not matched and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert' then insert () values () ; '
compiled SQL at target\run\my_new_project\snapshots\catalog_information\user_snapshot.sql

Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2

Further Analysis
After analyzing further, I have noticed the following issue in the merge query.
• Insert and value clause are empty
• The temp table is not created in Vertica database i.e user_snap__dbt_tmp

Environment
Running with dbt=0.18.1
dbt version: 0.18.1
python version: 3.8.5

Generated DDL source code in Vertica
CREATE TABLE snapshots.user_snap
(
...
dbt_scd_id varchar(32),
dbt_updated_at timestamptz,
dbt_valid_from timestamptz,
dbt_valid_to timestamptz
);

Model Code Snippet:

{% snapshot user_snap %}

{{
config(
target_database='investigate',
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='change_date'
)
}}
SELECT
....
change_date::timestamptz,
changed_by::varchar(255),
hist::boolean,
original_create_date::timestamptz,
create_date::timestamptz,
delete_date::timestamptz,
ts::timestamptz AS entity_begin_effective_time,
NULL::timestamptz AS entity_end_effective_time,
'identity'::VARCHAR(255) as entity_class_type
FROM
{{ source('public', "user_metric_saas")}} su
{% if is_incremental() %}
WHERE ts > {{get_last_execution_time('suser')}}
{% endif %}
{% endsnapshot %}

Need your help in further analyzing the issue.

Tests for 1.1+ version

Hello
From version 1.1 dbt-core moved to other test methods for adapters (more info here).
So what do you think to do with it?

Creating a schema includes the provided schema name as a prefix.

When using the config to set the schema, the database name gets set as a prefix.

Unsure if this is intended or not. From the below code, I'd expect the a new schema named staging not vmart_staging.

profiles.yml

vmart_examples:
  target: dev
  outputs:
    dev:
      type: vertica
      host: localhost
      port: 5433
      username: dbadmin
      password: ''
      database: vmart
      schema: vmart
      threads: 2

stg_online_page_dimension.sql

{{
  config(
    schema='staging',
    materialized='table',
    tags=['staging']
  )
}}

with source as
(
    select 
      online_page_key as online_page_id,
      start_date as start_at,
      end_date as end_date,
      page_number as page_number,
      page_description as page_description,
      page_type as page_type
    from {{ source('online_sales','online_page_dimension') }}
)
select * 
from source

FEATURE: freshness

There is no freshness in the adapter but default macros works fine.
Please, enable it.

upgrade to support dbt-core v1.5.0

Background

The latest version of dbt Core,dbt-core==1.5.0rc1, was published on April 13, 2023 (PyPI | Github).

How to upgrade

dbt-labs/dbt-core#7213 is an open discussion with more detailed information. If you have questions, please put them there!

The above linked guide has more information, but below is a high-level checklist of work that would enable a successful 1.5.0 release of your adapter.

  • Add support Python 3.11 (if you haven't already)
  • Add support for relevant tests (there's a lot of new ones!)
  • Add support model contracts
  • Add support for materialized views (this likely will be bumped to 1.6.0)

the next minor release: 1.6.0

FYI, dbt-core==1.6.0 is expected to be released at the end of July, with a release cut at least two weeks prior.

Bug with handling of the multistatement queries

Hello
We found bug in old version (0.19) and it is in newest versions too.
vertica-python library doesn't handle errors in second(and later) query in the multi statement query.
So if we have incremental model with pipeline: create temp result table, delete unique keys from target table, insert from temp table to result - if there are errors on step 2 or 3 - DBT decides that all is correct, but data doesn't insert to table.
More info on link: https://github.com/vertica/vertica-python#nextset
In version 0.19 I fixed it with overwriting core methods: chapsuk#4 (ugly code, sorry :()

BUG: partitions for incremental tables

Without fix from #74 adapter skips that error.
But with fix there is error - when incremental table with partitions updated then adds alter for temp table and it's not correct.
Error: Partitioning not supported for temporary tables

See

{% do run_query(vertica__create_table_as(True, tmp_relation, sql)) %}
where create temp table and adds alter

project setup issue

Hey

Trying to set up a project on Ubuntu 20.04.
While initializing the project I get No adapters available

cd vertica
virtualenv venv
source ./venv/bin/activate
pip install dbt-vertica

dbt init research 
Running with dbt=1.3.1
No adapters available. Go to https://docs.getdbt.com/docs/available-adapters


pip --version
pip 22.3 from /home/.../venv/lib/python3.8/site-packages/pip (python 3.8)

Any suggestions ?

Is adapter died?

Hello.
There are some critical bugs, and some pool requests with fixes, but nobody care about it :(
Microfocus kills opensource adapter?
May be return it to the open source?
How you think, @andyreagan @mpcarter?

Unable to connect to Vertica wth SSL. Is there a way to provide this as input in profile.

I am connecting to Vertica instance with SSL and it is failing with below error

Authentication failed for username "user1", Sqlstate: 28000, Routine: auth_failed, File: /data/qb_workspaces/jenkins2/ReleaseBuilds/Hammermill/REL-10_0_1-x_hammermill/build/vertica/Basics/ClientAuthentication.cpp, Line: 1012, Error Code: 2248

I tried adding ssl: true in profile but did not help.

Same works with vertica_python after setting ssl: True in connection info.

upgrade to support dbt-core v1.6.0

Background

Minor version v1.6 is targeted for final release on July 27, 2023. As a maintainer of a dbt adapter, we strongly encourage you to release a corresponding minor version increment to ensure users of your adapter can make use of this new minor version.

How to upgrade

dbt-labs/dbt-core#7958 is an open discussion with more detailed information. If you have questions, please put them there!

The above linked guide has more information, but below is a high-level checklist of work that would enable a successful 1.6.0 release of your adapter:

### Tasks
- [ ] SUPPORT: materialized views
- [ ] SUPPORT: new `clone` command
- [ ] BEHIND THE SCENES: Drop support for Python 3.7 (if you haven't already)
- [ ] BEHIND THE SCENES: new arg for `adapter.execute()`
- [ ] BEHIND THE SCENES: ensure support for revamped `dbt debug``
- [ ] BEHIND THE SCENES: Add support for new/modified relevant tests

the next minor release: 1.7.0

FYI, dbt-core==1.7.0 is expected to be released on October 12, 2023 in time for Coalesce, the annual analytics engineering conference!

BUG: don't handle status in multi queries models

Hello.
I did fix for old version dbt: fix and other in this file
If dbt sends multiquery to the vertica with ';' separator then dbt adapter doesn't check result for second...any query except first query.
So dbt reports thats it's ok, but the table is broken (not updated...).
For more see doc for the vertica_python -> Why does my query not throw an error?

BUG: partition for incremental tables

Hello.
When create incremental table I get such error:
Severity: ROLLBACK, Message: Partitioning not supported for temporary tables, Sqlstate: 0A000, Routine: alterTablePartition, File: /data/qb_workspaces/jenkins2/ReleaseBuilds/Hammermill/REL-10_1_1-x_hammermill/build/vertica/Commands/DDL.cpp, Line: 23522, Error Code: 4336, SQL: '/* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "vertica-dbt", "target_name": "dev", "node_id": "model._._requests"} */ create local temporary table "_requests__dbt_tmp" on commit preserve rows INCLUDE SCHEMA PRIVILEGES as ( select date_trunc(\'HOUR\', br.time) as balance_date , br.user_id as user_id , br.account_id as account_id , count(br.id) as dep_try_n from "docker"."public"."_requests" br where (date(br.time) in (\'1984-03-03\'::date ) ) group by 1,2,3 ) order by balance_date, user_id, account_id segmented BY hash(user_id) ALL NODES ; alter table "product_dtm_hourly_billing_requests__dbt_tmp" partition BY EXTRACT(YEAR FROM balance_date)*10000 + EXTRACT(MONTH FROM balance_date)*100 ;

Dbt generated model tables's Data Archiving

Requirements
We are using the dbt vertica in one of our products. We require data archiving to S3 as a parquet file. We require to archive data for 3 years before purging permanently for the database.

We are required to archive the older than one-year data records and have only last year's data in the model tables for our reporting purposes.
Solution
In most cases, we are using incremental materialization and data is growing hugely. We on investigating & looking for an elegant solution to archive the data in place of deleting and purging.

Our Approach
We would like to achieve model table data using the following available features of Vertica below.

  1. MOVE_PARTITIONS_TO_TABLE
  2. EXPORT TO PARQUET
    • Is it a good idea to alter the DBT model table using any other script to bring the partition?
    • Is it a good idea to use any other script to shift the data records from the model table to a temp table using Vertica function MOVE_PARTITIONS_TO_TABLE and then using “EXPORT TO PARQUET” for data record archiving from the temp table to S3?

It would be great help if you suggest a better solution to us.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.