Giter Site home page Giter Site logo

dbt-risingwave's Introduction

dbt-risingwave

A RisingWave adapter plugin for dbt.

RisingWave is a cloud-native streaming database that uses SQL as the interface language. It is designed to reduce the complexity and cost of building real-time applications. https://www.risingwave.com

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. Use dbt for data transformations in RisingWave

Getting started

The package has not been published to PyPI, please install it via git.

  1. Install dbt-risingwave
python3 -m pip install dbt-risingwave
  1. Get RisingWave running

Please follow this guide to setup a functional RisingWave instance.

  1. Configure dbt profile file

The profile file is located in ~/.dbt/profiles.yml. Here's an example of how to use it with RisingWave.

default:
  outputs:
    dev:
      type: risingwave
      host: 127.0.0.1
      user: root
      pass: ""
      dbname: dev
      port: 4566
      schema: public
  target: dev
  1. Run dbt debug to check whether configuration is correct.

Models

The dbt models for managing data transformations in RisingWave is similar to typical dbt sql models. The main differences are the materializations. We customized the materializations to fit the data processing model of RisingWave.

Materializations INFO
materialized_view Create a materialized view. This materialization is corresponding to the incremental one in dbt. To use this materialization, add {{ config(materialized='materialized_view') }} to your model SQL files.
materializedview (Deprecated) only for backward compatibility, use materialized_view instead
ephemeral This materialization uses common table expressions in RisingWave under the hood. To use this materialization, add {{ config(materialized='ephemeral') }} to your model SQL files.
table Create a table. To use this materialization, add {{ config(materialized='table') }} to your model SQL files.
view Create a view. To use this materialization, add {{ config(materialized='view') }} to your model SQL files.
incremental Use materialized_view instead if possible, since RisingWave is designed to use materialized view to manage data transformation in an incremental way. From v1.7.3, dbt-risingwave support incremental model to give users better control of when to update their model. This model will update table in a batch way incrementally.
source Define a source {{ config(materialized='source') }}. You need to provide your create source statement as a whole in this model.
table_with_connector Define a table with a connector {{ config(materialized='table_with_connector') }}. You need to provide your create table with connector statement as a whole in this model. Because dbt table has its own semantics, RisingWave use table_with_connector to distinguish itself from it.
sink Define a sink {{ config(materialized='sink') }}. You need to provide your create sink statement as a whole in this model.

To learn how to use, you can check RisingWave offical example dbt_rw_nexmark.

DBT RUN behavior

  • dbt run: only create new models (if not exists) without dropping any models.
  • dbt run --full-refresh: drop models and create the new ones. This command can make sure your streaming pipelines are consistent with what you define in dbt models.

Graph operators

Graph operators is useful when you want to only recreate a subset of your models.

dbt run --select "my_model+"         # select my_model and all children
dbt run --select "+my_model"         # select my_model and all parents
dbt run --select "+my_model+"         # select my_model, and all of its parents and children

Tests

All items below have been tested against the the latest RisingWave daily build verison.

dbt-risingwave's People

Contributors

algosday avatar chenzl25 avatar mattiasmts avatar neverchanje avatar stdrc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

siddyp northvolt

dbt-risingwave's Issues

risingwave's driver always maps varchar to varchar(0)


{% macro risingwave__get_columns_in_relation(relation) -%}
  {% call statement('get_columns_in_relation', fetch_result=True) %}
      select
          column_name,
          data_type,
          0 as character_maximum_length, -- todo
          0 as numeric_precision,
          0 as numeric_scale

      from {{ relation.information_schema('columns') }}
      where table_name = '{{ relation.identifier }}'
        {% if relation.schema %}
        and table_schema = '{{ relation.schema }}'
        {% endif %}
      order by ordinal_position

  {% endcall %}
  {% set table = load_result('get_columns_in_relation').table %}
  {{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}

It's because that in the postgres driver where risingwave's driver derives, character_maximum_length is a required param.

Support dbt Core 1.7

I would like to use a new feature in dbt Core 1.7. Unfortunately the connector is only compatible with dbt 1.6:

install_requires=["dbt-postgres~=1.6.0"],

Would be nice if the support for dbt 1.7 can be added.

failed to run the scaffold project generated by `dbt init`

Reproduction step:

  1. dbt init
  2. dbt run
dev=> CREATE VIEW "dev"."public"."my_first_dbt_model" AS (
dev(>     WITH source_data AS (
dev(>         SELECT
dev(>             1 AS id
dev(>         UNION ALL
dev(>         SELECT
dev(>             NULL AS id
dev(>     )
dev(>     SELECT
dev(>         *
dev(>     FROM
dev(>         source_data
dev(> );

ERROR:  QueryError: Invalid input syntax: UNION types INT32 of column id is different from types VARCHAR of column id

Essentially, it's because RW is unable to run the following query

select 1 as id union all select null as id;

The query compiler incorrectly recognizes null as varchar.

unable to run `dbt_utils.get_relations_by_pattern`

dbt_utils.get_relations_by_pattern is a commonly used function that can list the relations that match the pattern.

https://github.com/dbt-labs/dbt-utils/blob/6ba7b660b1d1b4e3b41cb0cf6c3c0e4a70ae54e4/macros/sql/get_tables_by_pattern_sql.sql#L6C10-L6C44

SELECT
    DISTINCT table_schema AS "table_schema",
    table_name AS "table_name",
    CASE
        table_type
        WHEN 'BASE TABLE' THEN 'table'
        WHEN 'FOREIGN' THEN 'external'
        WHEN 'MATERIALIZED VIEW' THEN 'materializedview'
        ELSE LOWER(table_type)
    END AS "table_type"
FROM
    dev.information_schema.tables
WHERE
    table_schema ILIKE 'public'
    AND table_name ILIKE 'test_table%'
    AND table_name NOT ILIKE ''

Now the issue is that ILIKE is unsupported by RisingWave.

Discussion: Support incremental materialization

Currently, dbt-risingwave doesn't support incremental materialization and recommends users use materialized_view instead, because RisingWave could maintain materialized views incrementally in real-time. However, dbt itself is well known for the incremental materialization, many users know how incremental works and how to use it. When they fully understand the trade-off between real-time stream processing and incremental batch processing, you want to use incremental materialization to optimize their workload. E.g. they can run dbt for incremental models every midnight.

feat: Rerun only files which changed

From user:

About the dbt plugin, looks like it’s working pretty well but there is a major problem with it.
It always recreate all the models. So even if I changed just one file the run itself is very slow.
Do you think it’s possible to change this? So files which hasn’t changed since the last run will be skipped

Support persisting documentation

RisingWave supports COMMENT ON syntax, as documented here: https://docs.risingwave.com/docs/current/sql-comment-on/

However, if I set the persist_docs config like this, as documented at https://docs.getdbt.com/reference/resource-configs/persist_docs:

    +persist_docs:
      relation: true
      columns: true

Then dbt will fail with an error like this:

10:11:19  SQL status: CREATE_MATERIALIZED_VIEW in 8.0 seconds
10:11:19  Using risingwave connection "model.tt_rw_staging.stg_my_view.v1"
10:11:19  On model.tt_rw_staging.stg_my_view.v1: /* {"app": "dbt", "dbt_version": "1.6.9", "profile_name": "thumbtack_rw", "target_name": "local", "node_id": "model.tt_rw_staging.stg_my_view.v1"} */



  comment on materialized_view "thumbtack"."dbt_jamesjohnston_stg_business"."stg_my_view_v1" is $dbt_comment_literal_block$This is the description of my table
$dbt_comment_literal_block$;


10:11:19  Postgres adapter: Postgres error: Failed to run the query

Caused by:
  sql parser error: Expected comment object_type, found: materialized_view at line:5, column:31
Near "
  comment on materialized_view"

Obviously this is invalid SQL.... I'm guessing the feature is not supported yet.

Definitely a low-priority nice-to-have for me, since I'll also be looking at using the dbt documentation generator. But in the future it would be nice if anyone directly browsing the database can have these descriptions close at hand.

Materialized views are fully refreshed

From the dbt docs a materialized view and a incremental should only fully refresh if it does not already exist or if you supply the ´--full-refresh´ parameter.

The risingwave docs says that it implements dbt's incremental mode i.e. https://docs.getdbt.com/docs/build/incremental-models but I'm not certain that it is implemented correctly.

In dbt-risingwave every time I run dbt run, all materialized views {{ config(materialized='materializedview') }} are dropped and re-created which is not expected when reading the official dbt documentation.

disallow creating table model

We currently don't support CTAS (CREATE TABLE AS). We should provide an error to the user and ensure that it's not compilable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.