Giter Site home page Giter Site logo

Comments (16)

adaminsta avatar adaminsta commented on July 29, 2024 1

Our idea is to use this tool to automatically create the resources and tables/models needed with CI/CD, based on the contract.

Getting the dbt schema is very useful. To make it even more powerful:

  1. Generate a dbt source with the model name as table, based on the snowflake database and schema
    e.g:
models:
  - name: orders    
    config:
      materialized: table
      contract:
        enforced: true
    columns:
      - name: order_id
        data_type: text
        constraints:
          - type: not_null
          - type: unique
[...]
servers:
  endpoint:
    type: snowflake
    account: xx0000
    database: MY_DB
    schema: MY_SCHEMA

provides

sources:
  - name: contract_name
    database: my_db  
    schema: my_schema  
    tables:
      - name: orders
  1. Generate a staging model
    In our automation process we'd also like to push a PR to our DBT repo in order to remove the bottle of the data engineer.
    Therefore, it would be useful to also generate a staging model that selects all columns from the source.
{{ config(
    materialized="table"
) }}

select 
    order_id
    [...]
from {{ source('contract_name', 'orders')}}

from datacontract-cli.

simonharrer avatar simonharrer commented on July 29, 2024 1

Thanks for sharing your ideas. I think your feature ideas should be included in the tool. Your examples help to drive this! Thank you.

from datacontract-cli.

pluttgens avatar pluttgens commented on July 29, 2024 1

Hello everyone, stumbled upon this issue as I have the same need as @adaminsta as far as I understand.

Would it make sense that the generated source includes more info from the contract, such as the column names and metadata ? That would greatly reduce the documentation work and have the information propagated to dbt generated docs as well.

I'd rather have it generate too much (source yaml, staging model yaml and sql selecting all the fields as @adaminsta suggested) and remove what I do not actually need than having to add what I'm missing.

from datacontract-cli.

simonharrer avatar simonharrer commented on July 29, 2024 1

@adaminsta and @pluttgens would love your feedback on the two features that we just implemented.

from datacontract-cli.

adaminsta avatar adaminsta commented on July 29, 2024

What is the status on DBT export? @simonharrer. My team use DBT+Snowflake are looking into a few alternatives for data contract implementations like this cli, paypal etc

from datacontract-cli.

simonharrer avatar simonharrer commented on July 29, 2024

@adaminsta I've started with #38 and want to release a first version of the dbt export on Friday afternoon.

Would love your thoughts and ideas on the dbt export. Any expectations already in your mind?

from datacontract-cli.

adaminsta avatar adaminsta commented on July 29, 2024

We want to build a dbt model (in Snowflake) on top of the contract. From the datacontract we are creating a dbt source, dbt schema (with an owner, contract, etc), and the staging model.

Can elaborate more later.

from datacontract-cli.

simonharrer avatar simonharrer commented on July 29, 2024

We currently have the following export behaviour implemented in the draft PR:

datacontract.yaml

dataContractSpecification: 0.9.2
id: orders-unit-test
info:
  title: Orders Unit Test
  version: 1.0.0
models:
  orders:
    fields:
      order_id:
        type: varchar
        unique: true
        required: true
      order_total:
        type: bigint
        required: true

datacontract export --format dbt datacontract.yaml

version: 2
models:
  - name: orders    
    config:
      materialized: table
      contract:
        enforced: true
    columns:
      - name: order_id
        data_type: text
        constraints:
          - type: not_null
          - type: unique
      - name: order_total
        data_type: integer
        constraints:
          - type: not_null    

from datacontract-cli.

simonharrer avatar simonharrer commented on July 29, 2024

Just merged a first implementation of the dbt export. It currently has only a mapping for snowflake but already supports all fields in the model. Does this already help you? What is missing for you @adaminsta ?

from datacontract-cli.

adaminsta avatar adaminsta commented on July 29, 2024

(Issue created)

Also, this is not related to dbt format but we are consuming this from a pub-sub topic and putting it into snowflake, but there is no pubsub option for servers. Need something like this:
source:
type: pubsub
project: gcp_project_name
name: my_topic

from datacontract-cli.

adaminsta avatar adaminsta commented on July 29, 2024

@simonharrer I've got some more issues when trying to test and export to DBT. Maybe we could have a quick call? Sent you an email as well from instabox email

from datacontract-cli.

simonharrer avatar simonharrer commented on July 29, 2024

I did not receive an email. Did you send one to [email protected] ?

from datacontract-cli.

simonharrer avatar simonharrer commented on July 29, 2024

Easiest would be if you would join the data contract slack: https://datacontract.com/slack

There, we could easily set up a call.

from datacontract-cli.

simonharrer avatar simonharrer commented on July 29, 2024

I've just added the dbt-sources export in a very early implementation: ac19d27

How to use:

datacontract export --format dbt-sources

and this would result in:

version: 2
sources:
  - name: orders-unit-test
    description: The orders data contract  
    database: my-database
    schema: my-schema  
    tables:
      - name: orders 
        description: The orders model
        columns:
          - name: order_id
            tests:
              - dbt_expectations.dbt_expectations.expect_column_values_to_be_of_type:
                  column_type: VARCHAR
              - not_null
              - unique
              - dbt_expectations.expect_column_value_lengths_to_be_between:
                  min_value: 8
                  max_value: 10
              - dbt_expectations.expect_column_values_to_match_regex:
                  regex: ^B[0-9]+$      
            meta:
              classification: sensitive
              pii: true
            tags:
              - order_id
          - name: order_total
            description: The order_total field
            tests:
              - dbt_expectations.dbt_expectations.expect_column_values_to_be_of_type:
                  column_type: NUMBER
              - not_null
              - dbt_expectations.expect_column_values_to_be_between:
                   min_value: 0
                   max_value: 1000000
          - name: order_status
            tests:
              - dbt_expectations.dbt_expectations.expect_column_values_to_be_of_type:
                  column_type: TEXT
              - not_null
              - accepted_values:
                  values:
                    - 'pending'
                    - 'shipped'
                    - 'delivered'

Please have a look. Feedback is appreciated! :-)

from datacontract-cli.

simonharrer avatar simonharrer commented on July 29, 2024

And we've added support for dbt-staging-sql export: 7cea40e

How to use:

datacontract export --format dbt-staging-sql

And this would result in

select 
    order_id,
    order_total,
    order_status
from {{ source('orders-unit-test', 'orders') }}

from datacontract-cli.

simonharrer avatar simonharrer commented on July 29, 2024

We now have the three dbt export features: for dbt models, dbt sources, and dbt staging sql. Create a new issue when you want to improve on the export features, or add a new dbt export feature.

from datacontract-cli.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.