Comments (16)
Our idea is to use this tool to automatically create the resources and tables/models needed with CI/CD, based on the contract.
Getting the dbt schema is very useful. To make it even more powerful:
- Generate a dbt source with the model name as table, based on the snowflake database and schema
e.g:
models:
- name: orders
config:
materialized: table
contract:
enforced: true
columns:
- name: order_id
data_type: text
constraints:
- type: not_null
- type: unique
[...]
servers:
endpoint:
type: snowflake
account: xx0000
database: MY_DB
schema: MY_SCHEMA
provides
sources:
- name: contract_name
database: my_db
schema: my_schema
tables:
- name: orders
- Generate a staging model
In our automation process we'd also like to push a PR to our DBT repo in order to remove the bottle of the data engineer.
Therefore, it would be useful to also generate a staging model that selects all columns from the source.
{{ config(
materialized="table"
) }}
select
order_id
[...]
from {{ source('contract_name', 'orders')}}
from datacontract-cli.
Thanks for sharing your ideas. I think your feature ideas should be included in the tool. Your examples help to drive this! Thank you.
from datacontract-cli.
Hello everyone, stumbled upon this issue as I have the same need as @adaminsta as far as I understand.
Would it make sense that the generated source includes more info from the contract, such as the column names and metadata ? That would greatly reduce the documentation work and have the information propagated to dbt generated docs as well.
I'd rather have it generate too much (source yaml, staging model yaml and sql selecting all the fields as @adaminsta suggested) and remove what I do not actually need than having to add what I'm missing.
from datacontract-cli.
@adaminsta and @pluttgens would love your feedback on the two features that we just implemented.
from datacontract-cli.
What is the status on DBT export? @simonharrer. My team use DBT+Snowflake are looking into a few alternatives for data contract implementations like this cli, paypal etc
from datacontract-cli.
@adaminsta I've started with #38 and want to release a first version of the dbt export on Friday afternoon.
Would love your thoughts and ideas on the dbt export. Any expectations already in your mind?
from datacontract-cli.
We want to build a dbt model (in Snowflake) on top of the contract. From the datacontract we are creating a dbt source, dbt schema (with an owner, contract, etc), and the staging model.
Can elaborate more later.
from datacontract-cli.
We currently have the following export behaviour implemented in the draft PR:
datacontract.yaml
dataContractSpecification: 0.9.2
id: orders-unit-test
info:
title: Orders Unit Test
version: 1.0.0
models:
orders:
fields:
order_id:
type: varchar
unique: true
required: true
order_total:
type: bigint
required: true
datacontract export --format dbt datacontract.yaml
version: 2
models:
- name: orders
config:
materialized: table
contract:
enforced: true
columns:
- name: order_id
data_type: text
constraints:
- type: not_null
- type: unique
- name: order_total
data_type: integer
constraints:
- type: not_null
from datacontract-cli.
Just merged a first implementation of the dbt export. It currently has only a mapping for snowflake but already supports all fields in the model. Does this already help you? What is missing for you @adaminsta ?
from datacontract-cli.
Also, this is not related to dbt format but we are consuming this from a pub-sub topic and putting it into snowflake, but there is no pubsub option for servers. Need something like this:
source:
type: pubsub
project: gcp_project_name
name: my_topic
from datacontract-cli.
@simonharrer I've got some more issues when trying to test and export to DBT. Maybe we could have a quick call? Sent you an email as well from instabox email
from datacontract-cli.
I did not receive an email. Did you send one to [email protected] ?
from datacontract-cli.
Easiest would be if you would join the data contract slack: https://datacontract.com/slack
There, we could easily set up a call.
from datacontract-cli.
I've just added the dbt-sources export in a very early implementation: ac19d27
How to use:
datacontract export --format dbt-sources
and this would result in:
version: 2
sources:
- name: orders-unit-test
description: The orders data contract
database: my-database
schema: my-schema
tables:
- name: orders
description: The orders model
columns:
- name: order_id
tests:
- dbt_expectations.dbt_expectations.expect_column_values_to_be_of_type:
column_type: VARCHAR
- not_null
- unique
- dbt_expectations.expect_column_value_lengths_to_be_between:
min_value: 8
max_value: 10
- dbt_expectations.expect_column_values_to_match_regex:
regex: ^B[0-9]+$
meta:
classification: sensitive
pii: true
tags:
- order_id
- name: order_total
description: The order_total field
tests:
- dbt_expectations.dbt_expectations.expect_column_values_to_be_of_type:
column_type: NUMBER
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1000000
- name: order_status
tests:
- dbt_expectations.dbt_expectations.expect_column_values_to_be_of_type:
column_type: TEXT
- not_null
- accepted_values:
values:
- 'pending'
- 'shipped'
- 'delivered'
Please have a look. Feedback is appreciated! :-)
from datacontract-cli.
And we've added support for dbt-staging-sql export: 7cea40e
How to use:
datacontract export --format dbt-staging-sql
And this would result in
select
order_id,
order_total,
order_status
from {{ source('orders-unit-test', 'orders') }}
from datacontract-cli.
We now have the three dbt export features: for dbt models, dbt sources, and dbt staging sql. Create a new issue when you want to improve on the export features, or add a new dbt export feature.
from datacontract-cli.
Related Issues (20)
- Can't import valid jsonschema HOT 4
- Decimal precision not supported - databricks HOT 3
- JsonSchema importer doesn't support array types HOT 1
- Trino tests are flaky
- Dbt model in data contract fails HOT 2
- Case issue on postgres table names HOT 2
- Avro import does not support 'enum' type HOT 5
- Pyspark dependency is required despite marked as optional HOT 4
- Checking for Databricks ARRAY<STRING> HOT 7
- Application install installs every available package version for moto HOT 1
- Resolve to fields within a definition
- Export to Unity Catalog
- Development Env. is broken HOT 3
- Enable tests in Google Cloud Buckets HOT 1
- Typo in a documentation HOT 1
- Import: No support of AWS Athena (Trino) DDLs HOT 1
- Glue import with database and glue-table parameters returns all tables in a Glue database HOT 2
- Glue: `map` data type is not supported HOT 5
- Delta table support for server type local is not implemented yet, it only checks for azure and then directly goes to AWS s3 HOT 1
- Breaking method is not working with bigint columns
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datacontract-cli.