Giter Site home page Giter Site logo

tnightengale / dbt-meta-testing Goto Github PK

View Code? Open in Web Editor NEW
116.0 4.0 14.0 65 KB

A dbt SQL package for ensuring documentation and test coverage, with granular control.

License: GNU General Public License v3.0

SQL 100.00%
dbt dbt-artifacts dbt-fundamentals dbt-macros dbt-packages dbt-tests testing

dbt-meta-testing's Introduction

buildstatus

dbt Meta Testing

This dbt package contains macros to assert test and documentation coverage from dbt_project.yml configuration settings.

Table of Contents

Install

Include in packages.yml:

packages:
  - package: tnightengale/dbt_meta_testing
    version: 0.3.6

For latest release, see https://github.com/tnightengale/dbt-meta-testing/releases.

Configurations

This package features two meta configs that can be applied to a dbt project:

  1. +required_tests
  2. +required_docs

Read the dbt documentation here to learn more about model configurations in dbt.

Required Tests

To require test coverage, define the +required_tests configuration on a model path in dbt_project.yml:

# dbt_project.yml
...
models:
  project:
    +required_docs: true
    marts:
      +required_tests: {"unique.*|not_null": 1}
      model_2:
        +required_tests:
          "mocker.*|unique": 1
          "mock_schema_test": 1
          ".*data_test": 1

The +required_tests config must be None or a dict with str keys and int values. YAML dictionaries are accepted.

All the regular dbt configuration hierarchy rules apply. For example, individual model configs will override configs from the dbt_project.yml:

# /models/marts/core/your_model.sql

-- This overrides the config in dbt_project.yml, and this model will not require tests
{{ config(required_tests=None) }}

SELECT
...

New in Version 0.3.3

The keys of the config are evaluated against both data and schema tests (including any custom tests) using the re.fullmatch function.

Therefore, any test restriction which can be expressed in regex can be evaluated.

For example:

# dbt_project.yml
...
models:
  project:
    +required_docs: true
    # The following configuration on the `marts` model path requires
    # each model in that path to have at least one test that either:
    #
    #    1. starts with "unique" (note the ".*" regex suffix) OR (note the "|" regex)
    #    2. is an exact match for the "not_null" test.

    marts:
      +required_tests: {"unique.*|not_null": 1}

Schema tests are matched against their common names, (eg. not_null, accepted_values).

Data tests are matched against their macro name.

Custom schema tests are matched against their name, eg. mock_schema_test:

# models/schema.yml
...
    - name: model_2
      description: ""
      tests:
        - equal_rowcount:
            compare_model: ref('model_1')
      columns:
          - name: id
            description: "The primary key for this table"
            tests:
                - unique
                - not_null
                - mock_schema_test

Models that do not meet their configured test minimums, because they either lack the tests or are not documented, will be listed in the error when validated via a run-operation:

usr@home dbt-meta-testing $ dbt run-operation required_tests
Running with dbt=0.20.0
Encountered an error while running operation: Compilation Error in macro required_tests (macros/required_tests.sql)
  Insufficient test coverage from the 'required_tests' config on the following models:
  Model: 'model_1' Test: 'not_null' Got: 1 Expected: 2
  Model: 'model_1' Test: 'mock_schema_test' Got: 0 Expected: 1

  > in macro _evaluate_required_tests (macros/utils/required_tests/evaluate_required_tests.sql)
  > called by macro required_tests (macros/required_tests.sql)
  > called by macro required_tests (macros/required_tests.sql)
usr@home dbt-meta-testing $

Required Docs

To require documentation coverage, define the +required_docs configuration on a model path in dbt_project.yml:

# dbt_project.yml
...
models:
    project:
        +required_docs: true

The +required_docs config must be a bool.

It also does not check ephemeral models. This is because it cannot leverage adapter.get_columns_in_relation() macro on ephemeral models, which it uses to fetch columns from the data warehouse and detect columns without documentation.

When applied to a non-ephemeral model, this config will ensure 3 things:

  1. The model has a non-empty description
  2. The columns in the model are specified in the model .yml
  3. The columns specified in the model .yml have non-empty descriptions

For example, the following configurations:

# models/schema.yml
version: 2

models:
    - name: model_1
      description: "A starter dbt model"
      columns:
          - name: id
            description: ""
            tests:
                - unique
                - not_null

    - name: model_2
      description: ""
      tests:
        - equal_rowcount:
            compare_model: ref('model_1')
      columns:
          - name: id
            description: "The primary key for this table"
            tests:
                - unique
                - not_null

Where model_2 has a column new which is not defined in the .yml above:

-- models/example/model_2.sql
select
    *,
    'new' as new
from {{ ref('model_1') }}
where id = 1

And all models in the example path require docs:

# dbt_project.yml
...
models:
    project:
        example:
            +required_docs: true

Would result in the following error when validated via a run-operation:

usr@home dbt-meta-testing $ dbt run-operation required_docs
Running with dbt=0.20.0
Encountered an error while running operation: Compilation Error in macro required_docs (macros/required_docs.sql)
  The following models are missing descriptions:
   - model_2
  The following columns are missing from the model yml:
   - model_2.new
  The following columns are present in the model yml, but have blank descriptions:
   - model_1.id

  > in macro _evaluate_required_docs (macros/utils/required_docs/evaluate_required_docs.sql)
  > called by macro required_docs (macros/required_docs.sql)
  > called by macro required_docs (macros/required_docs.sql)
usr@home dbt-meta-testing $

Usage

To assert either the +required_tests or +required_docs configuration, run the correpsonding macro as a run-operation within the dbt CLI.

By default the macro will check all models with the corresponding configuration. If any model does not meet the configuration, the run-operation will fail (non-zero) and display an appropriate error message.

To assert the configuration for only a subset of the configured models (eg. new models only in a CI) pass an argument, models, to the macro as a space delimited string of model names to use.

It's also possible to pass in the result of a dbt ls -m <selection_syntax> command, in order to make use of dbt node selection syntax. Use shell subsitution in a dictionary representation.

For example, to run only changed models using dbt's Slim CI feature:

dbt run-operation required_tests --args "{'models':'$(dbt list -m state:modified --state <filepath>)'}"

Alternatively, a space delimited string of model names will work as well:

dbt run-operation required_tests --args "{'models':'model1 model2 model3'}"

required_tests (source)

Validates that models meet the +required_tests configurations applied in dbt_project.yml. Typically used only as a run-operation in a CI pipeline.

Usage:

dbt run-operation required_tests [--args "{'models': '<space_delimited_models>'}"]

required_docs (source)

Validates that models meet the +required_docs configurations applied in dbt_project.yml. Typically used only as a run-operation in a CI pipeline.

Usage:

dbt run-operation required_docs [--args "{'models': '<space_delimited_models>'}"]

Note: Run this command after dbt run: only models that already exist in the warehouse can be validated for columns that are missing from the model .yml. By default, column names are assumed to be lower case in the DBT documentation, if this is not the case in your project, setting the variable convert_column_names_to_lower_case to false in dbt_project.yml will compare the column names in the case they appear.

Contributions

Feedback on this project is welcomed and encouraged. Please open an issue or start a discussion if you would like to request a feature change or contribute to this project.

Testing

The integration tests for this package are located at ./integration_tests/tests/.

To run the tests locally, ensure you have the correct environment variables set according to the targets in ./integration_tests/profiles.yml and use:

cd integration_tests
dbt test --data

Verified Data Warehouses

This package has been tested for the following data warehouses:

  • Snowflake

dbt-meta-testing's People

Contributors

gthesheep avatar tnightengale avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dbt-meta-testing's Issues

Is the latest version compatible with dbt 0.20.0

Not sure if this is the place to ask this, so let me know if it isn't.

I wanted to verify if dbt-meta-testing version 0.3.2 is compatible with dbt 0.20.0. It seems like it isn't because it's dependent on dbt_utils [">=0.6.0", "<0.7.0"], but dbt 0.20.0 requires dbt_utils 0.7.0.

If so, when do think dbt-meta-testing will be upgraded?

Required Docs and Pascal Case Columns

When running dbt run-operation required_docs against my models, every column was coming back saying it wasn't documented. I realised this was a case issue, so removed the 'column | lower' within evaluate_required_docs.sql. Now everything works perfectly.

Not sure what the solution is. I assume there's a reason the 'column | lower' was added? Can we have a flag indicating if a case sensitive comparison is required?

Great package by the way. Thanks for sharing.

Faulty behaviour when testing documentation for versioned models

Problem:

When having a model with multiple versions, the code will disregard this fact, and verify columns against the base schema, not the version specific. This will result in false positive whenever one has for example a base model with a certain amount of columns, and then version(s) defined by excluding certain columns.
Example:

 - name: production_time
    description: |
      A table containing stuff
    latest_version: 1
    config:
      contract: {enforced: true}
    meta:
      owner: data-enablement
    columns:
      - name: last_assignment_id_in_chain
        description: The last assignmentId of chain `chain_id` (currently this can change).
        data_type: integer
      - name: last_completed
        description: The date the last assignement in the assignment chain was completed
        data_type: datetime
      - name: number_of_points
        description: Number of points
        data_type: integer
      - name: number_of_3d_cube_properties
        description: Number of 3D Cube properties
        data_type: integer
      - name: number_of_polygon_properties
        description: Number of polygon properties
        data_type: integer
    versions:
      - v: 1
      - v: 2
        tests:
          - test_versions_equal:
              old_version: ref('production_time', v='1')
              temporal_dimension: last_completed
        columns:
          - include: all
            exclude: [
              number_of_3d_cube_properties,
              number_of_polygon_properties
            ]

results in:

13:58:26  Encountered an error while running operation: Compilation Error in macro required_docs (macros/required_docs.sql)
  The following columns are missing from the model yml:
   - production_time.number_of_3d_cube_properties
   - production_time.number_of_polygon_properties
  
  > in macro default__format_raise_error (macros/utils/formatters/format_raise_error.sql)
  > called by macro format_raise_error (macros/utils/formatters/format_raise_error.sql)
  > called by macro default__required_docs (macros/required_docs.sql)
  > called by macro required_docs (macros/required_docs.sql)
  > called by macro required_docs (macros/required_docs.sql)

I suspect changing this line:

{% set model_columns = adapter.get_columns_in_relation(ref(model.package_name, model.name))

with

 {% set model_columns = adapter.get_columns_in_relation(ref(model.package_name, model.name, version=model.version))

could solve the problem.
It shouldn't (to be verified) cause any back-compatibility problems with dbt-core<1.5 as the extra kwarg should (to be verified) ignored by ref()

Thanks a lot for the good work!

Regex Test Name Matching Not Working As Per Documentation

This is great, thank you! I am working on integrating this into one of my larger dbt projects

I don't think that test name matching is working exactly as per this, which is from the documentation:

For example, in the dbt_project.yml above, the path configuration on the marts model path requires each model in that path to have at least one test that either starts with unique or is an exact match for the not_null test.

I think it's not actually checking for exact matches, but rather just the presence of the string of the test name specified in the dbt_project.yml file in the possibly-larger string that is the actual test name in the model .yml files.

For example, I have one model that uses the dbt_utils.unique_combination_of_columns test. If I do this in my dbt_project.yml file, it seems to count the "unique_combination_of_columns" test as satisfying the requirement for a "unique" test:

  +required_tests:
    "unique": 1

Here is the .yml from the file in question:

tests:
  - dbt_utils.unique_combination_of_columns:
      combination_of_columns:
        - project
        - dataset
        - model_name

However, if I use regex anchors like this:

  +required_tests:
    "^unique$": 1

Then the "unique_combination_of_columns" test does not count to satisfy the testing requirement.

I can make it match both cases by doing this:

  +required_tests:
    "^unique$|^unique_combination_of_columns$": 1

I'm fine with having to be explicit with the test names, although it would be a bit nicer to not have to specify anchors. I just wanted to point out that the docs don't quite match with the actual experience, I am not 100% sure what the intended functionality was in this case?

Allow testing of test names, rather than just test type

This project is awesome!
I'm looking for something that can validate that specific fields have specific tests (every model in a folder must have a not_null test on their load_date_time column).

A proposed implementation might be a prefix:
+required_tests: {"testname!not_null.*load_date_time": 1}
This would require returning an object array rather than string array from tests_per_model ([{name: not_null_tablename_load_date_time, type: not_null}, ...], and adding a case statement to get_regex_match_count to split off and truncate the testname! - but would allow far more flexibility in future.

I'm happy to contribute with my mediocre python skills if you like the implementation.

Changes to dispatch in dbt v0.20

Hey Teghan! I wanted to give you a heads up about a change coming in the next version of dbt. Let me know if you have questions, I'm happy to help.

Required code change

We've made a change to adapter.dispatch: Instead of passing a _get_namespaces() macro to the packages arg, you should pass the name of this package (as a string) to the macro_namespace arg, which is now the second positional argument.

{% macro fetch_configured_models(meta_config, models=none, resource_type="model") %}
{{ return(adapter.dispatch("fetch_configured_models", packages=dbt_meta_testing._get_meta_test_namespaces())(meta_config, models, resource_type)) }}
{% endmacro %}

To:

{% macro fetch_configured_models(meta_config, models=none, resource_type="model") %}
	{{ return(adapter.dispatch("fetch_configured_models", "dbt_meta_testing")(meta_config, models, resource_type)) }}
{% endmacro %}

I hope this could be as simple as Find + Replace for packages=dbt_meta_testing._get_meta_test_namespaces()"dbt_meta_testing".

If you prefer more explicit syntax, you could also make this:

{% macro fetch_configured_models(meta_config, models=none, resource_type="model") %}
	{{ return(adapter.dispatch(macro_name = "fetch_configured_models", macro_namespace = "dbt_meta_testing")(meta_config, models, resource_type)) }}
{% endmacro %}

For the README

If a user wishes to override/shim this package, instead of defining a var named dbt_meta_test_dispatch_list, they should now define a config in dbt_project.yml, for instance:

dispatch:
  - macro_namespace: dbt_meta_testing
    search_order: ['my_project', 'dbt_meta_testing']  # enable override

Notes

This change is in dbt v0.19.2 as well. Both v0.19.2 and v0.20.0 have backwards compatibility for the old syntax, so there's no huge rush to upgrade.

However:

  • The old syntax will start raising a deprecation warning in v0.20
  • The var name in this package, dbt_meta_test_dispatch_list, varies just slightly from the convention established by other packages. The backwards-compatibility logic we added to dbt would instead expect dbt_meta_testing_dispatch_list. If you know other folks are relying on this functionality to override/shim this package, we can hard code logic that looks for dbt_meta_test_dispatch_list IFF the dispatching package is dbt_meta_testing.

As soon as you do upgrade to the new syntax, you'll need to require dbt >= 0.19.2 (or just >=0.20.0, for simplicity, since you're already making compatibility changes in #69).

See also:

Customize required_docs

Hi there,

Thank you, this repo is great!

One quick suggestion, it would be great if we can customize the required docs settings. Meaning we can choose from the list below, instead of enforcing all three of them.

  • The model has a non-empty description
  • The columns in the model are specified in the model .yml
  • The columns specified in the model .yml have non-empty descriptions

+required_docs: true - Error

I was having a look at this. I have installed dbt_meta_testing package into our dbt project and was trying to add the following configuration +required_docs: true in the project.yml. but for some reason it's not working and showing error:

image

Any ideas why maybe?

Support Required Tests on Sources as Well as Models

I very much appreciate the help that this package provides. My question - is there a reason to not also support required tests on sources and seeds? I'd like to define required tests for those too. Particularly sources.

Like this:

  my_project:
    source_yml:
      fivetran_oba_public:
        +required_tests:
          "^unique$": 1
          "^not_null$": 1
          "at_least_one$": 1
      fivetran_oba_secure:
        +required_tests:
          "^unique$": 1
          "^not_null$": 1
          "at_least_one$": 1

Issue when using required_docs on a project with ephemeral models

I'm getting this error when running dbt run-operation required_docs on our dbt project in which we have some ephemeral models.

Running with dbt=0.19.1
Checking `required_tests` config...
Encountered an error while running operation: Compilation Error in macro required_docs (macros/required_docs.sql)
  Operations can not ref() ephemeral nodes, but company_billie_limits__update__seller_financing_limit is ephemeral
  
  > in macro default__evaluate_required_docs (macros/utils/required_docs/evaluate_required_docs.sql)
  > called by macro evaluate_required_docs (macros/utils/required_docs/evaluate_required_docs.sql)
  > called by macro default__required_docs (macros/required_docs.sql)
  > called by macro required_docs (macros/required_docs.sql)

A possible workaround is to add config(required_docs=False) to each ephemeral model, but it might be easier to just exclude them already at the macro level.

0.3.0 Release

Courtesy of @jtalmi:

i’d like to extend it for two use cases:
model must have one of X tests, where X can be [‘unique’, ‘unique_where’, ‘unique_combinations_of_columns’], etc.
model must have one of X tags, where X can be [‘hourly’, ‘nightly’, ‘weekly’], etc.
this involves a bit of a shift in the current way the package operates, and i’d be curious to know your thoughts on how to implement it. my initial thoughts are passing in some sort of operation like:

+required_tests: {"unique | unique_where | unique_threshold": 1, "not_null": 1}

and the model parses that as requiring one of unique/unique_where/unique_threshold. can also wrap ths in a var like var(‘unique_tests’)
other options i can think of off the top of my head include making use of the regex functionality in dbt 0.19, e.g. {'.+unique.+': 1} , or adding something like:

+required_tests:
  - unique:
    - count: 1
      tests: 
       - unique
       - unique_where
  - not_null

Not compatible with dbt 0.21

This version of dbt is not supported with the 'dbt_meta_testing' package.
Installed version of dbt: =0.21.0
Required version of dbt for 'dbt_meta_testing': ['>=0.20.0', '<0.21.0']

Can we just lower restrictions to allow running on dbt 0.21 ? Or are there any incompatibilities ?

Faulty runs without descriptions.

Hi! I've been trying to run this with my local dbt installation, but I get inconsistent results. I get the following error:

# dbt run-operation required_docs
>  Encountered an error while running operation: Compilation Error in macro required_docs (macros/required_docs.sql)
> ...
>   > in macro default__format_raise_error (macros/utils/formatters/format_raise_error.sql)
>   > called by macro format_raise_error (macros/utils/formatters/format_raise_error.sql)
>   > called by macro default__required_docs (macros/required_docs.sql)
>   > called by macro required_docs (macros/required_docs.sql)
>   > called by macro required_docs (macros/required_docs.sql)

Anything I can be doing different to get this to function consistently/properly?

required_docs failed message not rendering on screen.

When I am using dbt 0.19.0 with this package v 0.2.0, when the validation fails, the message is not being displayed.
The problem was not happening when I was using 0.18.x version or when I negate package version to 0.1.2.
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.