dbt_project_evaluator#
This package highlights areas of a dbt project that are misaligned with dbt Labs' best practices. Specifically, this package tests for:
- Modeling - your dbt DAG for modeling best practices
- Testing - your models for testing best practices
- Documentation - your models for documentation best practices
- Structure - your dbt project for file structure and naming best practices
- Performance - your model materializations for performance best practices
- Governance - your model governance feature best practices
In addition to tests, this package creates the model int_all_dag_relationships
which holds information about your DAG in a tabular format and can be queried using SQL in your Warehouse.
Currently, the following adapters are supported:
- BigQuery
- Databricks/Spark
- PostgreSQL
- Redshift
- Snowflake
- DuckDB
- Trino (tested with Iceberg connector)
- AWS Athena (tested manually)
- Greenplum (tested manually)
- ClickHouse (tested manually)
Using This Package#
Cloning via dbt Package Hub#
Check dbt Hub for the latest installation instructions, or read the docs for more information on installing packages.
Additional setup for Databricks/Spark/DuckDB/Redshift#
In your dbt_project.yml
, add the following config:
This is required because the project currently overrides a small number of dbt core macros in order to ensure the project can run across the listed adapters. The overridden macros are in the cross_db_shim directory.
How It Works#
This package will:
- Parse your graph object and write it into your warehouse as a series of models (see models/marts/core)
- Create another series of models that each represent one type of misalignment in your project (below you can find a full list of each misalignment and its accompanying model)
- Test those models to alert you to the presence of the misalignment
Once you've installed the package, all you have to do is run a dbt build --select package:dbt_project_evaluator
Each test warning indicates the presence of a type of misalignment. To troubleshoot a misalignment:
- Locate the related documentation
- Query the associated model to find the specific instances of the issue within your project or set up an
on-run-end
hook to display the rules violations in the dbt logs (see displaying violations in the logs) - Either fix the issue(s) or customize the package to exclude them
Limitations#
BigQuery and Databricks#
BigQuery current support for recursive CTEs is limited and Databricks SQL doesn't support recursive CTEs.
For those Data Warehouses, the model int_all_dag_relationships
needs to be created by looping CTEs instead. The number of loops is configured with max_depth_dag
and defaulted to 9. This means that dependencies between models of more than 9 levels of separation won't show in the model int_all_dag_relationships
but tests on the DAG will still be correct. With a number of loops higher than 9 BigQuery sometimes raises an error saying the query is too complex.