From Ad-Hoc to Engineering-Grade: A Practical Guide to Integrating dbt into Your Data Team’s Workflow

If you don’t feel like listening right, you can listen to audio summary generated by NotebookLM.

Introduction

So, your team has adopted dbt. You’ve run through the tutorials, built a few models, and seen the magic of turning raw data into clean, analysis-ready tables with simple SQL SELECT statements. But what comes next? The path from running your first dbt run to managing a complex, production-grade data pipeline used by the entire company is often unclear. Many guides teach you the what of dbt’s commands, but not the how of integrating it into your team’s day-to-day operations.

This gap is where data projects often falter. Without established standards, a dbt project can quickly devolve into a “wild west” of inconsistent naming, untested logic, and un-documented models. The result is a system that no one trusts and everyone is afraid to touch.

The true power of dbt lies not just in its ability to transform data, but in its capacity to bring software engineering best practices to analytics. It provides the framework for version control, automated testing, documentation, and collaboration. To unlock this potential, your team needs more than just a tool; it needs a workflow.

This article provides a practical, opinionated blueprint for integrating dbt into your data team’s processes. We will move beyond basic commands and provide a series of actionable checklists and real-world examples covering the entire lifecycle of a dbt project. From establishing clear naming conventions and a consistent SQL style to defining a robust pull request process, a safe model deprecation strategy, and a comprehensive testing framework, you will learn how to build a dbt workflow that is reliable, scalable, and collaborative.

Part 1: The Blueprint — Establishing a Foundation with Clear Naming Conventions

Before you write a single line of complex business logic, the most impactful decision you can make is establishing a consistent naming convention. A well-defined structure for your folders, models, and columns is the bedrock of a scalable and maintainable dbt project. Without it, your project becomes a tangled web where orders, order_details, and final_orders might mean the same thing, or entirely different things, depending on who wrote them.

This section provides a prescriptive guide to organizing your dbt project. By following these conventions, new team members can quickly understand the data flow, and veterans can navigate the project with ease.

1. Folder Organization: From Raw to Ready

The first step is to structure your models directory to reflect the flow of data. The most common and effective pattern uses three layers: staging, intermediate, and marts.

staging: This layer is for cleaning and standardizing raw data from your sources. Each model should correspond one-to-one with a source table. Transformations here are light: renaming columns, casting data types, and basic cleaning.
intermediate: This is the “workshop” layer where you build modular pieces of business logic. Models here often join different staging models or perform complex aggregations. They are not meant for end-users but serve as building blocks for your final mart models.
marts: This is the final, business-facing layer. These models are clean, well-documented, and ready for consumption by BI tools, data scientists, or other stakeholders. They represent distinct business entities or concepts, like customers or monthly_revenue.

A well-organized project directory should look like this:

models/
├── staging/
│   ├── salesforce/
│   │   ├── stg_salesforce__accounts.sql
│   │   └── stg_salesforce__opportunities.sql
│   └── stripe/
│       └── stg_stripe__payments.sql
├── intermediate/
│   ├── finance/
│   │   └── int_orders__with_taxes.sql
│   └── marketing/
│       └── int_campaigns__performance.sql
└── marts/
    ├── finance/
    │   └── mart_monthly_revenue.sql
    └── core/
        ├── mart_customers.sql
        └── mart_orders.sql

2. Model Naming: Prefixes for Predictability

Model file names should tell you exactly what layer they belong to and what they represent.

Staging Models (stg_): Prefix with stg_ followed by the source name and the entity, separated by double underscores. The __ is a dbt convention that helps organize the DAG.
- Example: stg_stripe__payments.sql clearly indicates it’s the staging model for the payments table from stripe.
Intermediate Models (int_): Prefix with int_ and name them based on the logical transformation they perform.
- Example: int_customers__first_and_last_orders.sql is an intermediate model that calculates the first and last order dates for each customer.
Mart Models (mart_): Prefix with mart_ and give them clear, business-friendly names.
- Example: mart_monthly_revenue.sql is a final model ready for financial reporting.

3. Column Naming: The Language of Your Data

Consistent column naming is critical for making your final data models intuitive. When a data scientist sees order_amount_usd and created_at in every mart, they don’t have to guess the currency or format.

Key Rules:

IDs: Be specific. Use customer_id and order_id, not a generic id. This makes joins unambiguous.
Timestamps: Use the _at suffix (e.g., created_at, updated_at).
Booleans: Use prefixes like is_ or has_ (e.g., is_active, has_ordered).
Amounts: Include units or currency (e.g., order_amount_usd, shipping_weight_kg).

A key task in staging is to enforce these conventions. For example, when staging data from two different sources, you should standardize their column names immediately.

-- models/staging/salesforce/stg_salesforce__opportunities.sql
SELECT
    id AS opportunity_id,
    created_date AS created_at,
    amount AS opportunity_amount_usd,
    is_won AS is_won_deal
FROM 

-- models/staging/stripe/stg_stripe__payments.sql
SELECT
    id AS payment_id,
    created AS created_at,
    -- Convert from cents to dollars
    amount / 100 AS payment_amount_usd,
    status = 'succeeded' AS is_successful_payment
FROM 

Your Quick Reference: Naming Convention Checklist

Use this checklist to ensure every contribution to your dbt project is consistent and clean.

📋 dbt Project Naming Convention Checklist

1. Folder Organization & Asset Structure

Directory Structure

Did I organize models by layer and business domain?

models/
├── staging/
│   ├── salesforce/
│   │   ├── stg_salesforce__accounts.sql
│   │   └── stg_salesforce__opportunities.sql
│   └── stripe/
│       └── stg_stripe__payments.sql
├── intermediate/
│   ├── finance/
│   │   └── int_orders__with_taxes.sql
│   └── marketing/
│       └── int_campaigns__performance.sql
└── marts/
    ├── finance/
    │   └── mart_monthly_revenue.sql
    └── core/
        ├── mart_customers.sql
        └── mart_orders.sql

Asset Placement Rules

Did I place raw data transformations in staging/?
Did I place business logic and calculations in intermediate/?
Did I place final business-ready models in marts/?

2. Model Naming Standards

Staging Models

Did I prefix staging models with stg_ + source system?

-- ✅ Good
stg_salesforce__accounts.sql
stg_stripe__payments.sql
  
-- ❌ Bad
salesforce_accounts.sql
raw_payments.sql

Intermediate Models

Did I prefix intermediate models with int_ and use double underscores?

-- ✅ Good
int_orders__daily_summary.sql
int_customers__lifetime_value.sql
  
-- ❌ Bad
daily_orders.sql
intermediate_orders.sql

Mart Models

Did I prefix mart models with mart_ for consistency?

-- ✅ Good (marts/)
mart_customers.sql
mart_monthly_revenue.sql
  
-- ❌ Bad
customers.sql              -- Missing prefix
final_orders.sql           -- Wrong prefix

3. Column Naming Standards

Primary Keys & Foreign Keys

Did I use consistent ID naming patterns?

-- ✅ Good
customer_id          -- Primary key in customers table
order_id            -- Primary key in orders table
  
-- ❌ Bad
id                  -- Too generic
cust_id            -- Abbreviations

Timestamps & Booleans

Did I use _at suffix for timestamps and is_/has_/can_ prefix for booleans?

-- ✅ Good
created_at
updated_at
is_active
has_orders
  
-- ❌ Bad
created_date
active

Amounts & Measurements

Did I include units/currency in column names?

-- ✅ Good
order_amount_usd
shipping_weight_kg
duration_minutes
  
-- ❌ Bad
amount             -- What currency?
weight             -- What unit?

Standardized Column Transformations

Did I standardize column names from different sources?

-- From Salesforce staging
select
  id as opportunity_id,
  name as opportunity_name,
  created_date as created_at,
  amount as opportunity_amount_usd
from 

4. Naming Macros for Consistency (Advanced)

Column Standardization Macros

Did I create macros for common column transformations?

-- macros/standardize_columns.sql
{% macro standardize_id_column(source_column, entity_name) %}
  {{ source_column }} as {{ entity_name }}_id
{% endmacro %}

Using Naming Macros with Examples

Did I apply naming macros consistently across staging models?

Input Code:

-- stg_salesforce__opportunities.sql
select
  ,
  ...
from 

Compiled Output:

-- After dbt compile
select
  id as opportunity_id,
  ...
from raw_salesforce.opportunities

Part 2: Writing Readable Code — Your Team’s SQL Style Guide

Now that your project has a logical folder structure and consistent naming, the next step is to standardize the code inside each .sql file. If naming conventions provide the blueprint for your data warehouse, a style guide provides the clean, consistent handwriting that makes the blueprint legible for everyone.

An inconsistent style—where one developer uses trailing commas and lowercase keywords, while another prefers leading commas and uppercase—creates friction. The code becomes harder to read, review, and debug. By adopting a unified SQL style, you reduce the cognitive load on your team, allowing them to focus on the business logic rather than deciphering formatting quirks.

The cornerstone of modern, readable dbt SQL is the use of Common Table Expressions (CTEs). CTEs allow you to break down complex transformations into a series of logical, sequential steps. Each CTE acts as a temporary, named table, making the flow of data from source to final output easy to follow. This approach is vastly superior to nested subqueries or monolithic scripts, which quickly become unmanageable.

The Standard dbt Model Structure

A clean dbt model should follow a predictable pattern:

Import CTEs: Start with CTEs that select from your source or ref models. These are your building blocks.
Logical CTEs: Perform transformations in a series of intermediate CTEs. Each CTE should have a single, clear purpose, such as joining two tables, calculating an aggregation, or pivoting data.
Final SELECT: Conclude with a final SELECT statement that pulls from your last logical CTE, renames columns for clarity, and presents the final output of the model.

Here’s a practical example of a well-structured model that calculates customer lifetime value:

/*
This model calculates the lifetime value for each customer.
It joins orders with payments to get the total amount per customer.
*/

-- 1. Import CTEs from staging models
WITH stg_orders AS (
    SELECT
        order_id,
        customer_id
    FROM 
),

stg_payments AS (
    SELECT
        order_id,
        payment_amount_usd
    FROM 
),

-- 2. Logical CTE to join orders and payments
order_payments AS (
    SELECT
        stg_orders.customer_id,
        stg_payments.payment_amount_usd
    FROM stg_orders
    LEFT JOIN stg_payments
        ON stg_orders.order_id = stg_payments.order_id
),

-- 3. Logical CTE to aggregate at the customer level
customer_lifetime_value AS (
    SELECT
        customer_id,
        SUM(payment_amount_usd) AS total_revenue_usd,
        COUNT(DISTINCT order_id) AS total_orders
    FROM order_payments
    GROUP BY 1 -- Use ordinal position for conciseness
)

-- 4. Final SELECT statement to clean up for the mart
SELECT
    customer_id,
    total_revenue_usd,
    total_orders
FROM customer_lifetime_value

Your Quick Reference: SQL Style Guide Checklist

To enforce these best practices, provide your team with a clear checklist. This guide ensures that every model, regardless of author, is formatted consistently.

📜 dbt SQL Style Guide & Best Practices

This guide outlines the standard format for writing SQL within our dbt models. Following these rules ensures that our code is readable, consistent, and easy to maintain for everyone on the team.

1. General Formatting Rules

Keywords and Identifiers

Did I use uppercase for all SQL keywords?

Why? It visually separates the SQL language’s structure from our specific table and column names.

-- ✅ Good
SELECT
    customer_id,
    order_date
FROM stg_orders
WHERE is_active = true;
  
-- ❌ Bad
select
    customer_id,
    order_date
from stg_orders
where is_active = true;

Indentation and Commas

Did I use leading commas and consistent indentation (2 or 4 spaces)?

Why? Leading commas make it easier to add, remove, or comment out columns without causing syntax errors. It also improves readability.

-- ✅ Good: Easy to comment out a column
SELECT
  order_id
  , customer_id
--  , order_status
  , order_date
FROM stg_orders
  
-- ❌ Bad: Commenting out requires fixing the previous line
SELECT
  order_id,
  customer_id, -- Now I have to remove the comma here if I comment out order_status
  order_status,
  order_date
FROM stg_orders

2. Query Structure with CTEs

Did I structure my model using Common Table Expressions (CTEs)?
- Why? CTEs break down complex logic into clean, sequential, and debuggable steps. Each CTE should perform a single, logical transformation.

The Standard CTE Flow

Import CTEs: Start by selecting from source() or ref() models.
Logical CTEs: Perform transformations (joins, aggregations, window functions) in intermediate steps.
Final CTE: A final SELECT statement to select and rename columns for the final model output.

Complete Example Structure

-- ✅ Good: A full, readable model structure
WITH stg_orders AS (
    SELECT ... FROM 
),

stg_payments AS (
    SELECT ... FROM 
),

order_payments AS (
    SELECT ... FROM stg_orders JOIN stg_payments ...
),

final AS (
    SELECT ... FROM order_payments
)

SELECT * FROM final

3. Formatting Specific Clauses

JOINs

Did I format JOINs for clarity?

Why? Placing each clause on a new line with clear aliases makes complex joins much easier to read and understand.

-- ✅ Good
FROM stg_orders AS o
LEFT JOIN stg_customers AS c
    ON o.customer_id = c.customer_id
  
-- ❌ Bad
FROM stg_orders o LEFT JOIN stg_customers c ON o.customer_id = c.customer_id

CASE Statements

Did I properly indent my CASE statements?

Why? Indentation reveals the logical flow of the conditions and makes the statement scannable.

-- ✅ Good
CASE
    WHEN order_status = 'shipped' AND is_delivered THEN 'Completed'
    WHEN order_status = 'returned' THEN 'Returned'
    ELSE 'In Progress'
END AS order_category
  
-- ❌ Bad
CASE WHEN order_status = 'shipped' AND is_delivered THEN 'Completed' WHEN order_status = 'returned' THEN 'Returned' ELSE 'In Progress' END AS order_category

Comments

Did I document the model’s purpose and any tricky logic?

Why? Good comments help future developers (including your future self!) understand the business context and the reason behind specific code choices.

-- ✅ Good
/*
Model: mart_customer_segments
Description: This model segments customers based on their purchasing frequency
and monetary value (RFM analysis).
*/
...
-- This logic handles a known data quality issue where legacy orders have null dates.
-- We default them to the start of the fiscal year.
COALESCE(order_date, '2020-01-01') AS cleaned_order_date

Part 3: Extending Your Toolkit — A Style Guide for Python Models

While SQL is the foundation of dbt, Python models open the door to advanced statistics, machine learning, and complex data manipulations that are cumbersome in SQL. However, this flexibility can come at a cost. Without clear standards, your dbt project can become littered with inconsistent, difficult-to-debug Python scripts that operate as “black boxes.”

To prevent this, it’s crucial to apply the same level of discipline to your Python code as you do to your SQL. A consistent style guide ensures that your Python models are readable, maintainable, and seamlessly integrated into your dbt workflow, regardless of who writes them.

The Anatomy of a Well-Styled Python Model

A clean Python model is predictable. It has a clear entry point, separates concerns, and is easy for a teammate to understand at a glance. The fundamental structure revolves around a single model function that ingests data via dbt.ref() and returns a DataFrame.

Let’s look at a simple example that calculates the ratio of a customer’s revenue to the average revenue of all customers. This model, while simple, demonstrates all the elements of good style.

# models/intermediate/int_customer_revenue_ratio.py

# 1. Imports are clean and at the top
import pandas as pd

# 2. Function is well-defined with type hints and a docstring
def model(dbt, session) -> pd.DataFrame:
    """
    Calculates each customer's revenue as a ratio to the
    overall average customer revenue.

    Inputs:
      - ref('mart_customers'): A model with customer-level revenue.

    Outputs:
      - A DataFrame with customer_id and their revenue ratio.
    """
    
    # 3. dbt config is the first line inside the function
    dbt.config(materialized="table")

    # 4. Load data using dbt.ref()
    customers_df = dbt.ref("mart_customers").to_pandas()

    # 5. Logic is clear, with descriptive variable names
    # Avoids "magic numbers" by calculating the average first
    average_revenue = customers_df["total_revenue_usd"].mean()

    customers_df["revenue_to_avg_ratio"] = (
        customers_df["total_revenue_usd"] / average_revenue
    )
    
    # 6. Select and rename final columns for a clean output
    final_df = customers_df[["customer_id", "revenue_to_avg_ratio"]]

    # 7. Return the final DataFrame
    return final_df

This structure is a blueprint. It makes the code easy to test, debug, and for new team members to contribute to confidently.

Your Quick Reference: Python Model Style Checklist

Use this checklist during development and code reviews to enforce a high standard of quality for all Python models in your project.

🐍 dbt Python Model Style Checklist

1. Structure & Readability

Did I use a clear function signature with type hints and a docstring?

Why? It explains the model’s purpose and makes it clear what kind of object is expected and returned, which helps prevent runtime errors.

# ✅ Good
def model(dbt, session) -> pd.DataFrame:
    """Calculates user engagement scores based on recent activity."""
    ...
    return final_df
  
# ❌ Bad
def model(dbt, session): # No type hints or docstring
    # ... what does this do?
    return df

Is dbt.config() the first line inside the model function?

Why? It makes the model’s materialization strategy immediately obvious to anyone reading the code.

# ✅ Good
def model(dbt, session) -> pd.DataFrame:
    dbt.config(materialized="table")
    customers_df = dbt.ref("mart_customers").to_pandas()
    ...
  
# ❌ Bad
def model(dbt, session) -> pd.DataFrame:
    customers_df = dbt.ref("mart_customers").to_pandas()
    # ... some logic ...
    dbt.config(materialized="table") # Config is hidden at the end
    ...

Are all imports located at the top of the file?

Why? It provides a clear, upfront summary of all external dependencies.

# ✅ Good
import pandas as pd
from sklearn.preprocessing import StandardScaler
  
def model(dbt, session):
    ...
  
# ❌ Bad
def model(dbt, session):
    import pandas as pd # Hidden import
    ...

2. Code Quality & Formatting

Does the code follow PEP 8 style guidelines (e.g., snake_case for variables and functions)?

Why? It’s the community standard for Python code, making it universally readable. Use a tool like Black or Ruff to automate this.

# ✅ Good
daily_revenue_df = dbt.ref("mart_daily_revenue").to_pandas()
  
def calculate_moving_average(df, window_size):
    ...
  
# ❌ Bad
DailyRevenueDF = dbt.ref("mart_daily_revenue").to_pandas() # Not snake_case
  
def Calculate_MA(df, windowSize): # Not snake_case
    ...

Are complex steps or “magic numbers” explained with comments?

Why? It clarifies the business logic or rationale behind a technical choice.

# ✅ Good
# Calculate churn if no activity in the last 90 days
churn_threshold_days = 90
df["is_churned"] = df["days_since_last_login"] > churn_threshold_days
  
# ❌ Bad
df["is_churned"] = df["days_since_last_login"] > 90 # Why 90?

Does the model avoid hardcoding file paths, model names, or credentials?
- Why? Hardcoding makes models brittle and difficult to run in different environments (dev, prod). Use dbt.ref() and environment variables instead.
```
# ✅ Good
customers_df = dbt.ref("mart_customers").to_pandas()
  
# ❌ Bad
# This will break when `mart_customers` changes or in CI/CD
customers_df = pd.read_csv("target/mart_customers.csv")
```

3. dbt Integration & Documentation

Is the Python model documented in a schema.yml file?

Why? Python models are first-class citizens in your DAG. They should have descriptions, and their columns should be documented and tested just like any SQL model.

# ✅ Good
models:
  - name: int_customer_revenue_ratio
    description: "Python model that calculates each customer's revenue ratio to the mean."
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: revenue_to_avg_ratio
        description: "Ratio of the customer's total revenue to the average of all customers."

Are all required Python packages listed in packages.txt?
- Why? This ensures that the model can run reliably in any environment (local, CI, production) by defining its dependencies.
```
# packages.txt
pandas
scikit-learn
statsmodels
```

I’ve also created a comprehensive article on Python code styling guide for data teams, which you can explore after finishing this piece.

Part 4: The Gateway to Production — A Robust Pull Request Workflow

You’ve established clear naming conventions and a consistent style guide. Now, how do you enforce them? The answer lies in the Pull Request (PR). In modern data teams, the PR is the single most important control point for ensuring quality, collaboration, and stability in your dbt project. It transforms analytics development from a solo, ad-hoc activity into a transparent, peer-reviewed engineering discipline.

A successful PR workflow relies on a clear, repeatable set of steps that every team member follows. The developer is responsible for ensuring their changes are complete, tested, and well-documented. The reviewer is responsible for validating the logic, checking for adherence to standards, and providing constructive feedback.

The Anatomy of a Great Pull Request

A great PR tells a story. It clearly explains what was changed, why it was changed, and how it was tested. This makes the reviewer’s job easier and serves as a permanent historical record for future developers.

A well-structured PR should always include:

A Descriptive Title: Summarize the change concisely (e.g., “feat: Add mart_customer_churn model”).
A Link to the Ticket: Connect the work to a business request (e.g., a JIRA or Asana ticket).
A Clear Description: Explain the context and the approach taken.
Proof of Testing: Show that all relevant dbt tests have passed.

Your Quick Reference: Pull Request Checklist

This checklist serves as a “pre-flight” guide for any developer preparing to submit their work. By completing these steps, you ensure that your PR is ready for an efficient and effective review.

📝 dbt Pull Request (PR) Development Checklist with Code Examples

1. Branch Management

Create a new branch from main or develop.

git checkout main               # switch to base branch
git pull                        # ensure it is up-to-date
git checkout -b feature/new-churn-model

2. Development & Style

Add or update models, tests, seeds, or macros as required. -> Check org test guide!
Ensure all new and modified assets follow your team’s standards.
- Check against the Naming Convention Checklist.
- Check against the SQL/Python Style Guide.
Compile and check code for syntax errors.
```
dbt compile
```

3. Test Coverage

Add appropriate schema and logic tests for any new or changed logic.

# models/marts/schema.yml
models:
  - name: mart_customer_churn
    columns:
      - name: customer_id
        tests: [not_null, unique]
      - name: is_churned
        description: "True if the customer has not made a purchase in 90 days."
        tests:
          - accepted_values:
              values: [true, false]

Run all dbt tests locally and resolve any failures.

dbt build --select +mart_customer_churn # Build the model and run its tests

4. Documentation

Add or update model and column descriptions in the relevant schema.yml file.

columns:
  - name: churn_date
    description: "The first date the customer was considered churned. Null if not churned."

Summarize your changes and rationale in the PR description.

5. Repo Cleanliness

Remove temporary models, debugging statements (limit 100), or commented-out code blocks.
Regenerate docs locally to ensure your descriptions render correctly.
```
dbt docs generate
```
Run a linter/formatter (e.g., sqlfluff, black) if your project uses one.

6. Git Workflow: Add, Commit, Push

Stage, commit, and push your changes with a clear commit message.

git add models/marts/mart_customer_churn.sql models/marts/schema.yml
git commit -m "feat: Create mart_customer_churn model with tests"
git push origin feature/new-churn-model

7. Review & Collaboration

Open a Pull Request using your team’s template.

Example PR Description:
Summary:

This PR introduces a new model, mart_customer_churn, which flags customers as “churned” if they haven’t made a purchase in the last 90 days.

Rationale:

This is required for the new executive retention dashboard. Ticket: PROJ-1234

Tests:
- Added not_null, unique, and accepted_values tests for the new model.
- All tests pass locally via dbt build.
Impact:
- This is a new mart model with no downstream dependencies yet.
Request a review from at least one other team member.

When is a Review Most Critical?

While all changes should be reviewed, pay special attention when the PR includes:

Significant Business Logic: Implementing a key metric (e.g., ARR, LTV) or a complex rule that has financial or strategic importance.

Complex Changes: Introducing multi-stage CTEs, complex joins, window functions, or custom macros.

Changes to Core Models: Modifying a foundational model (like mart_customers or mart_orders) that has many downstream dependencies.

Part 5: Sunsetting Models Gracefully — The Deprecation Workflow

As your data project matures, models will inevitably become outdated. Business logic changes, sources are replaced, or you discover a more efficient way to transform data. Simply deleting an old model is a recipe for disaster, as it can silently break downstream dashboards, data science models, or ad-hoc queries that depend on it.

A formal deprecation process is essential for maintaining a clean, trustworthy, and modern dbt project. It provides a clear, communicative path for retiring models without causing chaos. This workflow ensures that all stakeholders are aware of upcoming changes, have a clear migration path, and are given adequate time to adapt.

The goal is to make model retirement a non-event—a routine maintenance task rather than a source of fire drills and broken reports. This process can be broken down into five phases: Announce, Analyze, Migrate, Execute, and Clean Up.

The Deprecation Lifecycle: A Step-by-Step Guide

Announce & Document: The first step is always communication. You must clearly signal that a model is slated for removal. dbt provides built-in features for this. Add a deprecation_date to the model’s schema.yml file. This is the official flag that a model is on its way out.
```
# models/core/schema.yml
models:
  - name: orders_legacy
    description: |
      DEPRECATED: This model will be removed after 2024-08-01.
      Please use `mart_orders_v2` instead.
    deprecation_date: '2024-08-01'
```
You should also broadcast this change in your team’s communication channels (e.g., Slack or Teams).
Analyze Dependencies: Before you can remove a model, you must understand its “blast radius.” Who and what depends on it? This analysis has two parts:
- Internal Dependencies: Check which other dbt models reference the one you’re deprecating.
- External Dependencies: This is the crucial part. Check your data warehouse’s query history to find BI tools, ad-hoc queries, or other systems that query the model’s table directly.
Migrate & Monitor: Proactively help users move to the new model. Provide clear instructions, code snippets, and support. As the deprecation date approaches, continue to monitor usage and send reminders to any remaining users.
Execute the Deprecation: On the deprecation date, don’t delete the model file immediately. First, “soft-deprecate” it by disabling it in dbt. This prevents it from being built but keeps the code in the repository, making it easy to reverse if needed.
```
# models/core/schema.yml
models:
  - name: orders_legacy
    config:
      enabled: false
    description: "DEPRECATED: Disabled after 2024-08-01. Use mart_orders_v2."
```
After disabling the model in dbt, you can drop the table from your data warehouse.
Clean Up: After a safe waiting period (e.g., one or two sprints) where no issues have been reported, you can complete the process by removing the model’s .sql and .yml files from your project.

Your Quick Reference: Model Deprecation Checklist

This checklist provides a systematic, step-by-step process to ensure nothing is missed when retiring a model.

DBT MODEL DEPRECATION CHECKLIST

1. Pre-Deprecation: Communication & Documentation

Did I add a deprecation_date and rationale to the model’s YAML?

models:
  - name: orders_legacy
    description: |
      DEPRECATED: This model will be removed after 2024-08-01.
      Please use `mart_orders_v2` instead.
    deprecation_date: '2024-08-01'

Did I communicate the plan to stakeholders (e.g., via Slack)? Sample Message:

:warning: dbt Model Deprecation Notice :warning: The model orders_legacy will be removed on August 1, 2024. Please migrate all dependencies to mart_orders_v2. Contact #data-team for support.

2. Analysis: Downstream Dependencies & Usage

Did I check which dbt models depend on it?

# This command searches for references within your dbt project files.
grep -r "ref('orders_legacy')" models/

Did I check warehouse query history for external usage (BI tools, ad-hoc queries)?

-- Example for Snowflake
SELECT query_text, user_name, start_time
FROM snowflake.account_usage.query_history
WHERE query_text ILIKE '%orders_legacy%'
  AND start_time > dateadd(day, -30, current_timestamp());

3. Migration: Plan & Action

Did I inform affected users about the migration deadline and alternatives?

Did I provide a clear code sample for the update?

-- Before
SELECT * FROM 
  
-- After
SELECT * FROM 

4. Transition: Monitoring & Warnings

Did I monitor usage as the deprecation date approaches? (Re-run the query history check).
Did I send final reminders to any remaining users?

5. Execution: Deprecation in dbt

Did I disable the model in dbt to prevent it from being built?

# in schema.yml
models:
  - name: orders_legacy
    config:
      enabled: false
    description: "DEPRECATED: Disabled on 2024-08-01. Use mart_orders_v2."

Did I drop the table/view from the data warehouse?
```
DROP TABLE IF EXISTS ..orders_legacy;
```

6. Post-Deprecation: Final Housekeeping

After a safe waiting period (e.g., 1-2 weeks), did I remove the model’s .sql and schema files from the repository?

git rm models/core/orders_legacy.sql
git rm models/core/schema.yml # Or just the entry from the file
git commit -m "chore: Remove deprecated model orders_legacy"

Did I clean up any references in project documentation (e.g., README)?

Part 6: Building Trust — A Comprehensive Testing Framework

Data without tests is just a rumor. You can have perfectly named models, beautifully styled code, and a flawless PR process, but if the data itself is wrong, none of it matters. A single NULL in a key column or an incorrect join can silently corrupt every downstream analysis and dashboard, eroding the trust you’ve worked so hard to build.

dbt’s built-in testing framework is its most powerful feature for preventing this. It transforms data quality from a manual, reactive task into an automated, proactive part of your development cycle. Tests are assertions you make about your data that run every time you build your project, acting as a permanent guard against regressions and bad data.

But what should you test, and when? The answer isn’t to test everything all the time. A smart testing strategy is pragmatic, risk-based, and adapts to the maturity of your data pipeline.

Choosing Your Testing Strategy: From Greenfield to Legacy

Not all development scenarios are the same. Your approach to testing should be tailored to the context of your work. We can think of these as different workflows.

Greenfield (“Design Before Build”): When starting a brand-new project, you can design your tests upfront. You define the business rules and data contracts before writing the code, ensuring full coverage from day one. This is ideal but rare.
Agile/Iterative (“Test-As-You-Build”): This is the most common workflow for modern data teams. You build a model and immediately add foundational tests (not_null, unique). As you add more complex logic or joins, you add more specific unit and integration tests in parallel. This keeps testing in sync with development.
Legacy Retrofit (“Test the Past”): When you inherit a project with little to no test coverage, you can’t test everything at once. You must prioritize. Start by adding basic schema tests to your most critical models. Then, add integration tests to high-risk joins and write specific unit tests to address known bugs or problem areas.
Hybrid/Continuous Improvement (Recommended): For most teams, the reality is a mix of all three. The rule is simple:
- All new models must have foundational tests from day one.
- Any change to an existing model requires reviewing and adding relevant tests.
- Continuously identify critical, untested legacy models and add coverage over time.

Regardless of the workflow you choose, the goal is the same: to have a suite of tests that gives you confidence in your data.

The Testing Pyramid: What to Test and How

The types of tests you write can be categorized to cover different aspects of data quality.

Schema & Data Quality Tests: These are the “must-haves” for every model. They are assertions about the shape and integrity of your data, like checking for nulls, uniqueness, or accepted values.
Unit Tests: These verify a specific piece of business logic within a single model, like a complex CASE statement or a calculation.
Integration / Referential Integrity Tests: These check the relationships between models, ensuring that joins are complete and that no orphaned records are created.
Business/Custom Data Tests: These are specific rules that encode business knowledge, such as “revenue can never be negative” or “an order’s ship date cannot be before its creation date.”

Your Quick Reference: Model Test Coverage Checklist

This checklist is your tactical guide to implementing a robust testing strategy. Use it for every new model you create and every existing model you modify.

📝 dbt Model Test Coverage Checklist (with Examples)

1. Schema & Data Quality (“Must-Have” for Every Model)

Did I add not_null tests for required columns?

# in schema.yml
models:
  - name: mart_customers
    columns:
      - name: customer_id
        tests:
          - not_null

Did I add unique tests for primary key columns?

- name: mart_customers
  columns:
    - name: customer_id
      tests:
        - unique

Pro tip: Apply both unique and not_null tests on the primary key columns

Did I add accepted_values tests for columns with a defined set of values (e.g., statuses)?

- name: mart_orders
  columns:
    - name: order_status
      tests:
        - accepted_values:
            values: ['placed', 'shipped', 'completed', 'returned']

2. Unit Testing Logic

Does this model contain non-trivial logic (e.g., complex CASE statements, window functions)? If yes:

Did I write a custom data test to validate that logic?

Example: Your model has a CASE statement to define a customer segment.

-- models/marts/mart_customers.sql
SELECT
  customer_id,
  CASE
    WHEN lifetime_revenue > 1000 THEN 'High Value'
    WHEN lifetime_revenue > 100 THEN 'Medium Value'
    ELSE 'Low Value'
  END AS customer_segment
FROM 

Unit Test: Write a SQL query that returns rows only if the logic is incorrect.

-- tests/assert_customer_segment_logic_is_correct.sql
SELECT *
FROM 
WHERE
  (lifetime_revenue > 1000 AND customer_segment != 'High Value') OR
  (lifetime_revenue BETWEEN 100 AND 1000 AND customer_segment != 'Medium Value')

dbt will pass this test only if the query returns zero rows.

3. Integration / Referential Integrity

Does this model join or depend on other models? If yes:

Did I write relationships tests to ensure foreign keys are valid?

# in models/marts/schema.yml
- name: mart_orders
  columns:
    - name: customer_id
      tests:
        - relationships:
            to: ref('mart_customers')
            field: customer_id

This test ensures every customer_id in mart_orders exists in mart_customers.

4. Business/Data-Specific Validations

Are there critical business rules this model must follow? If yes:

Did I add a custom data test to enforce that rule?

Example: An invoice amount should never be negative.

-- tests/assert_invoice_amount_is_positive.sql
SELECT *
FROM 
WHERE invoice_amount_usd < 0

Example: A subscription’s end date must be after its start date.

# Using the dbt_utils package for a more concise test
- name: mart_subscriptions
  tests:
    - dbt_utils.expression_is_true:
        expression: "ended_at >= started_at"

5. Test Maintenance

If I changed a model’s logic, did I update or add tests to cover the change?
If I removed columns or logic, did I remove the corresponding obsolete tests?

Conclusion: From Rules to Rituals

We’ve journeyed through a comprehensive blueprint for integrating dbt into your team’s workflow, covering everything from foundational naming conventions and style guides to the daily rituals of pull requests, model deprecation, and automated testing. Each section provided a detailed checklist, designed to bring structure and predictability to the art of data transformation.

However, the goal of this guide is not to impose a rigid, one-size-fits-all dogma. Your data team is unique—with its own size, maturity level, and specific challenges. A five-person startup team may find a lightweight process is sufficient, while a fifty-person enterprise team will require more stringent controls.

Think of these checklists less as immutable laws and more as a comprehensive toolbox. The true value lies not in blindly adopting every single rule, but in using them to facilitate a crucial conversation with your team. The important questions are:

What does a “good” model look like for us?
How will we ensure our data is trustworthy?
What is the standard of quality we agree to uphold?

The most critical step is to make a conscious, informed decision about how your team will operate and then document that decision. Whether you choose leading or trailing commas is less important than the fact that you’ve chosen one and committed to it as a team. This shared agreement is what eliminates ambiguity, reduces friction in code reviews, and enables new members to contribute confidently from day one.

By establishing and documenting these standards, you create a system of collective ownership. You move from a world of individual preferences and ad-hoc practices to one of shared language and engineering discipline. This is how you build a data platform that is more than just a collection of scripts—it becomes a reliable, scalable, and trusted asset that empowers the entire organization. It’s how you turn a powerful tool like dbt into the backbone of a truly data-driven culture.

From Ad-Hoc to Engineering-Grade: A Practical Guide to Integrating dbt into Your Data Team's Workflow