Engineering

Published: Apr 3, 2026

Data Testing Frameworks: dbt, Great Expectations, Soda, pytest

9 min read·Tags: data testing, dbt, great expectations, soda, pytest, data quality, data engineering

Data Testing Frameworks: dbt, Great Expectations, Soda, pytest

A bad row in a fact table costs more than the engineering hours to catch it. It costs the trust of every analyst who built a report on top of it.

Data testing frameworks exist to catch data quality issues before they reach downstream consumers. Four tools dominate this space: dbt tests, Great Expectations, Soda Core, and pytest. They overlap in purpose but differ significantly in philosophy, integration surface, and operational complexity.

Quick Comparison

	dbt tests	Great Expectations	Soda Core	pytest
Primary use case	SQL model validation	Dataset profiling & expectations	Pipeline-level data quality checks	Unit testing Python logic
Configuration	YAML	Python / JSON	YAML (SodaCL)	Python
Runs inside	dbt pipeline	Standalone or Airflow	Standalone or Airflow	CI/CD, local dev
Learning curve	Low	High	Medium	Low (if you know Python)
Reporting UI	dbt docs	Data Docs (self-hosted)	Soda Cloud (SaaS)	Console / custom
Best fit	dbt-native teams	Complex profiling needs	Cross-platform quality gates	Testing transformation code
Open source	✅	✅	✅ (Core)	✅

dbt Tests

dbt has a native testing framework built into the CLI. Tests are defined in .yml files alongside your models and run with dbt test.

Built-in generic tests

# models/schema.yml
models:
  - name: dim_customer
    columns:
      - name: customer_sk
        tests:
          - unique
          - not_null
      - name: country_code
        tests:
          - not_null
          - accepted_values:
              values: ['DE', 'FR', 'US', 'GB', 'NL']
      - name: customer_id
        tests:
          - relationships:
              to: ref('stg_customers')
              field: customer_id

These four built-in tests (unique, not_null, accepted_values, relationships) cover the majority of what most teams need.

Custom singular tests

For logic that doesn't fit generic tests, write a singular test — a SQL query that returns rows only when something is wrong:

-- tests/assert_no_negative_revenue.sql
-- Fails (returns rows) if any order has negative revenue
SELECT
    order_id,
    revenue
FROM {{ ref('fct_orders') }}
WHERE revenue < 0

If this query returns any rows, dbt test marks it as failed. Simple, version-controlled, and lives alongside the model it protects.

When to use dbt tests: You're already using dbt. Tests are the right first layer of defense for SQL model output validation. Start here.

Great Expectations

Great Expectations (GX) is the most powerful option but also the most complex to set up. Its core concept is an Expectation Suite — a collection of assertions about a dataset that you define once and run repeatedly.

# Python: Great Expectations core v1 (simplified)
import great_expectations as gx

context = gx.get_context()

# Connect to a data source
datasource = context.sources.add_pandas_filesystem(
    name="my_data",
    base_directory="./data"
)

asset = datasource.add_csv_asset(name="orders", batching_regex=r"orders_(?P<date>\d{8})\.csv")
batch_request = asset.build_batch_request()

# Define expectations
validator = context.get_validator(batch_request=batch_request)
validator.expect_column_values_to_not_be_null("order_id")
validator.expect_column_values_to_be_between("amount", min_value=0, max_value=100000)
validator.expect_column_values_to_be_in_set("status", ["pending", "completed", "refunded"])
validator.expect_column_to_exist("customer_id")

# Save suite and run checkpoint
validator.save_expectation_suite()
results = context.run_checkpoint(checkpoint_name="orders_checkpoint")
print(results["success"])  # True / False

Great Expectations generates Data Docs — human-readable HTML reports showing which expectations passed, which failed, and data profiles for each column. This is its biggest differentiator.

Strengths:

Column profiling and distribution analysis
Data Docs for stakeholder-facing quality reports
Works with Pandas, Spark, SQL databases

Weaknesses:

Configuration is verbose and complex
Onboarding takes hours, not minutes
Maintaining expectation suites as data evolves requires discipline

When to use Great Expectations: You need data profiling alongside validation, you're building a data quality reporting layer for non-engineers, or you're working outside of dbt.

Soda Core

Soda uses a purpose-built YAML DSL called SodaCL (Soda Checks Language). It reads like plain English and integrates cleanly into Airflow, GitHub Actions, or any orchestrator.

# checks/orders_checks.yml (SodaCL)
checks for orders:
  - row_count > 0
  - missing_count(order_id) = 0
  - duplicate_count(order_id) = 0
  - invalid_percent(status) < 1%:
      valid values: [pending, completed, refunded, cancelled]
  - avg(amount) between 10 and 5000
  - freshness(order_date) < 24h

Run it with the CLI:

# Bash: run Soda checks against a PostgreSQL table
soda scan -d my_postgres_connection -c configuration.yml checks/orders_checks.yml

The freshness check is a standout: it validates that data was updated recently — something dbt tests can't express natively.

Strengths:

Readable YAML is accessible to analysts and data owners, not just engineers
Freshness checks built-in
Clean integration with orchestrators as a pipeline step
Soda Cloud (SaaS) offers alerting and historical tracking

Weaknesses:

Complex conditional logic is awkward in YAML
Soda Cloud is a paid product (Soda Core is free)
Less mature ecosystem than Great Expectations

When to use Soda Core: You want data quality checks that non-engineers can read and modify. Works particularly well as a pipeline gate in Airflow or Prefect.

pytest for Data Pipelines

pytest is a general-purpose Python testing framework. It's not data-specific, but it's invaluable for testing the Python code around your data — transformation functions, schema mappings, custom logic.

# Python (pytest): test a transformation function
import pytest
import pandas as pd
from my_pipeline.transforms import normalize_country_code, compute_revenue_bucket

def test_normalize_country_code_uppercase():
    assert normalize_country_code("de") == "DE"
    assert normalize_country_code("fr") == "FR"

def test_normalize_country_code_invalid_raises():
    with pytest.raises(ValueError, match="Invalid country code"):
        normalize_country_code("XX")

def test_compute_revenue_bucket():
    df = pd.DataFrame({"amount": [5, 50, 500, 5000]})
    result = compute_revenue_bucket(df)
    assert list(result["bucket"]) == ["micro", "small", "medium", "large"]

def test_compute_revenue_bucket_empty_input():
    df = pd.DataFrame({"amount": []})
    result = compute_revenue_bucket(df)
    assert len(result) == 0

pytest is also excellent for snapshot tests on small reference datasets — load a fixture, run your transformation, assert the output matches the expected result:

# Python (pytest): snapshot test with fixture
import pandas as pd

def test_deduplication_logic(tmp_path):
    input_data = pd.DataFrame({
        "id": [1, 1, 2],
        "value": ["a", "a_duplicate", "b"]
    })
    expected = pd.DataFrame({
        "id": [1, 2],
        "value": ["a", "b"]
    })
    result = deduplicate(input_data, key="id", keep="first")
    pd.testing.assert_frame_equal(result.reset_index(drop=True), expected)

When to use pytest: You're writing Python ETL code, custom Spark transformations, or utility functions. Every function that transforms data should have a unit test.

Layering Them Together

These tools aren't mutually exclusive. A mature data team typically uses all four at different layers:

Layer 1 — Unit tests:         pytest (test Python transform logic)
Layer 2 — Model validation:   dbt tests (test SQL model output)
Layer 3 — Pipeline gates:     Soda Core (freshness, volume, cross-table checks)
Layer 4 — Profiling/audits:   Great Expectations (onboarding new sources, quality reports)

Trying to use only one tool for all four layers is where teams get into trouble. dbt tests are great at layer 2 but can't do layer 1 or express freshness checks. pytest is useless for layer 2. Pick the right tool for each layer.

Common Pitfalls

Testing the warehouse instead of the model: dbt tests run against materialized tables. If your source data is already corrupted, passing dbt tests just means the model faithfully reproduced bad data. Add source tests too.
No test for volume: A silent truncation (0 rows loaded) passes most tests. Always add a row_count > 0 check.
Over-testing: Don't write not_null tests for every column in every model. Focus on columns that would cause downstream failures if they were null.
Expectations that are never updated: A Great Expectations suite created on 6 months of data will fail when you add a new country. Maintenance is the hidden cost.

Wrapping Up

For most dbt-native teams, start with dbt tests and add Soda Core for freshness and cross-table checks. Bring in Great Expectations if you need data profiling or quality reporting for non-technical stakeholders. Use pytest whenever your pipeline includes Python logic worth testing.

Data testing isn't one tool — it's a discipline applied across layers. The teams that do it well treat test failures like CI/CD failures: nothing ships until they're green.

View all articles

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Harbinger Explorer

Data Testing Frameworks: dbt, Great Expectations, Soda, pytest

Data Testing Frameworks: dbt, Great Expectations, Soda, pytest

Quick Comparison

dbt Tests

Built-in generic tests

Custom singular tests

Great Expectations

Soda Core

pytest for Data Pipelines

Layering Them Together

Common Pitfalls

Wrapping Up

Continue Reading

Continue Reading

Data Deduplication Strategies: Hash, Fuzzy, and Record Linkage

Airflow vs Dagster vs Prefect: An Honest Comparison

Change Data Capture Explained

Try Harbinger Explorer for free

Data Testing Frameworks: dbt, Great Expectations, Soda, pytest

Quick Comparison

dbt Tests

Built-in generic tests

Custom singular tests

Great Expectations

Soda Core

pytest for Data Pipelines

Layering Them Together

Common Pitfalls

Wrapping Up

Continue Reading

Continue Reading

Data Deduplication Strategies: Hash, Fuzzy, and Record Linkage

Airflow vs Dagster vs Prefect: An Honest Comparison

Change Data Capture Explained

Try Harbinger Explorer for free

Command Palette