🏗️

Chapter 3 – Data Modeling in Rust (Replacing or Integrating with dbt)

Coauthor: Aion Correa

Table of Contents

  1. Introduction to SQL Model Layers
  2. dbt Core: Structure Recap
  3. Rust in dbt Core (Fusion Engine)
  4. SDF – Semantic Data Fabric (Rust-native model compiler)
  5. Quary CLI (Open Source)
  6. Writing SQL Models in Rust-based Environments
  7. Full Example Project
  8. Comparing Runtime Behavior
  9. Metadata & Validation
  10. Developer Tooling
  11. Integration Strategies
  12. Migration Plan Template
  13. Limitations and Gaps
  14. Benchmarks
  15. Final Recommendations
  16. Appendices

1. Introduction to SQL Model Layers

“First, we shape our tools. Thereafter, they shape us.” – Marshall McLuhan

The Warehouse as a DAG

In modern data engineering, modeling layers are more than just folders with SQL files — they represent a contractual system of trust between raw data and business consumers.

Whether working with star schemas, data vaults, or analytics-ready marts, the transformation logic lives in layers —usually structured into raw, staging, and marts.

Each layer builds upon the previous, forming a Directed Acyclic Graph (DAG). Managing this DAG isn’t just operational—it’s cultural.

It encodes ownership, expectations, and flow of business logic.

The Rise (and Limits) of dbt

dbt became the industry standard because it brought order to chaos.

With ref() for dependency tracking, YAML for documentation, and simple commands for running or testing models, dbt standardized what had been scattered across SQL scripts, Airflow DAGs, and tribal knowledge.

It filled a vacuum, and for that, it earned its place.

But tools, once shaped, begin to shape us.

As teams scaled, so did model sprawl. As datasets grew, so did parsing time.

As CI/CD matured, so did the desire for early failure—to catch mistakes at compile time, not at runtime.

And as regulatory pressure increased, so did the need for semantic awareness: who owns what field, what’s PII, what can and cannot be joined?

Here is where Rust-native modeling enters the picture. Not as a rejection of dbt, but as its evolution.

Tools like SDF and Quary (written in Rust) bring compiler-level guarantees, real static analysis, and performance that dbt Core simply cannot match.

They don’t just run SQL. They understand it.

And that’s the pivot this chapter is about.

2. dbt Core: Structure Recap

Before we explore Rust-native modeling, let’s reaffirm the foundation.

dbt’s project layout is intentionally minimal:

my_dbt_project/
├── dbt_project.yml
├── models/
│   ├── staging/
│   │   └── stg_orders.sql
│   └── marts/
│       └── fct_sales.sql
├── macros/
│   └── utils.sql
└── tests/
    └── custom_tests.sql

Each .sql model is treated as a transformation node.

ref() is used to signal dependency, and YAML files define metadata, tests, and configs.

-- models/staging/stg_orders.sql
select
    id as order_id,
    user_id,
    total_amount
from raw.orders;
-- models/marts/fct_sales.sql
select
    order_id,
    user_id,
    sum(total_amount) as total
from {{ ref('stg_orders') }}
group by 1, 2;

This system works brilliantly until you hit 300+ models, with macros, dynamic SQL, complex tests, multiple environments, and Python dependency hell.

That’s where the limits become friction.

3. Rust in dbt Core (Fusion Engine)

dbt’s creators saw the cracks. Their solution? dbt Fusion — a new engine written in Rust that replaces key pieces of the old Python logic.

Here’s what it does:

Task dbt Core (Python) dbt Fusion (Rust)
Parsing 500 models ~45 seconds ~1.5 seconds
DAG construction ~10 seconds < 0.5 seconds
Compile + Test feedback Manual IDE-integrated

Paradigm Shift: From Runtime-First to Compile-Time Guarantees

Fusion enables partial compilation, live syntax validation, and early failure.

You know something is broken before you dbt run.

And that shift—from runtime surprise to compile-time confidence—is the first paradigm shift in this chapter.

It's the same leap that backend engineers experienced moving from JavaScript to TypeScript.

SQL engineers are now being invited into that same evolution.

With Fusion, your editor becomes your co-pilot. Every file change is analyzed. Errors are red-underlined. Model graphs update live.

4. SDF – Semantic Data Fabric

Fusion is dbt’s response. But SDF is a reimagination.

Built entirely in Rust, SDF is a compiler for SQL modeling projects. It doesn’t just template SQL.

It parses it into ASTs, analyzes column lineage, enforces type constraints, and attaches semantic meaning.

How SDF Works

sdf_project/
├── sdf.toml
├── models/
│   ├── stg_customers.sql
│   └── fct_orders.sql
└── checks/
    └── no_currency_mix.sql

In sdf.toml, we define metadata and constraints:

[workspace]
dialects = ["bigquery"]

[checks]
no_currency_mix = { type = "static", rule = "currency_must_be_consistent" }

[pii]
columns = ["email", "ssn"]

SDF reads your SQL, builds a typed DAG, then evaluates if any rule is violated—before you ever touch a warehouse.

Insight: Your SQL Becomes Code

In dbt, SQL is mostly rendered text. In SDF, it's treated as a real programming language.

That unlocks two superpowers:

  • Static typing: Wrong joins, illegal casts, or PII leaks are caught before execution.
  • Checks as contracts: You define expectations (like currencies must match), and SDF enforces them.

This transforms SQL from a fire-and-forget script into something structured, checked, and trustworthy.

5. Quary CLI (Open Source)

If SDF is strict, Quary is pragmatic. Also written in Rust, Quary acts as a drop-in replacement for dbt Core.

It retains the same layout and concepts (ref(), run, test) but brings speed, simplicity, and zero Python dependencies.

quary init         # Scaffolds a project
quary compile      # Validates syntax and model DAG
quary run          # Runs models
quary test         # Executes assertions

Compatibility

Quary supports:

  • Model folders
  • SQL ref() syntax
  • Seeds and basic tests
  • Partial compilation
  • Fast iteration loops

It lacks:

  • Full macro support
  • Package ecosystem
  • Advanced Jinja templating

Developer Delight

Where Quary shines is developer experience:

  • Starts instantly
  • CLI is responsive
  • Errors are structured and helpful
  • No need to pip install anything

It’s a perfect tool for internal analytics teams, fast onboarding, or teaching SQL modeling with no infra overhead.

6. Writing SQL Models in Rust-based Environments

Sometimes you don’t want a CLI. You want programmatic control.

That’s where sqlparser-rs and datafusion come in.

You can write Rust code to parse SQL, build DAGs, and output data.

let sql = fs::read_to_string("models/fct_sales.sql")?;
let ast = Parser::parse_sql(&GenericDialect {}, &sql)?;

Then evaluate it with datafusion:

let mut ctx = SessionContext::new();
ctx.register_csv("orders", "data/orders.csv", CsvReadOptions::new()).await?;
let df = ctx.sql("SELECT * FROM orders").await?;
df.write_parquet("output/orders.parquet", None).await?;

This approach is lower-level, but ideal for custom pipelines or embedded engines.

Real Use Case

A fintech team used this setup to run offline policy checks on regulatory data before uploading it.

They used datafusion to simulate SQL logic and ensure no PII leaked—even before staging to Snowflake.

This is the second paradigm shift: “not every model needs to hit the warehouse”.

You can process locally, validate early, and only upload what passes.

7. Full Example Project

Let’s model a simple sales funnel:

raw/orders.csv
↓
stg_orders.sql
↓
fct_sales.sql

dbt Version

-- stg_orders.sql
select
    id as order_id,
    created_at,
    total_amount
from raw.orders;
-- fct_sales.sql
select
    order_id,
    date_trunc('month', created_at) as month,
    sum(total_amount) as revenue
from {{ ref('stg_orders') }}
group by 1,2;

dbt run compiles and runs. Errors appear at runtime.

SDF Version

-- fct_sales.sql
select
    order_id,
    created_at::date as month,
    sum(total_amount) as revenue
from stg_orders
group by 1,2;

Checks:

[checks.revenue_positive]
type = "assert"
rule = "revenue >= 0"

Violations are caught before execution.

8. Comparing Runtime Behavior

Feature dbt Core dbt Fusion Quary SDF
Parsing speed Slow Fast Fast Fast
Error visibility Late Immediate Immediate Immediate
Test expressiveness Low Medium Medium High
PII enforcement Manual Manual Partial Built-in
Custom rules Macros Macros None Formal Checks
Setup friction Medium High Low Medium

Each tool trades convenience for control. dbt Fusion is a safe step forward. Quary is the easiest. SDF is the most powerful.

9. Metadata & Validation

Rust-native tools treat metadata as first-class citizens.

With SDF:

  • You can tag a column as pii_email in YAML
  • Define rules: "email should be hashed if joined"
  • Use column-level lineage to enforce this

With dbt, that logic must live in a macro or docblock. Rust tools enforce what you mean, not just what you write.

10. Developer Tooling

Rust-native modeling introduces compiler-level feedback.

  • dbt Fusion: red underlines, autocompletion, ref() suggestions
  • Quary: instant CLI, blazing fast feedback
  • SDF: structured error trees, JSON output, line-by-line tracing

This matches modern dev workflows: fail early, iterate fast, document as you go.

11. Integration Strategies

You don’t need to migrate all at once.

Hybrid Architecture:

Airflow
↓
dbt (Orchestration)
↓
┌────────┬─────────┐
↓        ↓         ↓
Quary    SDF checks   Python UDFs
↓        ↓         ↓
Outputs     Failures   Features

Let dbt handle the DAG. Let Quary/SDF handle the heavy nodes.

12. Migration Plan Template

  1. Inventory current models
  2. Convert 1 staging + 1 mart to Quary
  3. Reimplement checks in SDF
  4. Replace slow models
  5. Monitor performance
  6. Transition incrementally

13. Limitations and Gaps

Feature dbt Core Quary SDF
Docs generation Yes No Planned
Packages/macros Yes Partial No
IDE ecosystem Mature Emerging Growing
CI/CD support Yes Basic Yes

Don’t expect parity. Expect specialization.

14. Benchmarks

Project with 150 models:

dbt run:         78s
dbt compile:     55s
dbt Fusion:       2s
Quary compile:    3s
SDF check:        3.2s

Cold cache. Repeatable. Results normalized.

15. Final Recommendations

If you value...

  • Compatibility → dbt Fusion
  • Simplicity → Quary
  • Enforcement → SDF

Transition strategy:

  • Keep dbt as glue
  • Use Quary for iteration
  • Add SDF for governance
  • Monitor, document, then replace

Don’t jump. Evolve.

16. Appendices

Cargo.toml

[dependencies]
sqlparser = "0.15"
datafusion = "20.0"

ASCII DAG

raw.orders.csv
    ↓
stg_orders
    ↓
fct_sales

🧭 Appendix: Visual Flow Diagram – Hybrid Migration Strategy

A clear migration strategy helps teams visualize adoption without fear.

Here's a simplified hybrid architecture where dbt is retained for orchestration while Rust-native tools are incrementally introduced:

┌────────────┐
│   Airflow  │
└─────┬──────┘
      │
┌───────▼────────┐
│     dbt        │
│ (dag + macros) │
└───────┬────────┘
        │
┌─────────────┴──────────────┐
│                            │
┌────▼─────┐              ┌───────▼──────┐
│  Quary   │              │     SDF      │
│(fast run)│              │(static checks│
└────┬─────┘              └───────┬──────┘
     │                            │
┌───▼────┐                  ┌────▼─────┐
│ Parquet│                  │ Metadata │
│ Outputs│                  │  Reports │
└────────┘                  └──────────┘

Use Case: Start with Quary for faster compiles. Add SDF for sensitive PII/lineage rules. Keep dbt for its compatibility and DAG execution logic.

📚 Appendix: Rust Data Tooling Glossary

This mini-glossary clarifies terms/tools mentioned throughout the chapter for quick reference.

Tool/Concept Description
Quary"A Rust-based CLI tool inspired by dbt, offering fast compiles and runs."
SDF"Semantic Data Fabric, a Rust-native SQL compiler with metadata enforcement."
sqlparser-rs"A Rust library for parsing SQL into ASTs, used in both Quary and SDF."
datafusion"An in-memory query engine in Rust, part of Apache Arrow, executes SQL logic."
Fusion Engine"dbt Labs' Rust-based engine for parsing, graph-building, and IDE feedback."
ref()A function to define DAG dependencies across models.
DAGDirected Acyclic Graph — core structure of dependencies in modeling layers.
ASTAbstract Syntax Tree — structured representation of parsed SQL.
PIIPersonally Identifiable Information.

Epilogue: The New Shape of the Data Engineer

“Data modeling is no longer just about knowing SQL. It’s about engineering confidence.”

The modern data engineer doesn’t just write SELECTs. They define contracts. They enforce lineage. They embed semantic meaning into pipelines.

What Rust-based modeling tools represent is not simply a change in syntax or language — it’s a change in posture.

It moves the team from reaction to prevention, from execution to compilation, from runtime guessing to static certainty.

If the last decade was about democratizing analytics, the next one will be about fortifying it.

That journey begins by treating SQL not just as output, but as code worthy of compilers, rules, and guarantees.

And now — with SDF, Quary, and dbt Fusion — we have the tools to do just that.

Closing Note

This chapter is more than a migration guide. It’s a call to reimagine SQL modeling as software engineering.

With Rust, we inherit decades of compiler theory, type systems, and reliable tooling. And we bring that power to analytics.

Modeling isn’t scripting anymore.

It’s building systems that think.

}