DBPort¶

Build locally. Publish safely.

Governance and orchestration for recomputable warehouse datasets.

You build models that produce datasets — and those datasets depend on each other. When external sources update, you need to recompute downstream models in the right order, knowing exactly which input versions went into each output. As the number of models grows, keeping track of dependencies, provenance, and data quality becomes harder than the modeling itself.

DBPort is the orchestration layer on top of your warehouse that enforces governance into recomputable workflows. It tracks dependencies between your models and on external inputs, so you can build with the confidence that future updates will be picked up correctly — and that other models can pick up your results.

Why DBPort — and who it's for Get started

See it in action¶

CLIPython

pip install dbport

# Initialize a project and configure the model
dbp init regional_trends --agency wifor --dataset emp__regional_trends
cd regional_trends
dbp config model wifor.emp__regional_trends schema sql/create_output.sql
dbp config model wifor.emp__regional_trends input estat.nama_10r_3empers

# Run the full lifecycle: load → execute → publish
dbp model run --version 2026-03-09 --timing

from dbport import DBPort

with DBPort(agency="wifor", dataset_id="emp__regional_trends") as port:
    port.schema("sql/create_output.sql")
    port.load("estat.nama_10r_3empers", filters={"wstatus": "EMP"})
    port.execute("sql/transform.sql")
    port.publish(version="2026-03-09", params={"wstatus": "EMP"})

That is a complete lifecycle: inputs loaded from an Iceberg warehouse, SQL transforms executed in DuckDB, and a versioned output published back — with schema validation, metadata, and codelists attached automatically.

Why using DBPort¶

Model dependencies, tracked

Models produce datasets that feed other models. DBPort tracks these dependencies so you always know what depends on what — across your entire organisation.
Full input provenance

Every publish records exactly which input versions and snapshots were used. Months later, you can trace any output back to the data that produced it.
Recompute when sources update

Snapshot-cached inputs detect when external sources change. Unchanged tables are skipped automatically — only what's new gets reprocessed.
Schema drift, caught early

Declare the output shape upfront. Drift is caught before anything is written to the warehouse — no fraudulent data, no silent corruption.
Versioned, resumable publishes

Every publish records version, parameters, and row count. Interrupted runs resume from checkpoint. Re-running a completed version is a safe no-op.
Committable state

dbport.lock is TOML, credential-free, and tracks schema, inputs, and versions — ready for code review and CI.

It fits with what you already use¶

DBPort doesn't deliver the models — it delivers the platform to keep track of dependencies between them. It is the governance layer that connects your tools.


DuckDB	The execution engine. DBPort adds governed inputs, output contracts, and publish semantics around it.
dbt	Complementary. dbt handles transformations in the middle; DBPort manages dataset lifecycle at the edges.
Airflow, Dagster, …	DBPort defines what a safe run means. Orchestrators decide when to trigger it.

Getting Started

Install DBPort, configure credentials, and run your first model.

Start here
Concepts

How inputs, schemas, metadata, versioning, and the lock file work together.

Read the concepts
CLI Reference

Full command reference for dbp init, dbp model, dbp config, and dbp status.

See all commands
Python API

Constructor options, methods, and lifecycle for the DBPort class.

See the API