Lock File¶
dbport.lock is the committable state file that tracks schemas, ingest state, and version history for all models in a repository. It is always written by DBPort — never by hand — but it appears in diffs, merge conflicts, and debugging sessions, so understanding its structure matters.
Location¶
The lock file lives at the repository root, next to pyproject.toml. When not provided explicitly, DBPort walks up from the working directory until it finds a pyproject.toml.
Properties¶
- TOML format — human-readable and diff-friendly
- No secrets — credentials are never written to disk
- Multi-model — each model gets a namespaced section under
[models."agency.dataset_id"] - Safe to commit — belongs in version control
Annotated example¶
default_model = "wifor.emp__regional_trends" # (1)!
[models."wifor.emp__regional_trends"] # (2)!
last_fetched_at = "2026-03-15T08:00:00Z"
[models."wifor.emp__regional_trends".schema]
ddl = "CREATE OR REPLACE TABLE wifor.emp__regional_trends (nuts2024 VARCHAR, year SMALLINT, value DOUBLE)" # (3)!
[[models."wifor.emp__regional_trends".schema.columns]] # (4)!
column_name = "nuts2024"
column_pos = 0
codelist_id = "NUTS2024"
codelist_kind = "hierarchical"
attach_table = "wifor.cl_nuts2024"
[[models."wifor.emp__regional_trends".inputs]] # (5)!
table_address = "estat.nama_10r_3empers"
filters = {wstatus = "EMP"}
last_snapshot_id = 123456789
rows_loaded = 42000
[[models."wifor.emp__regional_trends".versions]] # (6)!
version = "2026-03-09"
published_at = "2026-03-09T14:30:00Z"
params = {wstatus = "EMP"}
rows = 1345678
completed = true
- Which model the CLI uses when no
--modelflag is given. - Each model is namespaced by
agency.dataset_id. - The exact DDL that DuckDB executes — must match the current
.sqlfile. - Column metadata entries. Set via
port.columns.<name>.meta()ordbp config model … columns. - One entry per loaded input table.
last_snapshot_iddrives the ingest cache. - Append-only version history. Each
publish()adds a new entry.
When does the lock file change?¶
Every mutating operation writes to disk immediately — there is no deferred flush.
| Command | What changes |
|---|---|
dbp init |
Model header created, set as default |
dbp model sync |
Model header created if new; last_fetched_at updated |
dbp config model … schema |
[schema] section written or replaced |
dbp config model … input |
[[inputs]] entry added or updated |
dbp config model … input --load |
[[inputs]] entry added + last_snapshot_id resolved |
dbp model load |
[[inputs]] entries updated with last_snapshot_id |
dbp model publish |
[[versions]] entry appended |
dbp model run |
All of the above (sync → load → exec → publish) |
| Operation | What changes |
|---|---|
DBPort() init |
Model header created if new model |
port.schema(ddl) |
[schema] section written or replaced |
port.columns.*.meta(...) |
Column entry in [[schema.columns]] updated |
port.columns.*.attach(...) |
attach_table field set on column entry |
port.load(table) |
[[inputs]] entry added or updated with last_snapshot_id |
port.publish(version) |
[[versions]] entry appended |
Why commit the lock file¶
- Reproducibility — snapshot IDs pin exact input data, so a checkout reproduces the same pipeline state
- Code review — reviewers can see whether a schema changed, which inputs updated, and what was published
- CI integration — automated pipelines can validate schemas and detect drift without running the full pipeline
Recognizing diffs in pull requests¶
Normal development produces predictable diffs. Here are the three most common patterns.
Schema change — adding a column¶
[models."wifor.emp__regional_trends".schema]
-ddl = "CREATE OR REPLACE TABLE wifor.emp__regional_trends (nuts2024 VARCHAR, year SMALLINT, value DOUBLE)"
+ddl = "CREATE OR REPLACE TABLE wifor.emp__regional_trends (nuts2024 VARCHAR, year SMALLINT, value DOUBLE, source VARCHAR)"
+[[models."wifor.emp__regional_trends".schema.columns]]
+column_name = "source"
+column_pos = 3
Snapshot update — new data loaded¶
[[models."wifor.emp__regional_trends".inputs]]
table_address = "estat.nama_10r_3empers"
filters = {wstatus = "EMP", nace_r2 = "TOTAL"}
-last_snapshot_id = 123456789
+last_snapshot_id = 234567890
New version published¶
+[[models."wifor.emp__regional_trends".versions]]
+version = "2026-03-15"
+published_at = "2026-03-15T10:22:01Z"
+params = {wstatus = "EMP", nace_r2 = "TOTAL"}
+rows = 1345678
+completed = true
Fields that matter for correctness¶
Most fields are informational, but three have behavioral consequences.
These fields drive runtime behavior
last_snapshot_id— drives the ingest cache. A wrong value causes stale data or unnecessary reloads.completedon versions — drives publish idempotency. Flipping this causes skipped or duplicate publishes.ddl— must match what DuckDB actually has. Use the schema command to change it, not manual edits.
If any of these are wrong, the safest fix is to delete the lock file and re-run the full pipeline. The warehouse is always the source of truth.
Merge conflicts¶
The lock file is structured to minimize conflicts:
Resolution by section
[[versions]]— append-only. Accept both sides; each publish appends a new entry.[schema]— take the version that matches the current DDL in your SQL file. If both sides changed the DDL, resolve the DDL first, then re-run the schema command.[[inputs]]— take the entry with the higherlast_snapshot_id. The next load will re-check against the warehouse regardless.
Stale or missing lock file¶
If the lock file is deleted or becomes stale, DBPort recovers gracefully on the next run.
# 1. DBPort() re-creates a fresh model header on init
with DBPort(agency="wifor", dataset_id="emp__regional_trends") as port:
# 2. Re-run schema to restore the schema section
port.schema("sql/create_output.sql")
# 3. Re-run load for each input — snapshots fetched from warehouse
port.load("estat.nama_10r_3empers", filters={"wstatus": "EMP"})
# 4. Publish as normal
port.publish(version="2026-03-15", params={"wstatus": "EMP"})
Version history is lost locally, but the warehouse table properties retain the authoritative publish record.
Regeneration¶
Running the full pipeline rebuilds all lock state from scratch. The warehouse is the source of truth.
# schema() → load() → execute() → publish() regenerates everything
with DBPort(agency="wifor", dataset_id="emp__regional_trends") as port:
port.schema("sql/create_output.sql")
port.load("estat.nama_10r_3empers", filters={"wstatus": "EMP"})
port.execute("sql/transform.sql")
port.publish(version="2026-03-15", params={"wstatus": "EMP"})
Multi-model support¶
Multiple models in the same repository each get their own [models."agency.dataset_id"] section. The default_model key at the top level determines which model is used when no explicit model is specified in the CLI.