Output Artifacts (Contract)¶
This page defines the on-disk artifact contract sdmxflow produces: file names, semantics, and how to use them in a warehouse ingestion workflow.
Artifact Contract (stable) Given an
out_dir,sdmxflowwrites:
dataset.csv(append-only facts; first columnlast_updated)metadata.json(metadata history + operational fields)codelists/(reference tables withcode,name)Optional:
logs/whensave_logs=True.
Directory layout¶
<out_dir>/
dataset.csv
metadata.json
codelists/
<CODELIST_ID>.csv
logs/ # only when save_logs=True
<agency>__<dataset>__<timestamp>.log
dataset.csv¶
Schema¶
dataset.csv is the provider dataset schema with one additional leading column:
last_updated(string, UTC ISO-8601, e.g.2026-03-03T00:00:00Z)
All remaining columns are provider-defined and depend on the dataset.
Important
sdmxflowdoes not deduplicate across versions. When a new upstream version is appended, it appends the full downloaded slice tagged with that version’slast_updated.
Example rows¶
Example (illustrative — provider columns vary):
last_updated,geo,sex,time_period,value
2026-03-01T00:00:00Z,DE,M,2024-12,123.4
2026-03-01T00:00:00Z,DE,F,2024-12,120.1
2026-02-01T00:00:00Z,DE,M,2024-11,122.9
Append behavior¶
- On the first successful fetch,
dataset.csvis created with a header. - On later fetches,
sdmxflowappends rows only when the upstreamlast_updatedtimestamp changed. - If the provider changes the CSV header (column names or ordering), append fails with a schema/metadata error instead of producing a corrupted mixed-schema history.
How to use in a warehouse¶
Common patterns:
- Truncate + reload staging: load the full append-only file each run, then model “latest” downstream.
- Incremental load: load only rows where
last_updatedis newer than what your warehouse already has.
See Integration Patterns.
metadata.json¶
metadata.json is the operational and lineage record.
Key semantics:
last_fetched_atupdates on every fetch attempt (even if no new data appended).versions[]is append-only and records one entry per appended upstream version.codelists[]maps dataset columns to codelist files.
Example structure (snippet)¶
The exact file contains more fields; this snippet matches the current schema shape:
{
"schema_version": 1,
"agency_id": "ESTAT",
"dataset_id": "lfsa_egai2d",
"created_at": "2026-03-03T01:02:03Z",
"last_fetched_at": "2026-03-03T02:00:00Z",
"last_updated_at": "2026-03-03T02:00:00Z",
"last_updated_data_at": "2026-03-01T00:00:00Z",
"files": {
"datasets": {"csv": "dataset.csv"},
"codelists": {
"sex": "codelists/CL_SEX.csv"
}
},
"versions": [
{
"agency_id": "ESTAT",
"created_at": "2026-03-03T02:00:00Z",
"last_updated_data_at": "2026-03-01T00:00:00Z",
"dataset": {"rows_appended": 12345, "last_updated_column": "last_updated"},
"http": {"url": "https://…", "status_code": 200, "headers": {"etag": "…"}}
}
],
"codelists": [
{
"column_name": "sex",
"column_pos": 2,
"codelist_id": "CL_SEX",
"codelist_path": "codelists/CL_SEX.csv",
"codelist_kind": "dimension",
"codelist_type": "reference",
"codelist_labels": {"en": "Sex"}
}
]
}
Note Timestamps are written as UTC ISO-8601
...Zstrings.
codelists/¶
codelists/ contains one CSV per used codelist, with a stable two-column schema:
code(string)name(best-effort label, typically English if available)
Example:
How to join to fact columns¶
- Identify the fact column (dimension/attribute) you want to decode.
- Load the codelist CSV matching that column (use
metadata.json→codelists[]mapping). - Join
facts.<column>=codelist.code.
See Integration Patterns for examples.
Logs (save_logs=True)¶
If you enabled per-run log capture, a log file is written under <out_dir>/logs/.
See Logging for the INFO-level logging contract.