Overview¶
sdmxflow is a Python SDMX ingestion library that produces deterministic, append-only artifacts for data warehouse ETL/ELT workflows: a facts CSV, a metadata history trail, and exported codelists.
It is built for data engineers and analytics engineers who need scheduled refresh semantics, reproducible artifacts, and reference data exports — not ad-hoc interactive SDMX exploration.
Status Early but functional. The artifact contract is stable and designed to remain consistent as additional providers are added.
Provider support (today): Eurostat (
source_id="ESTAT"). See Provider Support for details and differences.
What you get (artifact contract)¶
Given an out_dir, sdmxflow writes:
dataset.csv— append-only facts across upstream versions, tagged by a leadinglast_updatedcolumnmetadata.json— operational metadata + an append-only version historycodelists/— reference CSVs (code,name) used to interpret coded dataset columns
Example output tree:
<out_dir>/
dataset.csv
metadata.json
codelists/
<CODELIST_ID>.csv
logs/ # only when save_logs=True
<agency>__<dataset>__<timestamp>.log
Quickstart (minimal)¶
from pathlib import Path
from sdmxflow import SdmxDataset
ds = SdmxDataset(
out_dir=Path("./out/lfsa_egai2d"),
source_id="ESTAT",
dataset_id="lfsa_egai2d",
save_logs=True, # writes <out_dir>/logs/<agency>__<dataset>__<timestamp>.log
)
result = ds.fetch()
print("appended:", result.appended)
How refresh works (workflow)¶
This diagram is intentionally identical to the README and describes the production workflow fetch() implements.
graph TD
A["Scheduled job<br/>cron / Airflow / Prefect"] --> B["Fetch upstream last-updated<br/>SDMX annotations (Eurostat)"]
B --> C{"Local metadata.json<br/>exists?"}
C -- Yes --> D{"Upstream<br/>changed?"}
C -- No --> E0
D -- No --> G["No new version<br/>Keep dataset.csv<br/>Ensure metadata + codelists"]
D -- Yes --> E1
subgraph DL[" "]
direction LR
E1["Download new slice"]
E0["Download initial slice"]
end
style DL fill:transparent,stroke:transparent
E0 --> F["Append rows to dataset.csv<br/>append-only, adds last_updated column"]
E1 --> F
F --> H["Update metadata history<br/>Export codelists"]
G --> I["Warehouse ingestion step<br/>dbt / COPY / load job"]
H --> I
Docs map (where to go next)¶
- New here? Start with Getting Started.
- Need file semantics? Read Output Artifacts (Contract).
- Deploying on a schedule? See Scheduling & Deployment.
- Loading into a warehouse? See Integration Patterns.
- Looking for parameters and defaults? See Configuration Reference.
- Provider behavior and roadmap: Provider Support.
- Operational issues: FAQ & Troubleshooting.
- Release history: Changelog (or https://github.com/knifflig/sdmxflow/releases)