FAQ & Troubleshooting¶
This page lists common operational issues when running sdmxflow in scheduled jobs and how to debug them.
No new data appended¶
Symptoms:
FetchResult.appendedisFalserepeatedly.- Logs say “Already up to date; skipping download.”
Likely causes:
- The upstream dataset has not changed.
- The provider’s “last updated” signal did not change even though the data changed.
What to do:
- Enable
DEBUGlogging and re-run. - Inspect
metadata.json→last_updated_data_atandversions[]. - Compare to the provider’s UI/metadata if available.
See Logging and Provider Support.
Output folder already exists¶
That’s expected.
fetch()will reuse the sameout_dir.dataset.csvis appended only when the upstream version changes.metadata.jsonis updated on every run (last_fetched_atalways bumps).
Large dataset performance tips¶
- Prefer running on a machine with fast local disk for
out_dir. - Treat
dataset.csvas a warehouse staging input; avoid repeatedly reading it end-to-end in downstream steps if you can load incrementally. - Consider splitting “download” and “warehouse load” into separate steps so warehouse loads can be retried without re-downloading.
Network failures and retries¶
sdmxflow classifies common failures into typed exceptions (timeout/unreachable/interrupted), but it does not implement a global retry policy at the top-level API.
Operational pattern:
- Implement retries in your scheduler (Airflow retries / Prefect retries / Kubernetes Job backoff).
- Use
save_logs=Trueto capture per-run debug logs for postmortems.
How to reset/rebuild artifacts safely¶
Sometimes the safest response to upstream schema changes or local corruption is to rebuild from scratch.
Recommended approach:
- Move the existing folder aside:
- Run
fetch()again into a cleanout_dir.
Warning Rebuilding means you lose the local append history in
dataset.csvandmetadata.jsonfor that dataset folder. Keep backups if you need auditability.
Common errors¶
Unsupported provider¶
Error:
SdmxDownloadError: Unsupported source_id=... Only 'ESTAT' is implemented.
Fix:
- Use
source_id="ESTAT"for now. - See Provider Support for roadmap and contribution path.
CSV schema mismatch¶
Error (paraphrased):
- “CSV schema mismatch: source columns differ from destination columns”
Meaning:
- The provider CSV header changed compared to what you previously stored in
dataset.csv.
Fix:
- Rebuild into a new
out_dir(see “reset/rebuild” above), or pin the dataset/key/params so the schema is stable.