Automating Data Export from SNCF Datasets Using Clojure and Calva ================================================================== Overview -------- In this guide, you'll learn how to automate data exports from the SNCF Open Data API using Clojure and Calva. You'll work with the `/exports/{format}` endpoint to download datasets in formats like CSV, Parquet, and GPX, and save them locally for further analysis or GIS use. What You’ll Learn: - How to call the `/exports/{format}` endpoint - Use export-specific query parameters (`compressed`, `epsg`, `with_bom`, etc.) - Save exported files using Clojure I/O tools - Understand when to use each export format for data and GIS applications Prerequisites ------------- Make sure you have the following installed: - `Visual Studio Code` and the Calva extension - `Java Development Kit (JDK)` version 11 or later - `Leiningen` or `Clojure CLI` - Internet access for fetching remote data Add Dependencies ---------------- Add these libraries to your project: **Leiningen (`project.clj`)** .. code-block:: clojure :dependencies [[org.clojure/clojure "1.11.1"] [clj-http "3.12.3"]] **Clojure CLI (`deps.edn`)** .. code-block:: clojure :deps {org.clojure/clojure {:mvn/version "1.11.1"} clj-http {:mvn/version "3.12.3"}} Exporting Data via API ---------------------- To export a dataset, use the following API structure: .. code-block:: text GET /datasets/{dataset_id}/exports/{format} **Example: Export CSV with compression** Endpoint: .. code-block:: text https://ressources.data.sncf.com/api/explore/v2.1/catalog/datasets/sncf-gares-et-arrets/exports/csv With query parameters: - `compressed=true` - `with_bom=true` - `limit=1000` Write the client function in `core.clj`: .. code-block:: clojure (ns exporter.core (:require [clj-http.client :as client] [clojure.java.io :as io])) (def export-url "https://ressources.data.sncf.com/api/explore/v2.1/catalog/datasets/sncf-gares-et-arrets/exports/csv") (defn download-csv [] (let [response (client/get export-url {:query-params {"compressed" "true" "with_bom" "true" "limit" "1000"} :as :byte-array})] (with-open [out (io/output-stream "stations.csv.gz")] (.write out (:body response))))) (defn -main [] (download-csv)) Running the Exporter --------------------- 1. Open a REPL using Calva (`Ctrl+Shift+P` → "Start Project REPL") 2. Evaluate `(-main)` or call `(download-csv)` from the REPL 3. A file named `stations.csv.gz` will be saved to your project directory Exporting in Parquet or GPX --------------------------- Change the `format` in the URL to `parquet` or `gpx`, and modify parameters: **Parquet Example** .. code-block:: clojure (def parquet-url "https://ressources.data.sncf.com/api/explore/v2.1/catalog/datasets/sncf-gares-et-arrets/exports/parquet") (defn download-parquet [] (let [response (client/get parquet-url {:query-params {"parquet_compression" "snappy"} :as :byte-array})] (with-open [out (io/output-stream "stations.parquet")] (.write out (:body response))))) **GPX Example** .. code-block:: clojure (def gpx-url "https://ressources.data.sncf.com/api/explore/v2.1/catalog/datasets/sncf-gares-et-arrets/exports/gpx") (defn download-gpx [] (let [response (client/get gpx-url {:query-params {"name_field" "name" "description_field_list" "city,population" "use_extension" "true"} :as :byte-array})] (with-open [out (io/output-stream "stations.gpx")] (.write out (:body response))))) Choosing the Right Format -------------------------- - **CSV**: Best for spreadsheets, lightweight analytics, and data pipelines - Use `with_bom=true` for Excel compatibility - Compress large exports with `compressed=true` - **Parquet**: Ideal for big data workflows (e.g., Spark, Hive) - Use `parquet_compression=snappy` or `gzip` for space efficiency - Retains data types and schema - **GPX**: Suitable for geographic apps and GPS devices - Set `name_field` and `description_field_list` for metadata - Use `epsg` to match coordinate systems Error Handling Tips ------------------- The API may return errors like: - `400 Bad Request`: Malformed query or unsupported parameters - `429 Too Many Requests`: Rate limit hit — retry later - `500 Internal Server Error`: Server-side issue Always check `(:status response)` and log failures gracefully. Conclusion ---------- You now know how to automate dataset exports from SNCF's Open Data API using Clojure and Calva. This enables powerful workflows for data transformation, GIS analysis, and pipeline automation. With a few lines of Clojure, you can fetch structured data and integrate it into your applications or tools. Next Steps: - Add CLI arguments or config files for dynamic control - Schedule exports with `cron` or a Clojure task runner - Convert this into a reusable library or script