Skip to content

ADR 0001 — CIRPASS reference-structure JSON Schema is derived from the hub's tree-view export

Status: Accepted Date: 2026-05-08 Deciders: dppvalidator maintainers (drafted during CIRPASS-2 migration planning) Migration-plan label: D-0.1 Related phases: Phase 0 (snapshot), Phase 3 (CIRPASS models)

Context

The dppvalidator pipeline opens with a JSON Schema validation pass (validators/schema.py). For UNTP DPP, the schema is downloaded from UN/CEFACT, vendored under src/dppvalidator/schemas/data/, and pinned by SHA-256 in MANIFEST.json. Migrating to CIRPASS-2 raises the question: what JSON Schema do we use for the CIRPASS DPP reference structure v1.3.0?

A static-HTML scan of https://dpp.vocabulary-hub.eu/specifications (2026-05-08) shows the hub publishes:

  • ~80 ontology versions (OntologyVersion_<uuid> GUIDs, exported as TTL).
  • ~2 JSON Schema versions (JsonSchemaVersion_<uuid> / JsonSchemaSpecVersion_<uuid> GUIDs). Both belong to the Battery Pass project.
  • An unspecified number of "message" versions surfaced in the UI as tree views; Tree view and Export schema controls produce tree-shaped JSON, not a JSON Schema.

The CIRPASS DPP reference structure v1.3.0 is a message, not a JSON Schema. The hub does not publish a JSON Schema for it.

We need a JSON Schema to:

  1. Run the existing schema-first validation pass without per-family special-casing.
  2. Drive Pydantic model generation in Phase 3 (or at least cross-check hand-written models against an authoritative shape).
  3. Pin integrity (SHA-256) so we detect upstream tree-view drift.

Decision

Derive the CIRPASS reference-structure JSON Schema from the hub's tree-view export, programmatically, in CI-checkable code.

Specifically:

  • A generator script lives at tools/codegen/cirpass/derive_schema.py, not under src/. Generated bytes are committed to src/dppvalidator/schemas/data/cirpass-reference-1.3.0.json.
  • The committed schema carries a # generated-from: <tree-view-path>@<sha> banner so a future reader can re-derive it.
  • A drift gate at tools/codegen/check_drift.py re-runs the generator on every CI build and git diff --exit-codes the result. Drift fails the build (mitigates R14 in the migration plan).
  • The schema is registered in SCHEMA_REGISTRY as (SchemaFamily.CIRPASS, "1.3.0") per Phase 2 task 2.5.

Consequences

Positive

  • Schema-first validation works uniformly across UNTP and CIRPASS families. No per-family branch in validators/schema.py.
  • Drift detection is automatic; we cannot ship a stale schema by accident.
  • Phase 3 has an authoritative shape to validate Pydantic models against (codegen reciprocity).

Negative

  • The derived schema may diverge from the hub's intent if the derivation logic mishandles a tree-view construct. Tests must include round-trip and example-instance checks against the official examples on the hub.
  • The generator becomes a maintenance surface that must be kept in sync with the tree-view export format if the hub changes it. Drift is detected but resolution still costs engineer-time.
  • We are subtly authoring spec-derived artefacts; if CIRPASS-2 later publishes a canonical JSON Schema, our derived schema diverges and we must transition. This is acceptable because the transition path is obvious: replace derivation with vendoring, drop the generator, keep the registry entry.

Alternatives considered

  • Hand-author a JSON Schema. Rejected: brittle; no audit trail back to the spec; high maintenance cost on every CIRPASS minor.
  • Skip JSON Schema entirely and rely solely on Pydantic + SHACL. Rejected: weakens schema-first validation; complicates the engine pipeline (per-family branching); makes dppvalidator validate inconsistent across families (some payloads pass schema check, others silently skip it).
  • Vendor the tree-view export as the schema artefact. Rejected: the tree-view is not JSON Schema; jsonschema-validate would fail.

Validation hooks

  • tools/codegen/check_drift.py — CI gate, runs on every build.
  • tests/integration/test_cirpass_v1_3_pipeline.py — full pipeline on golden fixtures (Phase 4 deliverable).
  • tests/unit/test_models_cirpass_v1_3.py — per-class invariants (Phase 3 deliverable).

References