Skip to content

Specification: Deterministic, Modular FHIR (grip) Resource ID Generation #8

@bwalsh

Description

@bwalsh

🧬 Specification: Deterministic, Modular FHIR (grip) Resource ID Generation

Status: Draft
Scope: All FHIR R4/R5 resources created by our pipelines (e.g., Forge, git-drs adjacencies).
Design goals: determinism, cross-tool parity, low coupling, high modularity, and predictable reference rewriting when namespacing (“reseed”) is required.


1) Rationale & Architectural Principles

Avoid “God Objects” (modularity)

  • Problem: ID generation logic embedded inside resource builders forces every component that creates resources to “know” how IDs are minted.
  • Remedy: Centralize ID creation in a single, lightweight shared library/service with a narrow API. Resource builders call it; they do not implement ID logic themselves.

Avoid Tight Coupling

  • Problem: If ID logic changes in one tool, downstream tools break or (worse) silently diverge.
  • Remedy: All tools consume the same deterministic algorithm via the shared library/CLI/REST. No tool depends on another tool’s internal structures.

Determinism & Composability

  • IDs must be stable and reproducible for the same inputs across languages (Go, Python, JS, etc.).
  • Support contextual namespacing (e.g., per project, tenant, environment) without changing the algorithm—only inputs.

2) Prior Art (Informative)


3) Specification

3.1 Inputs (required to compute a FHIR Resource.id)

For any FHIR resource:

Name Type Description
resource_type string FHIR type (e.g., Patient, Specimen, Observation, DocumentReference)
project_id string Stable project/tenant identifier
system_uri string System URI for the business identifier (e.g., urn:uuid:…, http://example.org/patient-id-system)
identifier_value string Business identifier value (project-scoped or global)

Notes

  • system_uri MUST be canonical and stable for the business identifier domain.
  • If a resource lacks a natural business identifier, define a canonical surrogate (e.g., a normalized path or key) at the modeling layer—not inside the ID library.

3.2 Canonical Name String (language-agnostic)

Construct a canonical name (UTF-8) with strict normalization:

canonical = "{project_id}/{resource_type}/{system_uri}|{identifier_value}"

Normalization rules

  • project_id: lowercased; strip surrounding whitespace.
  • resource_type: exact FHIR type case (e.g., DocumentReference).
  • system_uri: lowercase scheme/host; keep path/query as-is; no trailing #//.
  • identifier_value: exact string as issued; trim surrounding whitespace only.
  • No leading/trailing spaces around separators; use exactly one / and one | as shown.

3.3 Namespace UUID aka authority

  • Choose a deployment-wide constant FHIR_ID_NAMESPACE_UUID (UUID).
  • Do not change once published for an environment/tenant.
  • Example sources:
    • Derived: UUIDv5(NAMESPACE_DNS, "calypr.org") this MUST be configurable - we have changed namespaces once already

3.4 ID Algorithm (deterministic, REQUIRED)

id = UUIDv5(FHIR_ID_NAMESPACE_UUID, canonical)
  • Output is an RFC-4122 UUID string (lowercase hex, hyphenated).
  • Deterministic: same inputs → same id across all tools/languages.

3.5 Resource Construction (separation of concerns)

  • Resource builder calls IdGenerator.make(resource_type, project_id, system_uri, identifier_value).
  • The builder MUST NOT embed ID logic or reach into unrelated systems.

3.6 References

  • Internal references use the standard relative form:
    Reference.reference = "{resource_type}/{id}"
  • Do not hardcode base URLs inside references (keeps bundles portable).
  • If absolute URLs are needed for export, convert at the edge (ingress/egress mappers), not in core builders.

3.7 indexd record

  • Note that Dataframer needs to render <indexd.id> for user to download from FEF website - all downloading from fence is based on indexd.id (Currently DocumentReference.id == indexd.id)
  • FHIR DocumentReference.content.attachment.url MUST reference drs://....
  • FHIR DocumentReference.identifier.value MUST reference indexd.id.
  • FHIR DocumentReference.identifier.id follows canonical pattern (in other words, depends on id)

4) Reseeding / Namespacing Policy

Goal: Create environment/tenant-specific FHIR IDs and references without changing business identifiers or resource content. This is necessary to support use cases where:

  • a data submitter is uploading a graph of linked FHIR or iceberg data
  • a project (aka subgraph) is moved from one project to another

4.1 Reseed Input

  • seed: an arbitrary string indicating the target namespace (e.g., "prod", "staging-2025Q4", "tenant-foo").

4.2 Reseed Function (bundle-level transform)

Given a resource (or Bundle), produce a structurally identical copy where:

  • resource.id' = UUIDv5(RESEED_NAMESPACE_UUID, resource.id + seed)
  • Each Reference.reference = "Type/<old>" becomes
    "Type/" + UUIDv5(RESEED_NAMESPACE_UUID, <old> + seed)

Notes

  • RESEED_NAMESPACE_UUID is a separate constant from FHIR_ID_NAMESPACE_UUID.
  • The reseed transform is housed in a dedicated reseed module/CLI, not inside resource builders (modularity).
  • This matches the current FHIR-Aggregator approach, but isolates it to avoid coupling.

4.3 When to reseed

  • Cross-tenant portability (e.g., moving a Bundle between sandboxes).
  • Environment isolation to avoid ID collisions between dev, staging, prod.

4.4 When not to reseed

  • Within a single, consistent environment/tenant—prefer the base deterministic IDs.

10) Links


11) Work Items

  • Implement lightweight shared library/service.
  • Add cross-tool integration tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions