-
Notifications
You must be signed in to change notification settings - Fork 0
Description
🧬 Specification: Deterministic, Modular FHIR (grip) Resource ID Generation
Status: Draft
Scope: All FHIR R4/R5 resources created by our pipelines (e.g., Forge, git-drs adjacencies).
Design goals: determinism, cross-tool parity, low coupling, high modularity, and predictable reference rewriting when namespacing (“reseed”) is required.
1) Rationale & Architectural Principles
Avoid “God Objects” (modularity)
- Problem: ID generation logic embedded inside resource builders forces every component that creates resources to “know” how IDs are minted.
- Remedy: Centralize ID creation in a single, lightweight shared library/service with a narrow API. Resource builders call it; they do not implement ID logic themselves.
Avoid Tight Coupling
- Problem: If ID logic changes in one tool, downstream tools break or (worse) silently diverge.
- Remedy: All tools consume the same deterministic algorithm via the shared library/CLI/REST. No tool depends on another tool’s internal structures.
Determinism & Composability
- IDs must be stable and reproducible for the same inputs across languages (Go, Python, JS, etc.).
- Support contextual namespacing (e.g., per project, tenant, environment) without changing the algorithm—only inputs.
2) Prior Art (Informative)
-
gen3_util (current approach):
create_id_from_strings(resource_type, project_id, identifier_string)→UUIDv5(ACED_NAMESPACE, f"{project_id}/{resource_type}/{_get_system(identifier_string, project_id)}|{identifier_string}")
Source:gen3_tracker/meta/skeleton.pylines 117–121
https://github.com/ACED-IDP/gen3_util/blob/development/gen3_tracker/meta/skeleton.py#L117-L121 -
FHIR-Aggregator reseeding (current approach):
Reseedsresource.idand allReference.referencetargets usingUUIDv5(uuid.NAMESPACE_DNS, old_id + seed)
Source:reseedinprep.pylines 150–164
https://github.com/FHIR-Aggregator/submission/blob/52253a251bd3d6170acf4b433e4becce312aa0d7/fhir_aggregator_submission/prep.py#L150-L164 -
Team discussion:
Slack thread capturing motivation and pitfalls (namespacing, portability, determinism):
https://ohsucomputationalbio.slack.com/archives/C08D2NGAQ59/p1742582611184329?thread_ts=1742582412.534159&cid=C08D2NGAQ59
3) Specification
3.1 Inputs (required to compute a FHIR Resource.id)
For any FHIR resource:
| Name | Type | Description |
|---|---|---|
resource_type |
string | FHIR type (e.g., Patient, Specimen, Observation, DocumentReference) |
project_id |
string | Stable project/tenant identifier |
system_uri |
string | System URI for the business identifier (e.g., urn:uuid:…, http://example.org/patient-id-system) |
identifier_value |
string | Business identifier value (project-scoped or global) |
Notes
system_uriMUST be canonical and stable for the business identifier domain.- If a resource lacks a natural business identifier, define a canonical surrogate (e.g., a normalized path or key) at the modeling layer—not inside the ID library.
3.2 Canonical Name String (language-agnostic)
Construct a canonical name (UTF-8) with strict normalization:
canonical = "{project_id}/{resource_type}/{system_uri}|{identifier_value}"
Normalization rules
project_id: lowercased; strip surrounding whitespace.resource_type: exact FHIR type case (e.g.,DocumentReference).system_uri: lowercase scheme/host; keep path/query as-is; no trailing#//.identifier_value: exact string as issued; trim surrounding whitespace only.- No leading/trailing spaces around separators; use exactly one
/and one|as shown.
3.3 Namespace UUID aka authority
- Choose a deployment-wide constant
FHIR_ID_NAMESPACE_UUID(UUID). - Do not change once published for an environment/tenant.
- Example sources:
- Derived:
UUIDv5(NAMESPACE_DNS, "calypr.org")this MUST be configurable - we have changed namespaces once already
- Derived:
3.4 ID Algorithm (deterministic, REQUIRED)
id = UUIDv5(FHIR_ID_NAMESPACE_UUID, canonical)
- Output is an RFC-4122 UUID string (lowercase hex, hyphenated).
- Deterministic: same inputs → same
idacross all tools/languages.
3.5 Resource Construction (separation of concerns)
- Resource builder calls
IdGenerator.make(resource_type, project_id, system_uri, identifier_value). - The builder MUST NOT embed ID logic or reach into unrelated systems.
3.6 References
- Internal references use the standard relative form:
Reference.reference = "{resource_type}/{id}" - Do not hardcode base URLs inside references (keeps bundles portable).
- If absolute URLs are needed for export, convert at the edge (ingress/egress mappers), not in core builders.
3.7 indexd record
- Note that Dataframer needs to render <indexd.id> for user to download from FEF website - all downloading from fence is based on indexd.id (Currently DocumentReference.id == indexd.id)
- FHIR
DocumentReference.content.attachment.urlMUST referencedrs://.... - FHIR
DocumentReference.identifier.valueMUST referenceindexd.id. - FHIR
DocumentReference.identifier.idfollows canonical pattern (in other words, depends on id)
4) Reseeding / Namespacing Policy
Goal: Create environment/tenant-specific FHIR IDs and references without changing business identifiers or resource content. This is necessary to support use cases where:
- a data submitter is uploading a graph of linked FHIR or iceberg data
- a project (aka subgraph) is moved from one project to another
4.1 Reseed Input
seed: an arbitrary string indicating the target namespace (e.g.,"prod","staging-2025Q4","tenant-foo").
4.2 Reseed Function (bundle-level transform)
Given a resource (or Bundle), produce a structurally identical copy where:
resource.id' = UUIDv5(RESEED_NAMESPACE_UUID, resource.id + seed)- Each
Reference.reference = "Type/<old>"becomes
"Type/" + UUIDv5(RESEED_NAMESPACE_UUID, <old> + seed)
Notes
RESEED_NAMESPACE_UUIDis a separate constant fromFHIR_ID_NAMESPACE_UUID.- The reseed transform is housed in a dedicated reseed module/CLI, not inside resource builders (modularity).
- This matches the current FHIR-Aggregator approach, but isolates it to avoid coupling.
4.3 When to reseed
- Cross-tenant portability (e.g., moving a Bundle between sandboxes).
- Environment isolation to avoid ID collisions between
dev,staging,prod.
4.4 When not to reseed
- Within a single, consistent environment/tenant—prefer the base deterministic IDs.
10) Links
- gen3_util:
create_id_from_strings
https://github.com/ACED-IDP/gen3_util/blob/development/gen3_tracker/meta/skeleton.py#L117-L121 - FHIR-Aggregator:
reseed
https://github.com/FHIR-Aggregator/submission/blob/52253a251bd3d6170acf4b433e4becce312aa0d7/fhir_aggregator_submission/prep.py#L150-L164 - Slack discussion (context & goals)
https://ohsucomputationalbio.slack.com/archives/C08D2NGAQ59/p1742582611184329?thread_ts=1742582412.534159&cid=C08D2NGAQ59
11) Work Items
- Implement lightweight shared library/service.
- Add cross-tool integration tests.