Specification: Deterministic, Modular FHIR (grip) Resource ID Generation

# 🧬 Specification: Deterministic, Modular FHIR (grip) Resource ID Generation

**Status:** Draft  
**Scope:** All FHIR R4/R5 resources created by our pipelines (e.g., Forge, git-drs adjacencies).  
**Design goals:** determinism, cross-tool parity, low coupling, high modularity, and predictable reference rewriting when namespacing (“reseed”) is required.

---

## 1) Rationale & Architectural Principles

### Avoid “God Objects” (modularity)
- **Problem:** ID generation logic embedded inside resource builders forces every component that creates resources to “know” how IDs are minted.
- **Remedy:** Centralize ID creation in a **single, lightweight shared library/service** with a narrow API. Resource builders call it; they do not implement ID logic themselves.

### Avoid Tight Coupling
- **Problem:** If ID logic changes in one tool, downstream tools break or (worse) silently diverge.
- **Remedy:** All tools consume the **same deterministic algorithm** via the shared library/CLI/REST. No tool depends on another tool’s internal structures.

### Determinism & Composability
- IDs must be **stable and reproducible** for the same inputs across languages (Go, Python, JS, etc.).
- Support **contextual namespacing** (e.g., per project, tenant, environment) **without** changing the algorithm—only inputs.

---

## 2) Prior Art (Informative)

- **gen3_util** (current approach):  
  `create_id_from_strings(resource_type, project_id, identifier_string)` → `UUIDv5(ACED_NAMESPACE, f"{project_id}/{resource_type}/{_get_system(identifier_string, project_id)}|{identifier_string}")`  
  Source: `gen3_tracker/meta/skeleton.py` lines 117–121  
  https://github.com/ACED-IDP/gen3_util/blob/development/gen3_tracker/meta/skeleton.py#L117-L121

- **FHIR-Aggregator reseeding** (current approach):  
  Reseeds `resource.id` and all `Reference.reference` targets using `UUIDv5(uuid.NAMESPACE_DNS, old_id + seed)`  
  Source: `reseed` in `prep.py` lines 150–164  
  https://github.com/FHIR-Aggregator/submission/blob/52253a251bd3d6170acf4b433e4becce312aa0d7/fhir_aggregator_submission/prep.py#L150-L164

- **Team discussion:**  
  Slack thread capturing motivation and pitfalls (namespacing, portability, determinism):  
  https://ohsucomputationalbio.slack.com/archives/C08D2NGAQ59/p1742582611184329?thread_ts=1742582412.534159&cid=C08D2NGAQ59

---

## 3) Specification

### 3.1 Inputs (required to compute a FHIR `Resource.id`)
For any FHIR resource:

| Name | Type | Description |
|---|---|---|
| `resource_type` | string | FHIR type (e.g., `Patient`, `Specimen`, `Observation`, `DocumentReference`) |
| `project_id` | string | Stable project/tenant identifier |
| `system_uri` | string | System URI for the business identifier (e.g., `urn:uuid:…`, `http://example.org/patient-id-system`) |
| `identifier_value` | string | Business identifier value (project-scoped or global) |

> Notes  
> - `system_uri` MUST be canonical and stable for the business identifier domain.  
> - If a resource lacks a natural business identifier, define a **canonical surrogate** (e.g., a normalized path or key) at the modeling layer—**not** inside the ID library.

### 3.2 Canonical Name String (language-agnostic)
Construct a **canonical name** (UTF-8) with strict normalization:

```
canonical = "{project_id}/{resource_type}/{system_uri}|{identifier_value}"
```

**Normalization rules**
- `project_id`: lowercased; strip surrounding whitespace.
- `resource_type`: exact FHIR type case (e.g., `DocumentReference`).
- `system_uri`: lowercase scheme/host; keep path/query as-is; no trailing `#`/`/`.
- `identifier_value`: exact string as issued; trim surrounding whitespace only.
- No leading/trailing spaces around separators; use exactly one `/` and one `|` as shown.

### 3.3 Namespace UUID aka `authority`
- Choose a deployment-wide constant `FHIR_ID_NAMESPACE_UUID` (UUID).  
- **Do not change** once published for an environment/tenant.
- Example sources:  
  - Derived: `UUIDv5(NAMESPACE_DNS, "calypr.org")`    this **MUST** be configurable - we have changed namespaces once already

### 3.4 ID Algorithm (deterministic, REQUIRED)
```
id = UUIDv5(FHIR_ID_NAMESPACE_UUID, canonical)
```

- Output is an RFC-4122 UUID string (lowercase hex, hyphenated).
- Deterministic: same inputs → same `id` across all tools/languages.

### 3.5 Resource Construction (separation of concerns)
- Resource builder **calls** `IdGenerator.make(resource_type, project_id, system_uri, identifier_value)`.
- The builder MUST NOT embed ID logic or reach into unrelated systems.

### 3.6 References
- Internal references use the standard relative form:  
  `Reference.reference = "{resource_type}/{id}"`  
- Do **not** hardcode base URLs inside references (keeps bundles portable).
- If absolute URLs are needed for export, convert at the **edge** (ingress/egress mappers), not in core builders.

### 3.7 indexd record
- Note that Dataframer needs to render <indexd.id> for user to download from FEF website - all downloading from fence is based on indexd.id (Currently DocumentReference.id == indexd.id)
- FHIR `DocumentReference.content.attachment.url` **MUST** reference `drs://...`.
- FHIR `DocumentReference.identifier.value` **MUST** reference `indexd.id`.
- FHIR `DocumentReference.identifier.id` follows canonical pattern (in other words, depends on id)

---

## 4) Reseeding / Namespacing Policy

**Goal:** Create environment/tenant-specific FHIR IDs and references **without** changing business identifiers or resource content. This is necessary to support use cases where:  
* a data submitter is uploading a graph of linked FHIR or iceberg data  
* a project (aka subgraph) is moved from one project to another  

### 4.1 Reseed Input
- `seed`: an arbitrary string indicating the target namespace (e.g., `"prod"`, `"staging-2025Q4"`, `"tenant-foo"`).

### 4.2 Reseed Function (bundle-level transform)
Given a resource (or Bundle), produce a structurally identical copy where:
- `resource.id' = UUIDv5(RESEED_NAMESPACE_UUID, resource.id + seed)`
- Each `Reference.reference = "Type/<old>"` becomes  
  `"Type/" + UUIDv5(RESEED_NAMESPACE_UUID, <old> + seed)`

**Notes**
- `RESEED_NAMESPACE_UUID` is a **separate** constant from `FHIR_ID_NAMESPACE_UUID`.  
- The reseed transform is housed in a dedicated **reseed module/CLI**, not inside resource builders (modularity).  
- This matches the current FHIR-Aggregator approach, but isolates it to avoid coupling.

### 4.3 When to reseed
- Cross-tenant portability (e.g., moving a Bundle between sandboxes).
- Environment isolation to avoid ID collisions between `dev`, `staging`, `prod`.

### 4.4 When **not** to reseed
- Within a single, consistent environment/tenant—prefer the base deterministic IDs.

---

## 10) Links

- gen3_util: `create_id_from_strings`  
  https://github.com/ACED-IDP/gen3_util/blob/development/gen3_tracker/meta/skeleton.py#L117-L121  
- FHIR-Aggregator: `reseed`  
  https://github.com/FHIR-Aggregator/submission/blob/52253a251bd3d6170acf4b433e4becce312aa0d7/fhir_aggregator_submission/prep.py#L150-L164  
- Slack discussion (context & goals)  
  https://ohsucomputationalbio.slack.com/archives/C08D2NGAQ59/p1742582611184329?thread_ts=1742582412.534159&cid=C08D2NGAQ59

---

## 11) Work Items

- [ ] Implement lightweight shared library/service.  
- [ ] Add cross-tool integration tests.  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Specification: Deterministic, Modular FHIR (grip) Resource ID Generation #8

🧬 Specification: Deterministic, Modular FHIR (grip) Resource ID Generation

1) Rationale & Architectural Principles

Avoid “God Objects” (modularity)

Avoid Tight Coupling

Determinism & Composability

2) Prior Art (Informative)

3) Specification

3.1 Inputs (required to compute a FHIR `Resource.id`)

3.2 Canonical Name String (language-agnostic)

3.3 Namespace UUID aka `authority`

3.4 ID Algorithm (deterministic, REQUIRED)

3.5 Resource Construction (separation of concerns)

3.6 References

3.7 indexd record

4) Reseeding / Namespacing Policy

4.1 Reseed Input

4.2 Reseed Function (bundle-level transform)

4.3 When to reseed

4.4 When not to reseed

10) Links

11) Work Items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Name	Type	Description
`resource_type`	string	FHIR type (e.g., `Patient`, `Specimen`, `Observation`, `DocumentReference`)
`project_id`	string	Stable project/tenant identifier
`system_uri`	string	System URI for the business identifier (e.g., `urn:uuid:…`, `http://example.org/patient-id-system`)
`identifier_value`	string	Business identifier value (project-scoped or global)

Specification: Deterministic, Modular FHIR (grip) Resource ID Generation #8

Description

🧬 Specification: Deterministic, Modular FHIR (grip) Resource ID Generation

1) Rationale & Architectural Principles

Avoid “God Objects” (modularity)

Avoid Tight Coupling

Determinism & Composability

2) Prior Art (Informative)

3) Specification

3.1 Inputs (required to compute a FHIR Resource.id)

3.2 Canonical Name String (language-agnostic)

3.3 Namespace UUID aka authority

3.4 ID Algorithm (deterministic, REQUIRED)

3.5 Resource Construction (separation of concerns)

3.6 References

3.7 indexd record

4) Reseeding / Namespacing Policy

4.1 Reseed Input

4.2 Reseed Function (bundle-level transform)

4.3 When to reseed

4.4 When not to reseed

10) Links

11) Work Items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

3.1 Inputs (required to compute a FHIR `Resource.id`)

3.3 Namespace UUID aka `authority`