forge/etl design Discussion

---
## **Dual-User Story Support for FHIR Metadata Uploads**

### **The Problem**

We need a single workflow that can accommodate two very different user types:

* An **engineer** who understands **FHIR** and wants to upload FHIR metadata directly to the server.
* An **analyst** who **doesn't** understand FHIR and simply wants to add files to a project without providing any metadata.

---
### **Proposed Solutions**

To address these conflicting needs, we'll implement a multi-pronged approach that streamlines the upload process for both users.

1.  **Refine the Forge CLI:** We'll separate the **FHIR metadata validation** and **vertex-to-edge connection checks** into their own commands. This will prevent validation from blocking the initial metadata upload, creating a smoother experience for the engineer.
2.  **Offload Metadata Generation:** For analysts, we'll offload the generation of **DocumentReference** and **ResearchStudy** metadata to an **ETL (Extract, Transform, Load)** job. This will automatically create the necessary FHIR records based on the files they upload, removing the need for them to understand FHIR.
3.  **Implement a STAT Command in `git-drs`:** A new command will query `indexd` by project ID and resource path to retrieve all records for a given project. These records will then be converted into FHIR records and loaded into the ETL pod for processing.
4.  **Introduce an "UPSERT" Operation:** If a user has already provided metadata and `indexd` records exist, we'll perform an "UPSERT" (Update or Insert) join. This operation will overwrite the file information in the DocumentReference with the new `indexd` records while preserving any existing, user-defined fields.
5.  **Standardize Mint IDs:** We will ensure that the **Mint IDs** generated for DocumentReference and ResearchSubject metadata are derived from the same fields used in all other "mint\_id" functions. This prevents conflicts and maintains consistency across the system.

---
### **Notes on This Strategy**

This new approach eliminates the need for any ".meta" records because the ETL job will now fetch DocumentReference template information directly from `indexd` on every job run. "Meta Init" command can still be run client side if the user wishes to see what their skeleton metadata will look like and should have the exact same effect as what will happen server side, but it shouldn't need to be run as part of the user facing user story

**Work-in-progress code can be found at the following locations:**
* [https://github.com/calypr/git-drs/pull/43](https://github.com/calypr/git-drs/pull/43)
* [https://github.com/ACED-IDP/aced_etl_pod/pull/51](https://github.com/ACED-IDP/aced_etl_pod/pull/51)
* [https://github.com/calypr/forge/pull/1](https://github.com/calypr/forge/pull/1)
* **[QW edited]**: Also relevant to merge
    * https://github.com/ACED-IDP/gen3-helm/pull/81
    * https://github.com/ACED-IDP/gen3_util/pull/139 ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

forge/etl design Discussion #2

Dual-User Story Support for FHIR Metadata Uploads

The Problem

Proposed Solutions

Notes on This Strategy

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

forge/etl design Discussion #2

Description

Dual-User Story Support for FHIR Metadata Uploads

The Problem

Proposed Solutions

Notes on This Strategy

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions