Skip to content

forge/etl design Discussion #2

@matthewpeterkort

Description

@matthewpeterkort

Dual-User Story Support for FHIR Metadata Uploads

The Problem

We need a single workflow that can accommodate two very different user types:

  • An engineer who understands FHIR and wants to upload FHIR metadata directly to the server.
  • An analyst who doesn't understand FHIR and simply wants to add files to a project without providing any metadata.

Proposed Solutions

To address these conflicting needs, we'll implement a multi-pronged approach that streamlines the upload process for both users.

  1. Refine the Forge CLI: We'll separate the FHIR metadata validation and vertex-to-edge connection checks into their own commands. This will prevent validation from blocking the initial metadata upload, creating a smoother experience for the engineer.
  2. Offload Metadata Generation: For analysts, we'll offload the generation of DocumentReference and ResearchStudy metadata to an ETL (Extract, Transform, Load) job. This will automatically create the necessary FHIR records based on the files they upload, removing the need for them to understand FHIR.
  3. Implement a STAT Command in git-drs: A new command will query indexd by project ID and resource path to retrieve all records for a given project. These records will then be converted into FHIR records and loaded into the ETL pod for processing.
  4. Introduce an "UPSERT" Operation: If a user has already provided metadata and indexd records exist, we'll perform an "UPSERT" (Update or Insert) join. This operation will overwrite the file information in the DocumentReference with the new indexd records while preserving any existing, user-defined fields.
  5. Standardize Mint IDs: We will ensure that the Mint IDs generated for DocumentReference and ResearchSubject metadata are derived from the same fields used in all other "mint_id" functions. This prevents conflicts and maintains consistency across the system.

Notes on This Strategy

This new approach eliminates the need for any ".meta" records because the ETL job will now fetch DocumentReference template information directly from indexd on every job run. "Meta Init" command can still be run client side if the user wishes to see what their skeleton metadata will look like and should have the exact same effect as what will happen server side, but it shouldn't need to be run as part of the user facing user story

Work-in-progress code can be found at the following locations:

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions