-
Notifications
You must be signed in to change notification settings - Fork 0
Description
📔 Moved from ACED
Project Archetype Structure with META/ Folder
This project follows a standardized structure to manage large research data files and associated FHIR metadata in a version-controlled, DRS and FHIR compatible format.
Overview
The META/ folder contains newline-delimited JSON (.ndjson) files representing FHIR resources describing the project, its data, and related entities. Large files are tracked using Git LFS, with a required correlation between each data file and a DocumentReference resource.
User Story
As a research data steward,
I want to manage all project metadata in standardized FHIR .ndjson files within the META/ folder,
So that I can ensure traceable, reproducible, and DRS, FHIR compatible submissions that clearly link metadata to tracked data files.
Events
- When: I have a static set of files to associate with a ResearchStudy, Patient, Specimen or Assay
- When: I have one or more "ad-hoc" (ie workflow) files to associate with a ResearchStudy, Patient, Specimen or ServiceRequest (Assay)
- When: I have a data source (spreadsheet, files or bespoke system) that describes my one or more "ad-hoc" (ie workflow) files to associate with a ResearchStudy, Patient, Specimen or Assay
Acceptance Criteria
- The
META/ResearchStudy.ndjsonfile exists and contains at least one valid FHIRResearchStudyresource. - The
META/DocumentReference.ndjsonfile exists and contains exactly oneDocumentReferenceresource per Git LFS-managed file in the project. - Each
DocumentReference.content.attachment.urlmatches the relative file path of an actual Git LFS-managed file. - All Git LFS-managed files tracked in the repository are represented in the
META/DocumentReference.ndjsonfile. - The
.ndjsonfiles are properly formatted: one valid JSON object per line. - The project includes a
.gitattributesfile that tracks large files via Git LFS. - Automated validation confirms that all required files and metadata correlations are present and consistent.
Directory Structure
<project-root>/
├── .gitattributes
├── .gitignore
├── META/
│ ├── ResearchStudy.ndjson
│ ├── DocumentReference.ndjson
│ ├── Patient.ndjson (optional)
│ ├── Specimen.ndjson (optional)
│ ├── ServiceRequest.ndjson (optional)
│ ├── Observation.ndjson (optional)
│ └── <Other FHIR>.ndjson (optional)
├── data/
│ ├── file1.bam
│ ├── file2.fastq.gz
│ └── <additional files>
Required Contents
✅ META/ResearchStudy.ndjson
- Contains at least one FHIR
ResearchStudyresource describing the project. - Defines project identifiers, title, description, and key attributes.
✅ META/DocumentReference.ndjson
-
Contains one FHIR
DocumentReferenceresource per Git LFS-managed file. -
Each
DocumentReference.content.attachment.urlfield:- Must exactly match the relative path of the corresponding file in the repository.
- Example:
{
"resourceType": "DocumentReference",
"id": "docref-file1",
"status": "current",
"content": [
{
"attachment": {
"url": "data/file1.bam",
"title": "BAM file for Sample X"
}
}
]
}✅ Git LFS-Managed Files
- All large files tracked with Git LFS, typically under
data/. .gitattributesdefines file tracking rules.
Optional FHIR Metadata Files
Patient.ndjson: Participant records.Specimen.ndjson: Biological specimens.ServiceRequest.ndjson: Requested assays.Observation.ndjson: Measurements or results.- Other valid FHIR resource types as required.
File-Metadata Correlation
- Every Git LFS-managed file must have a corresponding
DocumentReferenceresource. - Each
DocumentReference.urlfield directly references the relative file path. - Every
DocumentReferencelisted must correspond to an actual file present.
The META validate ❓ name? command ensures both FHIR record validity and referential integrity across your project’s META/ folder. Here's what it does:
✅ Syntax
# see legacy g3t
g3t meta validate [--project-root <path>]🔍 Validation Steps
1. Schema Validation
- Each
.ndjsonfile inMETA/(likeResearchStudy.ndjson,DocumentReference.ndjson, etc.) is read line by line. - Every line is parsed as JSON and checked against the corresponding FHIR schema for that
resourceType. - Syntax errors, missing required fields, or invalid FHIR values trigger clear error messages with line numbers.
2. Mandatory Files Presence
-
Confirms that:
ResearchStudy.ndjsonexists and has at least one valid record.DocumentReference.ndjsonexists and contains at least one record.
-
If either is missing or empty, validation fails.
3. One-to-One Mapping of Files to DocumentReference
-
Scans the working directory for Git LFS-managed files in expected locations (e.g.,
data/). -
For each file, locates a corresponding
DocumentReferenceresource whosecontent.attachment.urlmatches the file’s relative path. -
Validates:
- All LFS files have a matching DocumentReference.
- All DocumentReferences point to existing files.
4. Project-level Referential Checks
-
Validates that
DocumentReferenceresources reference the sameResearchStudyviarelatesToor other linking mechanisms. -
If FHIR resources like
Patient,Specimen,ServiceRequest,Observationare present, ensures:- Their
idfields are unique. DocumentReferencecorrectly refers to those resources (e.g., viasubjector related fields).
- Their
5. Cross-Entity Consistency
-
If multiple optional FHIR
.ndjsonfiles exist:- Confirms IDs referenced in one file exist in others.
- Detects dangling references (e.g., a
DocumentReference.patientID that's not inPatient.ndjson).
✅ Example Error Output
ERROR META/DocumentReference.ndjson line 4: url "data/some_missing.bam" does not resolve to an existing file
ERROR META/Specimen.ndjson line 2: id "specimen-123" referenced in Observation.ndjson but not defined
🎯 Purpose & Benefits
- Ensures all files and metadata are in sync before submission.
- Prevents submission failures due to missing pointers or invalid FHIR payloads.
- Enables CI integration, catching issues early in the development workflow.
💡 Recommendation
Incorporate g3t meta validate or new name ❓ into pre-commit hooks or CI pipelines to enforce metadata integrity and maintain standards compliance.
Recommended Setup Workflow
git init
git lfs install
git lfs track "data/*"
mkdir META
# Add ResearchStudy.ndjson and DocumentReference.ndjson
git add .gitattributes META/ data/
git commit -m "Initial project structure with metadata and tracked files"Validation Requirements
Automated tools or CI processes must:
- Verify presence of
META/ResearchStudy.ndjsonwith at least one record. - Verify presence of
META/DocumentReference.ndjsonwith one record per LFS-managed file. - Confirm every
DocumentReference.urlmatches an existing file path. - Check proper
.ndjsonformatting.
Example Minimal Project
my-project/
├── .gitattributes
├── META/
│ ├── ResearchStudy.ndjson # 1 record
│ ├── DocumentReference.ndjson # 2 records, one per file below
├── data/
│ ├── sample1.bam
│ ├── sample2.fastq.gz
Conclusion
This structure enables reproducible, FAIR-aligned management of research files and metadata, supporting FHIR-compatible submissions and standardized project organization.