Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[submodule "products/funnel"]
path = products/funnel
url = https://github.com/ohsu-comp-bio/funnel.git
[submodule "products/grip"]
path = products/grip
url = https://github.com/bmeg/grip.git
[submodule "products/git-drs"]
path = products/git-drs
url = https://github.com/calypr/git-drs.git
19 changes: 19 additions & 0 deletions docs/.nav.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@

nav:
- index.md
- getting-started
- Funnel:
- funnel/*
- GRIP:
- grip/*
- Git-DRS: []
- CALYPR:
- Getting Started:
- requirements.md
- getting-started.md
- Data Management:
- data-management/git-drs.md
- data-management/meta-data.md
- data-model/integration.md
- data-model/introduction.md
- data-model/metadata.md
Binary file added docs/assets/banner.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/banner_fade.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/calypr_family.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/funnel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/git-drs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/grip.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/calypr/.nav.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
title: CALYPR
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ There are two ways to request the addition additional users to the project:
To give another user full access to the project, run the following:

```sh
g3t collaborator add --write user-can-write@example.com
calypr-admin collaborator add --write user-can-write@example.com
```

Alternatively, to give another user read access only (without the ability to upload to the project), run the following:
```sh
g3t collaborator add user-read-only@example.com
calypr-admin collaborator add user-read-only@example.com
```


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@
* Ony users with the steward role can approve and sign a request

```text
g3t collaborator approve --help
Usage: g3t collaborator approve [OPTIONS]
calypr-admin collaborator approve --help
Usage: calypr-admin collaborator approve [OPTIONS]

Sign an existing request (privileged).

Expand All @@ -40,9 +40,9 @@ Note: This example uses the ohsu program, but the same process applies to all pr

```text
## As an admin, I need to grant data steward privileges add the requester reader and updater role on a program to an un-privileged user
g3t collaborator add add data_steward_example@<institution>.edu --resource_path /programs/<program_name>/projects --steward
calypr-admin collaborator add add data_steward_example@<institution>.edu --resource_path /programs/<program_name>/projects --steward
# As an admin, approve that request
g3t collaborator approve
calypr-admin collaborator approve



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
title: Creating a Project
---

{% include '/note.md' %}


## CLI

```bash
$ g3t init --help
$ git-drs init --help

Usage: g3t init [OPTIONS] [PROJECT_ID]

Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Common Errors

## .ndjson is out of date
**Error:** After `g3t` adding and committing a file, when you go to submit your data, "DocumentReference.ndjson is out of date",
**Error:** After `git-drs` adding and committing a file, when you go to submit your data, "DocumentReference.ndjson is out of date",
```sh
$ g3t add file.txt
$ g3t commit -m "adding file.txt"
$ g3t push
$ git add file.txt
$ git commit -m "adding file.txt"
$ git push
Please correct issues before pushing.
Command `g3t status` failed with error code 1, stderr: WARNING: DocumentReference.ndjson is out of date 1969-12-31T16:00:00. The most recently changed file is MANIFEST/file.txt.dvc 2025-02-28T09:24:46.283870. Please check DocumentReferences.ndjson
No data file changes.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,7 @@ Depending on if a `patient` or `specimen` flag was specified, other resources ca
To add a cram file that's associated with a subject, sample, and particular task

```sh
g3t add myfile.cram --patient P0 --specimen P0-BoneMarrow --task_id P0-Sequencing
g3t meta init
git add myfile.cram --patient P0 --specimen P0-BoneMarrow --task_id P0-Sequencing
```

This will produce metadata with the following relationships:
Expand All @@ -54,8 +53,8 @@ This will produce metadata with the following relationships:
When the project is committed, the system will validate new or changed records. You may validate the metadata on demand by:

```sh
$ g3t meta validate --help
Usage: g3t meta validate [OPTIONS] DIRECTORY
$ forge meta validate --help
Usage: forge meta validate [OPTIONS] DIRECTORY

Validate FHIR data in DIRECTORY.

Expand Down
69 changes: 69 additions & 0 deletions docs/calypr/data-management/git-drs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@

## **3.5: Commit and Upload you files**

\# Commit files (creates DRS records via pre-commit hook)
```
git commit -m "Add genomic data files"
```

\# Upload to object store
```
git push
```

What happens during push:

1. Git-DRS creates DRS records for each tracked file
2. Files are uploaded to the configured S3 bucket
3. DRS URIs are registered in the Gen3 system
4. Pointer files are committed to the repository

##

### 3.5.1 Verifying upload

```
git lfs ls-files
```

Files should now show \* prefix (localized/uploaded):

```
* data/sample1.bam
* data/sample2.bam
* results/analysis.vcf.gz
```

The \- prefix means files are staged but not yet committed.

After completing the workflow:

* Files visible in Git repository (as LFS pointers)
* DRS records created (check .drs/ logs)
* Files accessible via git lfs pull
* Can share DRS URIs with collaborators
* Files NOT searchable in CALYPR web interface (expected)

## 4.5: Committing Changes

```
# Stage all changes
git add .
```

```
# Commit (triggers forge precommit hook)
git commit \-m "Register S3 files with custom FHIR metadata"
```

```
# Push to register DRS records
git push
```

What happens during push:

1. Git-DRS creates DRS records pointing to S3
2. DRS URIs are registered
3. No file upload occurs
4. Pointer files committed to repository
174 changes: 174 additions & 0 deletions docs/calypr/data-management/meta-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# Managing Metadata

Metadata in Calypr is formatted using the Fast Healthcare Interoperability Resources (FHIR) schema. If you choose to bring your own FHIR newline delimited json data, you will need to create a directory called “META” in your git-drs repository in the same directory that you initialized your git-drs repository, and place your metadata files in that directory.
The META/ folder contains newline-delimited JSON (.ndjson) files representing FHIR resources describing the project, its data, and related entities. Large files are tracked using Git LFS, with a required correlation between each data file and a DocumentReference resource. This project follows a standardized structure to manage large research data files and associated FHIR metadata in a version-controlled, DRS and FHIR compatible format.
Each file must contain only one type of FHIR resource type, for example META/ResearchStudy.ndjson only contains research study resource typed FHIR objects. The name of the file doesn’t have to match the resource type name, unless you bring your own document references, then you must use DocumentReference.ndjson. For all other FHIR file types this is simply a good organizational practice for organizing your FHIR metadata.

## META/ResearchStudy.ndjson

* The File directory structure root research study is based on the 1st Research Study in the document. This research study is the research study that the autogenerated document references are connected to. Any additional research studies that are provided will be ignored when populating the miller table file tree.
* Contains at least one FHIR ResearchStudy resource describing the project.
* Defines project identifiers, title, description, and key attributes.

## META/DocumentReference.ndjson

* Contains one FHIR DocumentReference resource per Git LFS-managed file.
* Each DocumentReference.content.attachment.url field:
* Must exactly match the relative path of the corresponding file in the repository.
* Example:

{
"resourceType": "DocumentReference",
"id": "docref-file1",
"status": "current",
"content": \[
{
"attachment": {
"url": "data/file1.bam",
"title": "BAM file for Sample X"
}
}
\]
}

Place your custom FHIR ndjson files in the META/ directory:

\# Copy your prepared FHIR metadata
cp \~/my-data/patients.ndjson META/
cp \~/my-data/observations.ndjson META/
cp \~/my-data/specimens.ndjson META/
cp \~/my-data/document-references.ndjson META/

## Other FHIR data

\[TODO More intro text here\]

* Patient.ndjson: Participant records.
* Specimen.ndjson: Biological specimens.
* ServiceRequest.ndjson: Requested procedures.
* Observation.ndjson: Measurements or results.
* Other valid FHIR resource types as required.

## Link Files to Metadata

Ensure your FHIR DocumentReference resources reference the DRS URIs:

Example DocumentReference linking to S3 file:

{
"resourceType": "DocumentReference",
"id": "doc-001",
"status": "current",
"content": \[{
"attachment": {
"url": "drs://calypr-public.ohsu.edu/your-drs-id",
"title": "sample1.bam",
"contentType": "application/octet-stream"
}
}\],
"subject": {
"reference": "Patient/patient-001"
}
}


---

## Validating Metadata

To ensure that the FHIR files you have added to the project are correct and pass schema checking, you can use the forge software.

forge validate

Successful output:

✓ Validating META/patients.ndjson... OK
✓ Validating META/observations.ndjson... OK
✓ Validating META/specimens.ndjson... OK
✓ Validating META/document-references.ndjson... OK
All metadata files are valid.

Fix any validation errors and re-run until all files pass.


### Forge Data Quality Assurance Command Line Commands

If you have provided your own FHIR resources there are two commands that might be useful to you for ensuring that your FHIR metadata will appear on the CALYPR data platform as expected. These commands are validate and check-edge

**Validate-** Example:

\`\`\`forge validate META\`\`\` or \`\`\`forge validate META/DocumentReference.ndjson\`\`\`

Validate checks to see if the provided directory or file will be accepted by the CALYPR data platform or whether there are validation errors that make it not accepted into the data platform. Validation errors range from improper JSON formatting to FHIR schema validation errors. We are currently using FHIR version R5 so the earlier version will not validate against our schema.

**Check-edge-** Example:

\`\`\`forge check-edge META\`\`\` or \`\`\`forge validate META/DocumentReference.ndjson\`\`\`

Check edge emulates exactly what will happen during data submission to your FHIR files. Your FHIR files will be loaded into a graph database. In order to create the graph edges must be generated from the references specified in your FHIR data to connect your vertices, which are essentially the rest of the NDJSON FHIR files that have been provided.

Check edge aims to ensure that the references that have been specified in the files do connect to known vertices and aren’t ‘orphaned’. Check edge does not take into account existing vertices that are already in the CALYPR graph and could potentially claim certain edges do not connect to anything if they are connecting to vertices that are in CALYPR but outside of the data that is provided when doing an edge check.

### Validation Process

#### 1\. Schema Validation

* Each .ndjson file in META/ (like ResearchStudy.ndjson, DocumentReference.ndjson, etc.) is read line by line.
* Every line is parsed as JSON and checked against the corresponding FHIR schema for that resourceType.
* Syntax errors, missing required fields, or invalid FHIR values trigger clear error messages with line numbers.

#### 2\. Mandatory Files Presence

* Confirms that:
* ResearchStudy.ndjson exists and has at least one valid record.
* DocumentReference.ndjson exists and contains at least one record.
* If either is missing or empty, validation fails.

#### 3\. One-to-One Mapping of Files to DocumentReference

* Scans the working directory for Git LFS-managed files in expected locations (e.g., data/).
* For each file, locates a corresponding DocumentReference resource whose content.attachment.url matches the file’s relative path.
* Validates:
* All LFS files have a matching DocumentReference.
* All DocumentReferences point to existing files.

#### 4\. Project-level Referential Checks

* Validates that DocumentReference resources reference the same ResearchStudy via relatesTo or other linking mechanisms.
* If FHIR resources like Patient, Specimen, ServiceRequest, Observation are present, ensures:
* Their id fields are unique.
* DocumentReference correctly refers to those resources (e.g., via subject or related fields).

#### 5\. Cross-Entity Consistency

* If multiple optional FHIR .ndjson files exist:
* Confirms IDs referenced in one file exist in others.
* Detects dangling references (e.g., a DocumentReference.patient ID that's not in Patient.ndjson).

---

#### ✅ Example Error Output

ERROR META/DocumentReference.ndjson line 4: url "data/some\_missing.bam" does not resolve to an existing file
ERROR META/Specimen.ndjson line 2: id "specimen-123" referenced in Observation.ndjson but not defined

---

#### 🎯 Purpose & Benefits

* Ensures all files and metadata are in sync before submission.
* Prevents submission failures due to missing pointers or invalid FHIR payloads.
* Enables CI integration, catching issues early in the development workflow.

---

#### Validation Requirements

Automated tools or CI processes must:

* Verify presence of META/ResearchStudy.ndjson with at least one record.
* Verify presence of META/DocumentReference.ndjson with one record per LFS-managed file.
* Confirm every DocumentReference.url matches an existing file path.
* Check proper .ndjson formatting.

---
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading