Skip to content

Conversation

@matthewpeterkort
Copy link
Collaborator

continuation PR I supposed

@quinnwai
Copy link

quinnwai commented Nov 11, 2025

Sorry initial set of comments can't put them inline with the code cause most of the diffs are already merged...

Feedbacks

  • createIDFromStrings: Maybe we use a namespace consistent across all deployments. api endpoint shouldn’t be used here, maybe use calypr.org for cross-platform consistency. Can defer to #79 ig
  • ndjson name determines Resource loaded in implementation, user user manual claims otherwise
  • pulling in all indexd records for a project, good for first commit but as files are edited, the repo is a proper subset of the indexd project, so needs to be subsetted on the set of files already in the repo (see metadata/meta.go:287for drsRecord := range drsRecordsChan”) no? How do we balance this with the no files only metadata use case where there's no files to subset on?
  • Why do we choose to upsert instead of nuke + reupload?
    • Isn’t this upsert just a replace (other than the .Content field)?
  • What’s the tea with the schema written directly in again?

Testing (in progress)

Checklist

  • existing files, no metadata (git-drs first, testing forge only): Take existing Git DRS project and create forge metadata, see that it shows up with right IDs and shit
  • new files, no metadata (forge only): Create project from scratch and add files to it and push up see that it shows up with the right IDs and shit
  • s3 files, metadata (git drs + forge): Take existing add-url data and push
  • Open Q: are these supported?
    • metadata (forge only): Pushing up a project that has no files only metadata? (SMMART)
    • autogen metadata at first, then supplied metadata (forge only)

Push files only

Testing Script

#!/bin/bash
set -e
set -x  # Print each command as it's executed

# Set repo name and remote URL
REPO_NAME="git-drs-e2e-test"
GIT_USER="cbds"
# REMOTE_URL="https://source.ohsu.edu/$GIT_USER/$REPO_NAME"
REMOTE_URL="git@source.ohsu.edu:$GIT_USER/$REPO_NAME.git"

# Clean up if rerunning (don't fail if not removable)
rm -rf "$REPO_NAME" || true

# Create directory and initialize git
mkdir "$REPO_NAME"
cd "$REPO_NAME"


# Step 1: Initialize Repository
git init
# git drs init --url https://caliper-training.ohsu.edu/ --profile local --cred ~/.gen3/credentials-local.json --project cbds-git_drs_test --bucket cbds
forge init --url https://calypr-dev.ohsu.edu/ --profile local --cred ~/.gen3/credentials-calypr-dev.json --project cbds-git_drs_test --bucket cbds

# set branch / add remote
git branch -M main
git remote add origin "$REMOTE_URL"

git lfs track "*.greeting"
git add .gitattributes

# Step 2: Create and Commit Initial Files
mkdir -p data/A data/B data/C

DATE="hello $(date)"
echo $DATE > data/A/simple.greeting
echo $DATE > data/B/simple.greeting
echo $DATE > data/C/simple.greeting

git add data/
git commit -m "Initial commit: Add .greeting files with 'hello'"

# Prompt user for remote if not set
git push -f

# Step 3: Update and Commit File Changes
echo "A" >> data/A/simple.greeting
echo "B" >> data/B/simple.greeting
echo "C" >> data/C/simple.greeting

git add data/
git commit -m "Update .greeting files with folder-specific greetings"
git push
forge publish

# Clone and pull
git clone "$REMOTE_URL"
cd "$REPO_NAME"
# git drs init --profile calypr-dev
# git lfs pull

Bugs

  • Expect that you need to download Git-DRS binary separately? If so, provide README instructions for setup (GPT can be good at this sometimes w some prompting). Solved by export git-drs to path
$git commit -m "Initial commit: Add .greeting files with 'hello'"
git: 'drs' is not a git command. See 'git --help'.

The most similar command is
	lfs
  • Status unknown is unclear to folks, any way to provide them some feedback on the job that gets run?
    Response: {"uid":"d71db60e-1849-43cd-ac25-3cf713ebfe7d","name":"fhir-import-export-mqlur","status":"Unknown"}

  • Expect that some checks are being done so that if you have unstaged or uncommitted changes, there's an error / enter to confirm / at minimum a warning on forge publish $GH_PAT saying that local changes haven't made it to remote, CALYPR will be updated with remote?

  • Running into error with BulkAddRaw. Current state: HTAN and SMMART loaded with newest GRIP image + db setup, tried to forge publish above script.

"level":"info","msg":"Resource List for method 'create': [/programs/HTAN_INT/projects/BForePC /programs/cbds/projects/git_drs_test /programs/cbds/projects/smmart_labkey_demo]","time":"2025-11-11T00:34:35Z"}
{"data":null,"graph":"CALYPR","level":"info","msg":"[200] project-delete on project cbds-git_drs_test","status":200,"time":"2025-11-11T00:34:35Z"}
[GIN] 2025/11/11 - 00:34:35 | 200 |  412.934667ms |       10.42.0.1 | DELETE   "CALYPR/proj-delete/cbds-git_drs_test"
{"level":"info","msg":"Resource List for method 'create': [/programs/HTAN_INT/projects/BForePC /programs/cbds/projects/git_drs_test /programs/cbds/projects/smmart_labkey_demo]","time":"2025-11-11T00:34:35Z"}
{"level":"info","msg":"Using hostname: caliper-training.ohsu.edu","time":"2025-11-11T00:34:35Z"}
2025/11/11 00:34:36 compile error: failing loading "file:///data/Directory": open /data/Directory: no such file or directory
2025/11/11 00:34:36 compile error: failing loading "file:///data/Directory": open /data/Directory: no such file or directory
2025/11/11 00:34:36 compile error: failing loading "file:///data/Directory": open /data/Directory: no such file or directory
2025/11/11 00:34:36 compile error: failing loading "file:///data/Directory": open /data/Directory: no such file or directory
{"error":"class 'Directory' not found","level":"error","msg":"BulkAddRaw: validation error for Directory: map[child:[map[reference:Directory/d8bbbd4f-d4e6-5de7-a5c9-aa55ab170ec3]] id:9be385e6-9018-532f-86f7-598ee467a306 name:/ resourceType:Directory]","time":"2025-11-11T00:34:36Z"}
2025/11/11 00:34:36 compile error: failing loading "file:///data/Directory": open /data/Directory: no such file or directory
{"error":"class 'Directory' not found","level":"error","msg":"BulkAddRaw: validation error for Directory: map[child:[map[reference:DocumentReference/e6e20336-7b3b-55b0-bc47-8e5052675e45] map[reference:DocumentReference/ca39509a-47d4-5fee-9d8f-67bccb2c9fd7]] id:47b1dd3d-20ca-5419-8f78-456a70e6c9ae name:A resourceType:Directory]","time":"2025-11-11T00:34:36Z"}
{"error":"class 'Directory' not found","level":"error","msg":"BulkAddRaw: validation error for Directory: map[child:[map[reference:DocumentReference/a2e6cf36-5a90-5bdf-b1d4-05a932ca32d2]] id:58c4771f-6790-5e3d-b5a9-cfdd2a58432d name:C resourceType:Directory]","time":"2025-11-11T00:34:36Z"}
{"error":"class 'Directory' not found","level":"error","msg":"BulkAddRaw: validation error for Directory: map[child:[map[reference:DocumentReference/05cf3b44-9531-5cce-aa54-74f24cd1bb4b]] id:9eaa9ce9-cf64-57bf-a402-d311d553cc31 name:B resourceType:Directory]","time":"2025-11-11T00:34:36Z"}
{"error":"class 'Directory' not found","level":"error","msg":"BulkAddRaw: validation error for Directory: map[child:[map[reference:Directory/47b1dd3d-20ca-5419-8f78-456a70e6c9ae] map[reference:Directory/9eaa9ce9-cf64-57bf-a402-d311d553cc31] map[reference:Directory/58c4771f-6790-5e3d-b5a9-cfdd2a58432d]] id:d8bbbd4f-d4e6-5de7-a5c9-aa55ab170ec3 name:data resourceType:Directory]","time":"2025-11-11T00:34:36Z"}
{"graph":"CALYPR","level":"info","msg":"[500] bulk-load-raw [validation failed for Directory: class 'Directory' not found validation failed for Directory: class 'Directory' not found validation failed for Directory: class 'Directory' not found validation failed for Directory: class 'Directory' not found validation failed for Directory: class 'Directory' not found]\n","status":500,"time":"2025-11-11T00:34:36Z"}
[GIN] 2025/11/11 - 00:34:36 | 500 |  627.896792ms |       10.42.0.1 | POST     "CALYPR/bulk-load-raw/cbds-git_drs_test"
{"data":200,"graph":"","level":"info","msg":"[200] healthy _status","status":200,"time":"2025-11-11T00:34:42Z"}
[GIN] 2025/11/11 - 00:34:42 | 200 |     331.125µs |       10.42.0.1 | GET      "_status"
{"data":200,"graph":"","level":"info","msg":"[200] healthy _status","status":200,"time":"2025-11-11T00:34:52Z"}
[GIN] 2025/11/11 - 00:34:52 | 200 |     324.833µs |       10.42.0.1 | GET      "_status"
{"level":"info","msg":"Resource List for method 'create': [/programs/HTAN_INT/projects/BForePC /programs/cbds/projects/git_drs_test /programs/cbds/projects/smmart_labkey_demo]","time":"2025-11-11T00:34:59Z"}
{"data":null,"graph":"CALYPR","level":"info","msg":"[200] project-delete on project cbds-git_drs_test","status":200,"time":"2025-11-11T00:34:59Z"}
[GIN] 2025/11/11 - 00:34:59 | 200 |    368.7995ms |       10.42.0.1 | DELETE   "CALYPR/proj-delete/cbds-git_drs_test"
{"level":"info","msg":"Resource List for method 'create': [/programs/HTAN_INT/projects/BForePC /programs/cbds/projects/git_drs_test /programs/cbds/projects/smmart_labkey_demo]","time":"2025-11-11T00:35:00Z"}
{"level":"info","msg":"Using hostname: caliper-training.ohsu.edu","time":"2025-11-11T00:35:00Z"}
2025/11/11 00:35:01 compile error: failing loading "file:///data/Directory": open /data/Directory: no such file or directory
2025/11/11 00:35:01 compile error: failing loading "file:///data/Directory": open /data/Directory: no such file or directory
{"error":"class 'Directory' not found","level":"error","msg":"BulkAddRaw: validation error for Directory: map[child:[map[reference:DocumentReference/05cf3b44-9531-5cce-aa54-74f24cd1bb4b]] id:9eaa9ce9-cf64-57bf-a402-d311d553cc31 name:B resourceType:Directory]","time":"2025-11-11T00:35:01Z"}
{"error":"class 'Directory' not found","level":"error","msg":"BulkAddRaw: validation error for Directory: map[child:[map[reference:Directory/47b1dd3d-20ca-5419-8f78-456a70e6c9ae] map[reference:Directory/9eaa9ce9-cf64-57bf-a402-d311d553cc31] map[reference:Directory/58c4771f-6790-5e3d-b5a9-cfdd2a58432d]] id:d8bbbd4f-d4e6-5de7-a5c9-aa55ab170ec3 name:data resourceType:Directory]","time":"2025-11-11T00:35:01Z"}
2025/11/11 00:35:01 compile error: failing loading "file:///data/Directory": open /data/Directory: no such file or directory
2025/11/11 00:35:01 compile error: failing loading "file:///data/Directory": open /data/Directory: no such file or directory
{"error":"class 'Directory' not found","level":"error","msg":"BulkAddRaw: validation error for Directory: map[child:[map[reference:DocumentReference/a2e6cf36-5a90-5bdf-b1d4-05a932ca32d2]] id:58c4771f-6790-5e3d-b5a9-cfdd2a58432d name:C resourceType:Directory]","time":"2025-11-11T00:35:01Z"}
2025/11/11 00:35:01 compile error: failing loading "file:///data/Directory": open /data/Directory: no such file or directory
{"error":"class 'Directory' not found","level":"error","msg":"BulkAddRaw: validation error for Directory: map[child:[map[reference:DocumentReference/e6e20336-7b3b-55b0-bc47-8e5052675e45] map[reference:DocumentReference/ca39509a-47d4-5fee-9d8f-67bccb2c9fd7]] id:47b1dd3d-20ca-5419-8f78-456a70e6c9ae name:A resourceType:Directory]","time":"2025-11-11T00:35:01Z"}
{"error":"class 'Directory' not found","level":"error","msg":"BulkAddRaw: validation error for Directory: map[child:[map[reference:Directory/d8bbbd4f-d4e6-5de7-a5c9-aa55ab170ec3]] id:9be385e6-9018-532f-86f7-598ee467a306 name:/ resourceType:Directory]","time":"2025-11-11T00:35:01Z"}
{"graph":"CALYPR","level":"info","msg":"[500] bulk-load-raw [validation failed for Directory: class 'Directory' not found validation failed for Directory: class 'Directory' not found validation failed for Directory: class 'Directory' not found validation failed for Directory: class 'Directory' not found validation failed for Directory: class 'Directory' not found]\n","status":500,"time":"2025-11-11T00:35:01Z"}
[GIN] 2025/11/11 - 00:35:01 | 500 |  688.373667ms |       10.42.0.1 | POST     "CALYPR/bulk-load-raw/cbds-git_drs_test"

Copy link

@quinnwai quinnwai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is really organized! Very clear.

See first round of comments above on bugs I encountered + unexpected behavior

@matthewpeterkort
Copy link
Collaborator Author

matthewpeterkort commented Nov 11, 2025

UUID namespace -> host-name is done on purpose to differentiate IDs belonging to the same data on different gen3 instances.

I changed this awhile back and nobody raised any dissent back then. I would have to change some server code to correct this, and I'd need a real good argument from multiple people as to why I should do this

ndjson name determines Resource loaded in implementation, user user manual claims otherwise

I disagree. Can you show me where do you get that from?

How do we balance this with the no files only metadata use case where there's no files to subset on?

Yes so this command doesn't do anything if there exists no drs records, which is fine, because then it defaults back to the etl_pod behavior which simply looks at the uploaded metadata files and loads them into CALYPR like it has always done.

Why do we choose to upsert instead of nuke + reupload?

because git+indexd is the source of truth. I don't want to overwrite what is in the bucket. Let the user control what is in the bucket, not the server code. Also, some of the operations done in the Forge META init command would make the files non R5 FHIR compliant, which the user probably doesn't want.

What’s the tea with the schema written directly in again?

The schema is written into the package because that way the user doesn't have to know which version of the schema is running on the server. That is abstracted out

Status unknown is unclear to folks, any way to provide them some feedback on the job that gets run?
Response: {"uid":"d71db60e-1849-43cd-ac25-3cf713ebfe7d","name":"fhir-import-export-mqlur","status":"Unknown"}

Yeah not a big fan of this either, but I think I need some type of message that says that the job made it to the sower server.

I also think I need to built this out a little more to add a command that checks the status of the job + returns the logs on job finish like what was done in g3t

Expect that some checks are being done so that if you have unstaged or uncommitted changes, there's an error / enter to confirm / at minimum a warning on forge publish $GH_PAT saying that local changes haven't made it to remote, CALYPR will be updated with remote?

This sounds like a git-drs thing to me. what are you trying to say?

Running into error with BulkAddRaw

I will take a look I assume you're using the test script posted?

@quinnwai
Copy link

quinnwai commented Nov 12, 2025

UUID namespace -> host-name is done on purpose to differentiate IDs belonging to the same data on different gen3 instances.

I changed this awhile back and nobody raised any dissent back then. I would have to change some server code to correct this, and I'd need a real good argument from multiple people as to why I should do this

Okay good call today in mentioning this UUID generation is specific to ResearchStudy. So explain why do we make this UUID based on the apiEndpoint / deployment? Doesn't that defeat the purpose of making metadata cross-deployment like we were discussing during the Wed morning dev meeting?

ndjson name determines Resource loaded in implementation, user user manual claims otherwise

I disagree. Can you show me where do you get that from?

Yes! I think it only matters for DocumentReferences the other resources are fine (source code):

docRefFP := filepath.Join(fhirDirectory, DOCUMENT_RESOURCE+NDJSON_EXT)

I think it's ok to require a filename, you should just update the user manual to explicitly describe that the rest of your FHIR metadata can be named whatever-the-helly.ndjson, but DocumentReferences are ALWAYS pulled from DocumentReference.ndjson.

How do we balance this with the no files only metadata use case where there's no files to subset on?

Yes so this command doesn't do anything if there exists no drs records, which is fine, because then it defaults back to the etl_pod behavior which simply looks at the uploaded metadata files and loads them into CALYPR like it has always done.

Cool. Confirmed this works

Why do we choose to upsert instead of nuke + reupload?

because git+indexd is the source of truth. I don't want to overwrite what is in the bucket. Let the user control what is in the bucket, not the server code. Also, some of the operations done in the Forge META init command would make the files non R5 FHIR compliant, which the user probably doesn't want.

Okay let me try and write this in a separate comment cause it ties into my testing of files only type of upload.

What’s the tea with the schema written directly in again?

The schema is written into the package because that way the user doesn't have to know which version of the schema is running on the server. That is abstracted out

Ok interesting but you have to copy paste it you can just refer to some version / commit that has this?

Status unknown is unclear to folks, any way to provide them some feedback on the job that gets run?
Response: {"uid":"d71db60e-1849-43cd-ac25-3cf713ebfe7d","name":"fhir-import-export-mqlur","status":"Unknown"}

Yeah not a big fan of this either, but I think I need some type of message that says that the job made it to the sower server.

I also think I need to built this out a little more to add a command that checks the status of the job + returns the logs on job finish like what was done in g3t

Thanks I see those changes. Could you add some docs / docstrings on what each is supposed to do? Additionally, you could write some info about how to monitor your job using these new commands. Here is my testing along with bugs I encountered thus far:

Work as expected

$forge list
Uid: 3adbe999-2b1e-4605-b851-0d862eb879b5 	 Name: fhir-import-export-cebpk 	 Status: Completed
Uid: 77a68837-2f14-4081-a58a-d34fdac350ad 	 Name: fhir-import-export-fcnjy 	 Status: Completed
Uid: 194a083e-bc1a-4ffe-8ec6-682324abf2a4 	 Name: fhir-import-export-xxksx 	 Status: Failed
Uid: c5dc1052-efc0-49e0-8a8b-cfce1eb94329 	 Name: fhir-import-export-znaia 	 Status: Completed

Confusing output: job executes and succeeds but logs say it failed to dispatch job

$forge publish $GH_PAT
Error: failed to dispatch job: &{ef4c164d-e5d7-4b18-9424-bda801c68bb4 fhir-import-export-eyfgt Unknown}: %!w(<nil>)

$forge status $GH_PAT
Error: failed to check authz, response body: &{500 Internal Server Error 500 HTTP/2.0 2 0 map[Access-Control-Allow-Headers:[DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization,Cookie,X-CSRF-Token] Access-Control-Allow-Methods:[GET, POST, OPTIONS, DELETE, PUT] Access-Control-Allow-Origin:[*] Access-Control-Expose-Headers:[Content-Length,Content-Range] Content-Length:[66] Content-Type:[text/plain; charset=utf-8] Date:[Wed, 12 Nov 2025 21:34:14 GMT] Server:[nginx] Strict-Transport-Security:[max-age=63072000; includeSubdomains;] X-Content-Type-Options:[nosniff nosniff] X-Frame-Options:[SAMEORIGIN] X-Xss-Protection:[1; mode=block]] {0xc001502300} 66 [] false false map[] 0xc001d16000 0xc001432900}

Expect that some checks are being done so that if you have unstaged or uncommitted changes, there's an error / enter to confirm / at minimum a warning on forge publish $GH_PAT saying that local changes haven't made it to remote, CALYPR will be updated with remote?

This sounds like a git-drs thing to me. what are you trying to say?

I think this should be checked in forge publish, as it may not be the case that the user goes through the right flow before the forge publish (eg publishing even tho unpushed commits; publishing even tho files staged but not committed/pushed) at least a warning message being able to say "not all files are committed / pushed" or "Warning: you have staged files that aren't committed. Publishing the latest branch on " or otherwise.

A related question: This forge publish allows you to spam out identical jobs. Thinking of a malicious forge user or even an unknowing forge user, we should enforce some guardrails / checks for this. This can be a V2 or considered out of scope so long as we have it in an issue somewhere.

Running into error with BulkAddRaw

I will take a look I assume you're using the test script posted?

Thanks for checking and trying, I fixed my local env with a make update and nuke GRIP

@matthewpeterkort
Copy link
Collaborator Author

matthewpeterkort commented Nov 12, 2025

Doesn't that defeat the purpose of making metadata cross-deployment like we were discussing during the Wed morning dev meeting?

Once you get into the forge layer you've entered a DNS specific system. Just like the PostgresSQL dbs don't work across instances either. They're DNS specific to the system

Yes! I think it only matters for DocumentReferences the other resources are fine (source code):

Thanks, updated internal docs to reflect this

Ok interesting but you have to copy paste it you can just refer to some version / commit that has this?

It could be automated via a makefile to track a certain version but it will always have to live in Forge in some capacity otherwise the user will have to specify it and that is unrealistic for the user to know that

"not all files are committed / pushed" or "Warning: you have staged files that aren't committed. Publishing the latest branch on " or otherwise.

You could make the same argument for git-drs. Why Forge and not git-drs ?

Job spamming

You could do it in g3t as well you just had to get more creative and run them as background processes using &. This will always be a problem, until we are able to send our jobs to ARC and track billing, etc.

As for the sower workflow commands -- I did a copy pasta and forget to update the instructions.
I'm not sure what the workflow should be on that . I don't like a hanging terminal that pings sower for a status and waits for success or termination like what g3t because It ties up my terminal and I have to make a new tab, maybe the average user doesn't care though -- open to opinions on this. Going to keep it pure to the API for now, ex: https://github.com/uc-cdis/sower/blob/master/openapis/openapi.yaml

@quinnwai
Copy link

When I tested the files only use case, I ran into two problems:

  1. When I push duplicate files to indexd, they don't show up on the Explorer when forge published
  2. When I update files in Git DRS, those files show up independently on the directory viewer even tho they're in the same path

1 is a Git DRS problem because we changed it so each indexd record is defined by a sha instead of a path. It is a question of "how do we define what a single record is in indexd"?. Because each record is defined by a sha, no any file path information will not make it to the metadata (document references). This is work that I need to do to improve Git DRS.

2 could be either a Git-DRS and/or a Forge problem. To my understanding, right now Forge creates an new DocRef for each indexd record. As a result, any changes to a file (say file.txt with hash h_0) has a new hash (say h_1) and hence creates a new indexd record. So in the explorer / directory viewer you'll have file.txt h_0 and file.txt at h_1. To my understanding, we want to see the most recent version of the file only (ie the state of the git repo, not the entire version history as described by indexd). Again this isn't a problem if each indexd record is defined by a path instead of a sha, but I'm reading up on this to determine what Walsh has to say about it

@matthewpeterkort
Copy link
Collaborator Author

matthewpeterkort commented Nov 13, 2025

When I push duplicate files to indexd, they don't show up on the Explorer when forge published

Great. Pleaase provide steps to reproduce.

When I update files in Git DRS, those files show up independently on the directory viewer even tho they're in the same path

Sounds like you're on the right track here. More of a git-drs design related decision I suppose.

@matthewpeterkort
Copy link
Collaborator Author

env setup:

Helm:

peterkor@RNB11238 gen3-helm % git branch
  feature/aws-frontend-framework
  feature/fix-mongodb
  feature/grip
* feature/grip-optim
  feature/grip-updates
  feature/image-viewer
  feature/reroute-grip-writer
  feature/ss-improvements
  ohsu-develop
peterkor@RNB11238 gen3-helm % 
peterkor@RNB11238 gen3-helm % git log 
commit 9fa71ad2f592b407dfef47db33d5f9c6557ea69e (HEAD -> feature/grip-optim, origin/feature/grip-optim)
Author: matthewpeterkort <matthewpeterkort@gmail.com>
Date:   Tue Nov 11 08:42:22 2025 -0800

    update initdb file

Images:

guppy:
  image:
    repository: quay.io/ohsu-comp-bio/guppy-es7
    tag: "guppy-fix-refresh"
    pull_policy: Always
gecko:
  enabled: true
  image:
    repository: quay.io/ohsu-comp-bio/gecko
    pullPolicy: Always
    tag: "feature_read-directory"

ETL POD: quay.io/ohsu-comp-bio/aced-etl:fix_dataframer-new-ds

Copy link

@quinnwai quinnwai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall

Thanks for the changes, CLI is really clear from a formatting / structure perspective but still some usage details during testing.

Started with a simple test to push to calypr-dev. Ran into issues, still some things hardcoded to remote = origin so if you could retest / patch with the following structure below that would be helpful.

git clone <repo-url>  #used monorepo here but just testing CLI atm
cd <repo>
git drs init
git drs remote add gen3 <profile> <flags>
forge publish   #get id from here
forge status <id>
forge output <id>

The general comments and inline comments below should cover the problems I ran into are below. I will continue testing more deeply (ie setup local for testing, use repos w metadata) after this round of feedback. Lmk if you have Qs

Vet CLI Commands

$ ./forge --help
Forge is a versatile CLI application designed to streamline various
development and project management tasks.

Usage:
  forge [command]

Available Commands:
  completion  Generate the autocompletion script for the specified shell
  config      Build skeleton template for CALYPR explorer page config.
  empty       empty metadata for a project
  help        Help about any command
  list        view all of the jobs currently catalogued in sower
  meta        Autogenerate metadata based off of files that have been uploaded
  output      view output logs of a specific job on sower
  ping        Ping Calypr instance and return user's project and user permissions
  publish     create metadata upload job for FHIR ndjson files
  status      view the status of a specific job on sower
  validate    Contains subcommands for validating config, data, and edges

Flags:
  -h, --help   help for forge

Use "forge [command] --help" for more information about a command.
  1. hide/remote completion, not sure what that is
  2. A lot of commands are clear bc they're verbs. Some less clear consider making changes for those that aren't, eg prepending build, create, generate, etc to meta and config. Otherwise being really clear in markdown docs as to what goes where
  3. Check that all docs have useful help strings when incorrect args, eg ./forge status returns
    Error: accepts between 1 and 2 arg(s), received 0. See [git drs here](calypr/git-drs@9512ce3) for inspo

Bash script

Would be useful to normalize a general e2e script not even in Go but just in bash. Also helps Claude get context on how things are run and will help orient Claude on the following point of building docs…

Build Docs

I’ve had decent success doing a few-shot with claude to build docs: giving it quick description of order of CLI operations (eg .sh script or within prompt), asking it to analyze code to understand commands, and suggest how to build docs. Then refining it before implementing. Would be extremely useful as there’s some nice stuff like forge status and forge output that would be helpful to know when to use it and why.

Setup

For workflows that require both forge and Git DRS, what is the setup? Just download both binaries separately?

Also, how do we manage dependency drift? Eg a user might use Git DRS 1.0 but then only be on Forge 0.6 and run into problems. Not a problem now in MVP but important to consider as we progress.

profile := cfg.Servers.Gen3.Auth.Profile
if profile == "" {
return nil, fmt.Errorf("No gen3 profile specified. Please provide a gen3Profile key in your .drsconfig")
gfc, ok := cfg.Remotes[remote]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does gfc stand for

cmd/meta/main.go Outdated
Short: "Autogenerate metadata based off of files that have been uploaded",
Long: `Not needed for expected user workflow. Useful for debugging server side operations only.`,
Example: "forge meta",
Example: "forge meta [remote]",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an opinion but should apply flag instead of optional param to match git drs

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, nitpick because non-user facing command but thinking that forge meta isn't clear enough on what is being done. Need some verb or action Some spitballed ideas:

forge add-meta
forge generate
forge create-meta

On contrary, I like the brevity of forge meta tho but it needs to then be backed up by documentation

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forge meta is fine honestly now there's docs in #12

return id, nil
}

type LSFIles struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typo and any references, I is capitalized should just be

LSFiles

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in #12

This was referenced Jan 8, 2026
* update to recent git-drs

* update all cmds to use default_remote

* hide unneeded  completion cmd

* add untracked stuff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants