-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/mirror g3t #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Sorry initial set of comments can't put them inline with the code cause most of the diffs are already merged... Feedbacks
Testing (in progress)Checklist
Push files onlyTesting ScriptBugs
|
quinnwai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is really organized! Very clear.
See first round of comments above on bugs I encountered + unexpected behavior
|
UUID namespace -> host-name is done on purpose to differentiate IDs belonging to the same data on different gen3 instances. I changed this awhile back and nobody raised any dissent back then. I would have to change some server code to correct this, and I'd need a real good argument from multiple people as to why I should do this
I disagree. Can you show me where do you get that from?
Yes so this command doesn't do anything if there exists no drs records, which is fine, because then it defaults back to the etl_pod behavior which simply looks at the uploaded metadata files and loads them into CALYPR like it has always done.
because git+indexd is the source of truth. I don't want to overwrite what is in the bucket. Let the user control what is in the bucket, not the server code. Also, some of the operations done in the Forge META init command would make the files non R5 FHIR compliant, which the user probably doesn't want.
The schema is written into the package because that way the user doesn't have to know which version of the schema is running on the server. That is abstracted out
Yeah not a big fan of this either, but I think I need some type of message that says that the job made it to the sower server. I also think I need to built this out a little more to add a command that checks the status of the job + returns the logs on job finish like what was done in g3t
This sounds like a git-drs thing to me. what are you trying to say?
I will take a look I assume you're using the test script posted? |
Okay good call today in mentioning this UUID generation is specific to ResearchStudy. So explain why do we make this UUID based on the apiEndpoint / deployment? Doesn't that defeat the purpose of making metadata cross-deployment like we were discussing during the Wed morning dev meeting?
Yes! I think it only matters for DocumentReferences the other resources are fine (source code): I think it's ok to require a filename, you should just update the user manual to explicitly describe that the rest of your FHIR metadata can be named
Cool. Confirmed this works
Okay let me try and write this in a separate comment cause it ties into my testing of files only type of upload.
Ok interesting but you have to copy paste it you can just refer to some version / commit that has this?
Thanks I see those changes. Could you add some docs / docstrings on what each is supposed to do? Additionally, you could write some info about how to monitor your job using these new commands. Here is my testing along with bugs I encountered thus far: Work as expected Confusing output: job executes and succeeds but logs say it failed to dispatch job
I think this should be checked in forge publish, as it may not be the case that the user goes through the right flow before the forge publish (eg publishing even tho unpushed commits; publishing even tho files staged but not committed/pushed) at least a warning message being able to say "not all files are committed / pushed" or "Warning: you have staged files that aren't committed. Publishing the latest branch on " or otherwise. A related question: This forge publish allows you to spam out identical jobs. Thinking of a malicious forge user or even an unknowing forge user, we should enforce some guardrails / checks for this. This can be a V2 or considered out of scope so long as we have it in an issue somewhere.
Thanks for checking and trying, I fixed my local env with a make update and nuke GRIP |
Once you get into the forge layer you've entered a DNS specific system. Just like the PostgresSQL dbs don't work across instances either. They're DNS specific to the system
Thanks, updated internal docs to reflect this
It could be automated via a makefile to track a certain version but it will always have to live in Forge in some capacity otherwise the user will have to specify it and that is unrealistic for the user to know that
You could make the same argument for git-drs. Why Forge and not git-drs ?
You could do it in g3t as well you just had to get more creative and run them as background processes using &. This will always be a problem, until we are able to send our jobs to ARC and track billing, etc. As for the sower workflow commands -- I did a copy pasta and forget to update the instructions. |
|
When I tested the files only use case, I ran into two problems:
1 is a Git DRS problem because we changed it so each indexd record is defined by a sha instead of a path. It is a question of "how do we define what a single record is in indexd"?. Because each record is defined by a sha, no any file path information will not make it to the metadata (document references). This is work that I need to do to improve Git DRS. 2 could be either a Git-DRS and/or a Forge problem. To my understanding, right now Forge creates an new DocRef for each indexd record. As a result, any changes to a file (say file.txt with hash h_0) has a new hash (say h_1) and hence creates a new indexd record. So in the explorer / directory viewer you'll have file.txt h_0 and file.txt at h_1. To my understanding, we want to see the most recent version of the file only (ie the state of the git repo, not the entire version history as described by indexd). Again this isn't a problem if each indexd record is defined by a path instead of a sha, but I'm reading up on this to determine what Walsh has to say about it |
Great. Pleaase provide steps to reproduce.
Sounds like you're on the right track here. More of a git-drs design related decision I suppose. |
|
env setup: Helm: Images: ETL POD: quay.io/ohsu-comp-bio/aced-etl:fix_dataframer-new-ds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall
Thanks for the changes, CLI is really clear from a formatting / structure perspective but still some usage details during testing.
Started with a simple test to push to calypr-dev. Ran into issues, still some things hardcoded to remote = origin so if you could retest / patch with the following structure below that would be helpful.
git clone <repo-url> #used monorepo here but just testing CLI atm
cd <repo>
git drs init
git drs remote add gen3 <profile> <flags>
forge publish #get id from here
forge status <id>
forge output <id>The general comments and inline comments below should cover the problems I ran into are below. I will continue testing more deeply (ie setup local for testing, use repos w metadata) after this round of feedback. Lmk if you have Qs
Vet CLI Commands
$ ./forge --help
Forge is a versatile CLI application designed to streamline various
development and project management tasks.
Usage:
forge [command]
Available Commands:
completion Generate the autocompletion script for the specified shell
config Build skeleton template for CALYPR explorer page config.
empty empty metadata for a project
help Help about any command
list view all of the jobs currently catalogued in sower
meta Autogenerate metadata based off of files that have been uploaded
output view output logs of a specific job on sower
ping Ping Calypr instance and return user's project and user permissions
publish create metadata upload job for FHIR ndjson files
status view the status of a specific job on sower
validate Contains subcommands for validating config, data, and edges
Flags:
-h, --help help for forge
Use "forge [command] --help" for more information about a command.
- hide/remote completion, not sure what that is
- A lot of commands are clear bc they're verbs. Some less clear consider making changes for those that aren't, eg prepending build, create, generate, etc to
metaandconfig. Otherwise being really clear in markdown docs as to what goes where - Check that all docs have useful help strings when incorrect args, eg
./forge statusreturns
Error: accepts between 1 and 2 arg(s), received 0. See [git drs here](calypr/git-drs@9512ce3) for inspo
Bash script
Would be useful to normalize a general e2e script not even in Go but just in bash. Also helps Claude get context on how things are run and will help orient Claude on the following point of building docs…
Build Docs
I’ve had decent success doing a few-shot with claude to build docs: giving it quick description of order of CLI operations (eg .sh script or within prompt), asking it to analyze code to understand commands, and suggest how to build docs. Then refining it before implementing. Would be extremely useful as there’s some nice stuff like forge status and forge output that would be helpful to know when to use it and why.
Setup
For workflows that require both forge and Git DRS, what is the setup? Just download both binaries separately?
Also, how do we manage dependency drift? Eg a user might use Git DRS 1.0 but then only be on Forge 0.6 and run into problems. Not a problem now in MVP but important to consider as we progress.
| profile := cfg.Servers.Gen3.Auth.Profile | ||
| if profile == "" { | ||
| return nil, fmt.Errorf("No gen3 profile specified. Please provide a gen3Profile key in your .drsconfig") | ||
| gfc, ok := cfg.Remotes[remote] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does gfc stand for
cmd/meta/main.go
Outdated
| Short: "Autogenerate metadata based off of files that have been uploaded", | ||
| Long: `Not needed for expected user workflow. Useful for debugging server side operations only.`, | ||
| Example: "forge meta", | ||
| Example: "forge meta [remote]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an opinion but should apply flag instead of optional param to match git drs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, nitpick because non-user facing command but thinking that forge meta isn't clear enough on what is being done. Need some verb or action Some spitballed ideas:
forge add-meta
forge generate
forge create-meta
On contrary, I like the brevity of forge meta tho but it needs to then be backed up by documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forge meta is fine honestly now there's docs in #12
| return id, nil | ||
| } | ||
|
|
||
| type LSFIles struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo and any references, I is capitalized should just be
LSFiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in #12
* update to recent git-drs * update all cmds to use default_remote * hide unneeded completion cmd * add untracked stuff
continuation PR I supposed