-
Notifications
You must be signed in to change notification settings - Fork 0
Add automatic Pandera schema generator #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Create PanderaGenerator class in matrix_schema/generators/panderagen.py - Generate schemas for MatrixNode, MatrixEdge, UnionedNode, UnionedEdge - Integrate with Makefile via gen-pandera target (always runs) - Maintain PySpark compatibility with proper ArrayType nullable=False for list items - Preserve existing validation patterns (enum checks, unique constraints) - Auto-generate from LinkML schema with proper formatting 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
matentzn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWeeeeeesommmmmmmeeee
THANKS!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the main Makefile be manually edited? It seems the cookiecutter template should update it?
…rated schema so that we get a better diff on the PR
matentzn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I love it. I assuming you wont merge before QC failures are dealt with :P
| return DataFrameSchema( | ||
| columns={ | ||
| "id": Column(T.StringType(), nullable=False), | ||
| "id": Column(T.StringType(), nullable=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you ordered the output now, why are there so many changes to the schema? for example nullable True seems like a big change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll figure out if the linkml is wrong or the schema generator is wrong. One way or another, I think id should clearly be nullable=False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to set required: true on a whole bunch of slots
…red slot, ran gen-project
…id, for multivalued fields, and generate the enum checks in a more generic way in panderagen.py
Summary
make gen-panderatargetKey Features
nullable=Falsefor list items.PHONYtarget ensures fresh generation on every runTest plan
make gen-panderagenerates all four schema functionsnullable=Falsefor list items🤖 Generated with Claude Code