Add automatic Pandera schema generator #20

kevinschaper · 2025-09-04T01:52:53Z

Summary

Implement LinkML-to-Pandera schema generator using Jinja templates
Generate schemas for MatrixNode, MatrixEdge, UnionedNode, UnionedEdge classes
Integrate with Makefile build system via make gen-pandera target

Key Features

Auto-generates from LinkML schema: No more manual Pandera schema maintenance
PySpark compatibility: Proper ArrayType with nullable=False for list items
Enum validation: Preserves existing validation for predicates, categories, etc.
Consistent formatting: Clean, properly indented output matching project style
Always regenerates: .PHONY target ensures fresh generation on every run

Test plan

Verify make gen-pandera generates all four schema functions
Confirm generated schemas compile without syntax errors
Check ArrayType fields use nullable=False for list items
Validate enum checks are properly applied
Ensure unique constraints match original patterns

🤖 Generated with Claude Code

- Create PanderaGenerator class in matrix_schema/generators/panderagen.py - Generate schemas for MatrixNode, MatrixEdge, UnionedNode, UnionedEdge - Integrate with Makefile via gen-pandera target (always runs) - Maintain PySpark compatibility with proper ArrayType nullable=False for list items - Preserve existing validation patterns (enum checks, unique constraints) - Auto-generate from LinkML schema with proper formatting 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

matentzn

AWeeeeeesommmmmmmeeee

THANKS!

matentzn · 2025-09-04T04:54:25Z

Makefile

Should the main Makefile be manually edited? It seems the cookiecutter template should update it?

Makefile

matrix_schema/generators/panderagen.py

project.Makefile

src/matrix_schema/datamodel/pandera.py

…rated schema so that we get a better diff on the PR

matentzn

Thank you, I love it. I assuming you wont merge before QC failures are dealt with :P

matentzn · 2025-09-04T22:13:08Z

src/matrix_schema/datamodel/pandera.py

    return DataFrameSchema(
        columns={
-            "id": Column(T.StringType(), nullable=False),
+            "id": Column(T.StringType(), nullable=True),


Since you ordered the output now, why are there so many changes to the schema? for example nullable True seems like a big change?

I'll figure out if the linkml is wrong or the schema generator is wrong. One way or another, I think id should clearly be nullable=False

I think we need to set required: true on a whole bunch of slots

…red slot, ran gen-project

…id, for multivalued fields, and generate the enum checks in a more generic way in panderagen.py

kevinschaper requested review from eKathleenCarter and matentzn September 4, 2025 02:05

matentzn reviewed Sep 4, 2025

View reviewed changes

move gen-pandera to project.Makefile, preserve order from cli to gene…

4960776

…rated schema so that we get a better diff on the PR

matentzn approved these changes Sep 4, 2025

View reviewed changes

matentzn reviewed Sep 4, 2025

View reviewed changes

kevinschaper added 2 commits September 4, 2025 16:22

updated the pyproject toml to match poetry 2.x needs, made id a requi…

4422937

…red slot, ran gen-project

update enum checks to check each value of a list to ensure that's val…

fc97224

…id, for multivalued fields, and generate the enum checks in a more generic way in panderagen.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add automatic Pandera schema generator #20

Add automatic Pandera schema generator #20

Uh oh!

kevinschaper commented Sep 4, 2025

Uh oh!

matentzn left a comment

Uh oh!

matentzn Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matentzn left a comment

Uh oh!

matentzn Sep 4, 2025

Uh oh!

kevinschaper Sep 4, 2025

Uh oh!

kevinschaper Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add automatic Pandera schema generator #20

Are you sure you want to change the base?

Add automatic Pandera schema generator #20

Uh oh!

Conversation

kevinschaper commented Sep 4, 2025

Summary

Key Features

Test plan

Uh oh!

matentzn left a comment

Choose a reason for hiding this comment

Uh oh!

matentzn Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matentzn left a comment

Choose a reason for hiding this comment

Uh oh!

matentzn Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

kevinschaper Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

kevinschaper Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants