-
Notifications
You must be signed in to change notification settings - Fork 1
Expose custom models via SDK #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Introduces Python SDK classes and methods for managing model-based custom entities, which allow users to define NER models by refining annotation guidelines with LLM-in-the-loop, upload test/training data, and train encoder models. New files: - tonic_textual/classes/model_entity.py: Core classes (ModelEntity, ModelEntityVersion, TrainedModel) with methods for test data upload, ground truth annotation, training, and model activation - tonic_textual/services/model_entity.py: Service layer for CRUD operations Modified: - tonic_textual/redact_api.py: Added convenience methods for model entity management (create, get, list, delete) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The training files endpoint returns a paginated dict with 'records' key, unlike test files which returns a list directly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| if files_with_spans and wait_for_processing: | ||
| # Wait for files to be processed before saving ground truth | ||
| self._wait_for_files_ready(file_ids, timeout_seconds=processing_timeout) | ||
|
|
||
| for file_id, spans in files_with_spans: | ||
| self._save_ground_truth(file_id, spans) | ||
|
|
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces SDK support for managing model-based custom entities in Tonic Textual, enabling programmatic workflows for training custom NER models with LLM-assisted annotation and guidelines refinement.
Changes:
- Added core classes for model entities, versions, and trained models with full lifecycle management
- Implemented service layer for CRUD operations on model-based entities
- Extended the
TonicTextualclient with convenience methods for entity management
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tonic_textual/classes/model_entity.py | Core domain classes supporting entity creation, test/training data upload, version management, metrics retrieval, and model training workflows |
| tonic_textual/services/model_entity.py | Service layer providing CRUD operations for model entities with API endpoint integration |
| tonic_textual/redact_api.py | Client-level convenience methods delegating to the model entity service |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| with requests.Session() as session: | ||
| data = self.client.http_get( | ||
| f"/api/model-based-entities/{entity_id}", | ||
| session=session, | ||
| ) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get() method creates a new requests.Session for a single HTTP call, which provides no benefit over using the session already managed by the client. This pattern is repeated in list() and creates unnecessary overhead. Consider removing the session context manager unless the client requires it, or reuse a client-managed session.
| if item.get("entityType") == "ModelBased": | ||
| # Fetch full entity data | ||
| entity = self.get(item["id"]) | ||
| model_entities.append(entity) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list() method makes N+1 HTTP requests: one to fetch all entities, then one per model-based entity to fetch full details. For large numbers of entities, this will be slow. Consider whether the /api/custom-entities endpoint can return full entity data, or if a dedicated endpoint exists for listing model-based entities with complete information.
| import json as json_module | ||
| endpoint = "training/files" if is_training else "test/files" | ||
| # API expects multipart with 'document' (JSON metadata) and 'file' (content) | ||
| document = json_module.dumps({"fileName": file_name}) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The json module is already imported at the module level (line 8), so this local import is redundant and creates naming confusion. Remove this line and use the module-level json import instead.
| import json as json_module | |
| endpoint = "training/files" if is_training else "test/files" | |
| # API expects multipart with 'document' (JSON metadata) and 'file' (content) | |
| document = json_module.dumps({"fileName": file_name}) | |
| endpoint = "training/files" if is_training else "test/files" | |
| # API expects multipart with 'document' (JSON metadata) and 'file' (content) | |
| document = json.dumps({"fileName": file_name}) |
| import json as json_module | ||
| endpoint = "training/files" if is_training else "test/files" | ||
| # API expects multipart with 'document' (JSON metadata) and 'file' (content) | ||
| document = json_module.dumps({"fileName": file_name}) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the module-level json.dumps() instead of json_module.dumps() after removing the redundant import on line 505.
| import json as json_module | |
| endpoint = "training/files" if is_training else "test/files" | |
| # API expects multipart with 'document' (JSON metadata) and 'file' (content) | |
| document = json_module.dumps({"fileName": file_name}) | |
| endpoint = "training/files" if is_training else "test/files" | |
| # API expects multipart with 'document' (JSON metadata) and 'file' (content) | |
| document = json.dumps({"fileName": file_name}) |
c53a73e to
2962353
Compare
Tests cover: - Entity CRUD operations (create, get, list, delete) - Version management (get latest, list versions) - Test data upload with ground truth spans - Version metrics and wait_for_completion - Guidelines refinement (create new version, get suggestions) - Training data upload - Trained model creation and listing - Full workflow integration test Tests are skipped by default. Set ENABLE_MODEL_ENTITY_TESTS=1 to run. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add 9 quick API tests that run without LLM (CRUD, versions, data upload) - Add 7 LLM tests behind ENABLE_MODEL_ENTITY_LLM_TESTS flag - Track created entity IDs for safe cleanup (only deletes test entities) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Changes
New files:
tonic_textual/classes/model_entity.py- Core classes (ModelEntity,ModelEntityVersion,TrainedModel) with full workflow supporttonic_textual/services/model_entity.py- Service layer for CRUD operationsModified:
tonic_textual/redact_api.py- Added convenience methods onTonicTextualclientFeatures
Entity Management:
create_model_entity(),get_model_entity(),list_model_entities(),delete_model_entity()Test Data & Guidelines Refinement:
upload_test_data()- Upload files with ground truth spansversion.wait_for_completion()- Wait for LLM annotationversion.get_metrics()- Get F1/precision/recall scoresversion.get_suggested_guidelines()- Get LLM-suggested improvementsentity.create_version()- Create new version with refined guidelinesTraining:
upload_training_data()/upload_training_file()create_trained_model(version_id)- Create model with specific guidelinesmodel.start_training()/model.wait_for_training()model.activate()- Set as active model for entityExample Usage
Test plan