Expose custom models via SDK #96

gandersteele · 2026-01-15T23:17:19Z

Summary

Adds Python SDK support for model-based custom entities, enabling programmatic management of LLM-in-the-loop NER model training
Introduces classes for entity management, version/guideline refinement, and model training workflows
Wraps existing backend endpoints with no server-side changes required

Changes

New files:

tonic_textual/classes/model_entity.py - Core classes (ModelEntity, ModelEntityVersion, TrainedModel) with full workflow support
tonic_textual/services/model_entity.py - Service layer for CRUD operations

Modified:

tonic_textual/redact_api.py - Added convenience methods on TonicTextual client

Features

Entity Management:

create_model_entity(), get_model_entity(), list_model_entities(), delete_model_entity()

Test Data & Guidelines Refinement:

upload_test_data() - Upload files with ground truth spans
version.wait_for_completion() - Wait for LLM annotation
version.get_metrics() - Get F1/precision/recall scores
version.get_suggested_guidelines() - Get LLM-suggested improvements
entity.create_version() - Create new version with refined guidelines

Training:

upload_training_data() / upload_training_file()
create_trained_model(version_id) - Create model with specific guidelines
model.start_training() / model.wait_for_training()
model.activate() - Set as active model for entity

Example Usage

from tonic_textual.redact_api import TonicTextual

textual = TonicTextual()

# Create entity and upload test data
entity = textual.create_model_entity(
    name='PRODUCT_CODE',
    guidelines='Identify product codes like SKU-12345.'
)
entity.upload_test_data([
    {'text': 'Order SKU-123 shipped.', 'spans': [{'start': 6, 'end': 13}]}
])

# Refine guidelines based on metrics
version = entity.get_latest_version()
version.wait_for_completion()
print(f'F1: {version.get_metrics().f1_score}')

# Train model
entity.upload_training_data([{'text': '...', 'fileName': 'train.txt'}])
model = entity.create_trained_model(version.id)
model.start_training()
model.wait_for_training()
model.activate()

Test plan

Tested entity CRUD operations against production API
Verified test data upload with ground truth saves correctly (files show "Reviewed" status)
Confirmed guidelines refinement loop works (F1 improved from 0.6 → 0.73 with refined guidelines)
Validated full training workflow: upload → annotate → train → activate

Introduces Python SDK classes and methods for managing model-based custom entities, which allow users to define NER models by refining annotation guidelines with LLM-in-the-loop, upload test/training data, and train encoder models. New files: - tonic_textual/classes/model_entity.py: Core classes (ModelEntity, ModelEntityVersion, TrainedModel) with methods for test data upload, ground truth annotation, training, and model activation - tonic_textual/services/model_entity.py: Service layer for CRUD operations Modified: - tonic_textual/redact_api.py: Added convenience methods for model entity management (create, get, list, delete) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The training files endpoint returns a paginated dict with 'records' key, unlike test files which returns a list directly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

tonic_textual/classes/model_entity.py

+        if files_with_spans and wait_for_processing:
+            # Wait for files to be processed before saving ground truth
+            self._wait_for_files_ready(file_ids, timeout_seconds=processing_timeout)
+
+        for file_id, spans in files_with_spans:
+            self._save_ground_truth(file_id, spans)
+


Copilot

Pull request overview

This PR introduces SDK support for managing model-based custom entities in Tonic Textual, enabling programmatic workflows for training custom NER models with LLM-assisted annotation and guidelines refinement.

Changes:

Added core classes for model entities, versions, and trained models with full lifecycle management
Implemented service layer for CRUD operations on model-based entities
Extended the TonicTextual client with convenience methods for entity management

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
tonic_textual/classes/model_entity.py	Core domain classes supporting entity creation, test/training data upload, version management, metrics retrieval, and model training workflows
tonic_textual/services/model_entity.py	Service layer providing CRUD operations for model entities with API endpoint integration
tonic_textual/redact_api.py	Client-level convenience methods delegating to the model entity service

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-15T23:21:16Z

tonic_textual/services/model_entity.py

+        with requests.Session() as session:
+            data = self.client.http_get(
+                f"/api/model-based-entities/{entity_id}",
+                session=session,
+            )


The get() method creates a new requests.Session for a single HTTP call, which provides no benefit over using the session already managed by the client. This pattern is repeated in list() and creates unnecessary overhead. Consider removing the session context manager unless the client requires it, or reuse a client-managed session.

Copilot · 2026-01-15T23:21:16Z

tonic_textual/services/model_entity.py

+            if item.get("entityType") == "ModelBased":
+                # Fetch full entity data
+                entity = self.get(item["id"])
+                model_entities.append(entity)


The list() method makes N+1 HTTP requests: one to fetch all entities, then one per model-based entity to fetch full details. For large numbers of entities, this will be slow. Consider whether the /api/custom-entities endpoint can return full entity data, or if a dedicated endpoint exists for listing model-based entities with complete information.

Copilot · 2026-01-15T23:21:16Z

tonic_textual/classes/model_entity.py

+        import json as json_module
+        endpoint = "training/files" if is_training else "test/files"
+        # API expects multipart with 'document' (JSON metadata) and 'file' (content)
+        document = json_module.dumps({"fileName": file_name})


The json module is already imported at the module level (line 8), so this local import is redundant and creates naming confusion. Remove this line and use the module-level json import instead.

Suggested change

import json as json_module

endpoint = "training/files" if is_training else "test/files"

# API expects multipart with 'document' (JSON metadata) and 'file' (content)

document = json_module.dumps({"fileName": file_name})

endpoint = "training/files" if is_training else "test/files"

# API expects multipart with 'document' (JSON metadata) and 'file' (content)

document = json.dumps({"fileName": file_name})

Copilot · 2026-01-15T23:21:17Z

tonic_textual/classes/model_entity.py

+        import json as json_module
+        endpoint = "training/files" if is_training else "test/files"
+        # API expects multipart with 'document' (JSON metadata) and 'file' (content)
+        document = json_module.dumps({"fileName": file_name})


Use the module-level json.dumps() instead of json_module.dumps() after removing the redundant import on line 505.

Suggested change

import json as json_module

endpoint = "training/files" if is_training else "test/files"

# API expects multipart with 'document' (JSON metadata) and 'file' (content)

document = json_module.dumps({"fileName": file_name})

endpoint = "training/files" if is_training else "test/files"

# API expects multipart with 'document' (JSON metadata) and 'file' (content)

document = json.dumps({"fileName": file_name})

tonic_textual/classes/model_entity.py

+
+    def _save_ground_truth(self, file_id: str, spans: List[Dict]) -> None:
+        """Save ground truth annotations for a file."""
+        annotations = [{"start": s["start"], "end": s["end"]} for s in spans]


Tests cover: - Entity CRUD operations (create, get, list, delete) - Version management (get latest, list versions) - Test data upload with ground truth spans - Version metrics and wait_for_completion - Guidelines refinement (create new version, get suggestions) - Training data upload - Trained model creation and listing - Full workflow integration test Tests are skipped by default. Set ENABLE_MODEL_ENTITY_TESTS=1 to run. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

tonic_textual/classes/model_entity.py

+            if all_ready:
+                return
+
+            sleep(poll_interval)


- Add 9 quick API tests that run without LLM (CRUD, versions, data upload) - Add 7 LLM tests behind ENABLE_MODEL_ENTITY_LLM_TESTS flag - Track created entity IDs for safe cleanup (only deletes test entities) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gandersteele and others added 2 commits January 15, 2026 14:58

Fix list_training_files to handle paginated response

a031110

The training files endpoint returns a paginated dict with 'records' key, unlike test files which returns a list directly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gandersteele requested a review from Copilot January 15, 2026 23:20

sentry bot reviewed Jan 15, 2026

View reviewed changes

Copilot AI reviewed Jan 15, 2026

View reviewed changes

Remove unused imports (Any, tempfile)

2962353

gandersteele force-pushed the custom_models_sdk branch from c53a73e to 2962353 Compare January 15, 2026 23:40

sentry bot reviewed Jan 15, 2026

View reviewed changes

tonic_textual/classes/model_entity.py

def _save_ground_truth(self, file_id: str, spans: List[Dict]) -> None:

"""Save ground truth annotations for a file."""

annotations = [{"start": s["start"], "end": s["end"]} for s in spans]

This comment was marked as outdated.

Sign in to view

sentry bot reviewed Jan 16, 2026

View reviewed changes

tonic_textual/classes/model_entity.py

Comment on lines +433 to +436

if all_ready:

return

sleep(poll_interval)

This comment was marked as outdated.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose custom models via SDK #96

Expose custom models via SDK #96

gandersteele commented Jan 15, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Expose custom models via SDK #96

Are you sure you want to change the base?

Expose custom models via SDK #96

Conversation

gandersteele commented Jan 15, 2026

Summary

Changes

Features

Example Usage

Test plan

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants