Deprecate CSVDataSource. #1009

anoto-moniz · 2026-01-06T20:44:05Z

It has long been supplanted by GemTableDataSource, which is itself growing long in the tooth.

Citrine Python PR

Description

Please briefly explain the goal of the changes/this PR.
The reviewer should be able to understand why the change is being made by reading this description
and its links (e.g. JIRA tickets).

PR Type:

Breaking change (fix or feature that would cause existing functionality to change)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Maintenance (non-breaking change to assist developers)

Adherence to team decisions

I have added tests for 100% coverage
I have written Numpy-style docstrings for every method and class.
I have communicated the downstream consequences of the PR to others.
I have bumped the version in __version__.py

docs/source/workflows/predictors.rst

kroenlein

Approve, subject to the context we've spoken about where I need CSVDataSources for high-throughput benchmarking.

docs/source/workflows/predictors.rst

kroenlein · 2026-01-06T22:55:49Z

docs/source/workflows/predictor_evaluation_workflows.rst

    from citrine.informatics.descriptors import RealDescriptor
    from citrine.informatics.predictors import AutoMLPredictor

+    data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1)


Would it make more sense to document this via the from_gemtable method?

Suggested change

data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1)

data_source = GemTableDataSource.from_gemtable(gemtable)

We already have a bunch of places across multiple files that construct it directly, most of which use these placeholder variable names. So I figure consistency wins out here.

Unless you're suggesting we make that change everywhere the ID and version aren't already available?

That's what I was feeling, but I'm not going to block on it.

I have some other doc updates to make for 4.0, so I may tack it on in that PR.

kroenlein · 2026-01-06T22:56:19Z

docs/source/workflows/predictors.rst

-            "Poisson\'s Ratio": poissons_ratio
-        }
-    )
+    training_data = GemTableDataSource(table_id=training_data_table_uid, table_version=1)


As above

Suggested change

training_data = GemTableDataSource(table_id=training_data_table_uid, table_version=1)

training_data = GemTableDataSource.from_gemtable(gemtable)

kroenlein · 2026-01-06T22:57:39Z

docs/source/workflows/predictors.rst

-        identifiers=['Ingredient id']
-    )
+    # Once you've ingested the data to the platform, plug in its table ID and version.
+    data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1)


As above

Suggested change

data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1)

data_source = GemTableDataSource.from_gemtable(gemtable)

kroenlein · 2026-01-06T22:59:12Z

tests/informatics/test_data_source.py

-    if isinstance(data_source, CSVDataSource):
-        # TODO: There's no obvious way to recover the column_definitions & identifiers from the ID
-        with pytest.warns(UserWarning):
-            transformed = DataSource.from_data_source_id(data_source.to_data_source_id())
-        assert isinstance(data_source, CSVDataSource)
-        assert transformed.file_link == data_source.file_link
-    else:
-        assert data_source == DataSource.from_data_source_id(data_source.to_data_source_id())


I remember being so disappointed when I put that mess in there. Glad this is going away.

kroenlein · 2026-01-06T23:00:42Z

tests/informatics/test_data_source.py

    GemTableDataSource(table_id=uuid.uuid4(), table_version="2"),
    GemTableDataSource(table_id=uuid.uuid4(), table_version="2"),


Why are there two identical GemTableDataSources here? Obviously no relevant to immediate work.

It has long been supplanted by GemTableDataSource, which is itself growing long in the tooth.

kroenlein

This all looks good to me.

anoto-moniz commented Jan 6, 2026

View reviewed changes

docs/source/workflows/predictors.rst Outdated Show resolved Hide resolved

anoto-moniz marked this pull request as ready for review January 6, 2026 21:07

anoto-moniz requested a review from a team as a code owner January 6, 2026 21:07

kroenlein previously approved these changes Jan 6, 2026

View reviewed changes

anoto-moniz dismissed kroenlein’s stale review via 866c318 January 7, 2026 18:25

anoto-moniz force-pushed the feature/deprecate-csvdatasource branch from b1c5a15 to 866c318 Compare January 7, 2026 18:25

Deprecate CSVDataSource.

b7aff0f

It has long been supplanted by GemTableDataSource, which is itself growing long in the tooth.

anoto-moniz force-pushed the feature/deprecate-csvdatasource branch from 866c318 to b7aff0f Compare January 7, 2026 18:29

anoto-moniz requested a review from kroenlein January 7, 2026 18:29

kroenlein approved these changes Jan 7, 2026

View reviewed changes

anoto-moniz merged commit 1ff0c99 into main Jan 7, 2026
44 checks passed

anoto-moniz deleted the feature/deprecate-csvdatasource branch January 7, 2026 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deprecate CSVDataSource. #1009

Deprecate CSVDataSource. #1009

Uh oh!

anoto-moniz commented Jan 6, 2026

Uh oh!

Uh oh!

kroenlein left a comment

Uh oh!

Uh oh!

kroenlein Jan 6, 2026

Uh oh!

anoto-moniz Jan 7, 2026

Uh oh!

kroenlein Jan 7, 2026

Uh oh!

anoto-moniz Jan 7, 2026

Uh oh!

kroenlein Jan 6, 2026

Uh oh!

kroenlein Jan 6, 2026

Uh oh!

kroenlein Jan 6, 2026

Uh oh!

kroenlein Jan 6, 2026

Uh oh!

kroenlein left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1)
	data_source = GemTableDataSource.from_gemtable(gemtable)

	training_data = GemTableDataSource(table_id=training_data_table_uid, table_version=1)
	training_data = GemTableDataSource.from_gemtable(gemtable)

		GemTableDataSource(table_id=uuid.uuid4(), table_version="2"),
		GemTableDataSource(table_id=uuid.uuid4(), table_version="2"),

Deprecate CSVDataSource. #1009

Deprecate CSVDataSource. #1009

Uh oh!

Conversation

anoto-moniz commented Jan 6, 2026

Citrine Python PR

Description

PR Type:

Adherence to team decisions

Uh oh!

Uh oh!

kroenlein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kroenlein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants