Skip to content

Conversation

@anoto-moniz
Copy link
Collaborator

It has long been supplanted by GemTableDataSource, which is itself growing long in the tooth.

Citrine Python PR

Description

Please briefly explain the goal of the changes/this PR.
The reviewer should be able to understand why the change is being made by reading this description
and its links (e.g. JIRA tickets).

PR Type:

  • Breaking change (fix or feature that would cause existing functionality to change)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Maintenance (non-breaking change to assist developers)

Adherence to team decisions

  • I have added tests for 100% coverage
  • I have written Numpy-style docstrings for every method and class.
  • I have communicated the downstream consequences of the PR to others.
  • I have bumped the version in __version__.py

@anoto-moniz anoto-moniz marked this pull request as ready for review January 6, 2026 21:07
@anoto-moniz anoto-moniz requested a review from a team as a code owner January 6, 2026 21:07
kroenlein
kroenlein previously approved these changes Jan 6, 2026
Copy link
Collaborator

@kroenlein kroenlein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve, subject to the context we've spoken about where I need CSVDataSources for high-throughput benchmarking.

from citrine.informatics.descriptors import RealDescriptor
from citrine.informatics.predictors import AutoMLPredictor
data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to document this via the from_gemtable method?

Suggested change
data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1)
data_source = GemTableDataSource.from_gemtable(gemtable)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a bunch of places across multiple files that construct it directly, most of which use these placeholder variable names. So I figure consistency wins out here.

Unless you're suggesting we make that change everywhere the ID and version aren't already available?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was feeling, but I'm not going to block on it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some other doc updates to make for 4.0, so I may tack it on in that PR.

"Poisson\'s Ratio": poissons_ratio
}
)
training_data = GemTableDataSource(table_id=training_data_table_uid, table_version=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

Suggested change
training_data = GemTableDataSource(table_id=training_data_table_uid, table_version=1)
training_data = GemTableDataSource.from_gemtable(gemtable)

identifiers=['Ingredient id']
)
# Once you've ingested the data to the platform, plug in its table ID and version.
data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

Suggested change
data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1)
data_source = GemTableDataSource.from_gemtable(gemtable)

Comment on lines -50 to -57
if isinstance(data_source, CSVDataSource):
# TODO: There's no obvious way to recover the column_definitions & identifiers from the ID
with pytest.warns(UserWarning):
transformed = DataSource.from_data_source_id(data_source.to_data_source_id())
assert isinstance(data_source, CSVDataSource)
assert transformed.file_link == data_source.file_link
else:
assert data_source == DataSource.from_data_source_id(data_source.to_data_source_id())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember being so disappointed when I put that mess in there. Glad this is going away.

Comment on lines 17 to 18
GemTableDataSource(table_id=uuid.uuid4(), table_version="2"),
GemTableDataSource(table_id=uuid.uuid4(), table_version="2"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there two identical GemTableDataSources here? Obviously no relevant to immediate work.

It has long been supplanted by GemTableDataSource, which is itself
growing long in the tooth.
@anoto-moniz anoto-moniz force-pushed the feature/deprecate-csvdatasource branch from 866c318 to b7aff0f Compare January 7, 2026 18:29
@anoto-moniz anoto-moniz requested a review from kroenlein January 7, 2026 18:29
Copy link
Collaborator

@kroenlein kroenlein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me.

@anoto-moniz anoto-moniz merged commit 1ff0c99 into main Jan 7, 2026
44 checks passed
@anoto-moniz anoto-moniz deleted the feature/deprecate-csvdatasource branch January 7, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants