-
Notifications
You must be signed in to change notification settings - Fork 7
Deprecate CSVDataSource. #1009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate CSVDataSource. #1009
Conversation
kroenlein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve, subject to the context we've spoken about where I need CSVDataSources for high-throughput benchmarking.
| from citrine.informatics.descriptors import RealDescriptor | ||
| from citrine.informatics.predictors import AutoMLPredictor | ||
| data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to document this via the from_gemtable method?
| data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1) | |
| data_source = GemTableDataSource.from_gemtable(gemtable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a bunch of places across multiple files that construct it directly, most of which use these placeholder variable names. So I figure consistency wins out here.
Unless you're suggesting we make that change everywhere the ID and version aren't already available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I was feeling, but I'm not going to block on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some other doc updates to make for 4.0, so I may tack it on in that PR.
docs/source/workflows/predictors.rst
Outdated
| "Poisson\'s Ratio": poissons_ratio | ||
| } | ||
| ) | ||
| training_data = GemTableDataSource(table_id=training_data_table_uid, table_version=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
| training_data = GemTableDataSource(table_id=training_data_table_uid, table_version=1) | |
| training_data = GemTableDataSource.from_gemtable(gemtable) |
docs/source/workflows/predictors.rst
Outdated
| identifiers=['Ingredient id'] | ||
| ) | ||
| # Once you've ingested the data to the platform, plug in its table ID and version. | ||
| data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
| data_source = GemTableDataSource(table_id=training_data_table_uid, table_version=1) | |
| data_source = GemTableDataSource.from_gemtable(gemtable) |
| if isinstance(data_source, CSVDataSource): | ||
| # TODO: There's no obvious way to recover the column_definitions & identifiers from the ID | ||
| with pytest.warns(UserWarning): | ||
| transformed = DataSource.from_data_source_id(data_source.to_data_source_id()) | ||
| assert isinstance(data_source, CSVDataSource) | ||
| assert transformed.file_link == data_source.file_link | ||
| else: | ||
| assert data_source == DataSource.from_data_source_id(data_source.to_data_source_id()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember being so disappointed when I put that mess in there. Glad this is going away.
| GemTableDataSource(table_id=uuid.uuid4(), table_version="2"), | ||
| GemTableDataSource(table_id=uuid.uuid4(), table_version="2"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there two identical GemTableDataSources here? Obviously no relevant to immediate work.
b1c5a15 to
866c318
Compare
It has long been supplanted by GemTableDataSource, which is itself growing long in the tooth.
866c318 to
b7aff0f
Compare
kroenlein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me.
It has long been supplanted by GemTableDataSource, which is itself growing long in the tooth.
Citrine Python PR
Description
Please briefly explain the goal of the changes/this PR.
The reviewer should be able to understand why the change is being made by reading this description
and its links (e.g. JIRA tickets).
PR Type:
Adherence to team decisions