Skip to content

Conversation

@brian-arnold
Copy link
Collaborator

I manually rebased the brian branch from the dev branch and added this code back in,

I addressed your previous comments regarding specifying expected_tag_keys as row_index for the dataframesource and element_index for the ListSource. There was one comment you made that I didn't know how to interpret, made on the line specifying the tag_function in the init section of the dataframeSource class:

"Would you mind adding a feature where if a list of strings are passed, they are interpreted as columns whose values should be used as the tags?"

I'm assuming you mean the tags should have the same keys as the column named passed in? Just double checking.

@brian-arnold brian-arnold requested review from Copilot and eywalker July 6, 2025 04:07
@brian-arnold brian-arnold marked this pull request as ready for review July 6, 2025 04:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds two new source types—DataFrameSource and ListSource—to enable streaming data from pandas DataFrames and Python lists with customizable tagging.

  • Introduces DataFrameSource for iterating over DataFrame rows as tagged packets
  • Introduces ListSource for iterating over list elements as tagged packets
  • Updates src/orcapod/core/sources.py with new imports and class definitions
Comments suppressed due to low confidence (3)

src/orcapod/core/sources.py:336

  • The docstring refers to 'expected_keys' but the parameter is named 'expected_tag_keys'. Update the doc to match the code.
        If expected_keys are provided, they will be used instead of the default keys.

src/orcapod/core/sources.py:494

  • The docstring refers to 'expected_keys' but the parameter is named 'expected_tag_keys'. Update the doc to match the code.
        If expected_keys are provided, they will be used instead of the default keys.

src/orcapod/core/sources.py:207

  • New classes DataFrameSource and ListSource have been introduced without accompanying unit tests. Please add tests to cover the core functionality and edge cases.
class DataFrameSource(Source):

from typing import Any, Literal

import pandas as pd
import polars as pl
Copy link

Copilot AI Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import 'polars as pl' is not used in this file. Consider removing it to keep imports clean.

Copilot uses AI. Check for mistakes.
Comment on lines +292 to +294
for idx, row in self.dataframe.iterrows():
tag = self.tag_function(row, idx)
packet = {col: row[col] for col in self.columns}
Copy link

Copilot AI Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using DataFrame.iterrows() can be slow for large data sets. Consider iterating with itertuples or another vectorized approach for better performance.

Suggested change
for idx, row in self.dataframe.iterrows():
tag = self.tag_function(row, idx)
packet = {col: row[col] for col in self.columns}
for row in self.dataframe.itertuples(index=True, name=None):
idx, *values = row
row_series = pd.Series(values, index=self.dataframe.columns)
tag = self.tag_function(row_series, idx)
packet = {col: row_series[col] for col in self.columns}

Copilot uses AI. Check for mistakes.
if missing_columns:
raise ValueError(f"Columns not found in DataFrame: {missing_columns}")

if tag_function is None:
Copy link

Copilot AI Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Similar tag_function initialization logic appears in both DataFrameSource and ListSource—consider abstracting this into a shared helper or the base class to reduce duplication.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants