Skip to content

Conversation

@evanjhoward11
Copy link

@evanjhoward11 evanjhoward11 commented Dec 20, 2025

Description

Please include a summary of the changes and the related issue. Include relevant motivation and context.
I fixed the type hints for the following files:

src/agentunit/adapters/agentops_adapter.py
src/agentunit/core/scenario.py
src/agentunit/datasets/builtins.py
src/agentunit/metrics/builtin.py
src/agentunit/production/integrations.py

Note: /metrics/builtin.py has one import-untyped error

Fixes #13

Type of Change

Please delete options that are not relevant:

  • Code refactoring
  • Test coverage improvement

Changes Made

Please provide a detailed list of changes:

  • Miscellaneous changes to all the files

Testing

Please describe the tests you ran to verify your changes:

  • Existing test suite passes (pytest)

Test Configuration

  • Python version: 3.11.4
  • Operating System: Windows 11
  • Relevant adapters tested: N/A

Test Results

# Paste relevant test output or results
$ mypy --follow-imports=silent src/agentunit/adapters/agentops_adapter.py
Success: no issues found in 1 source file

$ mypy --follow-imports=silent src/agentunit/core/scenario.py
Success: no issues found in 1 source file

$ mypy --follow-imports=silent src/agentunit/datasets/builtins.py
Success: no issues found in 1 source file

$ mypy --follow-imports=silent src/agentunit/metrics/builtin.py
src\agentunit\metrics\builtin.py:19: error: Skipping analyzing "ragas.metrics": module is installed, but missing library stubs or py.typed marker  [import-untyped]
src\agentunit\metrics\builtin.py:19: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
Found 1 error in 1 file (checked 1 source file)

$ mypy --follow-imports=silent src/agentunit/production/integrations.py
Success: no issues found in 1 source file

Code Quality

  • My code follows the project's style guidelines (Ruff, Black)
  • I have run pre-commit run --all-files and addressed any issues
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings or errors
  • I have added type hints where appropriate
  • I have updated docstrings following Google style

Documentation

  • I have updated relevant documentation (README, docs/, inline comments)
  • I have added/updated docstrings for new/modified functions
  • I have updated CHANGELOG.md under [Unreleased]
  • I have added examples if introducing new features

Breaking Changes

If this PR introduces breaking changes, please describe:

  • What breaks:
  • Migration path for users:
  • Deprecation warnings added (if applicable):

Dependencies

  • No new dependencies added
  • New dependencies added (list below with justification)

New dependencies:

  • package-name: Reason for adding

Performance Impact

Describe any performance implications:

  • No performance impact
  • Performance improvement (describe and provide benchmarks)
  • Potential performance regression (describe and justify)

Screenshots/Recordings (if applicable)

Add screenshots or recordings to help explain your changes:

Additional Context

Add any other context about the PR here:

  • Links to related issues or PRs
  • References to external documentation
  • Design decisions and trade-offs
  • Known limitations or future work

Checklist

  • I have read the CONTRIBUTING.md guide
  • My branch name follows the convention (feature/, fix/, docs/, etc.)
  • My commit messages follow the conventional commit format
  • I have tested my changes locally
  • I have updated the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • All new and existing tests pass
  • I have checked for security vulnerabilities in any new dependencies

Reviewer Notes

Please pay special attention to:

Post-Merge Tasks

Tasks to complete after merging (if any):

  • Update external documentation
  • Announce in discussions/community
  • Create follow-up issues for future work

Summary by CodeRabbit

  • New Features

    • AgentOps tracing integration with optional tracing toggle for richer run traces
    • Scenario execution now runs asynchronously
  • Refactor

    • Broadened factory method options for more flexible integrations
    • Introduced structured dataset row format and optional metadata support
    • Safer metadata access in metrics collection and clarified type annotations across the codebase

✏️ Tip: You can customize this high-level summary in your review settings.

@continue
Copy link

continue bot commented Dec 20, 2025

All Green - Keep your PRs mergeable

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts


Unsubscribe from All Green comments

@coderabbitai
Copy link

coderabbitai bot commented Dec 20, 2025

Walkthrough

Replaces LangSmith run management with AgentOps tracing across the AgentOps adapter, makes run_scenario async, switches project identifier usage to project_id, broadens factory method option typings to Any, introduces a DatasetRow TypedDict, and adds defensive metadata access and explicit type annotations.

Changes

Cohort / File(s) Change Summary
AgentOps Integration & Tracing Refactor
src/agentunit/adapters/agentops_adapter.py
Adds enable_tracing handling and self.client, replaces LangSmith run lifecycle with agentops.start_trace / agentops.update_trace_metadata / agentops.end_trace, converts run_scenario to async, updates session/message/interaction methods to emit AgentOps metadata, and switches internal usage from project_name to project_id.
Factory Method Type Hints
src/agentunit/core/scenario.py
Replaces **options: object with **options: Any across multiple factory methods (from_autogen, from_haystack, from_llama_index, from_semantic_kernel, from_phidata, from_promptflow, from_openai_swarm, from_anthropic_bedrock, from_mistral_server, from_rasa_endpoint, from_openai_agents, from_crewai). Adds Any to TYPE_CHECKING imports and refines name resolution in from_openai_agents.
Dataset Type Structure
src/agentunit/datasets/builtins.py
Introduces DatasetRow TypedDict (fields: id, query, expected_output, tools, context, optional metadata), updates _GAIA_L1_SHOPPING and _SWE_BENCH_LITE to list[DatasetRow], and updates _build_loader(rows: list[DatasetRow]).
Type Safety & Defensive Coding
src/agentunit/metrics/builtin.py, src/agentunit/production/integrations.py
Replaces direct trace.metadata access with getattr(trace, "metadata", {}) in CostMetric and TokenUsageMetric. Adds explicit local type annotation baseline_stats: dict[str, dict[str, dict[str, float]]] in _calculate_baseline_stats.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant AgentOpsAdapter as Adapter
  participant ScenarioRunner as Runner
  participant AgentOps as AgentOpsClient

  Client->>Adapter: run_scenario(scenario)
  Note right of Adapter: run_scenario is async
  Adapter->>AgentOps: start_trace(trace_metadata)
  AgentOps-->>Adapter: trace_id
  Adapter->>Runner: execute scenario (trace_id)
  Runner-->>Adapter: scenario result / events
  Adapter->>AgentOps: update_trace_metadata(trace_id, result/metrics)
  alt success
    Adapter->>AgentOps: end_trace(trace_id, status=completed)
  else failure
    Adapter->>AgentOps: end_trace(trace_id, status=failed, details)
  end
  Adapter-->>Client: ScenarioResult (includes scenario_run_id derived from trace_id)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning Issue #13 specifically requires type hints for DatasetCase in src/agentunit/datasets/base.py, but the PR changes focus on multiple other files with only indirect related changes to builtins.py, leaving the primary objective unaddressed. Ensure that src/agentunit/datasets/base.py receives explicit type hints for all DatasetCase fields to directly address issue #13's acceptance criteria.
Out of Scope Changes check ⚠️ Warning The PR modifies five files (agentops_adapter.py, scenario.py, builtins.py, builtin.py, integrations.py) with extensive type hint and signature changes beyond the scope of issue #13 which targets only DatasetCase in base.py. Clarify whether all modified files are part of issue #13 or if additional scope has been added; alternatively, create separate issues for the out-of-scope changes in adapters, core, metrics, and production modules.
Docstring Coverage ⚠️ Warning Docstring coverage is 48.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Fix miscellaneous type hints' is vague and generic, using non-descriptive language that doesn't convey the specific nature or scope of the changes. Provide a more descriptive title that specifies the type of changes (e.g., 'Add type hints and improve type safety across multiple modules' or 'Fix type hint compliance in adapters, core, datasets, metrics, and production modules').
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The description covers the key sections but is sparse; it lists affected files and testing results without providing comprehensive context about the motivation or detailed changes.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/agentunit/adapters/agentops_adapter.py (1)

73-87: Remove inconsistent and outdated docstring block.

Lines 73-87 contain a duplicate/outdated docstring that references "LangSmith" instead of "AgentOps" and has incorrect parameter descriptions (e.g., "Langsmith project ID" on line 78). This conflicts with the actual AgentOps docstring above (lines 56-65).

🔎 Proposed fix
-        """
-        Initialize LangSmith adapter.
-
-        Args:
-            api_key: LangSmith API key
-            project_id: Langsmith project ID
-            endpoint: Optional custom LangSmith endpoint
-            enable_tracing: Whether to enable automatic tracing
-            enable_feedback: Whether to collect feedback data
-        """
-        self.api_key = api_key
-        self.project_id = project_id
-        self.default_tags = default_tags or []
-        self.auto_start_session = auto_start_session
-
🧹 Nitpick comments (2)
src/agentunit/adapters/agentops_adapter.py (2)

70-71: Type annotation added for new attributes.

The self.client: Any annotation provides some type information, though Any sacrifices type safety. Consider using a protocol or the actual agentops client type when available for better type checking.


103-103: Document the import suppression or address the underlying issue.

The # type: ignore[import-not-found] comment suppresses mypy's missing import error. This aligns with the PR description noting that ragas import issues remain. Consider documenting why this is acceptable (e.g., optional dependency) in a comment.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 315a9a4 and cec0c1d.

📒 Files selected for processing (5)
  • src/agentunit/adapters/agentops_adapter.py (13 hunks)
  • src/agentunit/core/scenario.py (12 hunks)
  • src/agentunit/datasets/builtins.py (4 hunks)
  • src/agentunit/metrics/builtin.py (2 hunks)
  • src/agentunit/production/integrations.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/agentunit/adapters/agentops_adapter.py (4)
src/agentunit/adapters/langsmith_adapter.py (2)
  • platform (104-106)
  • run_scenario (355-516)
src/agentunit/production/integrations.py (1)
  • MonitoringPlatform (33-41)
src/agentunit/core/scenario.py (1)
  • Scenario (23-278)
src/agentunit/reporting/results.py (1)
  • ScenarioResult (32-66)
src/agentunit/datasets/builtins.py (1)
src/agentunit/datasets/base.py (1)
  • DatasetCase (19-27)
src/agentunit/core/scenario.py (2)
src/agentunit/datasets/base.py (1)
  • name (38-39)
tests/test_framework_adapters.py (1)
  • flow (105-107)
🔇 Additional comments (11)
src/agentunit/production/integrations.py (1)

100-100: LGTM! Type annotation correctly describes the baseline stats structure.

The explicit type annotation accurately reflects the nested dictionary structure used for categorizing baseline statistics by metric category, metric name, and statistical measures.

src/agentunit/metrics/builtin.py (2)

146-148: LGTM! Defensive metadata access prevents AttributeError.

The change from direct trace.metadata access to getattr(trace, "metadata", {}) safely handles cases where the trace object may not have a metadata attribute, falling back to an empty dict.


172-174: LGTM! Consistent defensive pattern applied.

The same safe metadata access pattern is correctly applied here, maintaining consistency with CostMetric above.

src/agentunit/datasets/builtins.py (4)

16-24: LGTM! DatasetRow TypedDict correctly structures dataset rows.

The TypedDict definition properly mirrors the DatasetCase dataclass structure from base.py, with appropriate use of NotRequired for the optional metadata field. This provides strong type checking for dataset row dictionaries.


27-27: LGTM! Explicit type annotation improves type safety.

The type annotation list[DatasetRow] provides clear type information for the dataset, enabling better IDE support and type checking.


44-44: LGTM! Consistent type annotation applied.

The same list[DatasetRow] annotation is consistently applied to this dataset.


64-64: LGTM! Function signature properly typed.

The updated signature with list[DatasetRow] parameter type and Iterable[DatasetCase] return type provides clear type contracts for the loader function.

src/agentunit/core/scenario.py (2)

98-98: **LGTM! Using Any for kwargs is more appropriate than object.

The change from **options: object to **options: Any is the correct approach for variadic keyword arguments, providing better flexibility while maintaining type safety at call sites.

Also applies to: 116-116, 131-131, 146-146, 161-161, 176-176, 191-191, 207-207, 222-222, 242-242


71-71: Verify the necessity of the additional None check.

The change from getattr(flow, "__name__", "openai-agents-scenario") to getattr(flow, "__name__", None) or "openai-agents-scenario" adds protection against falsy __name__ values (like empty strings). However, this differs from the pattern used in other factory methods (lines 89, 107, 122, etc.).

Is there a specific case where flow.__name__ could be an empty string or other falsy value? If not, consider keeping consistency with other factory methods for maintainability.

#!/bin/bash
# Check if other factory methods in the same file use similar patterns
rg -n "getattr.*__name__" src/agentunit/core/scenario.py
src/agentunit/adapters/agentops_adapter.py (2)

355-355: LGTM! Explicit type annotation improves clarity.

The agent_counts: dict[str, int] annotation clearly documents the structure of the dictionary being built.


385-385: AgentOpsAdapter.run_scenario is async, but inconsistently implemented across adapters.

The run_scenario method in AgentOpsAdapter is now async, while LangSmithAdapter and WandbAdapter retain sync implementations. This creates API inconsistency: the abstract base class declares it as async, but not all adapters comply. Additionally, the runner's main execution path calls adapter.execute() rather than adapter.run_scenario(), so this breaking change only affects direct callers of the method. Consider either making all adapters async or clarifying whether run_scenario is the intended public interface (if not, it could be internal).

Comment on lines 342 to 350
if isinstance(interaction.from_agent, list):
unique_agents.update(interaction.from_agent)
else:
unique_agents.add(interaction.from_agent)
if interaction.to_agent:
unique_agents.add(interaction.to_agent)
if isinstance(interaction.to_agent, list):
unique_agents.update(interaction.to_agent)
else:
unique_agents.add(interaction.to_agent)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Find the agentops_adapter.py file
find . -type f -name "agentops_adapter.py" | head -20

# Check the file structure
fd "agentunit" --type d | head -10

Repository: aviralgarg05/agentunit

Length of output: 125


🏁 Script executed:

#!/bin/bash

# First, let's check if the file exists and get basic info
if [ -f "src/agentunit/adapters/agentops_adapter.py" ]; then
  echo "File found. Checking lines 240-260 for AgentInteraction definition..."
  sed -n '240,260p' src/agentunit/adapters/agentops_adapter.py
  echo ""
  echo "Checking lines 342-350 for the code in review..."
  sed -n '342,350p' src/agentunit/adapters/agentops_adapter.py
else
  echo "File not found at expected location. Searching for it..."
  find . -name "agentops_adapter.py" -type f 2>/dev/null
fi

# Also search for AgentInteraction class definition
echo ""
echo "Searching for AgentInteraction class definition..."
rg "class AgentInteraction" --type py -A 10 2>/dev/null || echo "Pattern not found with rg"

Repository: aviralgarg05/agentunit

Length of output: 2159


Remove unnecessary list handling for from_agent.

The to_agent field is properly typed as AgentID | list[AgentID] to support broadcast messaging, so the list checks for that field are correct. However, from_agent is typed only as AgentID (string), not as a union type. Remove the isinstance check for from_agent being a list, or update its type hint to AgentID | list[AgentID] if list support is intentional.

🤖 Prompt for AI Agents
In src/agentunit/adapters/agentops_adapter.py around lines 342 to 350, the code
treats interaction.from_agent as possibly being a list but its type is only
AgentID (string); remove the unnecessary isinstance(interaction.from_agent,
list) branch and simply call unique_agents.add(interaction.from_agent), unless
you intentionally want to support broadcasting from multiple agents—in that case
update the type hint for from_agent to AgentID | list[AgentID] and keep the
existing list handling; ensure any callers and type annotations are updated to
match the chosen approach.

Copy link
Owner

@aviralgarg05 aviralgarg05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the issues

@codecov-commenter
Copy link

codecov-commenter commented Dec 27, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 5.26316% with 36 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/agentunit/adapters/agentops_adapter.py 11.76% 15 Missing ⚠️
src/agentunit/datasets/builtins.py 0.00% 12 Missing ⚠️
src/agentunit/metrics/builtin.py 0.00% 6 Missing ⚠️
src/agentunit/core/scenario.py 0.00% 2 Missing ⚠️
src/agentunit/production/integrations.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
src/agentunit/adapters/agentops_adapter.py (8)

73-86: Remove duplicate LangSmith code block.

Lines 73-86 contain a duplicate docstring and attribute assignments copied from the LangSmith adapter. The docstring incorrectly states "Initialize LangSmith adapter" and "Langsmith project ID", and the attribute assignments (api_key, project_id, default_tags, auto_start_session) redundantly repeat lines 66-69. This dead code reduces maintainability and creates confusion.

🔎 Proposed fix
-        """
-        Initialize LangSmith adapter.
-
-        Args:
-            api_key: LangSmith API key
-            project_id: Langsmith project ID
-            endpoint: Optional custom LangSmith endpoint
-            enable_tracing: Whether to enable automatic tracing
-            enable_feedback: Whether to collect feedback data
-        """
-        self.api_key = api_key
-        self.project_id = project_id
-        self.default_tags = default_tags or []
-        self.auto_start_session = auto_start_session
-
         # Initialize AgentOps client
         self._initialize_agentops()

380-388: Update docstring and log message to reference AgentOps instead of LangSmith.

The docstring at lines 380-386 states "Run a scenario with LangSmith integration" and the log message at line 389 references "LangSmith", but this is the AgentOps adapter. Update all references to reflect the correct integration.

🔎 Proposed fix
     async def run_scenario(self, scenario: Scenario) -> ScenarioResult:
         """
-        Run a scenario with LangSmith integration.
+        Run a scenario with AgentOps integration.
 
         Args:
             scenario: Scenario to execute
 
         Returns:
-            ScenarioResult: Execution results with LangSmith trace data
+            ScenarioResult: Execution results with AgentOps trace data
         """
-        logger.info(f"Running scenario with LangSmith: {scenario.name}")
+        logger.info(f"Running scenario with AgentOps: {scenario.name}")
 
-        # Start LangSmith run for the scenario
+        # Start AgentOps run for the scenario
         scenario_run_id = None

484-496: Update remaining LangSmith references to AgentOps.

Lines 484, 524, and associated code comments still reference "LangSmith run" when they should reference AgentOps traces. Update these comments for consistency.

🔎 Proposed fix
-            # Update LangSmith run with results
+            # Update AgentOps trace with results
             if scenario_run_id and self.enable_tracing:
                 try:
                     self.agentops.update_trace_metadata(

Apply similar changes to line 524.

Also applies to: 524-534


537-548: Update collect_metrics docstring to reference AgentOps.

The docstring at lines 539-547 states "Collect production metrics from LangSmith" but this is the AgentOps adapter. Update the docstring to reflect the correct integration platform.

🔎 Proposed fix
     def collect_metrics(self, scenario: Any, result: Any, **kwargs) -> ProductionMetrics:
         """
-        Collect production metrics from LangSmith.
+        Collect production metrics from AgentOps.
 
         Args:
             scenario: The scenario being evaluated

612-677: Update establish_baseline docstring and log messages to reference AgentOps.

The docstring at line 616 states "Establish baseline metrics from historical LangSmith data" and log messages reference "LangSmith baseline" and "LangSmith" errors. Update all references to reflect AgentOps integration.

🔎 Proposed fix
     def establish_baseline(
         self, historical_data: list[dict[str, Any]], metrics: list[str], **kwargs
     ) -> BaselineMetrics:
         """
-        Establish baseline metrics from historical LangSmith data.
+        Establish baseline metrics from historical AgentOps data.
 
         Args:
             historical_data: Historical data for baseline calculation

Apply similar updates to log messages at lines 642, 670, etc.


679-697: Update helper method docstring to reference AgentOps.

Line 680 docstring states "Extract metrics from LangSmith runs" but should reference AgentOps for consistency with the adapter's purpose.

🔎 Proposed fix
     def _extract_run_metrics(self, runs):
-        """Extract metrics from LangSmith runs."""
+        """Extract metrics from AgentOps runs."""
         durations = []

713-747: Update create_evaluation_dataset docstring and log messages to reference AgentOps.

The docstring at line 717 and log messages at lines 742 and 746 reference "LangSmith" when they should reference AgentOps.

🔎 Proposed fix
     def create_evaluation_dataset(
         self, name: str, examples: list[dict[str, Any]], description: str | None = None
     ) -> str:
         """
-        Create an evaluation dataset in LangSmith.
+        Create an evaluation dataset in AgentOps.
 
         Args:
             name: Dataset name

Apply similar updates to log messages.


749-776: Critical: run_evaluation uses LangSmith SDK instead of AgentOps SDK.

The run_evaluation method at lines 749-776 imports and calls langsmith.evaluation.evaluate, but in an AgentOps adapter where self.client is initialized as the AgentOps SDK (line 71: self.client = agentops). This will cause a runtime failure. Additionally, the current AgentOps Python SDK does not provide evaluation dataset or evaluation-run APIs (marked as "forthcoming" in their roadmap). Either remove this method or completely rewrite it using AgentOps APIs once they become available. Also fix the duplicate LangSmith docstring in __init__ (lines 73-82).

🧹 Nitpick comments (1)
src/agentunit/adapters/agentops_adapter.py (1)

72-72: Consider making platform a property for consistency with other adapters.

Line 72 assigns self.platform directly as an instance attribute. However, other adapters (LangSmithAdapter and WandbAdapter) implement platform as a @property method returning MonitoringPlatform. For consistency across adapters, consider refactoring this to match the pattern:

🔎 Proposed refactor

Remove line 72 and add a property method after the __init__ method:

-        self.platform = MonitoringPlatform.AGENTOPS

Then add after the __init__ method (around line 98):

    @property
    def platform(self) -> MonitoringPlatform:
        """Return the monitoring platform type."""
        return MonitoringPlatform.AGENT_OPS

Note: Use MonitoringPlatform.AGENT_OPS (with underscore) to match the enum definition in src/agentunit/production/integrations.py.

Based on relevant code snippets showing the pattern used by LangSmithAdapter and WandbAdapter.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cec0c1d and e0c8efd.

📒 Files selected for processing (1)
  • src/agentunit/adapters/agentops_adapter.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/agentunit/adapters/agentops_adapter.py (4)
src/agentunit/adapters/langsmith_adapter.py (1)
  • platform (104-106)
src/agentunit/adapters/wandb_adapter.py (1)
  • platform (117-119)
src/agentunit/production/integrations.py (1)
  • MonitoringPlatform (33-41)
src/agentunit/reporting/results.py (1)
  • ScenarioResult (32-66)
🔇 Additional comments (4)
src/agentunit/adapters/agentops_adapter.py (4)

245-245: LGTM! Default to "broadcast" addresses validation concern.

The fallback to_agent or "broadcast" correctly prevents the ValueError("to_agent is required") that would occur with an empty string, and aligns with the broadcast semantics used in other adapters.


348-348: LGTM! Explicit type annotation improves clarity.

Adding the explicit type hint dict[str, int] for agent_counts improves code clarity and helps with type checking.


70-71: LGTM! Good type hint additions.

Extracting enable_tracing from kwargs and adding the explicit type annotation self.client: Any improves code clarity. While Any is broad, it's reasonable for a dynamically-typed client library.


378-378: No changes needed - async signature is correct and required by MultiAgentAdapter abstract method.

The run_scenario method must be async because it implements the abstract method from MultiAgentAdapter (line 372 in src/agentunit/multiagent/__init__.py), which defines async def run_scenario. This is not a premature change—it is the required implementation signature for the base class. No internal callers were found because this is an abstract method implemented by subclasses; external consumers would already be designed to handle async execution when using these adapters.

Likely an incorrect or invalid review comment.

Comment on lines +103 to +106
import agentops # type: ignore[import-not-found]

self.agentops = agentops
self.client = agentops
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for list_runs usage in the codebase
echo "=== Searching for list_runs usage ==="
rg -n "\.list_runs\(" --type py -C 3

echo ""
echo "=== Searching for self.client usage in agentops_adapter.py ==="
rg -n "self\.client\." src/agentunit/adapters/agentops_adapter.py -C 2

echo ""
echo "=== Check lines 551, 637 specifically ==="
sed -n '549,553p; 635,639p' src/agentunit/adapters/agentops_adapter.py

echo ""
echo "=== Check if AgentOps client is instantiated ==="
rg -n "agentops\.(Client|init|get_session)" --type py -C 2

Repository: aviralgarg05/agentunit

Length of output: 4591


🌐 Web query:

AgentOps Python SDK client API list_runs method

💡 Result:

Do you mean the AgentOps SDK's method signature / example usage for a client.list_runs (to list runs/traces), or are you seeing an error when calling list_runs? Which language (pure agentops Python SDK, or the Public API via HTTP) and which version do you have?


🌐 Web query:

AgentOps Python SDK init method returns client session

💡 Result:

Yes — agentops.init() creates the SDK client and (by default) starts a session and returns a Session object. If you call agentops.init(auto_start_session=False) it will initialize the client without starting/returning a session. The docs also note the init call constructs the underlying Client (accessible via get_client()). [1][2]

Sources:

  • SDK reference (init, get_client). [1]
  • Sessions / Starting a session (auto_start_session behavior). [2]

🌐 Web query:

AgentOps SDK API methods list_runs create_dataset create_example

💡 Result:

I couldn't find public AgentOps SDK methods named exactly list_runs, create_dataset, or create_example in the AgentOps docs or SDK source. The official docs show the Python/TS SDK surface and a REST API (you can fetch the OpenAPI spec for full endpoint names), and AgentOps exposes session/run-related endpoints via its REST API. See:

  • SDK reference / quickstart (Python SDK overview). [1]
  • REST API / integrations page (mentions OpenAPI spec & session/run auth flow). [2]

If you want, I can:

  • fetch the AgentOps OpenAPI (JSON) and extract exact endpoint/method names (recommended), or
  • search the AgentOps GitHub repos for those exact method names and return code examples.

Which would you prefer?

Sources:
[1] https://docs.agentops.ai/v2/usage/sdk-reference
[2] https://docs.agentops.ai/v1/integrations/rest


🏁 Script executed:

#!/bin/bash
# Check the context around agentops.init() to see what it returns
echo "=== Lines around agentops.init() calls ==="
sed -n '105,125p' src/agentunit/adapters/agentops_adapter.py

echo ""
echo "=== Check what self.client is used for ==="
rg -n "self\.client\." src/agentunit/adapters/agentops_adapter.py | head -20

echo ""
echo "=== Check if there's a get_client or session pattern ==="
rg -n "agentops\.(get_client|get_session|Session|Client)" --type py

Repository: aviralgarg05/agentunit

Length of output: 1232


Critical issue: self.client is assigned to the agentops module but code calls non-existent LangSmith API methods.

At lines 103–106, the code assigns the agentops module to both self.agentops and self.client. However, agentops.init() (lines 110, 116) returns a Session object that is never captured. More critically, lines 551, 636, 728, and 735 attempt to call self.client.list_runs(), self.client.create_dataset(), and self.client.create_example() — methods that do not exist in the AgentOps SDK. These are LangSmith API methods (as evidenced by the comment "Query recent runs from LangSmith" at line 550). The adapter appears to be an incompletely adapted copy of the LangSmith adapter and will fail at runtime with AttributeError when these methods are invoked. Either capture the Session object returned by agentops.init() and use AgentOps-compatible APIs, or remove functionality that relies on non-existent methods.

🤖 Prompt for AI Agents
In src/agentunit/adapters/agentops_adapter.py around lines 103 to 106, the
adapter assigns the agentops module to self.client but never captures the
Session returned by agentops.init(), and later calls LangSmith-specific methods
(list_runs, create_dataset, create_example) that do not exist on AgentOps; fix
by changing the initialization to store the Session returned by agentops.init()
into self.client/self.session and refactor subsequent calls to use
AgentOps-compatible APIs on that Session, or remove/replace the
LangSmith-specific methods—ensure every call site (lines ~551, 636, 728, 735)
either calls a real AgentOps method on the Session or is removed, and update
imports/docs accordingly.

Comment on lines +394 to +401
run_id = self.agentops.start_trace(
name=f"Scenario: {scenario.name}",
run_type="chain",
project_name=self.project_name,
inputs={"scenario": scenario.name, "description": scenario.description},
project_name=self.project_id,
inputs={"scenario": scenario.name},
tags=["agentunit", "scenario"],
)
scenario_run_id = str(run.id)
scenario_run_id = str(run_id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

AgentOps Python SDK start_trace update_trace_metadata end_trace API documentation

💡 Result:

Here are the AgentOps Python SDK docs and the key details for start_trace, update_trace_metadata, and end_trace.

  • start_trace(trace_name: str = None, tags: Union[Dict, List] = None) — manually start a trace; returns a TraceContext (can be used as a context manager with with). Use when auto-start is disabled or you need separate traces. [1][2]

  • update_trace_metadata(metadata: Dict[str, Any], prefix: str = "trace.metadata") -> bool — update running-trace metadata (strings/numbers/booleans/lists), maps common keys to semantic conventions (e.g., tags, agent_name), validates types, returns True/False. Useful to add progress, stage, results mid-run. [1][2][3]

  • end_trace(trace: TraceContext = None, end_state: str = None, end_prompt: str = None, error_message: str = None) — end a specific trace (or all active traces if trace omitted); you can set an end_state like "Success"/"Error" and provide optional end details. Context-manager usage automatically ends traces on exit (Success or Error depending on exception). [1][2][4]

Concise examples (from docs):

  • Manual start/update/end
    import agentops
    agentops.init(api_key="KEY", auto_start_session=False)
    trace = agentops.start_trace("my-workflow", tags=["prod"])

    ... run ...

    agentops.update_trace_metadata({"stage":"processing","tags":["in-progress"]})
    agentops.end_trace(trace, "Success")
    [1][2]

Docs pages:

  1. SDK Reference (start_trace, update_trace_metadata, end_trace). [1]
  2. Manual Trace Control / examples. [2]
  3. Quickstart / updating trace metadata snippet. [3]
  4. Context manager / with start_trace behavior. [4]

Would you like direct links to any of those pages or a full code snippet tailored to your app?


🏁 Script executed:

cd /tmp && find . -name "agentops_adapter.py" -type f 2>/dev/null | head -1

Repository: aviralgarg05/agentunit

Length of output: 48


🏁 Script executed:

# Search for the file in the repository
git ls-files | grep agentops_adapter.py

Repository: aviralgarg05/agentunit

Length of output: 108


🏁 Script executed:

# Check imports and initialization in the adapter file
head -50 src/agentunit/adapters/agentops_adapter.py

Repository: aviralgarg05/agentunit

Length of output: 1388


🏁 Script executed:

# Search for other start_trace calls in the codebase
rg "start_trace" --type py

Repository: aviralgarg05/agentunit

Length of output: 160


🏁 Script executed:

# Check the full __init__ method and how self.agentops is initialized
sed -n '40,150p' src/agentunit/adapters/agentops_adapter.py

Repository: aviralgarg05/agentunit

Length of output: 3978


🏁 Script executed:

# Look for agentops import and initialization
rg "import.*agentops|from.*agentops|self\.agentops\s*=" src/agentunit/adapters/agentops_adapter.py -A 3

Repository: aviralgarg05/agentunit

Length of output: 235


🏁 Script executed:

# Check around line 394-401 for more context
sed -n '380,410p' src/agentunit/adapters/agentops_adapter.py

Repository: aviralgarg05/agentunit

Length of output: 1221


🏁 Script executed:

# Check if there's a custom wrapper around agentops being used
rg "class.*AgentOps|def start_trace|def update_trace_metadata|def end_trace" src/agentunit/adapters/agentops_adapter.py -B 2 -A 5

Repository: aviralgarg05/agentunit

Length of output: 316


🏁 Script executed:

# Check the entire run_scenario method to see how start_trace, update_trace_metadata, end_trace are used
sed -n '370,550p' src/agentunit/adapters/agentops_adapter.py

Repository: aviralgarg05/agentunit

Length of output: 6746


All three AgentOps trace method calls use incorrect parameter signatures incompatible with the official SDK.

Lines 394-401 call start_trace(name=, run_type=, project_name=, inputs=, tags=) but the AgentOps SDK signature is start_trace(trace_name: str = None, tags: Union[Dict, List] = None). The parameters run_type, project_name, and inputs do not exist in the SDK.

Lines 487-491 and 527-531 call update_trace_metadata(trace_id=, metadata=) and end_trace(trace_id=, status_code=) which use non-existent parameters. The correct signatures are:

  • update_trace_metadata(metadata: Dict[str, Any], prefix: str = "trace.metadata")
  • end_trace(trace: TraceContext = None, end_state: str = None, ...)

Additionally, the method docstring incorrectly states "Run a scenario with LangSmith integration" (line ~382) despite being in the AgentOpsAdapter class.

Refactor all three method calls to match the official AgentOps SDK signatures, and correct the docstring.

🤖 Prompt for AI Agents
In src/agentunit/adapters/agentops_adapter.py around lines 382 and 394-401 (and
also update calls near 487-491 and 527-531), the AgentOps SDK calls use the
wrong parameter names and the docstring is wrong: change the method docstring to
reference AgentOps integration (not LangSmith), refactor the start_trace call to
use the SDK signature start_trace(trace_name=..., tags=...) (move project_id,
run_type and inputs into the tags or into a metadata dict passed via
update_trace_metadata), refactor update_trace_metadata calls to call
update_trace_metadata(metadata_dict, prefix="trace.metadata") (pass a dict of
metadata as first arg), and refactor end_trace calls to call
end_trace(trace=trace_context_or_id, end_state="success"/"failed") using the
proper end_state string rather than status_code; ensure trace identifiers/types
match the SDK expected TraceContext or trace id usage and build metadata/tags as
dicts/lists accordingly.

Copy link
Owner

@aviralgarg05 aviralgarg05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls fix the issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add missing type hints to DatasetCase

3 participants