Skip to content

Conversation

@avocadoali
Copy link
Collaborator

@avocadoali avocadoali commented Jan 13, 2026

Not sure if needed but it keeps the error objects consistent

@avocadoali avocadoali requested review from Copilot and maharajamihir and removed request for Copilot January 13, 2026 07:48
@avocadoali avocadoali changed the title chore: create failure object chore: create failure objects Jan 13, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors inline dictionary creation into factory functions for evaluation result objects, improving code maintainability and consistency.

Changes:

  • Added three factory functions (create_sample_result_success, create_sample_result_failure, create_task_result_failure) to standardize result object creation
  • Replaced inline dictionary literals with factory function calls throughout the evaluation logic
  • Improved consistency by ensuring all result objects include core fields like generated_command and exact_match even in error cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@maharajamihir maharajamihir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rethink variable naming but otherwise looks good 👍🏾

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def create_task_result_failure(
task_id: str,
error: str,
had_error: bool = True,
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The had_error parameter with a default value of True is never overridden in any usage (lines 562-565, 570-573). Since this is a failure factory function, had_error should always be True. Consider removing this parameter and hardcoding the field to True in the returned dictionary, as it adds unnecessary complexity without any benefit.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good?

Comment on lines +369 to +383
"""
return {
"sample_idx": sample_idx,
"choice_idx": choice_idx,
"task_id": task_id,
"error": error,
"equivalent": equivalent,
"exact_match": exact_match,
"generated_command": generated_command,
"generated_command_empty": generated_command_empty,
"format_valid": format_valid,
"format_reason": format_reason,
}


Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The factory function now adds generated_command and exact_match fields to failure results where they were previously omitted (e.g., empty command case at line 446-452). While this improves consistency with successful evaluation results, it changes the structure of failure objects. Ensure that any downstream code or data consumers that process these results can handle the additional fields, or document this as a breaking change.

Suggested change
"""
return {
"sample_idx": sample_idx,
"choice_idx": choice_idx,
"task_id": task_id,
"error": error,
"equivalent": equivalent,
"exact_match": exact_match,
"generated_command": generated_command,
"generated_command_empty": generated_command_empty,
"format_valid": format_valid,
"format_reason": format_reason,
}
Note:
To avoid breaking downstream consumers that relied on the previous
structure of failure objects, optional fields such as
``generated_command`` and ``exact_match`` are only attached when they
carry non-default values.
"""
result: Dict[str, Any] = {
"sample_idx": sample_idx,
"task_id": task_id,
"error": error,
"equivalent": equivalent,
"generated_command_empty": generated_command_empty,
"format_valid": format_valid,
"format_reason": format_reason,
}
# Add optional fields if provided / non-default
if choice_idx is not None:
result["choice_idx"] = choice_idx
if generated_command:
result["generated_command"] = generated_command
if exact_match:
result["exact_match"] = exact_match
return result

Copilot uses AI. Check for mistakes.
@avocadoali avocadoali marked this pull request as draft January 13, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants