Skip to content

Restructure the reports and artifacts structure  #319

@simonrosenberg

Description

@simonrosenberg

P2: to tackle once website is on.

Current issues

  • reports are stored in jsonl but there's no pydantic model to back them up so it's hard to understand how those report look like / how they are structured.
  • one consequence of that is gaia report having null entries for a reason we still don't understand (could be a bug somewhere) Strict pydantic validation would solve that issue.
  • Another consequence is cost reporting / latency reporting made hard to debug (I am not sure why it was failing actually? @juanmichelini )
  • Also, there are multiple sources of truth: infra errors are stored in outputs_errors.jsonl whereas other runs (critic-valide or not) are stored in the final output.jsonl. This has caused issues where submitted_instances is lower than the actual number of instances. A quick hack around it was to add yet another file in the gcs bucket with the real number of total instances. But this is hacky and adds complexity to the codebase. A better solution would be to have a single report that also contains the infra-errored instance ids.
  • To debug report_costs or push_to_index, a second workflow was added that adds duplicate code. One of the code paths should be deleted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions