Restructure the reports and artifacts structure 

P2: to tackle once website is on.

Current issues
- reports are stored in jsonl but there's no pydantic model to back them up so it's hard to understand how those report look like / how they are structured.
- one consequence of that is gaia report having null entries for a reason we still don't understand (could be a bug somewhere) Strict pydantic validation would solve that issue.
- Another consequence is cost reporting / latency reporting made hard to debug (I am not sure why it was failing actually? @juanmichelini )
- Also, there are multiple sources of truth: infra errors are stored in outputs_errors.jsonl whereas other runs (critic-valide or not) are stored in the final output.jsonl.  This has caused issues where submitted_instances is lower than the actual number of instances. A quick hack around it was to add yet another file in the gcs bucket with the real number of total instances. But this is hacky and adds complexity to the codebase. A better solution would be to have a single report that also contains the infra-errored instance ids.  
- To debug report_costs or push_to_index, a second workflow was added that adds duplicate code. One of the code paths should be deleted. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Restructure the reports and artifacts structure #319

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Restructure the reports and artifacts structure #319

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions