-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
P2: to tackle once website is on.
Current issues
- reports are stored in jsonl but there's no pydantic model to back them up so it's hard to understand how those report look like / how they are structured.
- one consequence of that is gaia report having null entries for a reason we still don't understand (could be a bug somewhere) Strict pydantic validation would solve that issue.
- Another consequence is cost reporting / latency reporting made hard to debug (I am not sure why it was failing actually? @juanmichelini )
- Also, there are multiple sources of truth: infra errors are stored in outputs_errors.jsonl whereas other runs (critic-valide or not) are stored in the final output.jsonl. This has caused issues where submitted_instances is lower than the actual number of instances. A quick hack around it was to add yet another file in the gcs bucket with the real number of total instances. But this is hacky and adds complexity to the codebase. A better solution would be to have a single report that also contains the infra-errored instance ids.
- To debug report_costs or push_to_index, a second workflow was added that adds duplicate code. One of the code paths should be deleted.
Metadata
Metadata
Assignees
Labels
No labels