Skip to content

Conversation

@posborne
Copy link
Collaborator

This is a port of the viz.py proof-of-concept (#286). The workflow for generate these reports is to:

  1. Capture raw data to JSON/CSV.
  2. Generate a report using 2 or more input data sets with one engine/flags designated as the baseline (by default the first engine found in the provided data files).

The report itself tries to provide clear, concise representation of the data that is statistically significant in the overview table with raw results data in detail sections which, along with statistics, can give an indication of how noisy and trustworthy differences in measured results for a run might be.

The report is intended to be shared; it does pull in a couple dependencies for graph generation from a CDN but is otherewise standalone.


Here's a report that was generated from this changeset showing:

  • Baseline: wasmtime v32 with no flags
  • wasmtime v39.0.1 with no flags
  • wasmtime v39.0.1 with epoch-interruption=y

Example Report:

report.html

Screenshot of ^^^ to pique interest:

image

This is a direct port of the viz.py proof-of-concept.  The
workflow for generate these reports is to:

1. Capture raw data to JSON/CSV.
2. Generate a report using 2 or more input data sets with
   one engine/flags designated as the baseline (by default
   the first engine found in the provided data files).

The report itself tries to provide clear, concise representation
of the data that is statistically significant in the overview
table with raw results data in detail sections which, along with
statistics, can give an indication of how noisy and trustworthy
differences in measured results for a run might be.

The report is intended to be shared; it does pull in a couple
dependencies for graph generation from a CDN but is otherewise
standalone.
@posborne posborne force-pushed the html-report-command branch from ba4553c to 50f6653 Compare December 16, 2025 17:04
@fitzgen
Copy link
Member

fitzgen commented Dec 16, 2025

Awesome! I'll take a look at this tomorrow

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

Not something we need to do now but it would be cool to integrate the HTML reports into our existing output format machinery, rather than (or in addition to) having it be a separate CLI subcommand, so we could just do something like

$ sightglass benchmark --output-format html ...

Also, it would be nice to show the confidence interval as a percentage (in addition or instead of the absolute value) since, at least for me personally, the absolute values are much less useful at a glance than the percentage. Maybe the "Performance" column should include a +/- 1.23% or something? Lots of ways we could display this, don't care too much about the details, more just that we try to make things as minimally noisy and easy to read at a glance as possible.

Thanks again!

@fitzgen
Copy link
Member

fitzgen commented Dec 17, 2025

Oh also, looking at the screenshot, it is not clear to me which phase we are looking at data for: compilation, instantiation, or execution. That should probably be made clear.

@posborne
Copy link
Collaborator Author

Thanks for the review @fitzgen; I created issues to track the follow-up work. If you're OK with it, I would like to merge this as-is and visit those items later.

@fitzgen fitzgen merged commit 2dffe0d into bytecodealliance:main Dec 18, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants