-
Notifications
You must be signed in to change notification settings - Fork 102
perf: SIMD scan ASCII runs in input loop #288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a SIMD-accelerated fast path for scanning and processing contiguous runs of printable ASCII characters (0x20-0x7E) in the input loop, bypassing the parser for these common cases to reduce per-byte overhead. The optimization shows a ~4x improvement in the benchmark (22,725 ns/frame → 5,743 ns/frame for mixed input streams).
Key Changes:
- Added
asciiPrintableRunLen()function that uses SIMD vector operations to scan for printable ASCII runs - Integrated ASCII fast path in the non-Windows input loop to emit key_press events directly for ASCII characters
- Added benchmark functions to compare baseline parser performance against the SIMD-optimized path
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/Loop.zig | Implements asciiPrintableRunLen() SIMD function and integrates ASCII fast path in the input loop before parser invocation |
| bench/bench.zig | Adds asciiPrintableRunLen() function copy and benchmark harnesses (benchParseStreamBaseline, benchParseStreamSimd) to measure performance improvement |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Problem
Input parsing reads one byte at a time and funnels everything through Parser, even when large ASCII runs are present. This adds avoidable per-byte overhead in common input streams.
Fix
Add a SIMD-assisted scan in the input loop to detect contiguous printable ASCII runs (0x20..0x7E). For these runs we emit key_press events directly. If the next byte begins a combining mark, we leave the last ASCII byte for the parser to avoid breaking combining/keycap sequences.
Bench (local, zig build bench, iterations=200, 80x24)
Mixed stream: ASCII + CSI + UTF-8
Improvement: -18,539 ns/frame (-77.2%), 4.38x speedup.
Tests