perf(sanitize): optimize SQL parsing with ASCII fast-path (~43% faster) #2434

analytically · 2025-11-16T11:07:03Z

Significant performance improvements to SQL sanitization:

SanitizeSQL: 1632ns → 923ns/op (-43.4%)
Sanitize: 362ns → 348ns/op (-3.9%)
Memory: unchanged (544 B/op, 9 allocs/op)

Key optimizations:

ASCII fast-path in lexer state functions (rawState, singleQuoteState, doubleQuoteState, oneLineCommentState) - avoids UTF-8 decoding overhead for the 99%+ of SQL that is ASCII
Direct byte checks for lookaheads (e', --, /*, '', "") instead of UTF-8 decoding when checking for ASCII characters
Adaptive QuoteString allocation strategy:
- Short strings (≤64 bytes): worst-case preallocate
- Long strings (>64 bytes): scan-first for exact allocation

All optimizations maintain full UTF-8 safety and correctness. Benchmarked on Apple M1 Pro (darwin/arm64).

Significant performance improvements to SQL sanitization: - SanitizeSQL: 1632ns → 923ns/op (-43.4%) - Sanitize: 362ns → 348ns/op (-3.9%) - Memory: unchanged (544 B/op, 9 allocs/op) Key optimizations: 1. ASCII fast-path in lexer state functions (rawState, singleQuoteState, doubleQuoteState, oneLineCommentState) - avoids UTF-8 decoding overhead for the 99%+ of SQL that is ASCII 2. Direct byte checks for lookaheads (e', --, /*, '', "") instead of UTF-8 decoding when checking for ASCII characters 3. Adaptive QuoteString allocation strategy: - Short strings (≤64 bytes): worst-case preallocate - Long strings (>64 bytes): scan-first for exact allocation All optimizations maintain full UTF-8 safety and correctness. Benchmarked on Apple M1 Pro (darwin/arm64). Signed-off-by: Mathias Bogaert <mathias.bogaert@gmail.com>

analytically · 2025-11-22T10:39:38Z

Failing CI doesn't seem related but more to do with CockroachDB?

jackc · 2025-11-28T23:16:45Z

I'm a bit concerned about making changes to a security sensitive portion of the code.

How much does this impact real world performance? It should only happen when using the simple protocol which I hope is fairly rare.

Also, if I understand the ASCII fast path portions correctly, wouldn't it be impossible for the state characters (e.g. ', -, /) to be reachable in the UTF8 path? I would think all that the UTF8 path would do is consume characters.

analytically · 2025-12-01T11:45:24Z

Sounds fair. Rare code path vs stability risk for 0.0000+% performance change.

analytically closed this Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf(sanitize): optimize SQL parsing with ASCII fast-path (~43% faster) #2434

perf(sanitize): optimize SQL parsing with ASCII fast-path (~43% faster) #2434

analytically commented Nov 16, 2025

Uh oh!

analytically commented Nov 22, 2025

Uh oh!

jackc commented Nov 28, 2025

Uh oh!

analytically commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

perf(sanitize): optimize SQL parsing with ASCII fast-path (~43% faster) #2434

perf(sanitize): optimize SQL parsing with ASCII fast-path (~43% faster) #2434

Conversation

analytically commented Nov 16, 2025

Uh oh!

analytically commented Nov 22, 2025

Uh oh!

jackc commented Nov 28, 2025

Uh oh!

analytically commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants