-
Notifications
You must be signed in to change notification settings - Fork 17
Reduce size of Bytes
#20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Sorry oversight on my part there |
saecki
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I quite like this optimization. I've left a few little nitpicky comments, but otherwise I think looks good!
laurmaedje
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just two minor nitpicks. otherwise, lgtm!
|
rust-analyzer trolling me |
|
Assuming you forgot to take a second look at this, if not all good ^^ |
|
It was still on my board and back of my mind, but I hadn't yet gotten back to working through the board. But no more blockers here. Thanks! |
I was randomly looking at the source of this package for inspiration on an unrelated project which required a compressed trie datastructure and I soon noticed that the
Bytesstruct, whose size constitute a bottleneck for the maximum size of words which can be hyphenated without allocation, had a significant amount of extraneous storage.The
array::IntoItertype stores twousizeindices to be able to implementDoubleSidedIterator, and we stored an extrausizejust for the length of the iterator, which can already be statically computed from the iterator itself.Switching to storing the array inline and reimplementing the iterator methods lowered this overhead by 16 bytes (3
usizes to just one). This doesn't even make the code look much lower levelWe may further reduce memory usage by realising our indices have low bounds, so we can just use a single
u8for the index.By changing that into a
NonZeroU8we also remove memory overhead whenallocis enabled, thanks to niche optimisation.Previously, the max word size has been documented as 40 bytes in some places and 41 in others, whilst in practice it was only 38, as two bytes are reserved for matching the start and end of words.
With this change, the (true) max has been increased to 45 bytes, as alignment restrictions mean any length between 38 and 45 (inclusive) will result in the same size for the
Bytesstruct.The previous size of
Byteswas 72, bringing the size ofSyllablesto 96. The new sizes are 48 and 72, respectively.If we were to push
Bytesup to the limit of its previous size of 72, we could store as many as 69 bytes of word without allocating.Additionally, I opted to expose a new public constant
MAX_WORD_LENto avoid a future desync between the actual max length and the documented one, especially in downstream users.I also added tests to ensure that constant is correct. The old code passes those tests for a
MAX_WORD_LENvalue of 38.I haven't found a performance regression on my machine using
cargo bench -p hypher-bench. I'm not sure if that's how that extra crate is meant to be used, usually I just see benchmarks specified in thebenchsection ofCargo.toml.