-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Hi all,
I'm quite new to programming and Rust in general, as forewarning!
I read the guidelines but honestly I couldn't read through too much,
Anyway the reason I post here today is because I wanted to share a really interesting function I made and I don't want to let it go to waste in some stupid crate noone will ever use.
It's a (conditionally) constant version of strlen, only when used in context on dirent64 structure's, it has no branches and using SWAR(simd within a register) to do the operation in one go. I didn't study computer science but sometimes mathematicians swap hats.
(You can probably add some safe wrappers where appropriate too)
It's probably got some snags, it's obviously Linux specific, I'm not sure about weird architectures, those are easy to feature flag away. Luckily due to no intrinsics it's not specifically architecture bound. I've not tested it enough(though to be fair, if this even compiles, it likely works!)
THERE ARE LIKELY A LOT OF CASES WHERE IT WONT WORK
However, I'm posting this here because I thought it would be of interest, especially being in readdir etc.
I apologise if this is in the wrong place, kinda happy and don't know enough.
Of course, it wouldn't be complete without benchmarks.
You can find my repo at
https://github.com/alexcu2718/fdf
then git clone ... and cargo bench.
strlen_comparison/dirent_const_time_single/case_1
time: [1.0157 ns 1.0208 ns 1.0260 ns]
thrpt: [8.7716 Gelem/s 8.8162 Gelem/s 8.8611 Gelem/s]
change:
time: [−24.104% −23.281% −22.455%] (p = 0.00 < 0.05)
thrpt: [+28.957% +30.346% +31.759%]
Performance has improved.
Found 12 outliers among 500 measurements (2.40%)
10 (2.00%) high mild
2 (0.40%) high severe
strlen_comparison/libc_strlen_single/case_1
time: [2.2868 ns 2.3168 ns 2.3488 ns]
thrpt: [3.8318 Gelem/s 3.8846 Gelem/s 3.9356 Gelem/s]
change:
time: [+2.6941% +3.4345% +4.2209%] (p = 0.00 < 0.05)
thrpt: [−4.0499% −3.3205% −2.6234%]
Performance has regressed.
## LONGESTSTRINGS(240~)
strlen_comparison/dirent_const_time_single/case_8
time: [1.0524 ns 1.0571 ns 1.0618 ns]
thrpt: [8.4765 Gelem/s 8.5142 Gelem/s 8.5523 Gelem/s]
change:
time: [−47.073% −46.039% −45.017%] (p = 0.00 < 0.05)
thrpt: [+81.875% +85.319% +88.938%]
Performance has improved.
Found 2 outliers among 500 measurements (0.40%)
1 (0.20%) high mild
1 (0.20%) high severe
strlen_comparison/libc_strlen_single/case_8
time: [4.1926 ns 4.2135 ns 4.2376 ns]
thrpt: [2.1238 Gelem/s 2.1360 Gelem/s 2.1466 Gelem/s]
change:
time: [−33.151% −32.598% −32.052%] (p = 0.00 < 0.05)
thrpt: [+47.170% +48.363% +49.592%]
/// Const-fn strlen for dirent's `d_name` field using bit tricks.
/// O(1) complexity, no branching, and no loops.
///
/// This function can't really be used in a const manner
/// It's probably the most efficient way to calculate the length
/// It calculates the length of the `d_name` field in a `libc::dirent64` structure without branching on the presence of null bytes.
/// It needs to be used on a VALID `libc::dirent64` pointer, and it assumes that the `d_name` field is null-terminated.
///
/// This is my own implementation of a constant-time strlen for dirents, which is useful for performance/learning.
///
/// Reference <https://github.com/lattera/glibc/blob/master/string/strlen.c#L1>
///
/// Reference <https://graphics.stanford.edu/~seander/bithacks.html#HasZeroByte>
///
/// Reference <https://github.com/Soveu/find/blob/master/src/dirent.rs>
///
/// Combining all these tricks, i made this beautiful thing!
/// # SAFETY
/// This function is `unsafe` because...read it
/// The caller must uphold the following invariants:
/// - The `dirent` pointer must point to a valid `libc::dirent64` structure
/// `SWAR` (SIMD Within A Register) is used to find the first null byte in the `d_name` field of a `libc::dirent64` structure.
pub const unsafe fn dirent_const_time_strlen(dirent: *const libc::dirent64) -> usize {
const DIRENT_HEADER_SIZE: usize = std::mem::offset_of!(libc::dirent64, d_name) + 1;
let reclen = unsafe { (*dirent).d_reclen as usize }; // we MUST cast this way, as it is not guaranteed to be aligned, so
// Calculate the number of u64 words in the record length
// Calculate find the start of the d_name field
// THIS WILL ONLY WORK ON LITTLE-ENDIAN ARCHITECTURES, I CANT BE BOTHERED TO FIGURE THAT OUT, qemu isnt fun
let last_word = unsafe { *((dirent as *const u8).add(reclen - 8) as *const u64) };
// Special case: When processing the 3rd u64 word (index 2), we need to mask
// the non-name bytes (d_type and padding) to avoid false null detection.
// The 0x00FF_FFFF mask preserves only the 3 bytes where the name could start.
// Branchless masking: avoids branching by using a mask that is either 0 or 0x00FF_FFFF
let mask = 0x00FF_FFFFu64 * ((reclen / 8 == 3) as u64); // (multiply by 0 or 1)
//we're bit manipulating the last word (a byte/u64) to find the first null byte
//this boils to a complexity of strlen over 8 bytes, which we then accomplish with a bit trick
// The mask is applied to the last word to isolate the relevant bytes.
// The last word is masked to isolate the relevant bytes, and then we find the first zero byte.
// the kernel guarantees that the d_name field is null-terminated, so we can safely use this trick.
let zero_bit = (last_word | mask).wrapping_sub(0x0101_0101_0101_0101)// 0x0101_0101_0101_0101 -> underflows the high bit if a byte is
& !(last_word | mask) //ensures only bytes that were zero retain the underflowed high bit.
& 0x8080_8080_8080_8080; // 0x8080_8080_8080_8080 -->This masks out the high bit of each byte, so we can find the first zero byte
// The trailing zeros of the zero_bit gives us the position of the first zero byte.
// We divide by 8 to convert the bit position to a byte position.
// The trailing zeros of the zero_bit gives us the position of the first zero byte.
// We subtract 7 to get the correct offset in the d_name field.
//>> 3 converts from bit position to byte index (divides by 8)
reclen - DIRENT_HEADER_SIZE - (7 - (zero_bit.trailing_zeros() >> 3) as usize)
}