-
Notifications
You must be signed in to change notification settings - Fork 15.5k
Open
Description
Found while trying to implement a fast black-box primitive.
char f(char x) {
asm("nop" : "+r"(x));
return x * 3;
}
short g(short x) {
asm("nop" : "+r"(x));
return x * 3;
}
char h(char x) {
return x * 3;
}f(char):
nop
movzx eax, dil
lea eax, [rax + 2*rax]
ret
g(short):
nop
lea eax, [rdi + 2*rdi]
ret
h(char):
lea eax, [rdi + 2*rdi]
retThe line movzx eax, dil in f can be omitted (and, indeed, GCC omits it). I initially thought this was some kind of dependency-breaking optimization, but I'm not sure anymore. For one thing, it's not done for 16-bit numbers (g), which would seemingly suffer from the same issue. It is also not done in h, where the input to lea is the function argument, which by psABI has undefined top bits. If this is an optimization attempt, it seems more like a pessimization after inline assembly, which the author supposedly made as efficient as possible, and there's no way to opt out of the zero-extenion.