Hello,
are you aware of the other half implementation for x86_64?
https://github.com/melowntech/half
If so, could you compare them, please?
I'll be experimenting with using float16_t for a neural net library that runs on CPU, with intention to save CPU cache; the operations will be only +,-,> arithmetics on halfs (halves? :) )
Thank you,