Chromium's zlib fork has a very fast CRC32 implementation using SSE or AVX extension, whichever available on the machine.
Would it be preferable to implement this feature here?
Note that the code there only work for CRC32 (with the fixed polynomial), so we need to either
- generalize it, or
- use template specialization to use it in the specific case of CRC32.