Due to style preferences, there are quite a few layers of function calls. While this doesn't hurt performance when compiled with optimization, when compiled in debug mode it is not performantly competitive due to all the dispatching.
Maybe we can try to tell compilers to inline more stuff even in debug mode?