In #104, we fixed the slow time in this repo. But when using ECC as a rust crate (like in EthFullConsensus), it still requires manually set target-cpu=native, otherwise SIMD operations will be functions, and it costs a lot for function calling.
For regular circuits, we usually only compile in ECC, and then run with expander-exec. The performance impact is relatively small.
But for Zkcuda, everything is done in ECC, it's really 10x time.