I am compiling as per the readme for cuBlas but would like to try mul_mat_q kernels to compare speeds. From what I gather these kernels are implemented using openblas?
Does this mean I have to separately compile a llama-cpp-python for each backend and uninstall them in between? Or can I compile one backend with both cuBlas and openblas?
Will mul_mat_q flag also work with cublas compiled only?