Confused about mul_mat_q and multiple backends.

I am compiling as per the readme for cuBlas but would like to try mul_mat_q kernels to compare speeds. From what I gather these kernels are implemented using openblas? 

Does this mean I have to separately compile a llama-cpp-python for each backend and uninstall them in between? Or can I compile one backend with both cuBlas and openblas?

Will mul_mat_q flag also work with cublas compiled only?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Confused about mul_mat_q and multiple backends. #626

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Confused about mul_mat_q and multiple backends. #626

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions