Hello, thank you for sharing this great work!
I have some detailed questions about the KV cache compression experiments:
-
How exactly are the KV cache compression ratios calculated?
-
For all reported models and all compression ratios in the paper/experiments, could you share the specific parameter settings?
qk_rope_head_dim
kv_lora_rank
Thanks a lot for your time and support!