First of all, thank you for your excellent work!
I wanted to understand why the chunks are still being passed in the prompt even when using the KV cache. From my understanding, the purpose of KV caching is to avoid passing chunks in the prompt and to prevent redundant computations in the attention matrices .
Could you please clarify this? am i missing something here ?