Hi, thank you for great work and efforts.
Current kernels seem to support only dimensions of 7B models with hidden dimension 4096.
How can I extend it for larger models like Llama-30B or 65B?
It returns an error when I just add template instances for larger dimension.
Thank you.