e2e demonstration for bigger models

Hi, thank you for great work and efforts.

Current kernels seem to support only dimensions of 7B models with hidden dimension 4096.
How can I extend it for larger models like Llama-30B or 65B?
It returns an error when I just add template instances for larger dimension.

Thank you.