MT-Bench results in Llama2-7B

Hello, I encountered the MT-Bench issue when reproducing the PISSA results. The Llama2-7B trained with Pissa would experience repeated generation when generated using Fastchat, resulting in abnormally low test results. How did you handle this problem at that time? Of course, it is well known that the Llama series of models are very prone to situations where generation cannot be stopped.