qwen2-7b and llama2-7b unable to deploy

Dear authors,

I tried your script in your readme without changing one line to convert llama2-7b-hf. The conversion process itself run smoothly without error, however the converted model did not work properly. 

I faced following errors:

- If convert without --deepseek-style, vllm or sglang online serving will not recognize LlamaMLAforCasualLM.
- If convert with --deepseek-style, vllm or sglang online serving start successfully, but responding requests with nonsense characters.
- Vllm offline engine will hang, sglang offline engine will say out of memory (I tried 8*A100, tp8, still OOM)

I tried delopy with sglang and vllm online serving and the server output random characters.

My env: cuda 12.5 on A100 device, torch 2.4 py310, vllm 0.8.2, sglang 0.4.6

Conversion command:
python transmla/converter.py \
    --model-path Llama-2-7b-hf \
    --save-path ./outputs/llama2-7b-deepseek \
    --dtype bf16 \
    --device auto \
    --cal-dataset alpaca \
    --cal-nsamples 128 \
    --cal-max-seqlen 256 \
    --cal-batch-size 8 \
    --ppl-eval-batch-size 4 \
    --freqfold auto \
    --collapse auto \
    --qk-mqa-dim 64 \
    --q-lora-rank 512 \
    --kv-lora-rank 512
    --deepseek-style
also tried:
bash scripts/convert/qwen2.5-7B-Instruct.sh

conversion log:

![Image](https://github.com/user-attachments/assets/690f78b8-c434-4a81-a225-3051cbd72a72)

launch server command:

vllm serve --model-path  Llama-2-7b-hf-deepseek
python -m sglang.launch_server  --model-path  Llama-2-7b-hf-deepseek
request:
![Image](https://github.com/user-attachments/assets/57cd2a5b-7bd1-4eee-877e-50ba72cf3157)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

qwen2-7b and llama2-7b unable to deploy #24

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

qwen2-7b and llama2-7b unable to deploy #24

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions