Skip to content

qwen2-7b and llama2-7b unable to deploy #24

@mingyangHao

Description

@mingyangHao

Dear authors,

I tried your script in your readme without changing one line to convert llama2-7b-hf. The conversion process itself run smoothly without error, however the converted model did not work properly.

I faced following errors:

  • If convert without --deepseek-style, vllm or sglang online serving will not recognize LlamaMLAforCasualLM.
  • If convert with --deepseek-style, vllm or sglang online serving start successfully, but responding requests with nonsense characters.
  • Vllm offline engine will hang, sglang offline engine will say out of memory (I tried 8*A100, tp8, still OOM)

I tried delopy with sglang and vllm online serving and the server output random characters.

My env: cuda 12.5 on A100 device, torch 2.4 py310, vllm 0.8.2, sglang 0.4.6

Conversion command:
python transmla/converter.py
--model-path Llama-2-7b-hf
--save-path ./outputs/llama2-7b-deepseek
--dtype bf16
--device auto
--cal-dataset alpaca
--cal-nsamples 128
--cal-max-seqlen 256
--cal-batch-size 8
--ppl-eval-batch-size 4
--freqfold auto
--collapse auto
--qk-mqa-dim 64
--q-lora-rank 512
--kv-lora-rank 512
--deepseek-style
also tried:
bash scripts/convert/qwen2.5-7B-Instruct.sh

conversion log:

Image

launch server command:

vllm serve --model-path Llama-2-7b-hf-deepseek
python -m sglang.launch_server --model-path Llama-2-7b-hf-deepseek
request:
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions