Skip to content

Error with running qwen-3b-instruct converted #35

@nikita-yatchenko

Description

@nikita-yatchenko

Hello,

I have two questions:

  1. regarding kv-lora-rank parameter - I do not understand why for the qwen-3b default parameter value does NOT work, but for qwen-7b it works
  2. running converted models

I have been trying to run you conversion scripts, yet once the model is converted - I cannot run them (neither with transformers not with vLLM).

model_path=Qwen/Qwen2.5-3B-Instruct save_path=outputs/qwen2_5-3B-Instruct-MLA eval_batch_size=8 python transmla/converter.py \ --model-path $model_path \ --save-path $save_path \ --freqfold 4 \ --ppl-eval-batch-size $eval_batch_size \ --kv-lora-rank 192

As you can see i modified given qwen2.5-7B-Instruct.sh command to add --kv-lora-rank 192 (otherwise there is an error:

assert self.kv_lora_rank <= 2 * self.latent_dim - self.qk_mqa_dim, f"kv_lora_rank ({self.kv_lora_rank}) must be less than 2 * latent_dim ({self.latent_dim}) - qk_mqa_dim ({self.qk_mqa_dim})"

Which I also do not understand, since for qwen2.5-7B-Instruct.sh it works just fine (Q1).

When I modify kv-lora-rank for qwen-3b and set it to 192, I get the converted model, BUT I cannot run it. Here is the error that I get (Q2):
INFO 09-07 15:08:31 [loader.py:458] Loading weights took 0.93 seconds ERROR 09-07 15:08:31 [core.py:387] EngineCore hit an exception: Traceback (most recent call last): ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 378, in run_engine_core ERROR 09-07 15:08:31 [core.py:387] engine_core = EngineCoreProc(*args, **kwargs) ERROR 09-07 15:08:31 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 320, in __init__ ERROR 09-07 15:08:31 [core.py:387] super().__init__(vllm_config, executor_class, log_stats) ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 67, in __init__ ERROR 09-07 15:08:31 [core.py:387] self.model_executor = executor_class(vllm_config) ERROR 09-07 15:08:31 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__ ERROR 09-07 15:08:31 [core.py:387] self._init_executor() ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor ERROR 09-07 15:08:31 [core.py:387] self.collective_rpc("load_model") ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc ERROR 09-07 15:08:31 [core.py:387] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 09-07 15:08:31 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/utils.py", line 2378, in run_method ERROR 09-07 15:08:31 [core.py:387] return func(*args, **kwargs) ERROR 09-07 15:08:31 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^ ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 136, in load_model ERROR 09-07 15:08:31 [core.py:387] self.model_runner.load_model() ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1279, in load_model ERROR 09-07 15:08:31 [core.py:387] self.model = get_model(vllm_config=self.vllm_config) ERROR 09-07 15:08:31 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model ERROR 09-07 15:08:31 [core.py:387] return loader.load_model(vllm_config=vllm_config) ERROR 09-07 15:08:31 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 467, in load_model ERROR 09-07 15:08:31 [core.py:387] raise ValueError( ERROR 09-07 15:08:31 [core.py:387] ValueError: Following weights were not initialized from checkpoint: {'lm_head.weight'} ERROR 09-07 15:08:31 [core.py:387] CRITICAL 09-07 15:08:31 [core_client.py:359] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue. Killed

If I add a flag --deepseek-style then None of the weights match the architecture.

System parameters:
reqs:
vllm==0.8.4
transformers==4.52.4
datasets
accelerate==1.3.0
datatrove
tensorboardX

GPU:
NVIDIA H100 80GB HBM3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions