-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Hello,
I have two questions:
- regarding kv-lora-rank parameter - I do not understand why for the qwen-3b default parameter value does NOT work, but for qwen-7b it works
- running converted models
I have been trying to run you conversion scripts, yet once the model is converted - I cannot run them (neither with transformers not with vLLM).
model_path=Qwen/Qwen2.5-3B-Instruct save_path=outputs/qwen2_5-3B-Instruct-MLA eval_batch_size=8 python transmla/converter.py \ --model-path $model_path \ --save-path $save_path \ --freqfold 4 \ --ppl-eval-batch-size $eval_batch_size \ --kv-lora-rank 192
As you can see i modified given qwen2.5-7B-Instruct.sh command to add --kv-lora-rank 192 (otherwise there is an error:
assert self.kv_lora_rank <= 2 * self.latent_dim - self.qk_mqa_dim, f"kv_lora_rank ({self.kv_lora_rank}) must be less than 2 * latent_dim ({self.latent_dim}) - qk_mqa_dim ({self.qk_mqa_dim})"
Which I also do not understand, since for qwen2.5-7B-Instruct.sh it works just fine (Q1).
When I modify kv-lora-rank for qwen-3b and set it to 192, I get the converted model, BUT I cannot run it. Here is the error that I get (Q2):
INFO 09-07 15:08:31 [loader.py:458] Loading weights took 0.93 seconds ERROR 09-07 15:08:31 [core.py:387] EngineCore hit an exception: Traceback (most recent call last): ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 378, in run_engine_core ERROR 09-07 15:08:31 [core.py:387] engine_core = EngineCoreProc(*args, **kwargs) ERROR 09-07 15:08:31 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 320, in __init__ ERROR 09-07 15:08:31 [core.py:387] super().__init__(vllm_config, executor_class, log_stats) ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 67, in __init__ ERROR 09-07 15:08:31 [core.py:387] self.model_executor = executor_class(vllm_config) ERROR 09-07 15:08:31 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__ ERROR 09-07 15:08:31 [core.py:387] self._init_executor() ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor ERROR 09-07 15:08:31 [core.py:387] self.collective_rpc("load_model") ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc ERROR 09-07 15:08:31 [core.py:387] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 09-07 15:08:31 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/utils.py", line 2378, in run_method ERROR 09-07 15:08:31 [core.py:387] return func(*args, **kwargs) ERROR 09-07 15:08:31 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^ ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 136, in load_model ERROR 09-07 15:08:31 [core.py:387] self.model_runner.load_model() ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1279, in load_model ERROR 09-07 15:08:31 [core.py:387] self.model = get_model(vllm_config=self.vllm_config) ERROR 09-07 15:08:31 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model ERROR 09-07 15:08:31 [core.py:387] return loader.load_model(vllm_config=vllm_config) ERROR 09-07 15:08:31 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 09-07 15:08:31 [core.py:387] File "/opt/python3.12/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 467, in load_model ERROR 09-07 15:08:31 [core.py:387] raise ValueError( ERROR 09-07 15:08:31 [core.py:387] ValueError: Following weights were not initialized from checkpoint: {'lm_head.weight'} ERROR 09-07 15:08:31 [core.py:387] CRITICAL 09-07 15:08:31 [core_client.py:359] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue. Killed
If I add a flag --deepseek-style then None of the weights match the architecture.
System parameters:
reqs:
vllm==0.8.4
transformers==4.52.4
datasets
accelerate==1.3.0
datatrove
tensorboardX
GPU:
NVIDIA H100 80GB HBM3