-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Hi, I run the scripts/metamath_llama2_7b/run_pissa.sh
During evaluation, that is python utils/gen_vllm.py --model $OUTPUT_PATH --sub_task metamath --output_file $OUTPUT_PATH/metamath_response.jsonl
when I try to load the model, it returns:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/yifengx4/PiSSA/utils/gen_vllm.py", line 23, in
[rank0]: llm = LLM(model=args.model, tensor_parallel_size=torch.cuda.device_count())
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 214, in init
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 564, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 325, in init
[rank0]: self.model_executor = executor_class(
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 26, in init
[rank0]: super().init(*args, **kwargs)
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 47, in init
[rank0]: self._init_executor()
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 111, in _init_executor
[rank0]: self._run_workers("load_model",
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 185, in _run_workers
[rank0]: driver_worker_output = driver_worker_method(*args, **kwargs)
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/worker/worker.py", line 183, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1016, in load_model
[rank0]: self.model = get_model(model_config=self.model_config,
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 19, in get_model
[rank0]: return loader.load_model(model_config=model_config,
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 403, in load_model
[rank0]: model.load_weights(self._get_all_weights(model_config, model))
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 547, in load_weights
[rank0]: weight_loader(param, loaded_weight)
[rank0]: File "/home/yifengx4/miniconda3/envs/pissa/lib/python3.10/site-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 381, in weight_loader
[rank0]: assert loaded_weight.shape[output_dim] == self.org_vocab_size
[rank0]: AssertionError
May I ask how to solve this problem?