-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Training was performed under agentlightning 0.3.0 + verl 0.6.1 (model: Qwen/Qwen2.5-VL-3B-Instruct, 8×A100 40G), and a crash occurred during training when constructing mRoPE position_ids:
Crash point: verl/models/transformers/qwen2_vl.py:get_rope_index
Error: position_ids[..., attention_mask == 1] = llm_positions ... shape mismatch occurred
The length of llm_positions (19963) does not match the number of tokens in attention_mask==1 (16467).
Using python-openai to build the Agent
Environment
GPU: 8×A100 (40G)
Model: Qwen/Qwen2.5-VL-3B-Instruct
Key dependency versions (excerpt):
agentlightning==0.3.0
verl==0.6.1
ray==2.53.0
torch==2.8.0
transformers==4.57.3
vllm==0.10.2
qwen-vl-utils==0.0.14
flash_attn==2.8.3
xformers==0.0.32.post1
Additionally, I observed a logging-related issue: the multimodal model directly prints the input images in the logs in the format data:image/...;base64,... (for example, the Triplet(prompt=..., raw_content=...) entry contains the complete base64 string). This causes the log files to grow rapidly in size (becoming very large and difficult to save, transfer, or view).
Package Version
---------------------------------------- -------------
abnf 2.2.0
absl-py 2.3.1
accelerate 1.12.0
agentlightning 0.3.0
agentops 0.4.21
aiohappyeyeballs 2.6.1
aiohttp 3.13.2
aiohttp-cors 0.8.1
aiologic 0.16.0
aiosignal 1.4.0
annotated-doc 0.0.4
annotated-types 0.7.0
antlr4-python3-runtime 4.9.3
anyio 4.12.0
apache-tvm-ffi 0.1.6
APScheduler 3.11.2
astor 0.8.1
attrs 25.4.0
av 16.0.1
azure-core 1.37.0
azure-identity 1.25.1
azure-storage-blob 12.27.1
backoff 2.2.1
blake3 1.0.8
blessed 1.25.0
blinker 1.9.0
boto3 1.36.0
botocore 1.36.26
cachetools 6.2.4
cbor2 5.7.1
certifi 2025.11.12
cffi 2.0.0
chardet 5.2.0
charset-normalizer 3.4.4
cint 1.0.0
click 8.3.1
cloudpickle 3.1.2
codetiming 1.4.0
colorful 0.5.8
compressed-tensors 0.11.0
croniter 6.0.0
cryptography 46.0.3
cuda-bindings 13.1.1
cuda-pathfinder 1.3.3
cuda-python 13.1.1
cupy-cuda12x 13.6.0
datasets 4.4.2
debugpy 1.8.19
depyf 0.19.0
dill 0.3.7
diskcache 5.6.3
distlib 0.4.0
distro 1.9.0
dnspython 2.8.0
einops 0.8.1
email-validator 2.3.0
fastapi 0.127.0
fastapi-cli 0.0.20
fastapi-cloud-cli 0.8.0
fastapi-sso 0.16.0
fastar 0.8.0
fastrlock 0.8.3
fastuuid 0.14.0
fickling 0.1.6
filelock 3.20.1
flash_attn 2.8.3
flashinfer-python 0.5.3
Flask 3.1.2
frozendict 2.4.7
frozenlist 1.8.0
fsspec 2025.10.0
gguf 0.17.1
gitdb 4.0.12
GitPython 3.1.45
google-api-core 2.28.1
google-auth 2.45.0
googleapis-common-protos 1.72.0
gpustat 1.1.1
gql 4.0.0
graphql-core 3.2.7
graphviz 0.21
grpcio 1.67.1
gunicorn 23.0.0
h11 0.16.0
hf-xet 1.2.0
httpcore 1.0.9
httptools 0.7.1
httpx 0.28.1
httpx-sse 0.4.3
huggingface-hub 0.36.0
hydra-core 1.3.2
idna 3.11
importlib_metadata 8.7.1
interegular 0.3.3
intervaltree 3.2.1
isodate 0.7.2
itsdangerous 2.2.0
Jinja2 3.1.6
jiter 0.12.0
jmespath 1.0.1
jsonschema 4.25.1
jsonschema-specifications 2025.9.1
kaitaistruct 0.11
lark 1.2.2
litellm 1.80.11
litellm-enterprise 0.1.27
litellm-proxy-extras 0.4.16
llguidance 0.7.30
llvmlite 0.44.0
lm-format-enforcer 0.11.3
Markdown 3.10
markdown-it-py 4.0.0
MarkupSafe 3.0.3
mcp 1.25.0
mdurl 0.1.2
mistral_common 1.8.8
mpmath 1.3.0
msal 1.34.0
msal-extensions 1.3.1
msgpack 1.1.2
msgspec 0.20.0
multidict 6.7.0
multiprocess 0.70.15
networkx 3.6.1
ninja 1.13.0
nodejs-wheel 24.12.0
nodejs-wheel-binaries 24.12.0
numba 0.61.2
numpy 1.26.4
nvidia-cublas-cu12 12.8.4.1
nvidia-cuda-cupti-cu12 12.8.90
nvidia-cuda-nvrtc-cu12 12.8.93
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12 9.10.2.21
nvidia-cudnn-frontend 1.17.0
nvidia-cufft-cu12 11.3.3.83
nvidia-cufile-cu12 1.13.1.3
nvidia-curand-cu12 10.3.9.90
nvidia-cusolver-cu12 11.7.3.90
nvidia-cusparse-cu12 12.5.8.93
nvidia-cusparselt-cu12 0.7.1
nvidia-cutlass-dsl 4.3.4
nvidia-ml-py 13.580.82
nvidia-nccl-cu12 2.27.3
nvidia-nvjitlink-cu12 12.8.93
nvidia-nvtx-cu12 12.8.90
nvitop 1.6.1
oauthlib 3.3.1
omegaconf 2.3.0
openai 2.14.0
openai-harmony 0.0.8
opencensus 0.11.4
opencensus-context 0.1.3
opencv-python-headless 4.11.0.86
opentelemetry-api 1.39.1
opentelemetry-exporter-otlp 1.39.1
opentelemetry-exporter-otlp-proto-common 1.39.1
opentelemetry-exporter-otlp-proto-grpc 1.39.1
opentelemetry-exporter-otlp-proto-http 1.39.1
opentelemetry-exporter-prometheus 0.60b1
opentelemetry-instrumentation 0.60b1
opentelemetry-proto 1.39.1
opentelemetry-sdk 1.39.1
opentelemetry-semantic-conventions 0.60b1
ordered-set 4.1.0
orjson 3.11.5
outlines_core 0.2.11
packaging 24.2
pandas 2.3.3
partial-json-parser 0.2.1.1.post7
pdfminer.six 20250506
peft 0.18.0
pillow 12.0.0
pip 25.3
platformdirs 4.5.1
polars 1.36.1
polars-runtime-32 1.36.1
polyfile-weave 0.5.7
poml 0.0.8
portpicker 1.6.0
prometheus_client 0.23.1
prometheus-fastapi-instrumentator 7.1.0
propcache 0.4.1
proto-plus 1.27.0
protobuf 6.33.2
psutil 7.0.0
py-cpuinfo 9.0.0
py-spy 0.4.1
pyarrow 22.0.0
pyasn1 0.6.1
pyasn1_modules 0.4.2
pybase64 1.4.3
pybind11 3.0.1
pycountry 24.6.1
pycparser 2.23
pydantic 2.12.5
pydantic_core 2.41.5
pydantic-extra-types 2.10.6
pydantic-settings 2.12.0
Pygments 2.19.2
PyJWT 2.10.1
pylatexenc 2.10
pymongo 4.15.5
PyNaCl 1.6.1
python-dateutil 2.9.0.post0
python-dotenv 1.2.1
python-json-logger 4.0.0
python-multipart 0.0.18
pytz 2025.2
PyYAML 6.0.3
pyzmq 27.1.0
qwen-vl-utils 0.0.14
ray 2.53.0
redis 7.1.0
referencing 0.37.0
regex 2025.11.3
requests 2.32.5
rich 13.7.1
rich-toolkit 0.17.1
rignore 0.7.6
rpds-py 0.30.0
rq 2.6.1
rsa 4.9.1
s3transfer 0.11.3
safetensors 0.7.0
scipy 1.16.3
sentencepiece 0.2.1
sentry-sdk 2.48.0
setproctitle 1.3.7
setuptools 77.0.3
shellingham 1.5.4
six 1.17.0
smart_open 7.5.0
smmap 5.0.2
sniffio 1.3.1
sortedcontainers 2.4.0
soundfile 0.12.1
soxr 1.0.0
sse-starlette 3.0.4
starlette 0.50.0
stdlib-list 0.11.1
sympy 1.14.0
tabulate 0.9.0
tenacity 9.1.2
tensorboard 2.20.0
tensorboard-data-server 0.7.2
tensordict 0.8.3
termcolor 2.4.0
tiktoken 0.12.0
tokenizers 0.22.1
torch 2.8.0
torch_c_dlpack_ext 0.1.4
torchaudio 2.8.0
torchdata 0.11.0
torchvision 0.23.0
tqdm 4.67.1
transformers 4.57.3
triton 3.4.0
typer 0.20.1
typing_extensions 4.15.0
typing-inspection 0.4.2
tzdata 2025.3
tzlocal 5.3.1
urllib3 2.6.2
uv 0.9.18
uvicorn 0.31.1
uvicorn-worker 0.3.0
uvloop 0.21.0
verl 0.6.1
virtualenv 20.35.4
vllm 0.10.2
wandb 0.23.1
watchfiles 1.1.1
wcwidth 0.2.14
websockets 15.0.1
Werkzeug 3.1.4
wheel 0.45.1
wrapt 1.17.3
xformers 0.0.32.post1
xgrammar 0.1.23
xxhash 3.6.0
yarl 1.22.0
zipp 3.23.012/31/25 14:13:05] ERROR 2025-12-31 14:13:05,228 ERROR trainer.py:532 -- Algorithm bundle encountered an error. trainer.py:532
Traceback (most recent call last):
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/trainer/trainer.py", line 527, in
_algorithm_bundle
algorithm.run(
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/algorithm/verl/interface.py", line 184, in run
run_ppo(
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/entrypoint.py", line 78, in run_ppo
ray.get(
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/worker.py", line 2967, in get
values, debugger_breakpoint = worker.get_objects(
^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/worker.py", line 1015, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): �[36mray::TaskRunner.run()�[39m (pid=1266953, ip=172.21.0.101, actor_id=9acea5484d97834288521f9f01000000,
repr=<agentlightning.verl.entrypoint.TaskRunner object at 0x7f4846d8d130>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/entrypoint.py", line 244, in run
trainer.fit()
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/trainer.py", line 507, in fit
metrics = self._train_step(batch_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/trainer.py", line 257, in _train_step
batch, agent_metrics = self.agent_mode_daemon.get_train_data_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/daemon.py", line 1044, in
get_train_data_batch
pos_ids = self._compute_mrope_position_ids(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/daemon.py", line 353, in
_compute_mrope_position_ids
vision_pos = get_rope_index(
^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/verl/models/transformers/qwen2_vl.py", line 152, in
get_rope_index
position_ids[..., attention_mask == 1] = llm_positions.to(position_ids.device)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape mismatch: value tensor of shape [3, 19963] cannot be broadcast to indexing result of shape [3, 16467]
ERROR 2025-12-31 14:13:05,243 ERROR client_server.py:155 -- Algorithm bundle crashed; signaling stop event client_server.py:155
Traceback (most recent call last):
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/execution/client_server.py", line 144, in
_execute_algorithm
await algorithm(wrapper_store, stop_evt)
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/trainer/trainer.py", line 527, in
_algorithm_bundle
algorithm.run(
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/algorithm/verl/interface.py", line 184, in
run
run_ppo(
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/entrypoint.py", line 78, in run_ppo
ray.get(
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in
auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/worker.py", line 2967, in get
values, debugger_breakpoint = worker.get_objects(
^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/worker.py", line 1015, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): �[36mray::TaskRunner.run()�[39m (pid=1266953, ip=172.21.0.101, actor_id=9acea5484d97834288521f9f01000000,
repr=<agentlightning.verl.entrypoint.TaskRunner object at 0x7f4846d8d130>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/entrypoint.py", line 244, in run
trainer.fit()
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/trainer.py", line 507, in fit
metrics = self._train_step(batch_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/trainer.py", line 257, in _train_step
batch, agent_metrics = self.agent_mode_daemon.get_train_data_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/daemon.py", line 1044, in
get_train_data_batch
pos_ids = self._compute_mrope_position_ids(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/daemon.py", line 353, in
_compute_mrope_position_ids
vision_pos = get_rope_index(
^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/verl/models/transformers/qwen2_vl.py", line 152, in
get_rope_index
position_ids[..., attention_mask == 1] = llm_positions.to(position_ids.device)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape mismatch: value tensor of shape [3, 19963] cannot be broadcast to indexing result of shape [3, 16467]
ERROR 2025-12-31 14:13:05,251 ERROR client_server.py:425 -- Unhandled exception in execute method client_server.py:425
Traceback (most recent call last):
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/execution/client_server.py", line 378, in
execute
asyncio.run(self._execute_algorithm(algorithm, store, stop_evt))
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/execution/client_server.py", line 144, in
_execute_algorithm
await algorithm(wrapper_store, stop_evt)
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/trainer/trainer.py", line 527, in
_algorithm_bundle
algorithm.run(
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/algorithm/verl/interface.py", line 184, in
run
run_ppo(
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/entrypoint.py", line 78, in run_ppo
ray.get(
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in
auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/worker.py", line 2967, in get
values, debugger_breakpoint = worker.get_objects(
^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/worker.py", line 1015, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): �[36mray::TaskRunner.run()�[39m (pid=1266953, ip=172.21.0.101, actor_id=9acea5484d97834288521f9f01000000,
repr=<agentlightning.verl.entrypoint.TaskRunner object at 0x7f4846d8d130>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/entrypoint.py", line 244, in run
trainer.fit()
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/trainer.py", line 507, in fit
metrics = self._train_step(batch_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/trainer.py", line 257, in _train_step
batch, agent_metrics = self.agent_mode_daemon.get_train_data_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/daemon.py", line 1044, in
get_train_data_batch
pos_ids = self._compute_mrope_position_ids(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/daemon.py", line 353, in
_compute_mrope_position_ids
vision_pos = get_rope_index(
^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/verl/models/transformers/qwen2_vl.py", line 152, in
get_rope_index
position_ids[..., attention_mask == 1] = llm_positions.to(position_ids.device)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape mismatch: value tensor of shape [3, 19963] cannot be broadcast to indexing result of shape [3, 16467]
Traceback (most recent call last):
File "/mnt/inaisfs/data/home/tansy_criait/jmf/wgw/G_graduate/agl_rl/train.py", line 159, in <module>
main()
File "/mnt/inaisfs/data/home/tansy_criait/jmf/wgw/G_graduate/agl_rl/train.py", line 149, in main
train(mode=args.mode,
File "/mnt/inaisfs/data/home/tansy_criait/jmf/wgw/G_graduate/agl_rl/train.py", line 51, in train
trainer.fit(agent,
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/trainer/trainer.py", line 438, in fit
self.strategy.execute(algorithm_bundle, runner_bundle, self.store)
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/execution/client_server.py", line 378, in execute
asyncio.run(self._execute_algorithm(algorithm, store, stop_evt))
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/execution/client_server.py", line 144, in _execute_algorithm
await algorithm(wrapper_store, stop_evt)
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/trainer/trainer.py", line 527, in _algorithm_bundle
algorithm.run(
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/algorithm/verl/interface.py", line 184, in run
run_ppo(
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/entrypoint.py", line 78, in run_ppo
ray.get(
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/worker.py", line 2967, in get
values, debugger_breakpoint = worker.get_objects(
^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/ray/_private/worker.py", line 1015, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): �[36mray::TaskRunner.run()�[39m (pid=1266953, ip=172.21.0.101, actor_id=9acea5484d97834288521f9f01000000, repr=<agentlightning.verl.entrypoint.TaskRunner object at 0x7f4846d8d130>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/entrypoint.py", line 244, in run
trainer.fit()
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/trainer.py", line 507, in fit
metrics = self._train_step(batch_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/trainer.py", line 257, in _train_step
batch, agent_metrics = self.agent_mode_daemon.get_train_data_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/daemon.py", line 1044, in get_train_data_batch
pos_ids = self._compute_mrope_position_ids(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/agentlightning/verl/daemon.py", line 353, in _compute_mrope_position_ids
vision_pos = get_rope_index(
^^^^^^^^^^^^^^^
File "/mnt/inaisfs/data/home/tansy_criait/miniconda3/envs/agl/lib/python3.12/site-packages/verl/models/transformers/qwen2_vl.py", line 152, in get_rope_index
position_ids[..., attention_mask == 1] = llm_positions.to(position_ids.device)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape mismatch: value tensor of shape [3, 19963] cannot be broadcast to indexing result of shape [3, 16467]TaskRunner pid=3767566)[0m Warning: Rollout ro-6bd78257d375 contains empty response: [Triplet(prompt={'token_ids': [151644, 8948,..., 151644, 77091, 198], 'raw_content': [{'role': 'system', 'content': '### Character Introduction ...</answer>'}, {'role': 'user', 'content': 'What percentage ...?'}], 'image_urls': []}, response={'token_ids': [13708, ..., 151645], 'raw_content': [{'finish_reason': 'stop', 'role': 'assistant', 'content': "<think>I ... today in 2013</search>"}]}, reward=None, metadata={'request': {'type': 'chat', 'model': '/mnt/inaisfs/data/home/tansy_criait/jmf/wgw/ckpt/Qwen2.5-VL-3B-Instruct', 'max_tokens': 4096, 'temperature': 1.0, 'streaming': False}, 'response': {'id': 'chatcmpl-f369cb728b614b24bc78ad30eb468a82', 'model': 'hosted_vllm//mnt/inaisfs/data/home/tansy_criait/jmf/wgw/ckpt/Qwen2.5-VL-3B-Instruct'}, 'response_id': 'chatcmpl-f369cb728b614b24bc78ad30eb468a82', 'agent_name': '*'}), Triplet(prompt={'token_ids': [151644, ...,151655, 151655, 151653, 151645, 198, 151644, 77091, 198], 'raw_content': [{'role': 'system', 'content': '### Character Introduction ...</answer>'}, {'role': 'user', 'content': 'What ...?'}, {'role': 'user', 'content': '[{"type": "text", "text": "This image <image> ...</search>"}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4 .... gD//2Q"}}]'}], 'image_urls': ['data:image/jpeg;base64,/9j/4AAQSkZJRgA/MPsPyf/4qgD//2Q']}, response={'token_ids': [13708,..., 20, 4, 3918, 9217, 29, 151645], 'raw_content': [{'finish_reason': 'stop', 'role': 'assistant', 'content': '<think>The ... .</answer>'}]}, reward=None, metadata={'request': {'type': 'chat', 'model': '/mnt/inaisfs/data/home/tansy_criait/jmf/wgw/ckpt/Qwen2.5-VL-3B-Instruct', 'max_tokens': 4096, 'temperature': 1.0, 'streaming': False}, 'response': {'id': 'chatcmpl-3385f1a4806745fca6a62777b0e02901', 'model': 'hosted_vllm//mnt/inaisfs/data/home/tansy_criait/jmf/wgw/ckpt/Qwen2.5-VL-3B-Instruct'}, 'response_id': 'chatcmpl-3385f1a4806745fca6a62777b0e02901', 'agent_name': '*'}), Triplet(prompt={'token_ids': [], 'raw_content': [{'role': 'system', 'content': "You ... 5%."}], 'image_urls': []}, response={'token_ids': [], 'raw_content': [{'finish_reason': 'stop', 'role': 'assistant', 'content': '<reason> ... 0</answer>'}]}, reward=1.4, metadata={'request': {'type': 'chat', 'model': 'judge', 'max_tokens': 2048, 'temperature': 0, 'streaming': False}, 'response': {'id': 'chatcmpl-ba0f98b02d58d335', 'model': 'judge'}, 'response_id': 'chatcmpl-ba0f98b02d58d335', 'agent_name': '*'})]