-
Notifications
You must be signed in to change notification settings - Fork 99
Open
Labels
bugSomething isn't workingSomething isn't working
Description
🐛 Bug
Using torch.set_default_device to set anything else than "cpu" leads to a ValueError when calling model.predict.
To Reproduce
I'm using the code sample provided on the huggingface page of the model wmt22-cometkiwi-da:
import torch
from comet import download_model, load_from_checkpoint
torch.set_default_device("cuda")
model_path = download_model("Unbabel/wmt22-cometkiwi-da")
model = load_from_checkpoint(model_path)
data = [
{
"src": "The output signal provides constant sync so the display never glitches.",
"mt": "Das Ausgangssignal bietet eine konstante Synchronisation, so dass die Anzeige nie stört."
},
{
"src": "Kroužek ilustrace je určen všem milovníkům umění ve věku od 10 do 15 let.",
"mt": "Кільце ілюстрації призначене для всіх любителів мистецтва у віці від 10 до 15 років."
},
{
"src": "Mandela then became South Africa's first black president after his African National Congress party won the 1994 election.",
"mt": "その後、1994年の選挙でアフリカ国民会議派が勝利し、南アフリカ初の黒人大統領となった。"
}
]
model_output = model.predict(data, batch_size=8, gpus=1)
print (model_output)The output is:
Lightning automatically upgraded your loaded checkpoint from v1.8.2 to v2.5.1.post0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../../../../scratch1/robin/mlcache/huggingface/hub/models--Unbabel--wmt22-cometkiwi-da/snapshots/1ad785194e391eebc6c53e2d0776cada8f83179a/checkpoints/model.ckpt`
Encoder model frozen.
/home/test/miniconda3/envs/mware/lib/python3.13/site-packages/pytorch_lightning/core/saving.py:195: Found keys that are not in the model state dict but in the checkpoint: ['encoder.model.embeddings.position_ids']
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA RTX A5000') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
Predicting: 0it [00:00, ?it/s]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[1], line 22
7 model = load_from_checkpoint(model_path)
8 data = [
9 {
10 "src": "The output signal provides constant sync so the display never glitches.",
(...) 20 }
21 ]
---> 22 model_output = model.predict(data, batch_size=8, gpus=1)
23 print (model_output)
File ~/miniconda3/envs/mware/lib/python3.13/site-packages/comet/models/base.py:655, in CometModel.predict(self, samples, batch_size, gpus, devices, mc_dropout, progress_bar, accelerator, num_workers, length_batching)
646 trainer = ptl.Trainer(
647 devices=devices,
648 logger=False,
(...) 652 enable_progress_bar=enable_progress_bar,
653 )
654 return_predictions = False if gpus > 1 else True
--> 655 predictions = trainer.predict(
656 self, dataloaders=dataloader, return_predictions=return_predictions
657 )
658 if gpus > 1:
659 torch.distributed.barrier() # Waits for all processes to finish predict
...
raise ValueError(
...<4 lines>...
) from e
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`input_ids` in this case) have excessive nesting (inputs type `list` where type `int` is expected).
Expected behaviour
I would expect no error given that the model is using cuda anyway.
Environment
OS: Ubuntu 22.04
Packaging: conda environment, packages installed with pip
Version: unbabel-comet 2.2.6 ; torch 2.7.0
Additional context
I use torch.set_default_device("cuda") to ensure that other models are always loaded on GPU.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working