Skip to content

torch.set_default_device("cuda") error #249

@r-carpentier

Description

@r-carpentier

🐛 Bug

Using torch.set_default_device to set anything else than "cpu" leads to a ValueError when calling model.predict.

To Reproduce

I'm using the code sample provided on the huggingface page of the model wmt22-cometkiwi-da:

import torch
from comet import download_model, load_from_checkpoint

torch.set_default_device("cuda")

model_path = download_model("Unbabel/wmt22-cometkiwi-da")
model = load_from_checkpoint(model_path)
data = [
    {
        "src": "The output signal provides constant sync so the display never glitches.",
        "mt": "Das Ausgangssignal bietet eine konstante Synchronisation, so dass die Anzeige nie stört."
    },
    {
        "src": "Kroužek ilustrace je určen všem milovníkům umění ve věku od 10 do 15 let.",
        "mt": "Кільце ілюстрації призначене для всіх любителів мистецтва у віці від 10 до 15 років."
    },
    {
        "src": "Mandela then became South Africa's first black president after his African National Congress party won the 1994 election.",
        "mt": "その後、1994年の選挙でアフリカ国民会議派が勝利し、南アフリカ初の黒人大統領となった。"
    }
]
model_output = model.predict(data, batch_size=8, gpus=1)
print (model_output)

The output is:

Lightning automatically upgraded your loaded checkpoint from v1.8.2 to v2.5.1.post0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../../../../scratch1/robin/mlcache/huggingface/hub/models--Unbabel--wmt22-cometkiwi-da/snapshots/1ad785194e391eebc6c53e2d0776cada8f83179a/checkpoints/model.ckpt`
Encoder model frozen.
/home/test/miniconda3/envs/mware/lib/python3.13/site-packages/pytorch_lightning/core/saving.py:195: Found keys that are not in the model state dict but in the checkpoint: ['encoder.model.embeddings.position_ids']
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA RTX A5000') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
Predicting: 0it [00:00, ?it/s]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 22
      7 model = load_from_checkpoint(model_path)
      8 data = [
      9     {
     10         "src": "The output signal provides constant sync so the display never glitches.",
   (...)     20     }
     21 ]
---> 22 model_output = model.predict(data, batch_size=8, gpus=1)
     23 print (model_output)

File ~/miniconda3/envs/mware/lib/python3.13/site-packages/comet/models/base.py:655, in CometModel.predict(self, samples, batch_size, gpus, devices, mc_dropout, progress_bar, accelerator, num_workers, length_batching)
    646 trainer = ptl.Trainer(
    647     devices=devices,
    648     logger=False,
   (...)    652     enable_progress_bar=enable_progress_bar,
    653 )
    654 return_predictions = False if gpus > 1 else True
--> 655 predictions = trainer.predict(
    656     self, dataloaders=dataloader, return_predictions=return_predictions
    657 )
    658 if gpus > 1:
    659     torch.distributed.barrier()  # Waits for all processes to finish predict
...
    raise ValueError(
    ...<4 lines>...
    ) from e
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`input_ids` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

Expected behaviour

I would expect no error given that the model is using cuda anyway.

Environment

OS: Ubuntu 22.04
Packaging: conda environment, packages installed with pip
Version: unbabel-comet 2.2.6 ; torch 2.7.0

Additional context

I use torch.set_default_device("cuda") to ensure that other models are always loaded on GPU.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions