Negative score Unbabel/XCOMET-XL

## 🐛 Bug

Hi!

I am using Unbabel/XCOMET-XL for reference-based MT evaluation and observed a negative score. Is this expected?

I am aware from [Issue #38](https://github.com/Unbabel/COMET/issues/38) that older COMET models were trained on unbounded z-scores. However, the [XCOMET paper](https://aclanthology.org/2024.tacl-1.54) explicitly states: "we employ min-max scaling on our DA corpus to set its range of scores to [0, 1]".

If the model architecture does not include a final sigmoid layer or hard clipping, I assume it is possible for limit cases to output values slightly below 0 or above 1. Is this the case? If so, is clipping a good method to force the values to be in the range [0, 1]?

### To Reproduce 

```python
from comet import download_model, load_from_checkpoint

model_name = "Unbabel/XCOMET-XL"
model_path = download_model(model_name)
model = load_from_checkpoint(model_path)

src = ["There were protests worldwide, several criminal prosecutions, and the leaders of the governments of Iceland and Pakistan both resigned."]
mt = ["Було проведено протести по всьому світу, кілька кримінальних переслідувань, а лідери урядів Ісландії та Пакистану both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both both"]
ref = ["У світі відбулися протести, кілька кримінальних переслідувань, а обидва лідери урядів Ісландії та Пакистану пішли у відставку."]

# inference
data = [{"src": s, "mt": t, "ref": r} for s, t, r in zip(src, mt, ref)]
scores = model.predict(data, batch_size=1, gpus=0, accelerator="cpu")

print(scores) # "scores": -0.0025453418493270874

assert scores["scores"][0] < 0.0, scores
```

### Expected behaviour
I expected to obtain a value for the score in the range [0, 1].

### Environment
OS: Linux (Ubuntu 24.04.3 LTS)
Packaging: comet installed through pip inside a conda environment
Version: comet 2.2.7 (pypi), python 3.11.9 (conda)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Negative score Unbabel/XCOMET-XL #260

🐛 Bug

To Reproduce

Expected behaviour

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Negative score Unbabel/XCOMET-XL #260

Description

🐛 Bug

To Reproduce

Expected behaviour

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions