Adjust vocab size calculation of DPO model to be dynamic

The current DPO model returns a hardcoded value (-11 that will be exp(-11)) as the nearest integer value that is less than equal probability value across all logits (1/50257), reference in #138 

Chunk of code of [openvalidators/reward/dpo.py](https://github.com/opentensor/validators/blob/bd315ecbb89e40cdf09a0b59baf525a8e09e4fac/openvalidators/reward/dpo.py#L98C18-L98C18):

```python
...
# Check if completion is 
        if completion.strip() == '' or len(completion) <= 5:
            return -11 # exp(-11)=1.67e-5 < 2e-5=1/50257 (typical vocab size)
...
```

The 50257 vocab size is taken as typical vocab size but that could be different for other models / tokenizers. 
Ideally, this value would be calculated automatically like `1 / model.vocab_size` , rather than a hard-coded number

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adjust vocab size calculation of DPO model to be dynamic #141

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adjust vocab size calculation of DPO model to be dynamic #141

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions