Skip to content

Adjust vocab size calculation of DPO model to be dynamic #141

@p-ferreira

Description

@p-ferreira

The current DPO model returns a hardcoded value (-11 that will be exp(-11)) as the nearest integer value that is less than equal probability value across all logits (1/50257), reference in #138

Chunk of code of openvalidators/reward/dpo.py:

...
# Check if completion is 
        if completion.strip() == '' or len(completion) <= 5:
            return -11 # exp(-11)=1.67e-5 < 2e-5=1/50257 (typical vocab size)
...

The 50257 vocab size is taken as typical vocab size but that could be different for other models / tokenizers.
Ideally, this value would be calculated automatically like 1 / model.vocab_size , rather than a hard-coded number

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions