Simple end-to-end RLHF (Reinforcement Learning from Human Feedback) for diffusion models (DDPO) on personal hardware.
-
Updated
Feb 26, 2025 - Python
Simple end-to-end RLHF (Reinforcement Learning from Human Feedback) for diffusion models (DDPO) on personal hardware.
Add a description, image, and links to the ddpo topic page so that developers can more easily learn about it.
To associate your repository with the ddpo topic, visit your repo's landing page and select "manage topics."