This repository hosts the official implementation of:
Hyungjin Kim, Seokho Ahn, and Young-Duk Seo, Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models, [iccv] [supp] [arXiv]
- [2025.11.04]: ICCV paper and supplementary materials released
- [2025.08.05]: Pre-trained weights available on 🤗 HuggingFace
- [2025.08.05]: Repository created
DrUM enables personalized text-to-image (T2I) generation by integrating reference prompts into T2I diffusion models. It works with foundation T2I models such as Stable Diffusion v1/v2/XL/v3 and FLUX, without requiring additional fine-tuning. DrUM leverages condition-level modeling in the latent space using a transformer-based adapter, and integrates seamlessly with open-source text encoders such as OpenCLIP and Google T5.
This model is designed for easy use with the diffusers library as a custom pipeline.
pip install torch torchvision diffusers transformers accelerate safetensors huggingface-hubPre-trained adapter weights are available at 🤗 HuggingFace.
import torch
from drum import DrUM
from diffusers import DiffusionPipeline
# Load pipeline and attach DrUM
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16).to("cuda")
drum = DrUM(pipeline)
# Generate personalized images
images = drum(
prompt = "a photograph of an astronaut riding a horse",
ref = ["A retro-futuristic space exploration movie poster with bold, vibrant colors"],
weight = [1.0],
alpha = 0.3
)
images[0].save("personalized_image.png")For interactive usage: see inference.ipynb
For command line usage: see inference.py
| Parameter | Description | Value |
|---|---|---|
| prompt | Target prompt | String |
| ref | Reference prompts | List of strings |
| alpha | Personalization degree | Float (0-1) |
| weight | Reference weights | List of floats |
| sampling | Reference coreset sampling | Boolean |
DrUM works with a wide variety of foundation T2I models that uses text encoders with same weights:
| Architecture | Pipeline | Text encoder | DrUM weight |
|---|---|---|---|
| Stable Diffusion v1 | runwayml/stable-diffusion-v1-5, prompthero/openjourney-v4,stablediffusionapi/realistic-vision-v51,stablediffusionapi/deliberate-v2,stablediffusionapi/anything-v5, WarriorMama777/AbyssOrangeMix2, ... |
openai/clip-vit-large-patch14 |
L.safetensors |
| Stable Diffusion v2 | stabilityai/stable-diffusion-2-1, ... |
openai/clip-vit-huge-patch14 |
H.safetensors |
| Stable Diffusion XL | stabilityai/stable-diffusion-xl-base-1.0, ... |
openai/clip-vit-large-patch14,laion/CLIP-ViT-bigG-14-laion2B-39B-b160k |
L.safetensors,bigG.safetensors |
| Stable Diffusion v3 | stabilityai/stable-diffusion-3.5-largestabilityai/stable-diffusion-3.5-medium, ... |
openai/clip-vit-large-patch14,laion/CLIP-ViT-bigG-14-laion2B-39B-b160k,google/t5-v1_1-xxl |
L.safetensors,bigG.safetensors,T5.safetensors |
| FLUX | black-forest-labs/FLUX.1-dev, ... |
openai/clip-vit-large-patch14,google/t5-v1_1-xxl |
L.safetensorsT5.safetensors |
To train your own DrUM: see train.py
@InProceedings{kim2025drum,
author = {Kim, Hyungjin and Ahn, Seokho and Seo, Young-Duk},
title = {Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {17171-17180}
}
This project is licensed under the MIT License.





