feat: Longcat-Image / Longcat-Image-Edit support #1053

stduhpf · 2025-12-05T20:04:19Z

sd.exe --diffusion-model ..\ComfyUI\models\unet\LongCat-Image-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.0 --sampling-method euler -v --clip-on-cpu -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: \"THE CITY IS A CIRCUIT BOARD, AND I AM A LONG CAT.\" -- moody, atmospheric, profound, dark academic" --preview proj --steps 20 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --diffusion-fa --color -W 1024 -H 1024

Test models (converted to bfl format) can be found there:

Inference for models in diffusers format seem to be still broken

wbruna · 2025-12-05T20:32:10Z

That does look a bit like a circuit board...

stduhpf · 2025-12-06T02:12:01Z

TODO for when image generation works

stduhpf · 2025-12-06T15:06:41Z

I can't figure out what I'm doing wrong, I think it is supposed to be working just like Flux1, but with different PE indices and Qwen Text Encoder.... Maybe I'm missing an important detail but I can't find it.

stduhpf · 2025-12-07T21:40:50Z

I tried using my SplitAttention thing on a Flux model converted to diffusers format, and

I guess I found what is not working. I will try converting LongCat to Flux format and see if it works.

stduhpf · 2025-12-08T00:39:02Z

I think I got it?

With the padding fixed, but with diffusers format:

stduhpf · 2025-12-08T01:23:10Z

With the character-level tokenization trick:

Might need testing to make sure the current implementation supports languages that don't use the latin alphabet. Also for now it's applied to text wrapped in single quotes ( ') only.

stduhpf · 2025-12-08T01:34:26Z

Oh no, why are there so many conflicts now?

stduhpf · 2025-12-08T11:38:51Z

Using ' as a quote delimiter was a bad idea because it's the same symbol used for apostrophes. I will change it to detect " instead

stduhpf · 2025-12-08T12:47:54Z

Somehow not fully working yet, but it's definitely able to see it's supposed to be a cat holding a sign, maybe because of the vision model
sd.exe --diffusion-model ..\ComfyUI\models\unet\longcat_edit_bfl_format-Q8_0.gguf --vae ..\ComfyUI\models\vae\flux\ae.safetensors --cfg-scale 4.5 --sampling-method euler -v --offload-to-cpu --preview proj --steps 50 --vae-tile-size 128 --qwen2vl ..\ComfyUI\models\clip\Qwen2.5-VL-7B-Instruct.Q4_K_M.gguf --color --seed 0 -r .\assets\flux\flux1-dev-q8_0.png --llm_vision ..\ComfyUI\models\clip_vision\Qwen2.5-VL-7B-Instruct.mmproj-f16.gguf -p "Change the text to say \"I'm a long one\""

ref	out

(Also I made the change so it now needs double quotes around literal text)

stduhpf · 2025-12-08T13:34:28Z

Somehow couldn't get it to remove the original text, but there it goes

stable-diffusion.cpp

vae.hpp

Rocky-Lee-001 · 2025-12-10T06:06:52Z

May I ask which comfyui node is used to load this GGUF model?

stduhpf · 2025-12-12T01:21:12Z

Now supports UTF-8 encoding properly for the quoted text. (also quote characters are no longer excluded from the prompt after being parsed, seems to help a bit, especially with longer text.)

stuff

remove debug logs

Fix base rope offset for ref images

stduhpf · 2025-12-12T02:44:54Z

May I ask which comfyui node is used to load this GGUF model?

@Rocky-Lee-001 I don't think LongCat-Image is natively supported by ComfyUI yet. You could give https://github.com/sooxt98/comfyui_longcat_image a try, maybe it works well with the GGUF node for comfyUI?

leejet · 2025-12-13T07:19:04Z

ggml_extend.hpp

    }
 };

+class SplitLinear : public Linear {


If this part has no effect, I think we can remove the related code. In fact, even if it does have some effect, additional work is required to handle it when LoRA uses QKV format, so I wouldn’t really recommend this approach.

Thi is used when loading Flux diffusion models with diffusers naming convention, which has the qkv matrices split as individual linear layers rather than one big linear layer. For some reason it is not quite working, not sure why.

leejet · 2025-12-13T07:24:02Z

I’m not sure whether I did something wrong on my end, but I got a strange image.

.\bin\Release\sd-cli.exe --diffusion-model  ..\models\longcat_bfl_format-Q4_K_M.gguf --vae ..\..\ComfyUI\models\vae\ae.sft  --llm ..\..\ComfyUI\models\text_encoders\Qwen2.5-VL-7B-Instruct-Q8_0.gguf -p 'a lovely cat' --cfg-scale 5.0 -v --offload-to-cpu --diffusion-fa

stduhpf · 2025-12-13T14:11:30Z

@leejet that's strange. I can reproduce it with the same prompt though (even with Q8_0 model), but I haven't gotten anything like this in my earlier testing. Maybe There's a linear layer that could use scaling?

Does not seem related to seed.

It's a combination of short prompts + low resolution that seems to cause it.

leejet · 2025-12-15T15:44:50Z

I think there are still some differences between the implementation in this PR and the official pipeline—for example, the way token padding is handled and whether masks are used. I suspect these differences might be what caused the strange blocky artifacts in the generated images. I tried to fix it, but didn’t succeed; now it’s producing completely black images.

https://github.com/leejet/stable-diffusion.cpp/tree/longcat-fix

Yurchikian · 2025-12-16T20:51:36Z

I've tried this code and found out that short prompts lead to such a blocky modern-art generations, while long descriptive prompts produce higher quality results

For example ~/projects/longcat.cpp/build/bin/sd --diffusion-model ~/Drive/AI/ComfyUI/models/unet/longcat_bfl_format-Q4_K_M.gguf --diffusion-fa --vae ~/Drive/AI/ComfyUI/models/vae/ae.sft --sampling-method euler --cfg-scale 3.0 --steps 20 --llm ~/Drive/AI/ComfyUI/models/text_encoders/Qwen2.5-VL-7B-Instruct-Q3_K_M.gguf --color -W 512 -H 512 --seed 128 -o output_$(date +%Y-%m-%d_%H-%M-%S).png --vae-tiling -p "Orange cat" :

Repeating "Orange Cat" several times:

~/projects/longcat.cpp/build/bin/sd --diffusion-model ~/Drive/AI/ComfyUI/models/unet/longcat_bfl_format-Q4_K_M.gguf --diffusion-fa --vae ~/Drive/AI/ComfyUI/models/vae/ae.sft --sampling-method euler --cfg-scale 3.0 --steps 20 --llm ~/Drive/AI/ComfyUI/models/text_encoders/Qwen2.5-VL-7B-Instruct-Q3_K_M.gguf --color -W 512 -H 512 --seed 128 -o output_$(date +%Y-%m-%d_%H-%M-%S).png --vae-tiling -p "This is a portrait photograph of an Orange Cat sitting on the table by the window. We can see the room interior, fancy 50s style Art-Deco ornaments on the wallpaper. The weather outside is fine, the sun is shining. Wintertime snow on the ground, people playing with snowballs. The cat is large and shaped like a sphere. Whiskers are long and fangs are sharp. Phone snapshot, photograph."

Increasing step count would reduce artifacts even more. However face seems bit blocky anyway, not sure if it would be true for original workflow, can not run non-gguf model on my GPU. Also increasing resolution would not help with those artifacts.

Hope my observations might be helpful

P.S. I've also checked out @leejet longcat-fix branch. It generated black images indeed, until I added --clip-on-cpu, then i could get image with just "Orange Cat" prompt!

stduhpf marked this pull request as ready for review December 8, 2025 01:29

stduhpf changed the title ~~Wip: Longcat-Image support~~ Longcat-Image / Longcat-Image-Edit support Dec 8, 2025

stduhpf changed the title ~~Longcat-Image / Longcat-Image-Edit support~~ feat: Longcat-Image / Longcat-Image-Edit support Dec 8, 2025

leejet reviewed Dec 9, 2025

View reviewed changes

stable-diffusion.cpp Outdated Show resolved Hide resolved

vae.hpp Outdated Show resolved Hide resolved

stduhpf added 12 commits December 12, 2025 02:55

Support LongCat Image model

4249294

temp fix cuda error on quant concat for splitlinear

52ef50a

pre-patchify

7ba7feb

longcat rope ids

1241323

Fix diffusers_style detection

203d053

Flux: simplify when patch_size is 1

37c5e3e

correct rope offset for image tokens

a907fe2

stuff

Fix token length

fc8d85e

Split quoted text into character-level tokens

9f225e4

remove debug logs

support longcat-image-edit

c044a40

Fix base rope offset for ref images

Split quotes by utf8 characters rather than individual char

fd032bc

patch size consistent with Flux1

196bb89

stduhpf force-pushed the longcat branch from c31128b to 196bb89 Compare December 12, 2025 02:10

leejet reviewed Dec 13, 2025

View reviewed changes

feat: Longcat-Image / Longcat-Image-Edit support #1053

Are you sure you want to change the base?

feat: Longcat-Image / Longcat-Image-Edit support #1053

Uh oh!

Conversation

stduhpf commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wbruna commented Dec 5, 2025

Uh oh!

stduhpf commented Dec 6, 2025

Uh oh!

stduhpf commented Dec 6, 2025

Uh oh!

stduhpf commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Dec 8, 2025

Uh oh!

stduhpf commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rocky-Lee-001 commented Dec 10, 2025

Uh oh!

stduhpf commented Dec 12, 2025

Uh oh!

stduhpf commented Dec 12, 2025

Uh oh!

leejet Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

stduhpf Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

leejet commented Dec 13, 2025

Uh oh!

stduhpf commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leejet commented Dec 15, 2025

Uh oh!

Yurchikian commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

stduhpf commented Dec 5, 2025 •

edited

Loading

stduhpf commented Dec 7, 2025 •

edited

Loading

stduhpf commented Dec 8, 2025 •

edited

Loading

stduhpf commented Dec 8, 2025 •

edited

Loading

stduhpf commented Dec 8, 2025 •

edited

Loading

stduhpf commented Dec 8, 2025 •

edited

Loading

stduhpf commented Dec 8, 2025 •

edited

Loading

stduhpf commented Dec 13, 2025 •

edited

Loading

Yurchikian commented Dec 16, 2025 •

edited

Loading