Z-Image-Turbo `from_single_file` #12756

hlky · 2025-11-30T11:29:39Z

What does this PR do?

import torch
from diffusers import GGUFQuantizationConfig
from diffusers.models import ZImageTransformer2DModel
from huggingface_hub import hf_hub_download

model = ZImageTransformer2DModel.from_single_file(
    hf_hub_download("jayn7/Z-Image-Turbo-GGUF", "z_image_turbo-Q3_K_S.gguf"),
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16,
)

model = ZImageTransformer2DModel.from_single_file(
    hf_hub_download(
        "Comfy-Org/z_image_turbo",
        "split_files/diffusion_models/z_image_turbo_bf16.safetensors",
    )
)

See https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/z_image_convert_original_to_comfy.py

Fixes #12748

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Vargol · 2025-12-01T18:19:24Z

Sorry if this shows as a dupe I though commented this ages ago, but there's no sign of it
Hi, thanks for this, the model I tried loads but t doesn't work, if looks like there's some code in the transformer code that checks the dtype of the weights and gets the int8 storage dtype instead of the GGUF compute dtype and then calls (eventually) torch.nn.Linear with the wrong types.

The code is the TimestepEmbedder forward function

    def forward(self, t):
        t_freq = self.timestep_embedding(t, self.frequency_embedding_size)
        weight_dtype = self.mlp[0].weight.dtype
        if weight_dtype.is_floating_point:
            t_freq = t_freq.to(weight_dtype)
        t_emb = self.mlp(t_freq)
        return t_emb

self.mlp[0].weight.dtype returns int8 for a GGUF format model

This leads to the forward code

        t_emb = self.mlp(t_freq)

eventually calling output = torch.nn.functional.linear(inputs, weight, bias) with torch.Float , torch.BFloat16, torch.BFloat16 arguments.
on MPS this fails with Destination NDArray and Accumulator NDArray cannot have different datatype in MPSNDArrayMatrixMultiplication , I don't have any CUDA or other devices to check if this is a generic
issue.

If I hardcode the right type, I can generate an image without issue

        t_freq = t_freq.to(self.mlp[0].compute_dtype)

Presumably It will need to incorporated properly with a attribute check for compute_dtype as part of the dtype setting code rather than my brute force method

hlky · 2025-12-01T21:30:39Z

@Vargol Apologies, I only tested loading. Something like c84e6d7 should work

Vargol · 2025-12-01T22:15:53Z

That looks like it'll work, I'll give a quick test.

Vargol · 2025-12-01T22:31:22Z

Yep - that's worked, no errors only images :-)

DN6 · 2025-12-03T07:31:20Z

src/diffusers/models/transformers/transformer_z_image.py

        # Match t_embedder output dtype to x for layerwise casting compatibility
        adaln_input = t.type_as(x)
-        x[torch.cat(x_inner_pad_mask)] = self.x_pad_token
+        x[torch.cat(x_inner_pad_mask).to(x.device)] = self.x_pad_token.to(x.device)


Just a question. Why the device cast here? Is it to fix something else?

Oh, I meant to remove that, for context this patch was shared in the community to fix layer offloading in one of the training UIs, I was just curious what changes they made and forgot to revert before I started this branch, not sure if it's related to Diffusers offloading or specific to the third party repo. Removed in da06a2c

DN6 · 2025-12-03T07:46:40Z

src/diffusers/models/transformers/transformer_z_image.py

        cap_feats = torch.cat(cap_feats, dim=0)
        cap_feats = self.cap_embedder(cap_feats)
-        cap_feats[torch.cat(cap_inner_pad_mask)] = self.cap_pad_token
+        cap_feats[torch.cat(cap_inner_pad_mask).to(cap_feats.device)] = self.cap_pad_token.to(cap_feats.device)


Just a question. Why the device cast here? Is it to fix something else?

DN6

Thanks @hlky 👍🏽

HuggingFaceDocBuilderDev · 2025-12-03T11:48:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Z-Image-Turbo from_single_file

ba23ad8

compute_dtype

c84e6d7

sayakpaul requested a review from DN6 December 3, 2025 07:43

DN6 reviewed Dec 3, 2025

View reviewed changes

-device cast

da06a2c

DN6 approved these changes Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Z-Image-Turbo `from_single_file` #12756

Z-Image-Turbo `from_single_file` #12756

hlky commented Nov 30, 2025

Uh oh!

Vargol commented Dec 1, 2025 •

edited

Loading

Uh oh!

hlky commented Dec 1, 2025

Uh oh!

Vargol commented Dec 1, 2025

Uh oh!

Vargol commented Dec 1, 2025

Uh oh!

DN6 Dec 3, 2025

Uh oh!

hlky Dec 3, 2025

Uh oh!

DN6 Dec 3, 2025

Uh oh!

DN6 left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Z-Image-Turbo from_single_file #12756

Are you sure you want to change the base?

Z-Image-Turbo from_single_file #12756

Conversation

hlky commented Nov 30, 2025

What does this PR do?

Who can review?

Uh oh!

Vargol commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hlky commented Dec 1, 2025

Uh oh!

Vargol commented Dec 1, 2025

Uh oh!

Vargol commented Dec 1, 2025

Uh oh!

DN6 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

hlky Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

DN6 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

DN6 left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Z-Image-Turbo `from_single_file` #12756

Z-Image-Turbo `from_single_file` #12756

Vargol commented Dec 1, 2025 •

edited

Loading