Fix clip_skip AttributeError on Stable Diffusion pipelines with transformers>=5.6 by Sunt-ing · Pull Request #14043 · huggingface/diffusers

Sunt-ing · 2026-06-22T20:03:19Z

What does this PR do?

Passing clip_skip to any SD1.x-family pipeline crashes on transformers>=5.6:

AttributeError: 'CLIPTextModel' object has no attribute 'text_model'

transformers 5.6 flattened CLIPTextModel (huggingface/transformers#46285): embeddings / encoder / final_layer_norm became direct submodules and the text_model wrapper was removed (CLIPTextModelWithProjection still wraps via text_model). The clip_skip branch of encode_prompt re-applies the final LayerNorm by hand via self.text_encoder.text_model.final_layer_norm(...), so it raises as soon as clip_skip is set. The default clip_skip=None path is unaffected because it uses last_hidden_state (already normalized) and never touches .text_model. diffusers declares no upper bound on transformers and is actively migrating to 5.x, so this is a live defect rather than an out-of-range version.

This reuses the exact guard already merged for the from_single_file path in #13843:

text_model = self.text_encoder.text_model if hasattr(self.text_encoder, "text_model") else self.text_encoder
prompt_embeds = text_model.final_layer_norm(prompt_embeds)

StableDiffusionPipeline.encode_prompt is the source; make fix-copies propagates it to 39 # Copied from consumers, and 6 hand-written siblings (alt_diffusion x2, animatediff video2video x2, i2vgen_xl, ledits_pp) are updated to match, 46 files in total. SDXL is not affected: its encode_prompt reads hidden_states[-(clip_skip + 2)] and never calls final_layer_norm, and text_encoder_2 is a CLIPTextModelWithProjection (still has .text_model).

Reproduction (CPU, no GPU; transformers 5.12.1)

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("hf-internal-testing/tiny-stable-diffusion-pipe", safety_checker=None)
pipe.set_progress_bar_config(disable=True)
print(type(pipe.text_encoder).__name__, "has .text_model:", hasattr(pipe.text_encoder, "text_model"))

pipe(prompt="a cat", num_inference_steps=2, guidance_scale=0.0, output_type="np")                 # clip_skip=None: OK
pipe(prompt="a cat", num_inference_steps=2, guidance_scale=0.0, output_type="np", clip_skip=1)     # crashes on main

# main:  CLIPTextModel has .text_model: False
#        clip_skip=None -> image (1, 64, 64, 3)
#        clip_skip=1    -> AttributeError: 'CLIPTextModel' object has no attribute 'text_model'
# this:  clip_skip=None and clip_skip=1 both produce (1, 64, 64, 3)

Confirmed identically on real weights (stabilityai/sd-turbo, a CLIPTextModel): clip_skip=None fine, clip_skip=1 raises on main and works with this PR. The crash is in encode_prompt, before the denoise loop, so it is weight-independent and CPU-reproducible.

Tests

Added test_stable_diffusion_clip_skip to StableDiffusionPipelineFastTests: it asserts a clip_skip=1 call runs and that its output differs from clip_skip=None. The test errors on main (the AttributeError) and passes with this PR.

tests/pipelines/stable_diffusion/test_stable_diffusion.py::...::test_stable_diffusion_clip_skip  PASSED

ruff (0.9.10), check_copies, and check_dummies are clean on the changed files.

Before submitting

Did you use an AI agent (Claude Code, Codex, Cursor, etc.) to help with this PR? If so:
- Did you read the Coding with AI agents guide?
- Did you self-review the diff against .ai/review-rules.md?
Did you read the contributor guideline?
Did you write any new necessary tests?

Who can review?

@asomoza @DN6

…formers>=5.6 transformers 5.6 flattened CLIPTextModel (huggingface/transformers#46285): embeddings/encoder/final_layer_norm became direct submodules and the text_model wrapper was removed. The clip_skip branch of encode_prompt re-applies the final LayerNorm via self.text_encoder.text_model.final_layer_norm, so any SD1.x-family pipeline call with clip_skip set raised AttributeError. Guard the access with the same hasattr check already merged for from_single_file in huggingface#13843, propagated via make fix-copies and applied to the hand-written siblings. SDXL is unaffected (uses hidden_states[-(clip_skip+2)] and a CLIPTextModelWithProjection encoder). Signed-off-by: Ting Sun <suntcrick@gmail.com>

github-actions Bot added size/L PR with diff > 200 LOC tests pipelines and removed size/L PR with diff > 200 LOC labels Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix clip_skip AttributeError on Stable Diffusion pipelines with transformers>=5.6#14043

Fix clip_skip AttributeError on Stable Diffusion pipelines with transformers>=5.6#14043
Sunt-ing wants to merge 1 commit into
huggingface:mainfrom
Sunt-ing:2

Sunt-ing commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Sunt-ing commented Jun 22, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant