[MAX] Add Wan-Animate pipeline support by kkimmk · Pull Request #6347 · modular/modular

kkimmk · 2026-04-03T10:13:59Z

Summary

This PR adds support for the Wan-Animate pipeline (Wan-AI/Wan2.2-Animate-14B-Diffusers), a video generation model for motion transfer and character replacement built on top of the existing Wan I2V pipeline.

CLIP vision encoder: Adds a MAX-native CLIP vision encoder (clip_encoder.py, layers/) used as an image conditioning signal. The existing CLIP text encoder is extracted into clip_modulev3 to avoid a naming collision.
Wan-Animate transformer: Extends WanTransformerModel with pose injection (Conv3d), CLIP image cross-attention, a face adapter (WanAnimateFaceEncoder), and a motion encoder (WanAnimateMotionEncoder). Transformer layers are refactored into a layers/ subpackage.
Wan-Animate pipeline: Adds WanAnimatePipeline extending WanI2VPipeline, supporting animate mode (motion transfer) and replace mode (background-preserving character replacement via mask) with multi-segment processing and temporal overlap.
Tokenizer and preprocessing: Extends PixelGenerationTokenizer and adds video_processor.py to handle pose video, face pixels, background video, and mask inputs for the Animate pipeline.

How to Run

The Wan-Animate pipeline requires preprocessed driving videos — a skeleton pose video (DWPose renders) and a cropped face video — rather than raw footage. To generate them from a raw driving video, use the official preprocessing scripts.

Alternatively, pre-processed sample assets are available for download at https://huggingface.co/datasets/squeezebits/diffusion-benchmark.

Animate Mode

./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
      --model Wan-AI/Wan2.2-Animate-14B-Diffusers \
      --input-image <character.jpeg> \
      --pose-video <pose.mp4> \
      --face-video <face.mp4> \
      --prompt "A character moving naturally." \
      --height 480 --width 848 --num-frames 77 \
      --num-inference-steps 40 --seed 42  --num-warmups 1 \
      --guidance-scale 1.0 \
      --output output_animate.mp4

Replace Mode

./bazelw run //max/examples/diffusion:simple_offline_video_generation -- \
      --model Wan-AI/Wan2.2-Animate-14B-Diffusers \
      --input-image <character.jpeg> \
      --pose-video <pose.mp4> \
      --face-video <face.mp4> \
      --prompt "A character moving naturally." \
      --height 480 --width 848 --num-frames 77 \
      --num-inference-steps 40 --seed 42 --num-warmups 1 \
      --guidance-scale 1.0 \
      --output output_replace.mp4
      --animate-mode replace \
      --background-video <background.mp4> \
      --mask-video <mask.mp4>

Checklist

PR is small and focused — consider splitting larger changes into a
sequence of smaller PRs
I ran ./bazelw run format to format my changes
I added or updated tests to cover my changes
If AI tools assisted with this contribution, I have included an
Assisted-by: trailer in my commit message or this PR description
(see AI Tool Use Policy)

Assisted-by: Claude Code

Signed-off-by: jglee-sqbits <jingu.lee@squeezebits.com>

- simple_offline_video_generation.py: standalone T2V/I2V example with profiling - Supports --resolutions for multi-resolution runs in single process - All Wan 2.1/2.2 models, LoRA, MoE, configurable resolution/steps - all_wan_model_speed_metric.py: automated benchmark across all model variants - Streams output in real-time (tqdm visible) - Cleans up child processes on exit - BUILD.bazel targets for both scripts Signed-off-by: jglee-sqbits <jingu.lee@squeezebits.com>

# Conflicts: # max/kernels/src/nn/conv/conv.mojo

jglee-sqbits and others added 16 commits March 31, 2026 08:12

[Pipelines] Remove section divider comments

1936da3

Signed-off-by: jglee-sqbits <jingu.lee@squeezebits.com>

[Pipelines] Add UMT5 text encoder for Wan diffusion

0db7327

Signed-off-by: jglee-sqbits <jingu.lee@squeezebits.com>

[Pipelines] Remove section divider comments

86b6f1c

Signed-off-by: jglee-sqbits <jingu.lee@squeezebits.com>

[Pipelines] Add Wan VAE and refactor autoencoder module

5516e1e

Signed-off-by: jglee-sqbits <jingu.lee@squeezebits.com>

[Pipelines] Add Wan T2V diffusion pipeline with MoE support

64e9d6c

Signed-off-by: jglee-sqbits <jingu.lee@squeezebits.com>

[Pipelines] Add Wan I2V diffusion pipeline

94d0b89

Signed-off-by: jglee-sqbits <jingu.lee@squeezebits.com>

Merge branch 'add/wan/umt5-encoder' into add/wan/merge-0331

7327525

Merge branch 'add/wan/vae' into add/wan/merge-0331

e73ed25

Merge branch 'add/wan/transformer' into add/wan/merge-0331

a081672

Merge branch 'add/wan/pipeline-t2v' into add/wan/merge-0331

8a5f937

# Conflicts: # max/kernels/src/nn/conv/conv.mojo

Merge branch 'add/wan/pipeline-i2v' into add/wan/merge-0331

2588c08

Merge branch 'add/wan/examples' into add/wan/merge-0331

4e0fffe

[MAX] Add Clip vision encoder for Wan-Animate pipeline

e2b90f0

[MAX] Add Wan-Animate transformer model for Wan-Animate pipeline

3ef6dc2

[MAX] Add Wan-Animate pipeline

9825542

github-actions Bot added the waiting-on-review label Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAX] Add Wan-Animate pipeline support#6347

[MAX] Add Wan-Animate pipeline support#6347
kkimmk wants to merge 16 commits intomodular:mainfrom
SqueezeBits:add/wan-animate

kkimmk commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kkimmk commented Apr 3, 2026

Summary

How to Run

Animate Mode

Replace Mode

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants