[WIP] Granite Vision Embedding Support by alex-jw-brooks · Pull Request #488 · foundation-model-stack/foundation-model-stack

alex-jw-brooks · 2025-11-12T08:00:37Z

In progress support for granite vision 3.3 embeddings. Current state:

The model loads all weights successfully with no warnings
A very small encode() func has been added to the generate utils with a similar API that we can use in local tests for embedding models
Forward pass is implemented for text and image embeddings
- Equivalency test for text only & image only is passing against HF is currently passing on CPU with transformers 4.50
- Some other applicable tests for configs etc are passing; still fixing the compile & consistency failures

Next steps
[ ] Add the forward pass for the vision tower & special packing utils
[ ] Pull the config into its own class (currently using llava next's, but there are a few extra things we'll need here)
[ ] Fix the remaining tests + add image embedding tests
[ ] Refactor to reduce duplication of utils in llava next - lots of the adapter stuff is currently copied

As this will touch some stuff in other models I plan to break some pieces out into dependent PRs to avoid one off interface changes, like allowing the granite LLM to run without its LM head, but opening this as a WIP in case anyone has early thoughts

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

alex-jw-brooks · 2025-11-18T16:47:33Z

-            use_high_precision_pow=True,
-        )
+        # Hack for equivalence testing
+        self.layer_norm1 = nn.LayerNorm(self.embed_dim, eps=config.layer_norm_eps)


We shouldn't merge this, but it's a potential reason for currently diverging outputs coming out of the vision tower - I wrote a quick local equivalency test for calling the vision tower directly (after loading it with granite vision) and noticed that the things are off by a bit without this change, so leaving it here for now

alex-jw-brooks added 7 commits November 12, 2025 08:01

Add note for llava next

707cd3b

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

granite vision embed weights are loadable

22aa6c9

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Allow running granite llms as headless

6ac8892

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Fix rope mapping bug, add text only support for gvision embed

24602a8

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Add basic encode def

fdcca16

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Add test for hf equivalence (text only)

ddcb06a

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Add wip granite vision embed tests

c0d055f

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

alex-jw-brooks force-pushed the granite_vision_embed branch from 0c3391c to c0d055f Compare November 12, 2025 08:01

alex-jw-brooks added 6 commits November 17, 2025 22:04

Replace llava next config with granite vision emb

09b7eeb

Fix config initialization for granite vision emb

472582a

Use torch nn layer norm for siglip encoder blocks for equivalence

cceffcb

fix post init, implement vision tower wrapper

358ec6f

Add image equivalence test w/ sdpa

f172132

First pass at topk feature selection

b8c5d31

alex-jw-brooks commented Nov 18, 2025

View reviewed changes

kaoutar55 self-requested a review November 18, 2025 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Granite Vision Embedding Support#488

[WIP] Granite Vision Embedding Support#488
alex-jw-brooks wants to merge 13 commits into
foundation-model-stack:mainfrom
alex-jw-brooks:granite_vision_embed

alex-jw-brooks commented Nov 12, 2025 •

edited

Loading

Uh oh!

alex-jw-brooks Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alex-jw-brooks commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alex-jw-brooks Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alex-jw-brooks commented Nov 12, 2025 •

edited

Loading