Adding siglip vision model by sahilsuneja1 · Pull Request #405 · foundation-model-stack/foundation-model-stack

sahilsuneja1 · 2025-05-08T22:13:24Z

Needed to enable granite-vision = siglip vision (this PR) + llava-next vision-language connector (PR #420) + granite-3.1-2b-instruct (already supported)

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

sahilsuneja1 · 2025-06-12T19:35:58Z

PR ready for review @JRosenkranz

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

JRosenkranz

Has this been tested for equivalency with huggingface? We should try loading model using get_model with an "hf" source. Can you follow something similar to this as a reference: https://github.com/foundation-model-stack/foundation-model-stack/blob/main/tests/models/hf_equivalence/test_granite.py

ani300 · 2025-06-18T14:18:51Z

+        self.attention = torch.nn.MultiheadAttention(
+            config.hidden_size, config.num_attention_heads, batch_first=True
+        )


we can and probably should move this to use our MultiHeadAttention block, as that will enable quantization and any other AIU specific behavior we might need

I was also curious why the hf implementation uses two different attention mechanisms- their own SiglipAttention implementation in SiglipEncoderLayer as well as torch.nn.MultiheadAttention here. So kept it as is.

I do use our MultiHeadAttention block in SiglipEncoderLayer's self.self_attn, line 114.

Will try substituting with our own block to see if the output stays the same.

Got following error on using our MultiHeadAttention block. Maybe there is a reason HF implementation also uses two variants of MHA calls-- one in SiglipEncoderLayer (where we are able to use our MultiHeadAttention impl) and vanilla torch.nn.MultiheadAttention in SiglipMultiheadAttentionPoolingHead)

File "/gpfs/suneja/foundation-model-stack/scripts/inference_siglip_vision.py", line 243, in <module> outputs = model(**inputs) ^^^^^^^^^^^^^^^ File "/gpfs/suneja/conda-envs/env1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/suneja/conda-envs/env1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/suneja/foundation-model-stack/fms/models/siglip_vision.py", line 319, in forward pooler_output = self.head(hidden_states) if self.use_head else None ^^^^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/suneja/conda-envs/env1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/suneja/conda-envs/env1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/suneja/foundation-model-stack/fms/models/siglip_vision.py", line 259, in forward hidden_state = self.attention(probe, hidden_state, hidden_state)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/suneja/conda-envs/env1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/suneja/conda-envs/env1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/suneja/foundation-model-stack/fms/modules/attention.py", line 641, in forward q_out, k_out, v_out = self.in_proj(q, k, v) ^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/suneja/conda-envs/env1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/suneja/conda-envs/env1/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/suneja/foundation-model-stack/fms/modules/attention.py", line 512, in forward raise ValueError("q, k, and v must be the same or k and v must be None") ValueError: q, k, and v must be the same or k and v must be None

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

JRosenkranz · 2025-06-25T00:00:49Z

+        self.use_head = (
+            True
+            if not hasattr(self.config, "vision_use_head")
+            else self.config.vision_use_head


I don't see a vision_use_head as part of the above ModelConfig. Is it supposed to be there?

Hmm, I borrowed this from the HF implementation, to keep the Siglip implementation more general beyond its use in granite-vision, which does not explicitly set that config. I guess I can set vision_use_head to True as default, or drop this altogether for now.

If we see a requirement for vision_use_head in the future, I would add it to the config with True as the default. Otherwise i would remove it

Removed for now

JRosenkranz · 2025-06-25T00:01:13Z

+    SiglipFixtures,
+):
+    @staticmethod
+    def get_last_hidden_state(f_out):


do we want a case where vision_use_head=True?

Hmm, that wouldn't change anything. It would set self.use_head to True in siplip.py:301 which is the default case anyways when vision_use_head is not set. Also, I'll just drop it for now, as per one of the previous comments.

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

ani300 · 2025-06-26T02:44:05Z

+        self.probe = nn.Parameter(torch.randn(1, 1, config.hidden_size))
+
+        # HF implementation uses PT MHA here, as opposed to SiglipAttnention as in the SiglipEncoderLayer
+        self.attention = torch.nn.MultiheadAttention(


thinking mostly of AIU support for this, do we want to use our attention implementation instead?

I tried using our MultiHeadAttention implementation, but lead to the error I shared before. Does using torch.nn variant lead to issues on AIU?

ah, I see what's happening, this uses cross-attention, which our Attention class cannot do (we removed it as no models used cross attention anymore after everyone moved to decoders from encoder-decoder architectures). Keep it as torch.nn.MHA for now then, although I'm 99% sure this will fail on the AIU

ani300 · 2025-06-26T02:50:08Z

+
+    new_sd = input_sd
+    if has_fused_weights:
+        new_sd = serialization._mlp_glu_unfused_to_fused_adapter_step(


the MLP layer isn't GLU, so weight fusion with this helper won't work, there should be another one that fuses the MLP instead

Thanks! Will fix

@ani300 There doesn't seem to be one specifically for mlp in utils/serialization.py

oh nvm, that's what happens when you review code after 11pm. There's no fusion possible for MLP. Just remove the glu portion here, the code should be:

new_sd = input_sd if has_fused_weights: new_sd = serialization._attn_unfused_to_fused_step(new_sd) return new_sd

Np! Thank for reviewing!

ani300

I have one large-ish question about using torch.nn.MultiHeadAttention, but other than that lgtm! Some other minor comments in there

sahilsuneja1 · 2025-06-27T17:49:20Z

Thanks! Will take care of the others, but reharding our MHA impl, I run into this error.

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

ani300

lgtm!

sahilsuneja1 and others added 2 commits May 8, 2025 22:06

adding siglip vision support

51f6351

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

Merge branch 'foundation-model-stack:main' into vision

76c9dc1

sahilsuneja1 marked this pull request as draft May 8, 2025 22:13

sahilsuneja1 and others added 3 commits May 8, 2025 22:18

import fix

00d3211

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

siglip

768f6ef

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

Merge branch 'main' into vision

415601f

sahilsuneja1 mentioned this pull request May 29, 2025

Adding llava_next model #420

Merged

sahilsuneja1 added 5 commits May 29, 2025 20:47

update attn_kwargs

82c454d

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

ruff

c8d55d4

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

adding tests

a15d4b6

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

merge main

ed7056e

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

ruff format

4bf3765

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

sahilsuneja1 marked this pull request as ready for review June 12, 2025 19:36

test update

b510891

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

sahilsuneja1 changed the title ~~[DRAFT] Adding siglip vision model~~ Adding siglip vision model Jun 12, 2025

kaoutar55 requested review from JRosenkranz, ani300 and kaoutar55 June 17, 2025 20:42

JRosenkranz reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py Outdated

JRosenkranz reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py Outdated

JRosenkranz reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py Outdated

JRosenkranz reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py Outdated

JRosenkranz reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py Outdated

JRosenkranz reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py Outdated

JRosenkranz reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py Outdated

JRosenkranz reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py

JRosenkranz reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py Outdated

JRosenkranz reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/utils/activation.py

JRosenkranz reviewed Jun 18, 2025

View reviewed changes

ani300 reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/hf/utils.py Outdated

ani300 reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py

ani300 reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py Outdated

ani300 reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py Outdated

ani300 reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py Outdated

ani300 reviewed Jun 18, 2025

View reviewed changes

Comment thread fms/testing/_internal/model_test_suite.py Outdated

addressing review comments

49d3b3f

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

JRosenkranz reviewed Jun 25, 2025

View reviewed changes

siglip updates post review #2

902fd14

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

ani300 reviewed Jun 26, 2025

View reviewed changes

Comment thread fms/models/siglip_vision.py Outdated

ani300 reviewed Jun 26, 2025

View reviewed changes

sahilsuneja1 and others added 3 commits July 1, 2025 14:14

minor

e86408a

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

fusion fix

225bc13

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>

Merge branch 'main' into vision

f8b3288

ani300 approved these changes Jul 1, 2025

View reviewed changes

ani300 merged commit a532cb7 into foundation-model-stack:main Jul 1, 2025
4 checks passed

Conversation

sahilsuneja1 commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sahilsuneja1 commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JRosenkranz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ani300 Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ani300 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sahilsuneja1 commented Jun 27, 2025

Uh oh!

ani300 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

sahilsuneja1 commented May 8, 2025 •

edited

Loading

ani300 Jun 26, 2025 •

edited

Loading

ani300 left a comment •

edited

Loading