[WIP] Granite Vision Embedding Support#488
Draft
alex-jw-brooks wants to merge 13 commits into
Draft
Conversation
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
0c3391c to
c0d055f
Compare
alex-jw-brooks
commented
Nov 18, 2025
| use_high_precision_pow=True, | ||
| ) | ||
| # Hack for equivalence testing | ||
| self.layer_norm1 = nn.LayerNorm(self.embed_dim, eps=config.layer_norm_eps) |
Contributor
Author
There was a problem hiding this comment.
We shouldn't merge this, but it's a potential reason for currently diverging outputs coming out of the vision tower - I wrote a quick local equivalency test for calling the vision tower directly (after loading it with granite vision) and noticed that the things are off by a bit without this change, so leaving it here for now
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In progress support for granite vision 3.3 embeddings. Current state:
encode()func has been added to the generate utils with a similar API that we can use in local tests for embedding modelsNext steps
[ ] Add the forward pass for the vision tower & special packing utils
[ ] Pull the config into its own class (currently using llava next's, but there are a few extra things we'll need here)
[ ] Fix the remaining tests + add image embedding tests
[ ] Refactor to reduce duplication of utils in llava next - lots of the adapter stuff is currently copied
As this will touch some stuff in other models I plan to break some pieces out into dependent PRs to avoid one off interface changes, like allowing the granite LLM to run without its LM head, but opening this as a WIP in case anyone has early thoughts