Skip to content

Adding support for Pixtral#435

Open
sahilsuneja1 wants to merge 7 commits into
foundation-model-stack:mainfrom
sahilsuneja1:llava
Open

Adding support for Pixtral#435
sahilsuneja1 wants to merge 7 commits into
foundation-model-stack:mainfrom
sahilsuneja1:llava

Conversation

@sahilsuneja1
Copy link
Copy Markdown
Contributor

@sahilsuneja1 sahilsuneja1 commented Jun 27, 2025

This PR adds support for pixtral vision encoder + llava vision-language connector. This works together with a mistral language model variant (arch already supported), to yield overall Pixtral model.

Signed-off-by: Sahil Suneja <suneja@us.ibm.com>
Signed-off-by: Sahil Suneja <suneja@us.ibm.com>
Signed-off-by: Sahil Suneja <suneja@us.ibm.com>
Signed-off-by: Sahil Suneja <suneja@us.ibm.com>
Signed-off-by: Sahil Suneja <suneja@us.ibm.com>
Signed-off-by: Sahil Suneja <suneja@us.ibm.com>
Signed-off-by: Sahil Suneja <suneja@us.ibm.com>
@sahilsuneja1 sahilsuneja1 changed the title [DRAFT] Adding support for Pixtral Adding support for Pixtral Jul 9, 2025
@sahilsuneja1
Copy link
Copy Markdown
Contributor Author

sahilsuneja1 commented Jul 9, 2025

This is ready for review.
Will update fms/models/hf/utils.py to use recursive infer_model_parameters after llava_next PR #420 merges.
There seems to be a graph break in PixtralVision::forward()

        patch_embeds_list = [
            embed[..., : (size[0] // self.patch_size), : (size[1] // self.patch_size)]
            for embed, size in zip(patch_embeds, image_sizes)
        ]

Depending upon the complexity, we can wait for the fix to be made to this PR itself, or in a subsequent one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant