Skip to content

[RL] make sync weights conditional#4925

Draft
Datta0 wants to merge 3 commits intounslothai:mainfrom
Datta0:guard-sync-weights
Draft

[RL] make sync weights conditional#4925
Datta0 wants to merge 3 commits intounslothai:mainfrom
Datta0:guard-sync-weights

Conversation

@Datta0
Copy link
Copy Markdown
Collaborator

@Datta0 Datta0 commented Apr 8, 2026

When using shared weights, we were removing sync_weights or reload_weights calls.
We now want to make it conditional so that people can use GRPOTrainer for non weight shared models as well.
When we detect weight sharing, we don't sync/reload. We use LoRARequest

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a shared_weights attribute to the vLLM engine across Llama and Vision models to manage weight synchronization and reloading more granularly. Instead of unconditionally removing these calls, the code now uses conditional guards to skip them only when weights are shared. Additionally, it ensures lora_request is correctly passed during generation when weights are shared. The review feedback identifies several instances of redundant logic where multiple checks are used to verify the same state, suggesting simplifications to improve code clarity and consistency.

Comment thread unsloth/models/rl.py Outdated
Comment thread unsloth/models/rl_replacements.py Outdated
Comment thread unsloth/models/rl_replacements.py Outdated
Comment thread unsloth/models/rl_replacements.py
Comment thread unsloth/models/rl_replacements.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant