Skip to content

[Bug] RuntimeError: self and mat2 must have the same dtype (Half and BFloat16) in matmul_lora during GRPO training with 4-bit quantization #4891

@anandn1

Description

@anandn1

Note: Please do not remove the questions. Answer beside them.

  1. Did you update? pip install --upgrade unsloth unsloth_zoo Ans. Yes
  2. Colab or Kaggle or local / cloud Ans. Local
  3. Number GPUs used, use nvidia-smi Ans. 1x RTX 5050 Laptop GPU, 8GB VRAM
  4. Which notebook? Please link! Ans. Custom training script (no notebook)
  5. Which Unsloth version, TRL version, transformers version, PyTorch version?
    Ans. unsloth==2026.4.4, unsloth_zoo==2026.4.3, trl==0.23.0, transformers==5.5.0, torch==2.10.0, CUDA 12.8
  6. Which trainer? SFTTrainer, GRPOTrainer etc Ans. GRPOTrainer with PatchFastRL("GRPO", FastLanguageModel)
from unsloth import FastLanguageModel, PatchFastRL                                                                                                                             
PatchFastRL("GRPO", FastLanguageModel)
from trl import GRPOConfig, GRPOTrainer
                                                                                                                                                                               
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct-bnb-4bit",                                                                                                                       
    max_seq_length=1024,                                                                                                                                                       
    dtype=None,                                                                                                                                                                
    load_in_4bit=True,                                                                                                                                                         
    fast_inference=False,                                                                                                                                                      
)               
model = FastLanguageModel.get_peft_model(
    model, r=16,                                                                                                                                                               
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    lora_alpha=16, lora_dropout=0, bias="none",                                                                                                                                
    use_gradient_checkpointing="unsloth",
)                                                                                                                                                                              
              

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions