[Bug]  RuntimeError: self and mat2 must have the same dtype (Half and BFloat16) in matmul_lora during GRPO training with 4-bit quantization

Note: Please do not remove the questions. Answer beside them.
  1. Did you update? `pip install --upgrade unsloth unsloth_zoo`  Ans. Yes                                                                                                       
  2. `Colab` or `Kaggle` or local / cloud  Ans. Local                                                                                                                            
  3. Number GPUs used, use `nvidia-smi`  Ans. 1x RTX 5050 Laptop GPU, 8GB VRAM                                                                                                   
  4. Which notebook? Please link!  Ans. Custom training script (no notebook)                                                                                                     
  5. Which Unsloth version, TRL version, transformers version, PyTorch version?                                                                                                  
     Ans. unsloth==2026.4.4, unsloth_zoo==2026.4.3, trl==0.23.0, transformers==5.5.0, torch==2.10.0, CUDA 12.8
  6. Which trainer? `SFTTrainer`, `GRPOTrainer` etc  Ans. GRPOTrainer with PatchFastRL("GRPO", FastLanguageModel)                                                                
                                                                                                                                                                                 
  ```python                                                                                                                                                                      
  from unsloth import FastLanguageModel, PatchFastRL                                                                                                                             
  PatchFastRL("GRPO", FastLanguageModel)
  from trl import GRPOConfig, GRPOTrainer
                                                                                                                                                                                 
  model, tokenizer = FastLanguageModel.from_pretrained(
      model_name="unsloth/Llama-3.2-3B-Instruct-bnb-4bit",                                                                                                                       
      max_seq_length=1024,                                                                                                                                                       
      dtype=None,                                                                                                                                                                
      load_in_4bit=True,                                                                                                                                                         
      fast_inference=False,                                                                                                                                                      
  )               
  model = FastLanguageModel.get_peft_model(
      model, r=16,                                                                                                                                                               
      target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
      lora_alpha=16, lora_dropout=0, bias="none",                                                                                                                                
      use_gradient_checkpointing="unsloth",
  )                                                                                                                                                                              
                



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] RuntimeError: self and mat2 must have the same dtype (Half and BFloat16) in matmul_lora during GRPO training with 4-bit quantization #4891

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] RuntimeError: self and mat2 must have the same dtype (Half and BFloat16) in matmul_lora during GRPO training with 4-bit quantization #4891

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions