Group-relative Trajectory-based Policy Optimization: Increasing Quality and Training Stability
reinforcement-learning reinforcement-learning-algorithms train fine post-training llm rlhf grpo-training
-
Updated
Feb 23, 2026 - Jupyter Notebook