You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository presents the results of training Qwen 2.5 1.5B Instruct with SFT and RLVR (GRPO) on the GSM8K dataset, analyzing performance across GSM8K and MATH benchmarks.