Hi everyone, I’m an independent researcher in Vietnam preparing to submit a paper to arXiv in cs.CL, and I’m currently looking for an endorsement to complete the submission process.
The paper extends Open-RS RS2 (Knoveleng and Ngo, 2025) at sub-3B scale on a single-A100 plus LoRA budget. It compares three GRPO training arms that vary one axis at a time: training language (English vs
Vietnamese-translated math) and reward function (with or without a fastText language-consistency reward). The main finding is that the auxiliary reward, even when it fires constant 1.0 on English training data,
recovers 13.3 percentage points on AIME-2024 over the vanilla English-only run, suggesting it acts as an implicit regularizer via PPO clipping geometry rather than through content signal. The
Vietnamese-translated arm shows the same regularization signature at smaller magnitude. The paper documents the LoRA gap honestly (57.5 vs 80 percent on AMC23) and acknowledges single-seed limitations openly.
Endorsement code: S7LYVM
Endorse here: Log in to arXiv | arXiv e-print repository
For transparency, the paper is already public on Zenodo with DOI 10.5281/zenodo.20061328: Beyond English-Only GRPO: Training Language and Auxiliary Reward as Implicit Regularizers in Sub-3B Math Reasoning
Code, configs, LoRA adapters, evaluation JSONs, and per-step training logs are released on GitHub under nhockid235/xling-grpo-sub3b (Apache-2.0).
If you have 3+ cs.* submissions in the last 5 years, you’re eligible to endorse. I appreciate any help or guidance, and I’m happy to answer questions about the work or share raw evaluation outputs if needed.
Thank you for your time.