🔄 In a Training Loop

Changdae Oh

changdae

·

https://changdaeoh.github.io/

AI & ML interests

Distribution Shift; Uncertainty Quantification; Reward Modeling

Recent Activity

authored a paper about 17 hours ago

Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

authored a paper about 17 hours ago

Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

upvoted a collection 1 day ago

WTF GENIUS PAPERS

View all activity

Organizations

upvoted a collection 1 day ago

WTF GENIUS PAPERS

Papers that made me appreciate my major and my life a little more. obs=Observation, innov=Innovation. Most papers are abt improving tiny models. • 180 items • Updated 2 days ago • 42

upvoted a collection 2 days ago

Ministral 3 - Additional Checkpoints

Different formats and Quantized versions of our Ministral 3 family; 14B/8B/3B Instruct/Reasoning GGUF, 3B Instruct ONNX and 14B/8B/3B Instruct BF16. • 10 items • Updated Mar 2 • 30

upvoted a paper 3 days ago

Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

Paper • 2606.26080 • Published 6 days ago • 9

upvoted a paper 22 days ago

TAPE: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents

Paper • 2602.19633 • Published Feb 23 • 9

upvoted a paper 29 days ago

Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

Paper • 2605.30834 • Published May 29 • 10

upvoted 3 collections about 2 months ago

Mistral Small 4

A state-of-the-art model, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills. • 3 items • Updated Mar 16 • 75

Ministral 3

A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated Dec 2, 2025 • 169

Mistral Medium 3.5

Our first flaship models handling instruction-following, reasoning, and coding in a single set of opened-weights. • 2 items • Updated Apr 29 • 19

upvoted a paper about 2 months ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Paper • 2605.09063 • Published May 9 • 82

upvoted a paper 2 months ago

Exploration and Exploitation Errors Are Measurable for Language Model Agents

Paper • 2604.13151 • Published Apr 14 • 25

upvoted an article 3 months ago

Article

Welcome Gemma 4: Frontier multimodal intelligence on device

+5

merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift

•

Apr 2

• 910

upvoted 5 collections 3 months ago

Nemotron-Post-Training-v3

Collection of datasets used in the post-training phase of Nemotron Nano, Super, and Ultra v3. • 50 items • Updated 18 days ago • 168

NVIDIA Nemotron v3

Open, Production-ready Enterprise Models • 23 items • Updated 18 days ago • 330

Olmo 3.1

The latest members of the Olmo 3 family: another 3 weeks of RL for 32B Think, the 32B Instruct model, large post-training research datasets... • 9 items • Updated Dec 23, 2025 • 54

Olmo 3 Post-training

All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them. • 32 items • Updated Dec 23, 2025 • 56

Olmo 3

Artifacts for the Olmo 3 release. • 7 items • Updated Mar 2 • 171

upvoted an article 4 months ago

Article

A Guide to Reinforcement Learning Post-Training for LLMs: PPO, DPO, GRPO, and Beyond

karina-zadorozhny

•

Jan 19

• 33

upvoted a paper 4 months ago

UniGame: Turning a Unified Multimodal Model Into Its Own Adversary

Paper • 2511.19413 • Published Nov 24, 2025 • 21

upvoted 2 papers 5 months ago

On Randomness in Agentic Evals

Paper • 2602.07150 • Published Feb 6 • 2

Reliable and Responsible Foundation Models: A Comprehensive Survey

Paper • 2602.08145 • Published Feb 4 • 8