CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents Paper • 2606.22883 • Published 2 days ago • 27
CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents Paper • 2606.22883 • Published 2 days ago • 27
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 8 days ago • 203
Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields Paper • 2606.11042 • Published 15 days ago • 21
OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning Paper • 2606.08572 • Published 17 days ago • 14
CoVEBench: Can Video Editing Models Handle Complex Instructions? Paper • 2606.08415 • Published 17 days ago • 49
MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills? Paper • 2606.01993 • Published 23 days ago • 15
MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills? Paper • 2606.01993 • Published 23 days ago • 15
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 23 days ago • 55
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 23 days ago • 55
TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation Paper • 2606.02320 • Published 23 days ago • 14
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Paper • 2605.26244 • Published about 1 month ago • 38
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning Paper • 2605.22012 • Published May 21 • 46
Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos Paper • 2605.18984 • Published May 18 • 22
OProver: A Unified Framework for Agentic Formal Theorem Proving Paper • 2605.17283 • Published May 17 • 31
OProver: A Unified Framework for Agentic Formal Theorem Proving Paper • 2605.17283 • Published May 17 • 31
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution Paper • 2605.15301 • Published May 14 • 22
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution Paper • 2605.15301 • Published May 14 • 22
Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video Paper • 2605.15182 • Published May 14 • 39