ZeroGPU Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

yuntian-deng authored a paper 3 days ago

Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale

yuntian-deng authored a paper 3 days ago

DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities

yuntian-deng authored a paper 3 days ago

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

View all activity

LXT

authored 11 papers about 1 hour ago

The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA

Paper • 2509.16972 • Published Sep 21, 2025 • 2

LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation

Paper • 2510.11063 • Published Oct 13, 2025 • 1

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything

Paper • 2401.10228 • Published Jan 18, 2024

RecTok: Reconstruction Distillation along Rectified Flow

Paper • 2512.13421 • Published Dec 15, 2025 • 5

EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing

Paper • 2512.11715 • Published Dec 12, 2025

WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Paper • 2512.10958 • Published Dec 11, 2025 • 1

Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

Paper • 2512.16760 • Published Dec 18, 2025 • 15

DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation

Paper • 2412.03255 • Published Dec 4, 2024

SAMTok: Representing Any Mask with Two Words

Paper • 2601.16093 • Published Jan 22 • 44

Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models

Paper • 2602.01842 • Published Feb 2 • 3

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

Paper • 2312.07526 • Published Apr 8, 2024

zhiqiulin

posted an update 1 day ago

Post

🚀 VQAScore now supports text-to-video evaluation!

VQAScore scores how well a generated image or video matches a prompt by asking a VLM "does this show {prompt}?" and using P(Yes). It became a go-to evaluation metric and reward model for image generation (2M+ downloads), and we just added text-to-video support across 20+ VLMs (GPT, Gemini, Qwen). Free and open-source, and it keeps improving as VLMs improve.

💻 Code: https://github.com/linzhiqiu/t2v_metrics
📄 Paper: https://arxiv.org/abs/2404.01291
🧵 Launch thread + demo video: https://x.com/ZhiqiuLin/status/2064316582461841499

1 reply

sergiopaniego

posted an update 3 days ago

Post

3688

OpenEnv has a new home: github.com/huggingface/OpenEnv

Starting today, it's coordinated by a committee that includes Meta-PyTorch, Reflection, Unsloth, Modal, Prime Intellect, Nvidia, Mercor, Fleet AI, and Hugging Face

frontier labs train their models and their harnesses together. Claude knows Claude Code. GPT-5.5 knows Codex. that's not an accident, it's training. open-source models deserve the same magic, but pulling that off requires infrastructure that belongs to everyone, not one lab

OpenEnv is that layer. one api, any harness, any trainer, any environment

Rewards and training loops stay in TRL, Unsloth, wherever you already work. OpenEnv is the socket they all plug into

Get involved!

Full announcement: https://huggingface.co/blog/openenv-agentic-rl

sergiopaniego

posted an update 6 days ago

Post

223

Frontier agents are this good partly because the model was trained inside the very harness it ships with.

NVIDIA's new paper "Polar: Agentic RL on Any Harness at Scale" brings that recipe to the open: it turns coding harnesses like Codex, Claude Code, Qwen Code or Pi into RL training environments without touching their internals.

The core idea: every agent, however complex or closed, talks to a model through an API, so they put a proxy there. The harness runs exactly like in production while the proxy records prompts, sampled token ids and logprobs. Trajectories get rebuilt outside, token faithful, so gradients hit the exact tokens the policy sampled.

The gains are consistent across all four harnesses. Same Qwen3.5-4B, plain GRPO, evaluated on SWE-Bench Verified:

Codex 3.8 → 26.4 (+22.6)
Claude Code 29.8 → 34.6 (+4.8)
Qwen Code 34.6 → 35.2 (+0.6)
Pi 34.2 → 40.4 (+6.2)

The biggest gains appear on unfamiliar execution paths, Codex being the clearest case. The takeaway: you are not just training a model, you are training the model + harness system.

Two engineering pieces make it work at scale. Async worker pools isolate container boots (CPU), agent execution (GPU) and long tail test runs, so slow runtimes never block the GPUs. And prefix merging stitches hundreds of captured API calls back into contiguous traces: 5.4x faster trainer updates and rollout GPUs at 88% utilization.

It also doubles as an SFT data factory: 504 test verified agent traces from a 122B teacher, multi-turn conversations averaging 104 messages each, coming to the Hub under Apache 2.0 (release pending review).

Paper authors: Binfeng Xu, Hao Zhang, Shaokun Zhang, Songyang Han, Mingjie Liu, Jian Hu, Shizhe Diao, Zhenghui Jin, Yunheng Zou, Michael Demoret, Jan Kautz and Yi Dong.

> Paper: Polar: Agentic RL on Any Harness at Scale (2605.24220)
> Code: https://github.com/NVIDIA-NeMo/ProRL-Agent-Server
> Training data: NovaSky-AI/SkyRL-v0-293-data

sergiopaniego

posted an update 8 days ago

Post

198

The recording from our talk: "From Responses To Trajectories: Multi-Turn and Multi-Environment RL" from PyTorch Conf Europe is live!

@kashif and I covered the latest advances in multi-turn GRPO in TRL: trajectories, tool use, envs, and agentic post-training at scale

https://www.youtube.com/watch?v=rPBeXFntJSU

sergiopaniego

posted an update 8 days ago

Post

150

how do you sync a trillion parameter model every RL step without a shared cluster? we just wrote a blog about it, led by @aminediroHF

what I like the most is the way it proves you can use the Hub for basically everything 🧐 → trainer on one machine, vLLM in a HF Space, the wordle env in another HF Space and weights going through a Hub Bucket. no shared cluster, just HTTPS

it works because ~99% of bf16 weights don't change between RL steps so you only sync the diff. 1.2 GB to 25 MB of payload per step

https://huggingface.co/blog/delta-weight-sync

sergiopaniego

posted an update 9 days ago

Post

2304

most multi-turn RL loops have a silent bug: you decode the model's output to detect tool calls, then re-tokenize the conversation for the next turn. BPE isn't invertible, so decode then re-encode can land on different ids. gradient ends up on tokens the model never sampled. no crash, just quietly wrong math and broken training

@qgallouedec wrote a super educational blog on MITO (message-in, token-out) vs TITO (token-in, token-out) and how you might fix the problem above

go read it 🤓

https://huggingface.co/proxy/qgallouedec-tito.hf.space/

sergiopaniego

posted an update 10 days ago

Post

6239

new banger blog alert 🚨

@ariG23498 is starting a blog series about profiling in pytorch and part 1 just dropped

takes you from the simplest scenario to actually knowing what your gpu is doing. if you have never opened a profiler trace this is where you start

covers torch.profiler from scratch. reading tables and traces, overhead bound vs compute bound, the full dispatch chain from python to gpu kernels, and what torch.compile is actually fusing under the hood

find it here: https://huggingface.co/blog/torch-profiler

1 reply

sergiopaniego

posted an update 13 days ago

Post

177

If you have a github repo, you basically have an RL training environment

We're introducing Repo2RLEnv (built by @AdithyaSK ), a tool that mines PRs, commits, CVEs and turns them into verifiable sandboxed tasks with real reward signals, automatically

Outputs to Harbor spec so you can plug it straight into RL training or coding-agent eval

> repo: https://github.com/huggingface/Repo2RLEnv
> collection with envs: https://huggingface.co/collections/AdithyaSK/repo2rlenv-verifiable-rl-environments

sergiopaniego

posted an update 14 days ago

Post

240

periodic reminder 🧐

some HF blog posts use a special template, long-form, animated, super deep

I keep them all in one collection that gets updated every time a new one drops so you don't lose track

https://hf.co/collections/sergiopaniego/research-and-long-form-blog-posts

spot one missing? let me know

AI & ML interests

Recent Activity

Team members 748

zero-gpu-explorers's activity