SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research Paper • 2606.09730 • Published 7 days ago • 50
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 173
SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia Paper • 2606.03027 • Published 13 days ago • 1
GrepSeek: Training Search Agents for Direct Corpus Interaction Paper • 2605.29307 • Published 18 days ago • 106
MiniCPM RAG Suite Collection Embedding, re-ranking, generation -- the cornerstone of RAG. • 7 items • Updated 21 days ago • 18
MMTEB: Massive Multilingual Text Embedding Benchmark Paper • 2502.13595 • Published Feb 19, 2025 • 49
ToolOmni: Enabling Open-World Tool Use via Agentic learning with Proactive Retrieval and Grounded Execution Paper • 2604.13787 • Published Apr 15 • 2
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition Paper • 2605.08384 • Published May 8 • 11
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction Paper • 2605.05242 • Published May 3 • 124
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories Paper • 2605.04036 • Published May 5 • 69
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published May 6 • 102
WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments Paper • 2604.27776 • Published Apr 30 • 15