Yeah. Well, let’s look at it positively—if you find a good framework for you, it’ll save you time. That means you can spend more time improving the functionality.
When I looked into it before, I thought there are several solutions beyond strictly focusing on Neuro-sama’s personality (topic management skills?) or behavior as a streamer.
Depending on what you’re looking for, it’s often the case that someone has already built a framework for it.
Decision matrix
Legend:
5 = strongest fit, 1 = weakest fit.
These scores are my synthesis from the projects’ current docs/READMEs: realtime voice coverage, avatar support, memory/character tooling, deployment style, and how much glue code is still needed. (GitHub)
| Option | Closest to “AI streamer buddy” out of the box | Avatar/body included | Realtime voice strength | Character/memory strength | Local / self-host path | Managed / hosted path | How much custom engineering you still need | Best for |
|---|---|---|---|---|---|---|---|---|
| Open-LLM-VTuber | 5 | 5 | 4 | 3 | 5 | 2 | 3 | Closest open-source starting point |
| AIRI | 4 | 5 | 4 | 4 | 5 | 1 | 4 | Ambitious companion-style platform |
| ChatdollKit | 4 | 5 | 4 | 3 | 4 | 1 | 3 | Unity / VRM / 3D avatar builders |
| Pipecat | 2 | 1 | 5 | 2 | 4 | 2 | 4 | Python-first custom stacks |
| LiveKit Agents | 2 | 1 | 5 | 2 | 4 | 4 | 4 | Production-grade realtime backend |
| TEN Framework | 3 | 2 | 5 | 3 | 4 | 2 | 4 | Interruptible, full-duplex voice agents |
| Inworld Runtime | 4 | 3 | 4 | 5 | 2 | 5 | 2 | Managed character platform |
| Convai | 4 | 3 | 4 | 4 | 2 | 5 | 2 | Fast hosted character deployment |
| ElizaOS | 2 | 1 | 1 | 5 | 4 | 2 | 4 | Character brain / plugin layer |
Fast picks
Choose Open-LLM-VTuber if:
You want the closest thing to a self-hosted, open-source “talking on-screen companion” without assembling the whole stack yourself.
Choose AIRI if:
You want the most ambitious open-source “digital being” direction and are comfortable with a more evolving project.
Choose ChatdollKit if:
Your mental model starts with “I want a 3D character in Unity” rather than “I want a voice backend.”
Choose Pipecat or LiveKit Agents if:
You want to engineer the system cleanly yourself and treat avatar/persona as separate layers.
Choose Convai or Inworld Runtime if:
You want the fastest hosted path to a believable voiced character.
Choose ElizaOS if:
You mostly need a character brain, plugin system, and orchestration layer, then plan to pair it with another realtime/avatar stack.
Background on each option
1) Open-LLM-VTuber
This is the most direct open-source match for a Neuro-like setup because it already combines hands-free voice chat, interruption handling, a Live2D talking face, swappable LLM/ASR/TTS backends, offline-capable deployment on macOS/Linux/Windows, and configurable long-term memory via MemGPT. That means you start with a system that already thinks in terms of a talking character, not just a voice pipeline. (GitHub)
Best fit: solo builders who want the shortest path to “AI buddy on screen.”
Main tradeoff: you still need your own stream/community/game connectors if your workflow goes beyond the core companion loop. (GitHub)
2) AIRI
AIRI is one of the strongest “digital life” style projects right now. Its README shows browser and Discord audio input, client-side speech recognition, browser-local inference, VRM support, Live2D support, and even directions like Minecraft/Factorio play and chat integrations. In other words, it is trying to be more than a voice bot: it is aiming at an ongoing embodied companion platform. (GitHub)
Best fit: builders who want a broad companion/agent world and do not mind a project that is still maturing.
Main tradeoff: more moving parts, more experimental surface area, more setup risk than a narrower framework. (GitHub)
3) ChatdollKit
ChatdollKit is the cleanest pick if the center of gravity is a 3D avatar in Unity. Its current README emphasizes 3D model expression, autonomous facial expression/animation control, lip-sync, STT/TTS integration, dialog-state management, wakeword support, multiple LLM providers, and deployment across Unity-supported platforms including WebGL, VR, and AR. (GitHub)
Best fit: VTuber/avatar creators, Unity developers, VRM users.
Main tradeoff: it is much more avatar-engine-centric than “general AI streaming framework” centric. (GitHub)
4) Pipecat
Pipecat is one of the best Python-first bases for building your own Neuro-like system from scratch. Its docs position it as an open-source framework for voice and multimodal AI bots that can see, hear, and speak in real time, with orchestration for AI services, transports, and audio pipelines. It is especially good when you want to prototype quickly in Python and keep control over the stack. (docs.pipecat.ai)
Best fit: Python builders who want flexibility and rapid iteration.
Main tradeoff: you still need to choose and build the avatar layer, memory policy, and streamer-specific integrations yourself. (docs.pipecat.ai)
5) LiveKit Agents
LiveKit Agents is excellent when your biggest problem is not “how do I make a character,” but “how do I make realtime voice feel solid.” Its docs focus on STT→LLM→TTS pipelines, reliable turn detection, interruption handling, provider plugins, and the core mechanics of production-grade voice AI. (docs.livekit.io)
Best fit: teams or advanced builders who want a robust realtime core.
Main tradeoff: it is infrastructure-first, not VTuber-first. You add the body, persona, and content loop yourself. (docs.livekit.io)
6) TEN Framework
TEN sits between bare realtime plumbing and higher-level character systems. Its repo describes an open-source framework for real-time multimodal conversational AI, with an ecosystem that includes agent examples, VAD, and turn detection for full-duplex dialogue. That makes it especially attractive if you care a lot about interruption, overlap, and natural back-and-forth speech behavior. (GitHub)
Best fit: builders who want natural conversational flow and are comfortable assembling pieces.
Main tradeoff: still more framework than finished companion product. (GitHub)
7) Inworld Runtime
Inworld Runtime is a strong managed character platform. Its current docs describe it as an orchestration platform for sophisticated AI characters and voice agents, carrying forward capabilities like knowledge retrieval, safety checks, long-term memory, and expressive voice synthesis from its earlier character tooling. (docs.inworld.ai)
Best fit: teams that want believable characters with less low-level assembly.
Main tradeoff: less self-owned, less local-first, and more platform-shaped than the open-source stacks above. (docs.inworld.ai)
8) Convai
Convai is one of the fastest hosted routes to interactive characters for web/game experiences. Its current Web SDK docs emphasize fast hands-free interaction with real-time audio, text, optional video, character actions, and emotion signals, while its memory docs describe persistent session memory support. (docs.convai.com)
Best fit: people who want a cloud platform that already “thinks in characters.”
Main tradeoff: less local control and more dependence on the vendor’s model of character building. (docs.convai.com)
9) ElizaOS
ElizaOS is best understood as a brain and plugin layer, not a full Neuro-like frontend stack. Its repo describes it as an extensible platform for building and deploying AI-powered applications and game NPCs, and its plugin registry supports dynamic plugin loading for integrations like Discord, browser use, PDF/image/video processing, and local model support. (GitHub)
Best fit: builders who want character files, plugins, and orchestration, then plan to pair that with LiveKit, Pipecat, ChatdollKit, or a custom frontend.
Main tradeoff: realtime voice/avatar embodiment is not its main “out of the box” value. (GitHub)
The simplest decision rule
Pick one from this list depending on your starting point:
- Closest open-source “AI buddy on screen” → Open-LLM-VTuber (GitHub)
- Most ambitious open-source companion project → AIRI (GitHub)
- Best Unity / 3D avatar route → ChatdollKit (GitHub)
- Best Python custom route → Pipecat (docs.pipecat.ai)
- Best production realtime core → LiveKit Agents (docs.livekit.io)
- Best if interruption/full-duplex matters most → TEN Framework (GitHub)
- Best managed character platform → Inworld Runtime or Convai (docs.inworld.ai)
- Best “character brain” to pair with something else → ElizaOS (GitHub)
My recommendation by user type
Beginner who wants a Neuro-like prototype fast:
Start with Open-LLM-VTuber. (GitHub)
Open-source enthusiast who wants a broader long-term project:
Look at AIRI. (GitHub)
Unity / VRM / avatar creator:
Use ChatdollKit. (GitHub)
Python engineer:
Use Pipecat as the voice core, then add your own avatar layer. (docs.pipecat.ai)
Infra-minded or production-minded engineer:
Use LiveKit Agents. (docs.livekit.io)
You care most about natural turn-taking and interruption:
Evaluate TEN Framework. (GitHub)
You want hosted convenience over deep ownership:
Use Convai or Inworld Runtime. (docs.inworld.ai)
You want a personality/plugin system to combine with another stack:
Use ElizaOS as the brain layer. (GitHub)