C
ChaoBro

Open-LLM-VTuber: Build Your AI Virtual Streamer with Local LLM

Open-LLM-VTuber: Build Your AI Virtual Streamer with Local LLM

There are plenty of ways to chat with AI, but an open-source project that lets you talk to a Live2D avatar with actual voice? This is basically the only decent one.

Open-LLM-VTuber hit GitHub Trending today. 7,546 stars, 978 forks, 912 commits.

What it is

In one sentence: use any LLM as the backend, Live2D for the face, microphone for ears, speakers for mouth — build an AI virtual streamer that runs locally.

Core features:

  • Hands-free voice interaction: just talk, no button pressing
  • Voice interruption: cut in mid-conversation without waiting for it to finish
  • Cross-platform local: Windows, macOS, Linux all supported
  • Any OpenAI-compatible API: Ollama, LM Studio, cloud models all work

Architecture

The pipeline is a classic voice dialogue flow:

Microphone → ASR (Whisper) → LLM → TTS → Speakers
                                ↓
                          Live2D expression driver

ASR uses Whisper (sherpa-onnx supports multiple engines), LLM backend is compatible with all OpenAI-format APIs, TTS connects to various synthesis services.

Live2D converts text responses into expressions and lip-sync animations — this is the soul of the project. Without it, you just have a voice assistant. With it, your AI has a "face."

Activity level

912 commits, 19 tags, 88 open issues, 32 PRs. Not top-tier activity, but a steady maintenance rhythm.

Interestingly, the repo has .cursor/rules and .gemini directories — the developers are using AI to build this project too.

Use cases

  • Personal entertainment: an AI companion at home
  • Live streaming: 24/7 AI virtual streamer, auto-replying to chat
  • Content creation: AI-driven virtual character short videos
  • Language learning: practice speaking with an infinitely patient virtual character

Reality check

Running the full stack locally needs decent hardware — ASR, LLM inference, TTS, and Live2D rendering running simultaneously taxes both CPU and GPU. Using a cloud LLM API reduces the local burden, but then latency and privacy become different questions.

It is called "Open-LLM-VTuber," but honestly, it is far from Neuro-sama-level AI streamers. But Neuro-sama is closed-source and took extensive custom training. Open-LLM-VTuber gives you the infrastructure — build whatever you want on top of it.

Main sources: