Content
# 🚀 xiaozhi-esp32-server-golang
> **Xiaozhi AI Backend for ESP32 Devices**
---
## Project Overview
xiaozhi-esp32-server-golang is a high-performance, full-stream AI backend service designed for IoT and smart voice scenarios. The project is developed based on Go language, integrating core capabilities such as ASR (Automatic Speech Recognition), LLM (Large Language Model), and TTS (Text-to-Speech), supporting large-scale concurrency and multi-protocol access, facilitating AI voice interaction for smart terminals and edge devices.
---
## ✨ Key Features
- ⚡ **End-to-End Full-Stream AI Voice Link**: ASR → LLM → TTS full-process streaming processing, low-latency real-time interaction
- 🎙️ **Speaker Recognition and Dynamic TTS Switching**: Automatically switch TTS tone based on speaker identity, personalized voice experience
- 🔌 **Transport Interface Layer Abstraction**: WebSocket / MQTT UDP unified abstraction, flexible injection of main logic, easy protocol expansion
- 📬 **Message Queue Processing**: LLM and TTS adopt message queue asynchronous processing, supporting flexible injection of business logic
- 🌐 **Multi-Protocol High Concurrency Access**: Support large-scale device concurrency access and message push
- ♻️ **Efficient Resource Pool and Connection Reuse**: External resource connection pool mechanism, reducing response latency, improving system throughput
- 🤖 **Multi-Engine AI Capability Integration**: Based on Eino framework, supporting multiple engines such as FunASR, OpenAI compatible, Ollama, Doubao, EdgeTTS, CosyVoice
- 🧩 **Modular Scalable Architecture**: VAD/ASR/LLM/TTS/MCP/Vision and other core modules are independently pluggable
- 🎵 **MCP Audio Server**: Audio resource paging acquisition and streaming processing, music playback and volume control
- 🦞 **OpenClaw Intelligent Agent Access**: Generate exclusive OpenClaw Endpoint based on intelligent agent, support connection status viewing, session testing, entry/exit keyword routing (default "Open Lobster/Enter Lobster" and "Close Lobster/Exit Lobster")
- 🖥️ **Full-Featured Web Management Console**: Visual configuration wizard, VAD/ASR/LLM/TTS full-link availability testing, device management and message injection, real-time latency monitoring and OTA verification
- 🧠 **Advanced Business Functions**: MCP market aggregation and import, voice cloning, knowledge base (Dify/RAGFlow/WeKnora), device/intelligent agent dimension MCP remote call debugging
- 📦 **Easy-to-Use One-Click Deployment Solution**: Pre-compiled aio package out-of-the-box (main program + console + speaker service), Docker one-click deployment, support Linux/Windows/macOS local compilation
- 🔐 **Security and Permission System** (planned): Reserved user authentication and permission management interface
---
[deepwiki Architecture Analysis](https://deepwiki.com/hackers365/xiaozhi-esp32-server-golang)
## 🚀 Quick Start
### Method 1: One-Click Startup Package (Recommended)
Download the corresponding platform compression package, extract and run:
- **Release Page**: <https://github.com/hackers365/xiaozhi-esp32-server-golang/releases>
- **Usage Tutorial**: [doc/quickstart_bundle_tutorial.md](doc/quickstart_bundle_tutorial.md)
After startup, access **http://<Server IP or Domain>:8080** to enter the Web console for configuration.
### Method 2: Docker Deployment
- [Docker Compose (with console)](doc/docker_compose.md)
- [Docker (without console)](doc/docker.md)
### Method 3: Local Compilation
Suitable for development environment or scenarios requiring custom compilation.
**Install Dependencies** (taking Ubuntu as an example)
```bash
# Go 1.20+
# Opus codec
sudo apt-get install -y pkg-config libopus0 libopusfile-dev
# ONNX Runtime(1.21.0)
wget https://github.com/microsoft/onnxruntime/releases/download/v1.21.0/onnxruntime-linux-x64-1.21.0.tgz
tar -xzf onnxruntime-linux-x64-1.21.0.tgz
sudo cp -r onnxruntime-linux-x64-1.21.0/include/* /usr/local/include/onnxruntime/
sudo cp -r onnxruntime-linux-x64-1.21.0/lib/* /usr/local/lib/
sudo ldconfig
# ten_vad runtime dependencies
sudo apt install -y libc++1 libc++abi1
```
> 📖 For complete dependency instructions and Windows/macOS configuration, please refer to [config.md](doc/config.md)
Refer to [FunASR official documentation](https://github.com/modelscope/FunASR/blob/main/runtime/docs/SDK_advanced_guide_online_zh.md) for deployment.
**Compilation and Startup**
```bash
# Compilation
go build -o xiaozhi_server ./cmd/server/
# Startup (configuration file details see config/config.yaml)
./xiaozhi_server -c config/config.yaml
```
---
## 📚 Docs
### Deployment Related
- [One-Click Startup Package Tutorial](doc/quickstart_bundle_tutorial.md)
- [Docker Compose Deployment](doc/docker_compose.md)
- [Docker Deployment](doc/docker.md)
- [Configuration Details](doc/config.md)
### User Guide
- [Management Console User Guide](doc/manager_console_guide.md)
- [WebSocket Service and OTA Configuration](doc/websocket_server.md)
- [MQTT + UDP Configuration](doc/mqtt_udp.md)
- [MQTT UDP Protocol](doc/mqtt_udp_protocol.md)
### Functional Modules
- [Vision Capability](doc/vision.md)
- [Speaker Recognition](doc/speaker_identification.md)
- [MCP Architecture](doc/mcp.md)
- [MCP Audio Resources](doc/mcp_resource.md)
- [MCP Market (Market Discovery/Import/Hot Update)](doc/mcp_market.md)
- [OpenClaw Intelligent Agent Access (Endpoint/Keyword Routing/Session Testing)](doc/openclaw_integration.md)
- [Voice Cloning (User Operation and Administrator Quota)](doc/voice_clone.md)
- [Knowledge Base (Provider Configuration/Synchronization/Recall Testing/RAG)](doc/knowledge_base.md)
- [Device/Intelligent Agent Dimension MCP Remote Call (Endpoint/Tools/Call)](doc/mcp_remote_call_agent_device.md)
### Device Access
- [ESP32 Access Guide](doc/esp32_xiaozhi_backend_guide.md)
- [OTA MQTT Authorization Instructions](doc/ota_mqtt_auth.md)
---
## 🧩 Module Overview
| Module | Function Introduction | Technology Stack |
|------|----------|--------|
| VAD | Voice Activity Detection | Silero VAD / WebRTC VAD / ten_vad |
| ASR | Speech Recognition | FunASR / Doubao ASR |
| LLM | Large Model Inference | Eino framework compatible, OpenAI, Ollama, etc. |
| TTS | Text-to-Speech | Doubao / EdgeTTS / CosyVoice |
| MCP | Multi-Protocol Access, MCP Market Discovery Import, Device/Intelligent Agent Dimension Remote Call Debugging | MCP Server / Access Point / MCP Market / SSE / StreamableHTTP / WebSocket Controller / MCP Tool Call |
| OpenClaw | Intelligent Agent Dimension Access Point, Entry/Exit Keyword Mode Switching, Session Message Forwarding and Testing | OpenClaw WebSocket / Agent Endpoint / Chat Router |
| Vision | Vision Processing | Doubao / Alibaba Cloud Vision |
| Speaker Recognition | Speaker Identification | sherpa-onnx + Vector Database |
| Voice Cloning | User-Side Voice Clone Creation and Trial | Minimax / CosyVoice / Qianwen |
| Knowledge Base (RAG) | Document Synchronization, Recall Testing and Dialogue Retrieval | Dify / RAGFlow / WeKnora |
---
## 📈 Performance & Testing
- [Latency Test Report](doc/delay_test.md)
- Management console provides VAD/ASR/LLM/TTS availability and latency test entry
---
## 🛠️ Roadmap
- Establish long connection with device
- Proactive AI
---
## 🤝 Contributing
Welcome to submit Issue, PR or suggestions!
---
## 📄 License
MIT License
---
## 📬 Contact
**Exchange Group** (QR code expires, please contact author)

**Personal WeChat**:hackers365

---
> 2024 xiaozhi-esp32-server-golang
Connection Info
You Might Also Like
OpenAI Whisper
OpenAI Whisper MCP Server - 基于本地 Whisper CLI 的离线语音识别与翻译,无需 API Key,支持...
markitdown
Python tool for converting files and office documents to Markdown.
oh-my-opencode
Background agents · Curated agents like oracle, librarians, frontend...
chatbox
User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)
continue
Continue is an open-source project for seamless server management.
claude-flow
Claude-Flow v2.7.0 is an enterprise AI orchestration platform.