Content
# Voice MCP Server
This is an MCP server that allows voice conversations in Claude Code.
- **STT**: MLX Whisper (Apple Silicon optimized)
- **TTS**: Kokoro (Multilingual support: Japanese, English, Chinese, etc.)
- **VAD**: Silero VAD + RMS dual filter
> ⚠️ The current code is hardcoded for **Korean input → Japanese output**.
> To change to another language, refer to the [Language Change Guide](#언어-변경-가이드) section.
## Requirements
- macOS (Apple Silicon M1/M2/M3)
- Python 3.11 or higher
- Claude Code CLI
- Microphone (MacBook built-in or external)
## Installation
### 1. Clone Repository
```bash
git clone https://github.com/jeonghyeon-net/voice-mcp-demo.git
cd voice-mcp-demo
```
### 2. Install Python 3.11 (if not installed)
```bash
brew install python@3.11
```
### 3. Create and Activate Virtual Environment
```bash
python3.11 -m venv venv
source venv/bin/activate
```
### 4. Install Dependencies
```bash
pip install -r requirements.txt
```
### 5. Download Models
```bash
python setup_models.py
```
> ⚠️ **Required**: Must be executed before using Claude Code.
> Downloads Whisper, Kokoro TTS, and Silero VAD models in advance.
> Approximately 2-3GB will be downloaded on the first run.
### 6. Verify Installation
```bash
# VAD Test (Microphone Test)
python test_vad.py
```
Voice probability should be displayed when you speak.
## MCP Configuration
Create or modify the `~/.mcp.json` file:
```json
{
"mcpServers": {
"voice": {
"command": "/path/voice-mcp-demo/venv/bin/python",
"args": ["/path/voice-mcp-demo/voice_mcp.py"]
}
}
}
```
> Replace `/path/` with the actual path.
## Usage
In Claude Code:
```
> listen
```
Entering this will start voice recognition mode.
### Tools
| Tool | Description |
|------|------|
| `listen()` | Listen to voice via microphone (Korean) |
| `speak(text)` | Respond with Japanese TTS |
| `listen_fixed(duration)` | Record for a fixed duration |
### Flow
1. Enter `listen` → Speak after the beep
2. Claude responds in Japanese (`speak`)
3. Continue or end the conversation
### Verify MCP Server Registration
After running Claude Code, enter `/mcp`:
```
> /mcp
✓ voice (connected)
```
If `voice` is in the connected state, it is ready.
### Conversation Example
```
> listen
⏺ voice - listen (MCP)
⎿ { "result": "[사용자]: 안녕하세요\n\n⚠️ ..." }
⏺ voice - speak (MCP)(text: "こんにちは!何かお手伝いできますか?")
⎿ { "result": "→ listen() 호출하세요" }
⏺ voice - listen (MCP)
⎿ { "result": "[사용자]: 오늘 날씨 어때?\n\n⚠️ ..." }
...
```
### End Voice Conversation
- Say "끝", "바이바이", "고마워", etc., and Claude will end the conversation
- Or force quit with Ctrl+C
- Timeout (automatically ends if there is no speech for 5 minutes)
### Tips
- **On the first run**, it takes time to load the model (TTS announces "初期化中")
- **When speaking**, wait about 0.5 seconds after the beep before speaking
- **When finishing speaking**, remain silent for about 1.5 seconds to start recognition
- **After Claude responds**, it automatically returns to listening mode (you may need to manually enter listen)
## Configuration Values
Adjustable in `voice_mcp.py`:
| Setting | Default Value | Description |
|------|--------|------|
| `VAD_THRESHOLD` | 0.85 | Voice detection threshold |
| `RMS_THRESHOLD` | 0.02 | Volume threshold |
| `SILENCE_DURATION` | 1.5 seconds | Time after silence to end |
| `timeout_seconds` | 300 seconds | Maximum waiting time |
## Language Change Guide
The default is **Korean input → Japanese output**. To change to another language:
### Change to English TTS
Modify `voice_mcp.py`:
```python
# 1. Change TTS language code (get_tts function)
_tts = KPipeline(lang_code='a', repo_id='hexgrad/Kokoro-82M')
# 'a' = US English, 'b' = UK English, 'j' = Japanese
# 'k' = Korean, 'z' = Chinese, 'f' = French, etc.
# 2. Change voice (default value of speak function)
def speak(text: str, voice: str = "af_heart", speed: float = 1.0) -> str:
# English voices: af_heart, af_bella, am_adam, am_michael, etc.
```
### Kokoro Supported Languages
| Code | Language |
|------|------|
| `a` | US English |
| `b` | UK English |
| `j` | Japanese |
| `z` | Chinese |
| `f` | French |
| `e` | Spanish |
| `i` | Italian |
| `p` | Portuguese |
| `h` | Hindi |
> **Note**: Kokoro 82M does not support Korean TTS. If you need Korean voice output, use another TTS engine (Edge TTS, Google TTS, etc.).
### English Voice List
| Voice | Description |
|------|------|
| `af_heart` | US Female (Recommended) |
| `af_bella` | US Female |
| `af_sarah` | US Female |
| `am_adam` | US Male |
| `am_michael` | US Male |
| `bf_emma` | UK Female |
| `bm_george` | UK Male |
### Change speak() Prompt
Modify the docstring of the `speak` function so that Claude responds in English:
```python
@mcp.tool()
def speak(text: str, voice: str = "af_heart", speed: float = 1.0) -> str:
"""
Speak in English.
⚠️ Text must be in English only!
Args:
text: English text
voice: Voice
speed: Speed
Returns:
Playback complete
"""
```
### Change Input Language
Change the default language parameter of the `listen()` function:
```python
def listen(timeout_seconds: int = 300, language: str = "en") -> str:
# "ko" = Korean, "en" = English, "ja" = Japanese
```
### Example of Full English Configuration
```python
# get_tts()
_tts = KPipeline(lang_code='a', repo_id='hexgrad/Kokoro-82M')
# listen()
def listen(timeout_seconds: int = 300, language: str = "en") -> str:
# speak()
def speak(text: str, voice: str = "af_heart", speed: float = 1.0) -> str:
"""Speak in English. Text must be in English only!"""
```
## How It Works
### Overall Flow
```
[User] --speaks--> [Microphone] --audio--> [Silero VAD] --voice section--> [Whisper] --text--> [Claude]
|
[User] <--listens-- [Speaker] <--audio-- [Kokoro TTS] <--text------------------------------+
```
### 1. Voice Activity Detection (VAD)
```python
# Silero VAD calculates voice probability (0.0 ~ 1.0)
speech_prob = vad_model(chunk_tensor, SAMPLE_RATE)
# Check RMS (volume) together to filter background noise
rms = np.sqrt(np.mean(chunk ** 2))
# Recognize as voice only if both exceed the threshold
is_voice = speech_prob > 0.85 and rms > 0.02
```
### 2. Speech-to-Text (STT)
```python
# MLX Whisper - Apple Silicon GPU acceleration
result = mlx_whisper.transcribe(
audio_data,
path_or_hf_repo="mlx-community/whisper-medium-mlx",
language="ko" # Korean
)
```
### 3. Text-to-Speech (TTS)
```python
# Kokoro TTS - Japanese voice generation
tts = KPipeline(lang_code='j', repo_id='hexgrad/Kokoro-82M')
for _, _, audio in tts(text, voice='jf_alpha', speed=1.0):
sd.play(audio, 24000)
```
### MCP Communication
```
Claude Code <--stdio--> voice_mcp.py (FastMCP Server)
|
├── listen() # Tool 1
├── speak() # Tool 2
└── listen_fixed() # Tool 3
```
MCP (Model Context Protocol) is a protocol that allows Claude Code to call external tools. By registering a server in `~/.mcp.json`, Claude can use those tools.
## Project Structure
```
voice-mcp-demo/
├── voice_mcp.py # MCP server main
├── setup_models.py # Model pre-download
├── echo.py # Standalone version (Ollama integration)
├── run.sh # Script to run echo.py
├── test_vad.py # VAD test tool
├── requirements.txt # Dependencies
└── README.md
```
## Test
```bash
# VAD Test
./venv/bin/python test_vad.py
# Standalone Execution (echo.py)
./run.sh
```
## Troubleshooting
### Voice Not Recognized
- Check microphone permissions
- Lower RMS_THRESHOLD (0.01)
### Reacts to Background Noise
- Increase VAD_THRESHOLD (0.9)
- Increase RMS_THRESHOLD (0.03)
### MCP Connection Failed
- Check Python path
- Check syntax with `python -m py_compile voice_mcp.py`
## License
MIT
MCP Config
Below is the configuration for this MCP Server. You can copy it directly to Cursor or other MCP clients.
mcp.json
Connection Info
You Might Also Like
OpenAI Whisper
OpenAI Whisper MCP Server - 基于本地 Whisper CLI 的离线语音识别与翻译,无需 API Key,支持...
markitdown
Python tool for converting files and office documents to Markdown.
oh-my-opencode
Background agents · Curated agents like oracle, librarians, frontend...
chatbox
User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)
continue
Continue is an open-source project for seamless server management.
claude-flow
Claude-Flow v2.7.0 is an enterprise AI orchestration platform.