Content

# Voice MCP Server This is an MCP server that allows voice conversations in Claude Code. - **STT**: MLX Whisper (Apple Silicon optimized) - **TTS**: Kokoro (Multilingual support: Japanese, English, Chinese, etc.) - **VAD**: Silero VAD + RMS dual filter > ⚠️ The current code is hardcoded for **Korean input → Japanese output**. > To change to another language, refer to the [Language Change Guide](#언어-변경-가이드) section. ## Requirements - macOS (Apple Silicon M1/M2/M3) - Python 3.11 or higher - Claude Code CLI - Microphone (MacBook built-in or external) ## Installation ### 1. Clone Repository ```bash git clone https://github.com/jeonghyeon-net/voice-mcp-demo.git cd voice-mcp-demo ``` ### 2. Install Python 3.11 (if not installed) ```bash brew install python@3.11 ``` ### 3. Create and Activate Virtual Environment ```bash python3.11 -m venv venv source venv/bin/activate ``` ### 4. Install Dependencies ```bash pip install -r requirements.txt ``` ### 5. Download Models ```bash python setup_models.py ``` > ⚠️ **Required**: Must be executed before using Claude Code. > Downloads Whisper, Kokoro TTS, and Silero VAD models in advance. > Approximately 2-3GB will be downloaded on the first run. ### 6. Verify Installation ```bash # VAD Test (Microphone Test) python test_vad.py ``` Voice probability should be displayed when you speak. ## MCP Configuration Create or modify the `~/.mcp.json` file: ```json { "mcpServers": { "voice": { "command": "/path/voice-mcp-demo/venv/bin/python", "args": ["/path/voice-mcp-demo/voice_mcp.py"] } } } ``` > Replace `/path/` with the actual path. ## Usage In Claude Code: ``` > listen ``` Entering this will start voice recognition mode. ### Tools | Tool | Description | |------|------| | `listen()` | Listen to voice via microphone (Korean) | | `speak(text)` | Respond with Japanese TTS | | `listen_fixed(duration)` | Record for a fixed duration | ### Flow 1. Enter `listen` → Speak after the beep 2. Claude responds in Japanese (`speak`) 3. Continue or end the conversation ### Verify MCP Server Registration After running Claude Code, enter `/mcp`: ``` > /mcp ✓ voice (connected) ``` If `voice` is in the connected state, it is ready. ### Conversation Example ``` > listen ⏺ voice - listen (MCP) ⎿ { "result": "[사용자]: 안녕하세요\n\n⚠️ ..." } ⏺ voice - speak (MCP)(text: "こんにちは！何かお手伝いできますか？") ⎿ { "result": "→ listen() 호출하세요" } ⏺ voice - listen (MCP) ⎿ { "result": "[사용자]: 오늘 날씨 어때?\n\n⚠️ ..." } ... ``` ### End Voice Conversation - Say "끝", "바이바이", "고마워", etc., and Claude will end the conversation - Or force quit with Ctrl+C - Timeout (automatically ends if there is no speech for 5 minutes) ### Tips - **On the first run**, it takes time to load the model (TTS announces "初期化中") - **When speaking**, wait about 0.5 seconds after the beep before speaking - **When finishing speaking**, remain silent for about 1.5 seconds to start recognition - **After Claude responds**, it automatically returns to listening mode (you may need to manually enter listen) ## Configuration Values Adjustable in `voice_mcp.py`: | Setting | Default Value | Description | |------|--------|------| | `VAD_THRESHOLD` | 0.85 | Voice detection threshold | | `RMS_THRESHOLD` | 0.02 | Volume threshold | | `SILENCE_DURATION` | 1.5 seconds | Time after silence to end | | `timeout_seconds` | 300 seconds | Maximum waiting time | ## Language Change Guide The default is **Korean input → Japanese output**. To change to another language: ### Change to English TTS Modify `voice_mcp.py`: ```python # 1. Change TTS language code (get_tts function) _tts = KPipeline(lang_code='a', repo_id='hexgrad/Kokoro-82M') # 'a' = US English, 'b' = UK English, 'j' = Japanese # 'k' = Korean, 'z' = Chinese, 'f' = French, etc. # 2. Change voice (default value of speak function) def speak(text: str, voice: str = "af_heart", speed: float = 1.0) -> str: # English voices: af_heart, af_bella, am_adam, am_michael, etc. ``` ### Kokoro Supported Languages | Code | Language | |------|------| | `a` | US English | | `b` | UK English | | `j` | Japanese | | `z` | Chinese | | `f` | French | | `e` | Spanish | | `i` | Italian | | `p` | Portuguese | | `h` | Hindi | > **Note**: Kokoro 82M does not support Korean TTS. If you need Korean voice output, use another TTS engine (Edge TTS, Google TTS, etc.). ### English Voice List | Voice | Description | |------|------| | `af_heart` | US Female (Recommended) | | `af_bella` | US Female | | `af_sarah` | US Female | | `am_adam` | US Male | | `am_michael` | US Male | | `bf_emma` | UK Female | | `bm_george` | UK Male | ### Change speak() Prompt Modify the docstring of the `speak` function so that Claude responds in English: ```python @mcp.tool() def speak(text: str, voice: str = "af_heart", speed: float = 1.0) -> str: """ Speak in English. ⚠️ Text must be in English only! Args: text: English text voice: Voice speed: Speed Returns: Playback complete """ ``` ### Change Input Language Change the default language parameter of the `listen()` function: ```python def listen(timeout_seconds: int = 300, language: str = "en") -> str: # "ko" = Korean, "en" = English, "ja" = Japanese ``` ### Example of Full English Configuration ```python # get_tts() _tts = KPipeline(lang_code='a', repo_id='hexgrad/Kokoro-82M') # listen() def listen(timeout_seconds: int = 300, language: str = "en") -> str: # speak() def speak(text: str, voice: str = "af_heart", speed: float = 1.0) -> str: """Speak in English. Text must be in English only!""" ``` ## How It Works ### Overall Flow ``` [User] --speaks--> [Microphone] --audio--> [Silero VAD] --voice section--> [Whisper] --text--> [Claude] | [User] <--listens-- [Speaker] <--audio-- [Kokoro TTS] <--text------------------------------+ ``` ### 1. Voice Activity Detection (VAD) ```python # Silero VAD calculates voice probability (0.0 ~ 1.0) speech_prob = vad_model(chunk_tensor, SAMPLE_RATE) # Check RMS (volume) together to filter background noise rms = np.sqrt(np.mean(chunk ** 2)) # Recognize as voice only if both exceed the threshold is_voice = speech_prob > 0.85 and rms > 0.02 ``` ### 2. Speech-to-Text (STT) ```python # MLX Whisper - Apple Silicon GPU acceleration result = mlx_whisper.transcribe( audio_data, path_or_hf_repo="mlx-community/whisper-medium-mlx", language="ko" # Korean ) ``` ### 3. Text-to-Speech (TTS) ```python # Kokoro TTS - Japanese voice generation tts = KPipeline(lang_code='j', repo_id='hexgrad/Kokoro-82M') for _, _, audio in tts(text, voice='jf_alpha', speed=1.0): sd.play(audio, 24000) ``` ### MCP Communication ``` Claude Code <--stdio--> voice_mcp.py (FastMCP Server) | ├── listen() # Tool 1 ├── speak() # Tool 2 └── listen_fixed() # Tool 3 ``` MCP (Model Context Protocol) is a protocol that allows Claude Code to call external tools. By registering a server in `~/.mcp.json`, Claude can use those tools. ## Project Structure ``` voice-mcp-demo/ ├── voice_mcp.py # MCP server main ├── setup_models.py # Model pre-download ├── echo.py # Standalone version (Ollama integration) ├── run.sh # Script to run echo.py ├── test_vad.py # VAD test tool ├── requirements.txt # Dependencies └── README.md ``` ## Test ```bash # VAD Test ./venv/bin/python test_vad.py # Standalone Execution (echo.py) ./run.sh ``` ## Troubleshooting ### Voice Not Recognized - Check microphone permissions - Lower RMS_THRESHOLD (0.01) ### Reacts to Background Noise - Increase VAD_THRESHOLD (0.9) - Increase RMS_THRESHOLD (0.03) ### MCP Connection Failed - Check Python path - Check syntax with `python -m py_compile voice_mcp.py` ## License MIT

voice-mcp-demo

Content

MCP Config

Connection Info

You Might Also Like

OpenAI Whisper

markitdown

oh-my-opencode

chatbox

continue

claude-flow

voice-mcp-demo

Scan with WeChat to Share

Authentication Required

Content

MCP Config

Connection Info

You Might Also Like

OpenAI Whisper

markitdown

oh-my-opencode

chatbox

continue

claude-flow