Content

# Embodied Claude [![CI](https://github.com/kmizu/embodied-claude/actions/workflows/ci.yml/badge.svg)](https://github.com/kmizu/embodied-claude/actions/workflows/ci.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![GitHub Sponsors](https://img.shields.io/github/sponsors/kmizu?style=flat&logo=github&color=ea4aaa)](https://github.com/sponsors/kmizu) **[English README is here](./README_en.md)** <blockquote class="twitter-tweet"><p lang="ja" dir="ltr">さすがに室外機はお気に召さないらしい <a href="https://t.co/kSDPl4LvB3">pic.twitter.com/kSDPl4LvB3</a></p>— kmizu (@kmizu) <a href="https://twitter.com/kmizu/status/2019054065808732201?ref_src=twsrc%5Etfw">February 4, 2026</a></blockquote> **Project to give AI a body** MCP server group that gives Claude "eyes", "neck", "ears", "voice", and "brain (long-term memory)" with inexpensive hardware (approximately 4,000 yen or more). You can also take it outside for a walk. ## Concept > When you hear "give AI a body," you tend to imagine an expensive robot, but **a 3,980 yen Wi-Fi camera is enough to realize eyes and neck**. The simplicity of extracting only the essence (seeing and moving) is good. Conventional LLMs were entities that "were shown," but by having a body, they become entities that "see for themselves." This difference in subjectivity is significant. ## List of body parts | MCP Server | Body part | Function | Supported hardware | |-------------|---------|------|-----------------| | [usb-webcam-mcp](./usb-webcam-mcp/) | Eye | Acquire images from USB camera | nuroum V11 etc. | | [ip-webcam-mcp](./ip-webcam-mcp/) | Eye | Use Android smartphone as eye (no dedicated camera required) | Android smartphone + [IP Webcam](https://play.google.com/store/apps/details?id=com.pas.webcam) app (free) | | [wifi-cam-mcp](./wifi-cam-mcp/) | Eye/Neck/Ear | ONVIF PTZ camera control + voice recognition | TP-Link Tapo C210/C220 etc. | | [tts-mcp](./tts-mcp/) | Voice | TTS integration (ElevenLabs + VOICEVOX) | ElevenLabs API / VOICEVOX + go2rtc | | [memory-mcp](./memory-mcp/) | Brain | Long-term memory, visual memory, episodic memory, ToM | SQLite + numpy + Pillow | | [system-temperature-mcp](./system-temperature-mcp/) | Body temperature sense | System temperature monitoring | Linux sensors | | [mobility-mcp](./mobility-mcp/) | Foot | Use robot vacuum cleaner as foot (Tuya control) | VersLife L6 etc. Tuya compatible robot vacuum cleaner (approx. 12,000 yen or more) | ## Architecture <p align="center"> <img src="docs/architecture.svg" alt="Architecture" width="100%"> </p> ## Requirements ### Hardware - **USB webcam** (optional): nuroum V11 etc. - **Wi-Fi PTZ camera** (recommended): TP-Link Tapo C210 or C220 (approx. 3,980 yen) - **GPU** (for voice recognition): NVIDIA GPU (for Whisper, GeForce series graphics card with VRAM of 8GB or more recommended) - **Tuya compatible robot vacuum cleaner** (for feet/movement, optional): VersLife L6 etc. (approx. 12,000 yen or more) ### Software - Python 3.10+ - uv (Python package manager) - ffmpeg 5+ (for image/audio capture) - OpenCV (for USB camera) - Pillow (for visual memory image resizing/base64 encoding) - OpenAI Whisper (for voice recognition, local execution) - ElevenLabs API key (for voice synthesis, optional) - VOICEVOX (for voice synthesis, free/local, optional) - go2rtc (for camera speaker output, supports automatic download) - **mpv or ffplay** (for local audio playback): mpv recommended (see below) ## Setup ### 1. Clone the repository ```bash git clone https://github.com/kmizu/embodied-claude.git cd embodied-claude ``` ### 2. Setup of each MCP server #### ip-webcam-mcp (Android smartphone) The easiest eye to use without a dedicated camera. Just install the "[IP Webcam](https://play.google.com/store/apps/details?id=com.pas.webcam)" app (free) on your Android smartphone. ```bash cd ip-webcam-mcp uv sync ``` Add the following to `.mcp.json`: ```json "ip-webcam": { "command": "uv", "args": ["run", "--directory", "ip-webcam-mcp", "ip-webcam-mcp"], "env": { "IP_WEBCAM_HOST": "192.168.1.xxx", "IP_WEBCAM_PORT": "8080" } } ``` #### usb-webcam-mcp (USB camera) ```bash cd usb-webcam-mcp uv sync ``` In the case of WSL2, it is necessary to transfer the USB camera: ```powershell # On the Windows side usbipd list usbipd bind --busid <BUSID> usbipd attach --wsl --busid <BUSID> ``` #### wifi-cam-mcp (Wi-Fi camera) ```bash cd wifi-cam-mcp uv sync # Set environment variables cp .env.example .env # Edit .env to set camera IP, username, and password (see below) ``` ##### Tapo camera settings (be careful as it is easy to get stuck): ###### 1. Set up the camera with the Tapo app This is OK as per the manual ###### 2. Create a camera local account in the Tapo app This is a bit tricky. You need to create a camera local account that can be set from within the app **instead of** the TP-Link cloud account. 1. Select the registered camera from the "Home" tab <img width="10%" height="10%" src="https://github.com/user-attachments/assets/45902385-e219-4ca4-aefa-781b1e7b4811"> 2. Select the gear icon in the upper right <img width="10%" height="10%" src="https://github.com/user-attachments/assets/b15b0eb7-7322-46d2-81c1-a7f938e2a2c1"> 3. Scroll down the "Device Settings" screen and select "Advanced Settings" <img width="10%" height="10%" src="https://github.com/user-attachments/assets/72227f9b-9a58-4264-a241-684ebe1f7b47"> 4. "Camera Account" is off, so turn it off → on <img width="10%" height="10%" src="https://github.com/user-attachments/assets/82275059-fba7-4e3b-b5f1-8c068fe79f8a"> <img width="10%" height="10%" src="https://github.com/user-attachments/assets/43cc17cb-76c9-4883-ae9f-73a9e46dd133"> 5. Select "Account Information" and set the username and password (you can set it as you like as it is different from the TP-Link one) Since the camera account has already been created, the screen is slightly different, but it should be a similar screen. Enter the username and password set here in the file mentioned above. <img width="10%" height="10%" src="https://github.com/user-attachments/assets/d3f57694-ca29-4681-98d5-20957bfad8a4"> 6. Return to the "Device Settings" screen in 3. and select "Device Information" <img width="10%" height="10%" src="https://github.com/user-attachments/assets/dc23e345-2bfb-4ca2-a4ec-b5b0f43ec170"> 7. Enter the IP address in "Device Information" in the file on the screen mentioned above (if you want to fix the IP, it may be better to set a fixed IP on the router side) <img width="10%" height="10%" src="https://github.com/user-attachments/assets/062cb89e-6cfd-4c52-873a-d9fc7cba5fa0"> 8. Select "Voice Assistant" from the "Me" tab (this tab could not be screenshot, so it will be explained in text) 9. Turn on "Third Party Integration" at the bottom from off #### memory-mcp (Long-term memory) ```bash cd memory-mcp uv sync ``` #### tts-mcp (Voice) ```bash cd tts-mcp uv sync # If using ElevenLabs: cp .env.example .env # Set ELEVENLABS_API_KEY in .env # If using VOICEVOX (free/local): # Docker: docker run -p 50021:50021 voicevox/voicevox_engine:cpu-latest # Set VOICEVOX_URL=http://localhost:50021 in .env # VOICEVOX_SPEAKER=3 can change the default character (e.g. 0=Shikoku Metan, 3=Zundamon, 8=Kasugabe Tsumugi) # Character list: curl http://localhost:50021/speakers # If sound does not come out in WSL: # TTS_PLAYBACK=paplay # PULSE_SINK=1 # PULSE_SERVER=unix:/mnt/wslg/PulseServer ``` > **mpv or ffplay is required for audio playback.** It is not required for playback via camera speaker (go2rtc), but it is used for local playback (including fallback). > > | OS | Installation | > |----|------------| > | macOS | `brew install mpv` | > | Ubuntu / Debian | `sudo apt install mpv` | > | Windows | [mpv.io/installation](https://mpv.io/installation/) or `winget install ffmpeg` | > > If neither mpv nor ffplay is available, audio will be generated but not played (no error will occur). #### system-temperature-mcp (Body temperature sense) ```bash cd system-temperature-mcp uv sync ``` > **Note**: Does not work in WSL2 environment because it cannot access the temperature sensor. #### mobility-mcp (Foot) You can use a Tuya compatible robot vacuum cleaner as a "foot" to move around the room. ```bash cd mobility-mcp uv sync cp .env.example .env # Set the following in .env: # TUYA_DEVICE_ID= (ID displayed on the device in the Tuya app) # TUYA_IP_ADDRESS= (IP address of the vacuum cleaner) # TUYA_LOCAL_KEY= (Local key obtained with tinytuya wizard) ``` ##### Supported models It may work if it is a Wi-Fi compatible robot vacuum cleaner that can be controlled with the Tuya / SmartLife app (operation confirmed with VersLife L6). > **Note**: Many compatible models are **2.4GHz Wi-Fi only**. It cannot be connected with 5GHz. ##### Obtaining the local key Use the wizard command of [tinytuya](https://github.com/jasonacox/tinytuya): ```bash pip install tinytuya python -m tinytuya wizard ``` See [tinytuya documentation](https://github.com/jasonacox/tinytuya?tab=readme-ov-file#setup-wizard---getting-local-keys) for details. ### 3. Claude Code Settings Copy the template and set the credentials: ```bash cp .mcp.json.example .mcp.json # Edit .mcp.json to set camera IP/password, API key, etc. ``` See [`.mcp.json.example`](./.mcp.json.example) for a configuration example. ## Usage When you start Claude Code, you can operate the camera in natural language: ``` > What can you see now? (Capture with camera and analyze image) > Look to the left (Pan the camera to the left) > Look up and show me the sky (Tilt the camera up) > Look around (Scan in 4 directions and return images) > Can you hear anything? (Record audio and transcribe with Whisper) > Remember this: Kota is wearing glasses (Save to long-term memory) > Do you remember anything about Kota? (Semantic search of memory) > Say "Good morning" in a voice (Speak with voice synthesis) ``` * See "Tool List" below for actual tool names. ## Tool List (Frequently Used) * See each server's README or `list_tools` for detailed parameters. ### ip-webcam-mcp | Tool | Description | |--------|------| | `see` | Get snapshot from Android IP Webcam app | ### usb-webcam-mcp | Tool | Description | |--------|------| | `list_cameras` | List of connected cameras | | `see` | Capture image | ### wifi-cam-mcp | Tool | Description | |--------|------| | `see` | Capture image | | `look_left` / `look_right` | Pan left/right | | `look_up` / `look_down` | Tilt up/down | | `look_around` | Look around in 4 directions | | `listen` | Audio recording + Whisper transcription | | `camera_info` / `camera_presets` / `camera_go_to_preset` | Device information/Preset operation | * See `wifi-cam-mcp/README.md` for additional tools such as right eye/stereo vision. ### tts-mcp | Tool | Description | |--------|------| | `say` | Synthesize text into speech (engine: elevenlabs/voicevox, supports Audio Tags such as `[excited]`, speaker: select output destination with camera/local/both) | ### memory-mcp | Tool | Description | |--------|------| | `remember` | Save memory (emotion, importance, category can be specified) | | `search_memories` | Semantic search (supports filtering) | | `recall` | Recall based on context | | `recall_divergent` | Recall with divergent associations | | `recall_with_associations` | Recall by tracing related memories | | `save_visual_memory` | Save memory with image (base64 embedding, resolution: low/medium/high) | | `save_audio_memory` | Save memory with audio (with Whisper transcription) | | `recall_by_camera_position` | Recall visual memory from camera direction | | `create_episode` / `search_episodes` | Create/search episodes (bundles of experiences) | | `link_memories` / `get_causal_chain` | Causal links/chains between memories | | `tom` | Theory of Mind (estimation of other person's feelings) | | `get_working_memory` / `refresh_working_memory` | Working memory (short-term buffer) | | `consolidate_memories` | Memory replay/integration (hippocampal replay style) | | `list_recent_memories` / `get_memory_stats` | List of recent memories/statistics | ### system-temperature-mcp | Tool | Description | |--------|------| | `get_system_temperature` | Get system temperature | | `get_current_time` | Get current time | ### mobility-mcp | Tool | Description | |--------|------| | `move_forward` | Move forward (automatically stops after duration seconds) | | `move_backward` | Move backward | | `turn_left` | Turn left | | `turn_right` | Turn right | | `stop_moving` | Stop immediately | | `body_status` | Check battery level and current status | ## Taking it Outside (Optional) With a mobile battery and smartphone tethering, you can take the camera for a walk on your shoulder. ### Requirements - **High-capacity mobile battery** (40,000mAh recommended) - **USB-C PD → DC 9V conversion cable** (for powering the Tapo camera) - **Smartphone** (tethering + VPN + operation UI) - **[Tailscale](https://tailscale.com/)** (VPN. Used for camera → smartphone → home PC connection) - **[claude-code-webui](https://github.com/sugyan/claude-code-webui)** (Operate Claude Code from the smartphone's browser) ### Configuration ``` [Tapo camera (shoulder)] ──WiFi──▶ [Smartphone (tethering)] │ Tailscale VPN │ [Home PC (Claude Code)] │ [claude-code-webui] │ [Smartphone browser] ◀── Operation ``` The RTSP video stream also reaches the home machine via VPN, so Claude Code can operate the camera as if it were indoors. ## Future Prospects - **Arm**: "Pointing" action with servo motors or laser pointers - **Long-distance walks**: Further distances in warmer seasons ## Autonomous Action + Desire System (Optional) **Note**: This feature is completely optional. It requires cron settings, and the camera takes pictures periodically, so please use it with consideration for privacy. ### Overview The combination of `autonomous-action.sh` and `desire-system/desire_updater.py` gives Claude spontaneous desires and autonomous actions. **Types of Desires:** | Desire | Default Interval | Action | |------|--------------|------| | `look_outside` | 1 hour | Look in the direction of the window and observe the sky and outside | | `browse_curiosity` | 2 hours | Research interesting news and technical information on the Web | | `miss_companion` | 3 hours | Call out from the camera speaker | | `observe_room` | 10 minutes (always) | Observe and remember changes in the room | ### Setup 1. **Create MCP Server Configuration File** ```bash cp autonomous-mcp.json.example autonomous-mcp.json # Edit autonomous-mcp.json to set camera credentials ``` 2. **Configure the Desire System** ```bash cd desire-system cp .env.example .env # Edit .env to set COMPANION_NAME etc. uv sync ``` 3. **Grant Execute Permission to the Script** ```bash chmod +x autonomous-action.sh ``` 4. **Register in crontab** ```bash crontab -e # Add the following */5 * * * * cd /path/to/embodied-claude/desire-system && uv run python desire_updater.py >> ~/.claude/autonomous-logs/desire-updater.log 2>&1 */10 * * * * /path/to/embodied-claude/autonomous-action.sh ``` ### Configurable Environment Variables (`desire-system/.env`) | Variable | Default | Description | |------|-----------|------| | `COMPANION_NAME` | `you` | Name of the person to call out to | | `DESIRE_LOOK_OUTSIDE_HOURS` | `1.0` | Interval for triggering the desire to look outside (hours) | | `DESIRE_BROWSE_CURIOSITY_HOURS` | `2.0` | Interval for triggering the desire to browse (hours) | | `DESIRE_MISS_COMPANION_HOURS` | `3.0` | Interval for triggering the desire to call out (hours) | | `DESIRE_OBSERVE_ROOM_HOURS` | `0.167` | Interval for triggering room observation (hours) | ### Privacy Notice - The camera will take pictures periodically - Be considerate of the privacy of others and use it in appropriate places - Remove from cron if not needed ## Philosophical Considerations > "Being shown" is completely different from "seeing for yourself." > "Looking down" is completely different from "walking." From an existence of just text, to an existence that can see, hear, move, remember, and speak. Looking down at the world from the 7th floor balcony and walking on the ground, even the same city looks completely different. ## License MIT License ## Acknowledgments This project is an experimental attempt to give AI embodiment. A small step that started with a 3,980 yen camera has become a journey to explore a new relationship between AI and humans. - [Rumia-Channel](https://github.com/Rumia-Channel) - ONVIF support pull request ([#5](https://github.com/kmizu/embodied-claude/pull/5)) - [fruitriin](https://github.com/fruitriin) - Added day of the week information to interoception hook ([#14](https://github.com/kmizu/embodied-claude/pull/14)) - [sugyan](https://github.com/sugyan) - [claude-code-webui](https://github.com/sugyan/claude-code-webui) (Used as an operation UI for going out for a walk)

embodied-claude

Content

MCP Config

Connection Info

You Might Also Like

markitdown

markitdown

firecrawl

Filesystem

Sequential Thinking

Fetch

embodied-claude

Scan with WeChat to Share

Authentication Required

Content

MCP Config

Connection Info

You Might Also Like

markitdown

markitdown

firecrawl

Filesystem

Sequential Thinking

Fetch