Content

Based on MCP Intelligent Robotic Arm Control System * * * ### I. Project Overview * **Project Name:** Intelligent Robotic Arm Control System Based on Agentic MCP ![loading-ag-360](./imgs/78d29b96-d4ec-4b36-9ea5-32a7686512d6.png) * **Project Goals:** 1. Build a six-joint robotic arm including virtual simulation and physical objects. 2. Build an MCP server that allows external agents (various large language model Agents) to control the robotic arm through standardized interfaces. 3. The large language model can understand user natural language instructions, autonomously orchestrate, and translate them into specific operation sequences for the robotic arm. * **Project Background:** As robot technology and artificial intelligence develop rapidly, the naturalness and intelligence of human-computer interaction have become key bottlenecks. Traditional robot control methods (such as programming and teaching pendants) are highly specialized and inefficient. Large language models have shown great potential in understanding complex instructions and planning tasks. This project explores the use of mcp on embedded hardware through the case of a 6-joint robotic arm. * **Practical Problems Solved:** 1. **Lower the barrier to robot operation:** Enable non-professional users to interact with the robotic arm through natural language to complete complex tasks. 2. **Improve robot task programming efficiency:** Utilize the planning capabilities of LLM to quickly generate and adjust robot operation processes. 3. **Provide a safe and low-cost research and development and debugging environment:** Verify control algorithms and AI logic at low cost without touching real hardware through a browser-available simulator. 4. **Explore the application of LLM in embodied intelligence:** Give LLM a "hand" so that it can interact more directly with the physical world. * * * ### II. Work Description and Functional Highlights Our work is a comprehensive robotic arm control platform that integrates three-dimensional simulation, real hardware control, MCP services, and AI agent interaction. * **Core Functions:** 1. **High-Fidelity Three-Dimensional Simulator:** * Based on Three.js and URDFLoader, it can load and render standard URDF robotic arm models. * Provides realistic lighting, shadows, and customizable ground textures (MuJoCo style). * Supports users to freely control and observe the robotic arm through slider controllers. 2. **Multi-Mode Control:** * **Keyboard Control:** Allows users to control each joint of the simulated robotic arm in real time through keyboard keys and displays the key status. * **Real Robotic Arm Control:** Integrates real servos, connects and controls the real six-axis robotic arm through the Web Serial API, and realizes synchronous movement with the simulator. * **MCP Service Control:** The MCP bridging service implemented through WebSocket allows external programs (self-made mcp Python client, VSCode AI Toolkit) to send commands to control the robotic arm joint angles. 3. **Agentic AI Interaction (Implemented through Python Client):** * Large language models (such as DeepSeek, chatgpt) use "mcp tools" to translate user natural language instructions ("Let the robotic arm dance a seaweed dance", "Nod your head") into calls to predefined robot control tools. * The Python client receives the LLM's tool call request, converts it into an MCP-compatible JSON command, and sends it to the MCP bridging server, which in turn controls the simulator or real robotic arm. * The results of robot operations (success, failure, warning) are fed back to the LLM, allowing it to make subsequent decisions or report to the user. * **Highlights and Features:** 1. **End-to-End Intelligent Control Link:** Realizes a complete closed loop from natural language input -> LLM understanding and planning -> MCP service -> simulator/real robotic arm execution -> result feedback to LLM. 2. **Virtual-Real Combination and Low-Cost Verification:** The simulator provides a safe and efficient environment for the development and testing of AI algorithms, and can be seamlessly migrated to the control of real hardware. 3. **User-Friendly Interaction Interface:** Provides a clear control panel, status display, and instant feedback to enhance the user experience. 4. **Modularity and Scalability:** Each component of the system (simulator, controller, MCP service, AI client) is relatively independent, which is convenient for future function expansion and technology upgrades. * * * ### III. Construction of MCP Service and Client * Server mcp = FastMCP() ![69371004-cf22-47eb-8e1b-7aa9be70d4f9](./imgs/69371004-cf22-47eb-8e1b-7aa9be70d4f9.png) * mcp client ![121884db-a4b9-4d90-936f-6f354c7ec2f7](./imgs/121884db-a4b9-4d90-936f-6f354c7ec2f7.png) * * * ### IV. Agentic AI Platform Framework and Agent Construction * **Define Tools (mcp Tools for LLM):** 1. We have predefined a series of "tools" related to robot control for LLM: * set_robot_servo_angle: Control a single servo through ID and angle. * set_robot_joint_angle: Control a single joint through URDF joint name and angle. * set_robot_all_servo_angles: Control multiple servos at the same time. 2. Each tool description includes the name, function description, and detailed parameter definitions (type, description, whether it is required). This enables LLM to understand the purpose of each tool and how to call it correctly. * **LLM Interaction Process (Implemented in Python Client):** 1. **User Instruction Input:** The user inputs natural language instructions to the Python client (for example, "Turn the first joint of the robotic arm upward by 30 degrees and repeat this swing motion twice"). 2. **Call LLM API:** The Python script sends the user's instructions and the predefined list of robot control tools to the DeepSeek API. We set tool_choice="auto" to allow LLM to determine when and how to use these tools. 3. **LLM Generates Tool Calls:** If LLM believes that it needs to operate the robot to complete the user's instructions, it will return one or more tool_calls objects in the API response. Each tool_call contains the name of the function to be called (the tool name we defined) and parameters (a JSON string generated by LLM based on the user's instructions). 4. **Python Executes Tool Calls:** * The Python script parses tool_call to get the function name and parameters. * Call the execute_robot_tool_call function, which converts the LLM's abstract tool call into a specific MCP command JSON. 5. **Get Robot Operation Results:** The Python script waits for the operation receipt returned from the MCP bridging server (indicating success, failure, or warning). 6. **Feed the Results Back to LLM:** The Python script sends the results of the robot operation (formatted as a JSON string) as a role: "tool" message, along with the previous chat history, to the DeepSeek API again. 7. **LLM Generates Final Reply:** After receiving the tool execution results, LLM will generate a final reply to the user's instructions, such as confirming the completion of the operation, reporting an error, or requesting the next instruction. * **Agentic Features:** * **Sense-Think-Act Cycle:** LLM receives user input (sense), plans through tool calls (think), the Python script executes tool calls and operates the robot (act), and the robot operation results are fed back to LLM (sense again), forming a closed loop. * **Multi-Step Reasoning and Complex Task Decomposition:** For instructions such as "Repeat the swing motion three times", LLM can understand and continuously generate multiple tool calls to achieve it. * **Support for Multiple mcp Servers:** You can add mcp servers such as weather, so you can ask the robotic arm, is it raining in Shenzhen today? If it rains, nod your head. LLM can automatically call multiple mcp servers based on process judgment. * **Interaction with the Environment:** Although it is mainly one-way control at present, the feedback mechanism lays the foundation for future realization of more complex two-way interaction (such as vision-based adjustment). In this way, we have successfully combined the natural language understanding and planning capabilities of LLM with the physical execution capabilities of the robotic arm to build a preliminary agent prototype. * * * ### V. Technical Innovation Points 1. **Lightweight MCP Bridging Service Based on WebSocket:** The MCP server will eventually control real hardware. The server can be deployed online, but the control implementation of communication with the hardware is still on the end side. Instead of using a complex RPC framework or a heavyweight message queue, we designed a simple and efficient WebSocket bridging service to realize low-latency, two-way communication between the Python AI client and the browser-side Three.js simulator. This makes command delivery and status feedback fast and direct. 2. **Dynamic Blob Message Processing:** When implementing browser-side WebSocket message reception, we found that even if the server sends a text frame, the browser sometimes recognizes event.data as a Blob object. We ensure the correct parsing of messages and enhance the robustness of communication by asynchronously reading Blob.text() content. 3. **Virtual-Real Synchronization and Error Recovery Mechanism (for Real Servos):** * When connecting to a real servo, the system not only synchronizes the instructions to the hardware, but also tries to read the initial position from the servo and uses it as the benchmark for subsequent control. * Real-time UI feedback of the servo communication status (idle, pending, success, warning, error) is realized. * When the servo operation fails or an error occurs, the system records the last safe position and tries to restore the servo to that position, enhancing the safety of physical operation. At the same time, the error information is fed back to the AI client through the MCP service. * * * ### VI. UI/UX Optimization 1. **Responsive Control Panel:** Fixed positioning and maximum height restrictions are used to ensure good display on different screen sizes, and overflow: auto is used to implement scrolling when content overflows. The native scroll bar is hidden to make the interface more concise. 2. **Collapsible Sections:** The different functional modules of the control panel (keyboard control, real robot, MCP service) are organized in collapsible areas, and users can expand or collapse them as needed to keep the interface clean and orderly. Icons (▼/►) intuitively indicate the expanded state. 3. **Real-Time Status Feedback:** * Keyboard key presses have visual highlights (key-pressed class) and control area highlights (control-active class). * The servo connection status, the communication status of each servo (idle, pending, success, warning, error), and the specific error information are displayed on the UI in real time and distinguished by different colors. 4. **Instant Warning Reminders:** For virtual joint overtravel or real servo operation failure/error, a non-blocking, brightly colored warning box (jointLimitAlert, servoLimitAlert) will pop up at the top of the screen and disappear automatically after a few seconds. 5. **Clear Help Tips:** Help icons are added next to key operation buttons (such as "Connect Real Robotic Arm"), and mouse hover can display operation instructions and precautions. 6. **Smooth Robotic Arm Animation:** As mentioned earlier, joint movements use interpolation and easing functions to provide a smooth visual experience. 7. **Consistent Button Style:** A unified visual style is used for interactive elements such as connection buttons, and the color is changed according to the connection status. * * * ### VII. Team Contributions * **[Zhang Zehua] - Project Leader/Architecture Design/Backend Development:** * **[Tang Yang] - Frontend Development/Three.js Simulator:** * **[Xiao Qi] - AI Integration/Python Client Development:** * **[Xiao Kaijun, Li Yindong] - Real Robotic Arm Integration/Testing and Documentation:** The front-end code refers to the open source project: https://github.com/timqian/bambot * * * ### VIII. Future TODO This project lays a solid foundation for building a more intelligent and easier-to-use robot interaction system. In the future, we plan to deepen and expand in the following aspects: 1. **Enhance AI Perception and Interaction Capabilities:** * **Integrate Visual Feedback:** Introduce cameras and computer vision algorithms (or use the multi-modal capabilities of LLM) to enable the robotic arm to "see" the environment, realize vision-based object recognition, positioning, and grasping tasks, and adjust actions based on visual feedback. 2. **Improve Simulator Fidelity and Functionality:** * **Physics Engine Integration:** Introduce physics engines such as Bullet, Ammo.js, or Rapier to realize more realistic collision detection, gravity, friction, and other physical effects, and support more complex grasping and operation simulations. 3. **Optimize MCP Service and Multi-Agent Collaboration:** * **Richer MCP Instruction Set:** Extend the MCP protocol to support querying robot status (such as the current angle of each joint, end effector position), setting speed/acceleration, controlling grippers, and other more detailed operations.

mcp_robot

Content

Connection Info

You Might Also Like

markitdown

Fetch

chatbox

oh-my-opencode

continue

semantic-kernel

mcp_robot

Scan with WeChat to Share

Authentication Required

Content

Connection Info

You Might Also Like

markitdown

Fetch

chatbox

oh-my-opencode

continue

semantic-kernel