Content
# LeiAI Agent - Intelligent Travel Assistant





**LeiAI Agent** is an intelligent travel assistant system built on Spring Boot 3 and Spring AI. It utilizes advanced technologies such as RAG (Retrieval-Augmented Generation), Tool Calling, and MCP (Multiple Conversation Paths) to provide users with personalized travel planning, information retrieval, and intelligent dialogue services.
## 📚 Table of Contents
- [Project Overview](#project-overview)
- [System Architecture](#system-architecture)
- [Core Functional Modules](#core-functional-modules)
- [Technology Stack](#technology-stack)
- [Project Structure](#project-structure)
- [Quick Start](#quick-start)
- [API Documentation](#api-documentation)
- [Project Highlights](#project-highlights)
- [Latest Optimizations](#latest-optimizations)
- [Future Plans](#future-plans)
- [License](#license)
## 🌟 Project Overview
LeiAI Agent is an intelligent assistant system for the tourism domain. By integrating large language models (such as Alibaba Cloud's Tongyi Qianwen), knowledge retrieval enhancement, tool calling, and multi-turn dialogue management, it provides users with an immersive travel planning and consultation experience. The system can not only answer travel-related questions but also actively call various tools (such as web search, PDF generation, file operations, etc.) to complete complex tasks, achieving true intelligent agent functionality.
### Project Background
With the development of artificial intelligence technology, especially the emergence of large language models (LLMs), new possibilities have been provided for building intelligent dialogue systems. However, relying solely on LLMs has issues such as knowledge timeliness, tool usage capabilities, and context management. This project aims to address these pain points and build an intelligent agent system that can think, plan, and act autonomously, providing comprehensive intelligent services for the tourism domain.
### Project Value
- **Enhance User Experience**: Simplify the travel planning process through natural language interaction.
- **Reduce Information Acquisition Costs**: Integrate multi-source data to provide one-stop travel information services.
- **Personalized Recommendations**: Provide customized travel plans based on user preferences.
- **Automated Execution**: Autonomously call tools to complete complex tasks, such as generating travel plan PDFs, booking inquiries, etc.
## 🏗️ System Architecture
LeiAI Agent adopts a modular, layered architecture design to ensure the system's scalability and maintainability.
```mermaid
graph TD
Client[客户端] --> API[API层]
API --> Service[服务层]
Service --> Agent[智能体层]
Agent --> LLM[大语言模型]
Agent --> ToolCalling[工具调用]
Agent --> RAG[知识检索增强]
LLM --> DashScope[阿里云通义千问]
ToolCalling --> WebSearch[网页搜索]
ToolCalling --> PDFGen[PDF生成]
ToolCalling --> FileOps[文件操作]
ToolCalling --> WebScraping[网页抓取]
ToolCalling --> Terminal[终端操作]
RAG --> Embedding[向量嵌入]
RAG --> VectorDB[向量数据库]
RAG --> Retriever[检索器]
Service --> Memory[会话记忆]
Memory --> Redis[Redis缓存]
VectorDB --> PostgreSQL[PostgreSQL+pgvector]
```
### Architecture Description
1. **API Layer**: Responsible for handling HTTP requests and providing RESTful API interfaces.
2. **Service Layer**: Implements business logic and coordinates the work of various components.
3. **Agent Layer**: The core AI agent layer, implementing the think-act loop.
4. **Large Language Model**: Provides natural language understanding and generation capabilities.
5. **Tool Calling**: Various tools that extend the capabilities of the intelligent agent.
6. **Knowledge Retrieval Enhancement**: Enhances the quality of model responses through vector databases.
7. **Session Memory**: Manages multi-turn dialogue context.
## 🧩 Core Functional Modules
### 1. Agent System
The agent is the core of the system, adopting the ReAct (Reasoning and Acting) pattern to solve complex problems through a "think-act" loop.
- **BaseAgent**: Abstract base agent class, providing state management and execution process control.
- **ReActAgent**: Abstract class implementing the think-act loop.
- **ToolCallAgent**: Agent implementation capable of calling external tools.
- **LiManus**: Super intelligent agent integrating all capabilities, capable of autonomously planning and executing tasks.
The core advantage of the agent system lies in its autonomy and scalability, enabling it to dynamically select appropriate tools and execution paths based on user needs.
### 2. Tool Calling System (Tools)
The tool calling system extends the agent's capability boundary, enabling it to interact with the external world.
Tool Calling Process

Framework Controlled Tool Execution

- **WebSearchTool**: Web search tool for obtaining real-time information.
- **PDFGenerationTool**: PDF generation tool for creating travel plan documents.
- **FileOperationTool**: File operation tool for managing local files.
- **WebScrapingTool**: Web scraping tool for extracting web page content.
- **TerminalOperationTool**: Terminal operation tool for executing system commands.
- **ResourceDownloadTool**: Resource download tool for obtaining remote resources.
- **TerminateTool**: Termination tool for ending the current task execution.
The tool system adopts a unified registration and calling mechanism, making it easy to extend new tools.
### 3. RAG Knowledge Enhancement System (RAG)
The RAG system enhances model responses by retrieving relevant knowledge, making it particularly suitable for handling professional domain questions.
- **Vector Storage**: Uses PostgreSQL+pgvector to store document vectors.
- **Document Processing**: Supports processing and chunking of documents in various formats.
- **Retriever**: Retrieves relevant document fragments based on semantic similarity.
- **Embedding Model**: Converts text into vector representations.
### 4. Dialogue Management System
- **Session State Management**: Tracks and maintains dialogue state.
- **Context Memory**: Uses Redis to store short-term memory.
- **Multi-Turn Dialogue Support**: Maintains a coherent dialogue experience.
### 5. API Interface System
- **AgentController**: Provides agent interaction interfaces.
- **AppController**: Provides application core function interfaces.
- **RagController**: Provides knowledge retrieval related interfaces.
## 🔧 Technology Stack
### Backend Technology
- **Core Framework**: Spring Boot 3.5.0
- **AI Framework**: Spring AI 1.0.0
- **Large Language Model**: 阿里云通义千问 (Qwen-Plus)
- **Vector Database**: PostgreSQL + pgvector
- **Cache System**: Redis
- **ORM Framework**: MyBatis-Plus 3.5.12
- **API Documentation**: Knife4j 4.5.0 (based on OpenAPI 3)
- **Tool Library**: Hutool 5.8.37, Lombok 1.18.38
- **PDF Processing**: iText 9.1.0
- **Web Scraping**: Jsoup 1.19.1
- **Exception Handling**: Global unified exception handling mechanism
- **Response Encapsulation**: Unified API response format
### Frontend Technology
The frontend adopts a modern web technology stack to provide an intuitive user interface (Note: This project mainly focuses on backend implementation, and the frontend is an optional component).
- **Framework**: Vue.js/React (can be selected according to requirements)
- **UI Components**: Element-Plus/Ant Design
- **HTTP Client**: Axios
- **State Management**: Pinia/Redux
## 📂 Project Structure
The project adopts a clear layered architecture, with each module having clear responsibilities, making it easy to maintain and extend:
```
src/main/java/com/lilei/leiaiagent/
├── agent/ # Agent core implementation
│ ├── BaseAgent.java # Base agent abstract class
│ ├── ReActAgent.java # ReAct pattern agent
│ ├── ToolCallAgent.java # Tool calling agent
│ ├── LiManus.java # Comprehensive agent implementation
│ └── model/ # Agent model
│ └── AgentState.java # Agent state
├── config/ # Configuration class
│ ├── SwaggerConfig.java # Swagger documentation configuration
│ ├── WebMvcConfig.java # Web MVC configuration
│ └── ResponseAdvice.java # Unified response handling
├── constant/ # Constant definition
│ ├── ApiConstants.java # API related constants
│ └── ErrorCode.java # Error code constants
├── controller/ # API controller
│ ├── AgentController.java # Agent controller
│ └── ...
├── domain/ # Domain model
│ ├── entity/ # Entity class
│ │ ├── ChatSession.java # Session entity
│ │ └── ChatMessage.java # Message entity
│ └── dto/ # Data transfer object
│ ├── ChatSessionDTO.java # Session DTO
│ └── ChatMessageDTO.java # Message DTO
├── exception/ # Exception handling
│ ├── GlobalExceptionHandler.java # Global exception handler
│ └── custom/ # Custom exception
│ ├── BusinessException.java # Business exception
│ ├── ResourceNotFoundException.java # Resource not found exception
│ └── UnauthorizedException.java # Unauthorized exception
├── llm/ # LLM integration
├── pojo/ # Plain Java object
│ └── vo/ # View object
│ ├── AgentRequestVO.java # Agent request
│ ├── AgentResponseVO.java # Agent response
│ └── ApiResponse.java # Unified API response
├── prompt/ # Prompt template
├── rag/ # RAG implementation
├── service/ # Service layer
│ ├── api/ # Service interface
│ │ └── AgentService.java # Agent service interface
│ └── impl/ # Service implementation
│ └── AgentServiceImpl.java # Agent service implementation
├── tools/ # Tool implementation
├── utils/ # Utility class
└── LeiAiAgentApplication.java # Application entry
```
## 🚀 Quick Start
### Environment Requirements
- JDK 21+
- Maven 3.8+
- PostgreSQL 14+ (with pgvector extension installed)
- Redis 6+
### Installation Steps
1. **Clone Repository**
```bash
git clone https://github.com/lilei/lei-ai-agent.git
cd lei-ai-agent
```
2. **Configure Environment Variables**
Create a `.env` file and configure the following environment variables:
```properties
AI_DASHSCOPE_API_KEY=your_dashscope_api_key
SEARCH_API_KEY=your_search_api_key
```
3. **Compile Project**
```bash
mvn clean package
```
4. **Run Application**
```bash
java -jar target/lei-ai-agent-0.0.1-SNAPSHOT.jar
```
5. **Access Swagger Documentation**
Open your browser and visit `http://localhost:8082/api/swagger-ui.html`
### Configuration Description
The main configuration files are located in `src/main/resources/application.yml` and `src/main/resources/application-dev.yml`, including the following key configurations:
- Database connection information
- Redis configuration
- AI model configuration
- RAG system parameters
- Tool calling parameters
- Log configuration
## 📖 API Documentation
The project integrates Knife4j, providing a beautiful and easy-to-use API documentation interface. The main APIs are divided into the following categories:
### 1. Agent API
- `POST /api/agent/execute`: Execute agent task (synchronous mode)
- `POST /api/agent/execute/advanced`: Execute agent task (advanced synchronous mode)
- `GET /api/agent/stream`: Execute agent task (streaming mode)
- `POST /api/agent/stream/advanced`: Execute agent task (advanced streaming mode)
- `GET /api/agent/status`: Get agent status
- `POST /api/agent/reset`: Reset agent status
- `POST /api/agent/stream/close/{sessionId}`: Close streaming connection
### 2. Application API
- `GET /api/ai/chat`: Chat interface
### 3. RAG API
- `POST /api/rag/query`: Knowledge base query interface
## 💡 Project Highlights
### 1. Innovative Agent Architecture
When designing the agent architecture, I paid special attention to scalability and flexibility. By abstracting the base agent class (BaseAgent) and implementing the ReAct pattern (ReActAgent), I built an intelligent agent system capable of autonomous thinking and action. This architecture is not only suitable for the tourism domain but can also be easily extended to other vertical domains.
The agent's state management mechanism (AgentState) ensures the reliability of task execution, while the step-based execution loop allows complex tasks to be broken down into manageable small steps. This design enables the system to handle long chains of reasoning and action sequences, greatly enhancing the agent's capability boundary.
### 2. Flexible Design of Tool Calling System
The tool calling system adopts a unified registration and calling mechanism, making it very easy to add new tools. Each tool implements a standard interface and is uniformly managed through ToolRegistration. This design allows the agent to dynamically select appropriate tools based on task requirements, achieving true tool-augmented AI.
It is particularly worth mentioning that I implemented a tool calling error recovery mechanism. When a tool call fails, the system attempts to retry or select an alternative, improving the system's robustness.
### 3. High-Performance RAG Implementation
In the RAG system design, I used PostgreSQL+pgvector as the vector database. Compared to pure memory vector databases, this solution performs more stably on large-scale datasets and supports persistent storage.
To improve retrieval efficiency, I implemented an index mechanism based on HNSW (Hierarchical Navigable Small World), which greatly improves the speed of vector retrieval. At the same time, by optimizing the chunking strategy (chunk-size and chunk-overlap parameters), I balanced retrieval accuracy and system performance.
### 4. Streaming Response Mechanism
To enhance the user experience, I designed a streaming response mechanism based on SSE (Server-Sent Events), allowing users to see the agent's thinking and action process in real-time without waiting for the entire task to complete. This design is particularly suitable for long-running complex tasks, providing a better interactive experience.
The streaming response mechanism also includes complete error handling and resource cleanup logic, ensuring system stability and resource utilization efficiency.
## 🔄 Latest Optimizations
Recently, a series of optimizations and improvements have been made to the project, mainly including:
### 1. Code Architecture Optimization
- **Separation of Interface and Implementation**: Separating service layer interfaces from implementations improves code testability and maintainability.
- **Domain Model Layering**: Introducing entity (Entity) and data transfer object (DTO) layering makes data flow clearer.
- **Constant Extraction**: Extracting constants such as API paths and error codes into dedicated constant classes avoids hardcoding.
### 2. Exception Handling Mechanism
- **Global Exception Handling**: Implementing a unified global exception handler standardizes exception response formats.
- **Custom Exception System**: Designing custom exception classes such as business exceptions, resource not found exceptions, and unauthorized exceptions.
- **Error Code Standardization**: Establishing a unified error code system facilitates problem localization and troubleshooting.
### 3. API Response Standardization
- **Unified Response Format**: All APIs return a unified response format, including status code, message, data, and other fields.
- **Response Interceptor**: Implementing a ResponseAdvice interceptor automatically wraps controller return values.
- **Status Code Specification**: Defining response specifications for different states such as success, error, and warning.
### 4. Configuration Optimization
- **Swagger Documentation Enhancement**: Improving API documentation configuration, adding more detailed descriptions and examples.
- **Cross-Origin Support**: Configuring global CORS support facilitates front-end and back-end separation development.
- **Log Configuration**: Optimizing log configuration, supporting file logs and console logs, facilitates problem troubleshooting.
### 5. Agent Service Improvements
- **Session Management**: Enhancing session management functions, supporting session state tracking and history recording.
- **Streaming Connection Management**: Improving the lifecycle management of SSE connections avoids resource leaks.
- **Parameter Validation**: Adding request parameter validation improves system robustness.
## 🔮 Future Plans
1. **Multi-Model Support**: Integrate more large language models, such as OpenAI GPT, Claude, etc.
2. **Multi-Modal Capabilities**: Add image understanding and generation capabilities.
3. **Knowledge Base Expansion**: Enrich the tourism domain knowledge base to improve the quality of professional Q&A.
4. **Tool Ecosystem Expansion**: Develop more tourism-related tools, such as hotel booking, flight inquiries, etc.
5. **Frontend Interface Optimization**: Develop a more user-friendly user interface to enhance the user experience.
6. **Performance Optimization**: Further optimize system performance to support higher concurrency.
7. **Multi-Language Support**: Add multi-language support to serve international users.
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Connection Info
You Might Also Like
awesome-mcp-servers
A collection of MCP servers.
git
A Model Context Protocol server for Git automation and interaction.
Appwrite
Build like a team of hundreds
TrendRadar
TrendRadar: Your hotspot assistant for real news in just 30 seconds.
oh-my-opencode
Background agents · Curated agents like oracle, librarians, frontend...
chatbox
User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)