Content
# Beginner's Guide: Building AI Agents from Prompt to Context Engineering
In the past two to three years, I have conducted training on Agent development for senior developers at several companies; in the last month, I have also been designing and training AI Agents for graduates. It wasn't until this week, after completing a showcase project that combined AI capabilities, that I truly clarified how to systematically construct a learning path for Agents tailored to developers at different stages. This process also made me deeply aware of the existence of the "curse of knowledge"—what I take for granted may be the biggest obstacle for beginners.
We can simply divide the learning process into four parts:
- Structured Prompt Engineering — How to design efficient and reusable prompts in an engineering manner.
- Context Engineering and Knowledge Retrieval — Retrieving, generating, and compressing contextual information to produce high-quality knowledge backgrounds.
- Systematic Design of Tool Functions — Designing and implementing tools and interfaces for Agent invocation.
- Agent Planning and Multi-Agent — Constructing task planning and execution paths to achieve closed-loop automation.
Before we begin, we should briefly define AI Agents. Considering that there are numerous definitions of Agents, we can refer to the real-world examples provided by Anthropic in "Building effective agents":
> Some clients define them as fully autonomous systems that can operate independently for extended periods and use various tools to complete complex tasks; others describe them as systems that follow predefined workflows.
Thus, a simple task centered around a prompt can also be viewed as an AI Agent; while a task involving a complex system, multiple tools, and multiple steps is also an AI Agent.
## Structured Prompt Engineering
Although Context Engineering is a very popular term, knowing how to write effective Prompts remains a key focus for us to get started. There is already a wealth of content related to prompts online, but from my experience, we can focus on three main areas:
- Structuring the input and output of prompts
- Chain and modular design for complex problems
- Prompt routing and task distribution
With some necessary AI frameworks or tools, we can accomplish our tasks quite well.
### Structuring the Input and Output of Prompts
In the current process of developing Agents, although models can generate some prompts, tuning the prompts remains a key focus of the work. We hope that the content output by the model can be in JSON, XML, or Java classes for integration with other code.
> Prompts are the art and science of designing inputs to guide AI models to generate **specific outputs**. By carefully designing and wording the inputs, we can effectively influence and control the model's response direction and results, enabling AI to generate outputs that meet expectations.
We can directly look at the Structured Output Converter in the Spring AI documentation as an example:

The yellow parts in the image represent two core components:
**Formatted Input Instructions**
Generally, we need structured prompt templates to dynamically generate prompts, using structured text to design inputs:
- Dynamic Prompt Templates (PromptTemplate). Using classic template engines to dynamically combine context, such as Jinja2 in LangChain or StringTemplate in Spring AI. This approach allows for injecting context, user input, system state, etc., at runtime, enabling flexible prompt construction.
- Structured Text Structure. To ensure the reliability and parseability of AI outputs, prompts need to be designed structurally, including role positioning (Role), task description (Task), constraints (Constraints), output format, etc.
- Example-Driven. By providing example inputs (Few-shots) and expected outputs, we can significantly improve the stability and consistency of model outputs. For instance, when implementing QA, different scenario examples are provided.
**Transforming Model Output Results**
This involves using appropriate output formats for different scenarios and implementing corresponding parsing and **exception handling**.
- Domain-Specific Output Formats. Based on different scenarios, we adopt various designs such as JSON, XML, YAML, or Markdown to present information in a user-friendly manner. For example, JSON's advantage is that it can be serialized for transmission, but it cannot be rendered in real-time, leading to a poor experience and lack of robustness. YAML, on the other hand, can better handle streaming issues and has lower transmission costs.
- Parsing Implementation. Extracting code blocks from plain text, followed by deserialization and object mapping. Using Schema validation (JSON Schema, XSD) to ensure that the model output field types and structures conform to agreements.
- Exception Handling. Due to the uncertainty in model generation, outputs may have missing fields, type errors, or not conform to the agreed format. For example, when fields are missing, default values or fallback strategies can be used, which may trigger the model to retry generating specific fields.
When capabilities are appropriate, we can fine-tune/train the model based on existing data and information to enhance its abilities in this area.
### Prompt Routing and Task Distribution
In complex AI systems, especially in scenarios involving multiple Agents or modules, a single prompt often cannot complete all tasks. Therefore, we need prompt routing:

> Prompt Routing is an engineering model that splits tasks, analyzes inputs, and intelligently allocates them to the most suitable model or sub-task prompts in multi-task, multi-Agent, or complex AI processes.
The core idea is: by analyzing inputs and context, dynamically decide the information processing path, which prompt to use, or which tool or sub-Agent to call, thus achieving non-linear, conditional task execution. Taking a typical QA scenario as an example:
- Non-system-related questions → Directly inform the user that this type of question is not supported.
- Basic knowledge questions → Call document retrieval and QA models.
- Complex analytical questions → Call data analysis tools and then generate summaries.
- ……
Through prompt routing, the system can intelligently select the most suitable processing method based on the question type while maintaining modularity and scalability. In some AI frameworks, such as RouterChain in LangChain, similar capabilities are supported, along with methods like [Routing by semantic similarity](https://python.langchain.com/docs/how_to/routing/#routing-by-semantic-similarity).
### Chain and Modular Design for Complex Problems
With prompt routing in place, complex problems can be systematically decomposed through **Prompt Chaining**. Prompt chaining allows a large task to be split into multiple sub-tasks, each corresponding to different prompts or model calls, and finally integrating the results. This approach is particularly suitable for tasks with a fixed process, where some steps can be skipped.

This enables better modular design:
- Each sub-task focuses on handling a specific phase of the task.
- Sub-tasks can be rewritten as needed, adding or replacing prompts.
- Subsequent prompts can be dynamically adjusted based on the output of the previous phase.
For example, a product manager's idea can be decomposed into a prompt chain for common software requirements:

1. Idea Collection: Gather product ideas and initial requirements.
2. Requirement Logic Sorting: Clarify requirement logic and functional priorities.
3. Preliminary Requirement Scheduling: Formulate an initial requirements document or task list.
4. Final Requirement Confirmation: Confirm final requirements and generate formal documentation.
Each stage can be handled by different prompts or sub-Agents. For instance, idea collection can leverage an AI Agent with search capabilities, while requirement logic sorting can be accomplished using tools like Dify or Copilot 365. Ultimately, each stage executes in a chained process while maintaining the flexibility of modular design, allowing for adjustments or replacements of sub-tasks as needed.
## Context Engineering and Knowledge Retrieval
Generally, we have NoCode and ProCode to support context-aware Agent development.
- NoCode Solutions (suitable for rapid validation): Use low-code platforms (such as Dify, N8N, Coze, etc.) and pre-configured RAG pipelines to quickly configure retrieval strategies through UI.
- ProCode Solutions (suitable for customized needs): Use frameworks (LangChain, Spring AI) to customize retrieval processes and optimization strategies, enabling multi-stage HyDE + hybrid retrieval + reordering pipelines.

Context itself is also part of the prompt. Before achieving various automations, we typically manually copy from documents into AI chat tools. However, as we delve deeper into the model, we need to start thinking about how to build automation, approaching the problem from an engineering perspective. Before we begin, we still need to define AI Agents; here we can quote the definition given by Anthropic in "Effective context engineering for AI agents" (as it also involves both science and art):
> Context engineering is the art and science of carefully selecting and placing the most relevant content from an ever-changing universe of information into a limited context window.
### Context Window
In simple terms: focus on selecting the most critical information within a limited context window to enable the model to understand and reason more efficiently. Below are six common context engineering techniques summarized by [Drew Breunig](https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html) as illustrated by [Langchain](https://github.com/langchain-ai/how_to_fix_your_context):

Here, I will summarize it as: Engineering RAG and context windows. The content of a complete context window (i.e., prompts) should typically include:
- System Prompt Section:
- Input Instruction Context: Telling it "who you are" and "what you want to do," including system prompts, user inputs, and role definitions.
- Formatted Output Context: Specifying a structured pattern for the model's output format, such as requiring a return in JSON format to ensure the output's usability.
- Function Call Section:
- Tool-Related Context: This grants the model the ability to interact with the external world. It includes definitions of available tools or functions and the responses returned after calling these tools.
- Dynamic Context Section:
- Time and Memory Context: Short-Term Memory, Long-Term Memory.
- External Knowledge Context: Facts retrieved from external information repositories like documents and databases, helping the model avoid "nonsense" errors.
- Global State/Temporary Storage: Temporary storage for the model when handling complex tasks, akin to its "working memory."
- External Knowledge: Information retrieved from external knowledge bases (like documents, databases) using techniques such as Retrieval-Augmented Generation (RAG) to provide factual basis for the model and reduce hallucinations.
In addition to the fixed system prompt section, the **acquisition of external knowledge** and **memory** will greatly influence the entire window, making the design and optimization of these two aspects the top priority in context engineering.
### Knowledge Retrieval-Augmented Generation

> RAG (Retrieval-Augmented Generation) is one of the core technologies for building Agents. It enhances the generative capabilities of large language models by retrieving relevant information from external knowledge bases. In complex scenarios like codebase Q&A, simple vector retrieval is often not precise enough, requiring a combination of multiple retrieval strategies to improve accuracy.
In simple terms, it enriches the context through search. Depending on the complexity of implementation and scenario requirements, we can categorize retrieval strategies into the following types:
- **Keyword Search**. The most basic retrieval method, suitable for exact match scenarios. For instance, when searching for specific function names, class names, or variable names in a codebase, keyword search is often more effective than semantic search. Common implementations include:
- **Full-text Search**: Using search engines like Elasticsearch, Solr, employing algorithms like BM25, TF-IDF, etc.
- **Regular Expression Matching**: Tools like ripgrep, grep, etc., where Cursor uses a hybrid approach of ripgrep + vector retrieval.
- **Semantic Search**. Understanding the semantic meaning of queries through vector embeddings rather than just literal matches. This is particularly important for natural language queries:
- Using pre-trained embedding models (like OpenAI text-embedding-3-large, Jina embeddings v3) to convert text into vectors.
- Calculating the similarity between queries and documents in vector space (usually using cosine similarity or dot product).
- **Graph-based Search**. Graph retrieval focuses not only on "content similarity" but also on relationships and contextual dependencies.
- In code scenarios: Constructing code call relationship graphs, dependency graphs, utilizing AST (Abstract Syntax Tree) to extract methods, classes, constructors, etc.
- Examples include Microsoft's [GraphRAG](https://github.com/microsoft/graphrag), Aider's repomap, or infrastructures like Joern and CodeQL.
Before retrieval, to ensure the reliability of generated retrieval results, we need to introduce **Query Rewriting**, which gradually transforms the user's vague intent into precise queries that can be efficiently executed by the database. Modifying the user's original query enhances its relevance to documents in the knowledge base, addressing the "impedance mismatch" between natural language queries and stored data blocks.
#### RAG Example in Code Scenarios
Typically, various retrieval strategies can be combined to enhance retrieval effectiveness. Below is an example of a [Codebase RAG implementation](https://blog.lancedb.com/rag-codebase-1/) provided by the vector database LanceDB:

In addition to using TreeSitter for knowledge generation during the indexing phase, the [retrieval phase](https://blog.lancedb.com/building-rag-on-codebases-part-2/) also employs:
- HyDE (Hypothetical Document Embedding): First, the model generates a "hypothetical" document or code snippet based on the query, and then uses this generated content for vector search, making it easier to find semantically related code.
- BM25 (Keyword Search): A traditional keyword search algorithm adept at finding code containing precise terms or API names, which can also be combined with vector search.
- Hybrid Search: Combining BM25 and semantic search, allowing for both precise keyword matching and semantic understanding of code, achieving better results by adjusting the weights of both.
- Re-ranking: After obtaining preliminary results from vector search, re-ranking the results using cross-attention mechanisms to improve the relevance and accuracy of the final answers.
Of course, in the previous indexing phase, this example also generates **meta-feature data**, meaning that for each element or code snippet, we first generate a textual description of the code, then embed that description into vectors to capture all meta-features of the code, with features extracted by fine-tuned LLMs.
### Engineering the Context Window

Two years ago, [GitHub Copilot](https://code.visualstudio.com/docs/copilot/chat/prompt-crafting) built a context system for completion that is one of the most worthy of study in the industry (if not the best):
- Continuous Signal Monitoring. The Copilot plugin continuously monitors a series of signals from the IDE to dynamically adjust the priority of the context. For example, inserting or deleting characters, changing the currently edited file and language, cursor movement, scrolling position changes, and opening or closing files.
- Prioritization of Context Sources. In the final prompts sent to the model, the information is sorted and filtered based on optimization levels:
- Highest Priority: Code around the cursor position, including content before and after the cursor, which is the most direct context.
- High Priority: The rest of the currently edited file.
- Medium Priority: Other files or tabs opened in the IDE (i.e., "neighboring files").
- Auxiliary Context: Other information is also considered, including file paths, repository URLs, import statements in the code, and code information retrieved by RAG.
- Prompt Assembly under Context Length Constraints. Each information fragment is "scored" based on the above priorities, and then an optimal prompt is assembled.
This provides us with a very good reference:
- Freshness Priority. Recently edited or accessed content receives higher priority, while outdated content gradually loses weight.
- Signal Fusion and Dynamic Scoring. Fusing multiple editing signals (like cursor movement, file switching, import changes, etc.) to dynamically adjust context weights.
- Sliding Window and Incremental Updates. Using a sliding window mechanism to only incrementally update changed parts, avoiding full reconstruction.
- Budget Awareness and Automatic Truncation. Real-time estimation of token usage, automatically trimming or summarizing low-priority content as it approaches limits.
Of course, this is a very complex design, and such a design is only worth adopting in sufficiently high-value systems. Combined with various popular Cursor Rules/Specs, using persistent memory (Memory System) to store key information across sessions provides long-term background information for subsequent queries.
### Agentic Retrieval

> Agentic refers to a characteristic that enables AI systems to possess autonomous perception, dynamic decision-making, and goal-oriented execution capabilities, allowing them to actively optimize context, generate retrieval strategies, and continuously self-iterate during task processes.
In the AI Coding domain, we can observe the processes of systems like Cursor and Claude Code, which essentially execute RAG through Agents. Compared to ordinary RAG, it is easier for them to obtain rich context, ensuring that context is not lost throughout the process. We can see examples of some relatively mature AI applications:
- Cursor optimizes the use of `file + ripgrep` to directly retrieve code, and when results are insufficient, it calls vector checks or Git history for related retrieval.
- Google DeepResearch follows a similar process to complete a certain study: identifying mainstream tools for context engineering, gaining a preliminary understanding of tool functions and differences, and determining the next steps: delving into tool details.
In simple terms, for complex retrieval, we can construct it as an Agent, allowing the Agent to determine which retrieval tools and strategies to use, and when context is insufficient, continue to call tools with new parameters to obtain sufficient context.
#### DeepResearch Example
Below is an example of the [Open DeepResearch](https://github.com/langchain-ai/open_deep_research) process built by Langchain AI:

The Deep Research Agent demonstrates a more systematic Agentic retrieval method:
1. Splitting tasks into planning phases (Manager Agent) and execution phases (Execution Agent).
- The Manager Agent is responsible for task understanding, sub-task decomposition, and retrieval strategy design.
- The Execution Agent is responsible for actual searches, web or document scraping, and content parsing.
2. During the retrieval process, the Agent maintains the status of the topic structure, covered sub-issues, and information gaps to determine the next exploration direction.
3. User review (HITL mode) can be inserted at critical stages to enhance control and accuracy.
4. Finally, the Agent integrates the collected fragmented information into a structured report, complete with source citations.
Observing their interactions and thought processes can better help us understand this process. Based on this, we can also see that Agentic Context Engineering allows LLMs to autonomously generate, organize, and iterate context, achieving intelligent and scalable context management, thereby optimizing the retrieval and reasoning efficiency of complex tasks.

That is, optimizing how to retrieve based on historical conversations or experiences to make the Agent more suitable for the scenario.
## Engineering Design of Agent Tool Systems
In the process of building Agents, the design of the Tool System is the aspect that most reflects engineering thinking. It determines what the Agent can do, how well it can do it, and whether it can efficiently collaborate with the external world. Tools can be any API, such as data queries (like database access), real-world operations (like sending emails, booking meetings), or interfaces that collaborate with other services. As mentioned earlier, RAG under Agentic is also a type of tool, with LlamaIndex providing such explicit encapsulation:
- FunctionTool: Easily wraps any Python function into a tool available for the Agent.
- QueryEngineTool: Converts any data query engine (e.g., a vector index) into a tool, enabling the Agent to query and reason on it.
This data-centric approach simplifies our understanding of tools.
### Semantic Tools: Function Interfaces Designed for Agents
**Tools** are essentially a type of semantically understandable function interface. They not only contain logical execution capabilities but also carry metadata that helps the model understand:
- Name: The unique identifier of the tool, usually the function name, e.g., `getWeather`.
- Description: A natural language description of the tool's function, purpose, and applicable scenarios. This is crucial, as the model primarily relies on this description to determine when and how to use the tool.
- Parameters: An object defining the input parameters of the tool, including each parameter's name, data type (e.g., string, number), description, and whether it is a required parameter.
In terms of execution mechanisms, two common paradigms are:
- ReAct Framework (Reasoning + Acting): The core of the ReAct paradigm is to have the LLM interleave generating "thoughts" (reasoning trajectories) and "actions" (tool calls), forming an explicit think-act-observe loop.
- Direct Function Calling: This is a more structured approach. The LLM determines that the user's query can be best answered by calling one or more predefined functions during single-step reasoning. It then outputs a structured JSON object, clearly indicating the function name and its parameters that need to be called.
We need to decide which method to use based on the model's support and the designed interactions and intents.
### Tool Design Principles
Generally, when building Coding Agents, we follow these principles:
- Semantic Clarity: The names, descriptions, and parameter names of tools must be extremely clear, descriptive, and unambiguous to the LLM. The tool's description field is treated as a form of AI-oriented "micro-prompt" that is carefully crafted.
- Stateless **Objective** Functions: Only encapsulate complex technical logic or domain knowledge, avoiding strategic or subjective decision-making.
- Atomicity and Single Responsibility: Each tool should only be responsible for one clearly defined function, i.e., executing one atomic operation. If an Agent acts as a tool, it should also follow similar principles and only accomplish one task.
- Least Privilege: Each tool should only be granted the minimum permissions and capabilities necessary to complete its clearly defined tasks.
#### Workflow-Based Tool Orchestration: Task-Chained Design
This also applies to AI Agents in non-programming domains. Based on the above principles, we can decompose "plan my trip to Beijing next week" into a set of discrete, single-responsibility tools.
- search_flights(origin: str, destination: str, outbound_date: str, return_date: str): Search for flight information.
- search_hotels(location: str, check_in_date: str, check_out_date: str, adults: int): Search for hotel information.
- get_local_events(query: str, date: str): Get information about local events or attractions for a specific date.
- book_cruise(passenger_name: str, itinerary_id: str): Book a cruise itinerary.
- lookup_vacation_packages(query: str): Query vacation packages.
The key features of this orchestration method are: strong predictability, clear logic, and ease of modeling as visual processes (like DAG) within the platform. It is particularly suitable for Agents with stable processes and task dependencies (such as travel, customer service, data pipeline scenarios).
#### Classification-Based Tool Invocation: Dynamic Intent Decision-Making
For example, the Copilot orchestrator described in [Understanding GitHub Copilot’s Internal Architecture](https://iitsnl.com/blog/understanding-github-copilots-internal-architecture/) decides to call one or more internal tools to complete tasks based on the analysis results of the "intent classifier":
- File Operations: Including `read_file` (read file), `edit_file` (edit file), and `create_file` (create new file), enabling Copilot to interact directly with the user's codebase.
- Code Execution: Through the `run_in_terminal` tool, Copilot can execute commands in the user's terminal, such as running tests or building scripts.
- Search and Analysis: This is one of the most critical toolsets, including traditional `grep_search` (text search), `list_code_usages` (list code references), and the most powerful `semantic_search` **(semantic search)**.
The key features of this model are: high flexibility, strong scalability, but it relies on a good classification system and semantic matching capabilities. It is more suitable for dynamic scenarios, such as code generation, debugging, and documentation Q&A.
### Using the MCP Protocol to Build a Composable Tool Network
As the number of tools and Agents continues to grow, we need a mechanism to standardize descriptions, dynamically register, and cross-communicate tools among Agents. The MCP (Model Context Protocol) is a universal protocol layer designed for this purpose. Through MCP, AI Agents no longer rely on hard-coded interfaces or specific systems but can call tools, access data, or collaborate with other Agents in a unified format. The core values of MCP are standardization, dynamism, and composability:
- Standardization: A unified tool invocation format allows different Agents to share toolsets.
- Dynamism: Supports runtime registration and access to tools, enabling Agents to select the most suitable tools based on task requirements.
- Composability: Different Agents and tools can be combined like building blocks to achieve complex task decomposition and collaborative execution.
Combining the previously designed atomic tool functions, MCP can integrate these tools into a reusable, collaborative tool network, allowing Agents to be more flexible and efficient in solving complex problems.
#### Other Tool Networks
We can also observe the emergence of GitHub Copilot Extensions or Claude Code Plugins, indicating that even with protocols like MCP and A2A, the AI Agent ecosystem may not be as unified as we expect. For instance, the project https://github.com/wshobson/agents documents (as of 2025.10.14):
> A comprehensive system for production environments, consisting of 84 dedicated AI Agents, 15 multi-Agent workflow orchestrators, and 44 development tools, organized into 62 focused and single-responsibility plugins for Claude Code.
## Agent Planning and Beyond Single-Agent Systems
> An Agent is a software system that uses AI to achieve goals and perform tasks on behalf of users. It exhibits reasoning, planning, and memory capabilities, and possesses a degree of autonomy, enabling it to learn, adapt, and make decisions independently. - Google Cloud
Agents are goal-oriented and typically require **perception** - **planning** - **action** to achieve their objectives, along with memory, while complex AI Agent systems may also include **collaboration** and **self-improvement** capabilities. In the previous content, we have introduced several basic capabilities:
- Through **structured prompts and prompt chains**, Agents possess a planning and decision-making thought structure.
- Through **context engineering**, Agents gain the ability to "perceive the world," capturing information from external knowledge and environments.
- Through the engineering design of the **tool system**, Agents gain the ability to interact with the external world and execute tasks.
Based on this, the further development direction of Agents lies in:
- **Collaboration** — Multiple Agents work together through A2A (Agent-to-Agent) communication protocols or task allocation mechanisms, achieving role division and information sharing.
- **Self-improvement** — Agents accumulate experience through memory systems and reflection mechanisms, optimizing their prompts and planning strategies, thus possessing continuous learning and self-evolution capabilities.
As this is a rapidly evolving field,
### Modular System Prompts: The Thinking Blueprint of Agents
The first step in building an effective Agent is to define its "thinking blueprint" — the system prompt. A well-designed system prompt not only defines what the Agent should do but also clarifies what it should not do. In the Coding Agent domain, a system prompt for an Agent is often extremely complex. For example, the system prompt for [Cursor](https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/Cursor%20Prompts/Agent%20Prompt%202025-09-03.txt) includes detailed specifications regarding roles, tool invocation, safety boundaries, and task planning.
Combining tools like Cursor, Claude Code, Augment, and Junie, we can summarize a series of modular design practices:
- **Structural Layering and Modularization**: Organize prompts with clear hierarchies (roles/communication/tools/safety/tasks) to avoid "one-size-fits-all" text, facilitating maintenance and dynamic loading.
- **Tool Prioritization and Parallelization**: Prioritize specialized tools and parallelize when possible, significantly reducing latency and costs (e.g., parallel calls to `read_file` to read multiple files, using `search_replace` for editing instead of sed).
- **Safety Boundaries and Permission Models**: Default to sandboxing with minimal permissions, requiring explicit authorization for dangerous operations (e.g., `required_permissions: ["network"|"git_write"|"all"]`), prohibiting high-risk actions like force-pushing to `main/master`.
- **Minimal Sufficient Task Management**: Use TODO management for multi-step complex tasks (the first marked as in_progress upon creation, completed immediately upon finishing), while simple direct tasks are executed immediately.
- **Context Uniqueness and Safe Modifications**: Code edits require uniquely identifiable contexts (e.g., `old_string` must be unique in the file, with 3–5 lines before and after), and multiple modifications should be executed in separate steps to avoid errors.
- **Communication Norms and User Experience**: Hide internal tool names, using natural language to "say-do-summarize," keeping it concise and scannable; use backticks to denote file/function names, providing minimal usable examples when necessary.
This evolution from monolithic prompts to modular, hierarchical, and dynamic designs is akin to the transition from monolithic applications to microservices architecture, providing structural support for advanced reasoning, system scalability, and maintainability of Agents.
### From Retrieval to Planning: Using Prompts to Decompose Goals for Agents
Simply telling an Agent "make a plan" is far from sufficient; we must guide its decomposition process through a set of clear principles, just as we would establish specifications for software modules. The intelligence ceiling of a monolithic Agent often depends on its "planning capability"—whether it can decompose vague goals into clear, executable sub-tasks.
This involves two core strategies:
- **Pre-decomposition**: This strategy, also known as static planning, decomposes the entire complex task into a sequence of sub-tasks or plans before task execution begins.
- **Interleaved Decomposition**: This strategy, also known as dynamic planning, does not formulate a complete plan at the start of the task but dynamically decides the next sub-task during execution.
For example, the architecture of BabyAGI embodies this "task-driven" planning: it consists of three core Agents—task_creation_agent (task generation), execution_agent (task execution), and prioritization_agent (task prioritization), forming a continuously looping task update and execution system.
In modern systems (like Augment, Claude Code), planning logic is often embedded in system prompts in the form of todo_spec, featuring the following characteristics:
- **Atomicity and Action Orientation**: Each to-do item should be an independent, indivisible "atomic" task.
- **Meaningful Abstraction Levels**: To-do items should not be trivial operational actions (like "read file a.txt" or "fix linter errors") but should represent higher-level, meaningful, and non-trivial tasks.
- **Appropriate Scope**: Specifications tend to favor "fewer, larger to-do items" rather than a lengthy list of minor steps.
- **Implementation-Centric**: If the user's request is to implement a certain feature, the to-do list generated by the Agent itself is the final plan.
Through this structured planning, Agents can transform "user needs" into "system plans," laying the semantic interface for multi-Agent collaboration.
### Multi-Agent Collaboration System: From Individuals to Organizations

The capabilities of a monolithic Agent are limited, while multi-Agent systems (Multi-Agent System, MAS) represent an engineering direction suitable for the development of intelligent systems. Just as microservices achieve high cohesion and low coupling by decomposing monolithic applications, multi-Agent systems achieve intelligent horizontal scaling by splitting Agent responsibilities. By enabling multiple Agents to collaborate to achieve more complex goals, they can work together like a "team" in software development.
Common collaboration topologies (refer to [LangGraph](https://langchain-ai.github.io/langgraph/concepts/multi_agent/), AutoGen, etc.):
- Supervisor-Expert Model (Hierarchical Structure): A "Supervisor Agent" or "Coordinator Agent" is responsible for receiving high-level user goals, decomposing them into a series of sub-tasks, and then assigning them to the corresponding "Expert Agents" based on the nature of each sub-task.
- Parallel Model (Collective Intelligence): Also known as "concurrent mode" or "swarm mode." Multiple Agents independently execute the same task or different parts of the task simultaneously, then aggregate their outputs.
- Sequential Model (Pipeline): Agents work in a predefined order, like on an assembly line. The output of the previous Agent becomes the input for the next Agent.
- Network Model (Conversational/Dynamic Mode): Agents can freely communicate in a many-to-many network without a fixed hierarchical structure. The next acting Agent is usually dynamically determined based on the flow of conversation.
The choice of multi-Agent topology directly reflects the underlying structure of the problem to be solved. The architecture is not arbitrarily chosen but attempts to create a "cognitive model" that mirrors the dependency graph of the problem. Of course, it inevitably encounters various issues similar to the complexities found in microservices architecture.
#### A2A Protocol: Building Agent Networks to Accelerate Intelligent Capability Sharing
A2A is designed specifically for Agent-to-Agent communication, complementing other standards like the Model Context Protocol (MCP) that handle Agent-to-Tool communication. It plays the role of a public internet protocol, allowing different Agent systems to connect and interoperate.
However, we do not necessarily need to introduce A2A architecture; for example, the mechanism we implemented in AutoDev exposes A2A protocol Agents as MCP tools for Agent use, achieving collaboration between Agents without adding system complexity.
#### Self-Improvement: Reflection, Memory, and Evaluation Loops
The true power of an evolving Agent comes from the close integration of reflection loops and persistent memory systems.
- Reflection Mechanism: The Agent reviews its outputs, identifies errors, and generates improvement suggestions.
- Memory Storage: Persisting task experiences and contexts (like `AGENTS.md`, Knowledge Graph) provides long-term references for subsequent tasks.
For memory, there should be a mechanism for weighted retrieval based on recency, relevance, and importance, as well as a reflective memory management system that can autonomously decide what to remember, what to forget, and how to organize information.
> The ultimate goal of an advanced Agent architecture is to create a self-reinforcing flywheel: actions generate experiences, reflections distill experiences into knowledge, and memory stores knowledge to improve future actions. This transforms the Agent from a static program into a dynamic learning entity.
## Conclusion
> The status of the system prompt in Agent systems far exceeds that of a simple instruction set; it is, in fact, the core "operating system" of the Agent, requiring a high level of architectural design for prompts and context engineering.
Utilizing markup languages like Markdown or XML to construct structured instruction modules can significantly enhance LLM's understanding and adherence to complex rules. By employing clear role activations, detailed behavioral specifications, and "just-in-time" data loading techniques in context engineering, developers can shape a stable and predictable "cognitive environment" for Agents, guiding their behavior onto the desired track. Excellent context engineering is the foundation for achieving reliability in Agent behavior.
Related Resources:
- https://docs.spring.io/spring-ai/reference/api/structured-output-converter.html
- Agentic Design Patterns: https://docs.google.com/document/d/1rsaK53T3Lg5KoGwvf8ukOUvbELRtH-V0LnOIFDxBryE/edit?tab=t.0
- Agentic Context Engineering: https://www.arxiv.org/pdf/2510.04618
- A Survey on Large Language Model based Autonomous Agents: https://arxiv.org/pdf/2308.11432
- [Effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
- [AGENTIC RETRIEVAL-AUGMENTED GENERATION: A SURVEY ON AGENTIC RAG](https://arxiv.org/pdf/2501.09136)
- [How to build reliable AI workflows with agentic primitives and context engineering](https://github.blog/ai-and-ml/github-copilot/how-to-build-reliable-ai-workflows-with-agentic-primitives-and-context-engineering/)