Content

# Dual-Layer MemCube Agent Backend A robust, modular AI backend implementing the **Dual-Layer Memory Architecture**. It optimizes for both low-latency conversation (Hot Memory) and massive historical recall (Cold Memory/Vector DB). Built with **FastAPI**, **ChromaDB**, and designed to be fully customizable—easily switch between **OpenAI**, **Ollama**, **vLLM**, or any OpenAI-compatible API. ## 🚀 Key Features * **Dual-Layer Memory System**: * **L1 (Hot Memory)**: In-memory LRU cache for recent context. Includes an "Importance" lock to prevent critical information from being evicted. * **L2 (Cold Memory)**: Persistent Vector Database (ChromaDB) for long-term storage of overflowed memories. * **Waterfall Retrieval**: Intelligent context lookup that queries L1 first and only falls back to the Vector DB if necessary, reducing latency and cost. * **Customizable AI Provider**: seamless support for: * **OpenAI** (GPT-3.5/4) * **Local LLMs** (Ollama, vLLM, LocalAI) * **Mock Mode** (Zero-cost testing) * **Production Ready**: Built on FastAPI with asynchronous request handling, modular architecture, and structured logging. ## 📊 Benchmark Results We compared MemCube against a traditional chat history system (keeping last 30 messages in context). Both use the same AI API for fair comparison. ### Performance Summary | Metric | MemCube | Traditional | Advantage | |--------|---------|-------------|-----------| | **Avg Latency** | 7,758 ms | 12,272 ms | **37% faster** ⚡ | | **Min Latency** | 5,368 ms | 6,141 ms | 13% faster | | **Max Latency** | 16,218 ms | 19,746 ms | 18% faster | | **Recall Rate** | 100% | 100% | Equal ✓ | | **API Calls** | 30 | 30 | Equal | ### Visual Comparison ![Benchmark Summary](docs/benchmark_summary.png) ### Why MemCube is Faster 1. **Deferred Embedding** - Memory is saved immediately; embedding generated in background 2. **Smart Retrieval** - If enough recent context exists in L1, skip expensive vector search 3. **No Query Embedding** - For recent conversations, no embedding generation needed ### Additional Advantages Over Traditional Systems | Feature | MemCube | Traditional | |---------|---------|-------------| | **Scalability** | Handles 1000+ conversations | Context window limited | | **Cross-Session Memory** | ✅ Persistent | ❌ Session-only | | **Old Memory Retrieval** | ✅ Vector search | ❌ Lost after window | | **Critical Info Protection** | ✅ Importance lock | ❌ FIFO eviction | ### Running the Benchmark ```bash # Start the backend python -m app.main # In another terminal, run benchmark python benchmark.py # Generate charts (optional) python generate_charts.py ``` ## 🛠️ Installation 1. **Clone the repository** (if applicable) or navigate to the project folder: ```bash cd memcube_backend ``` 2. **Install Dependencies**: Recommended to use a virtual environment (venv/conda). ```bash pip install -r requirements.txt ``` ## ⚙️ Configuration Copy the example configuration file: ```bash cp .env.example .env ``` ### Option A: Using OpenAI Edit `.env` to use official OpenAI API: ```ini AI_PROVIDER_TYPE=openai AI_API_KEY=sk-proj-... AI_CHAT_MODEL=gpt-3.5-turbo AI_EMBEDDING_MODEL=text-embedding-3-small ``` ### Option B: Using Local LLM (Ollama) Edit `.env` to point to your local instance. **No API key required.** ```ini AI_PROVIDER_TYPE=custom AI_BASE_URL=http://localhost:11434/v1 AI_API_KEY=ollama AI_CHAT_MODEL=llama3 AI_EMBEDDING_MODEL=nomic-embed-text ``` ### Option C: Mock Mode (Testing) Perfect for testing logic without running an LLM. ```ini AI_PROVIDER_TYPE=mock ``` ## 🏃‍♂️ Usage Start the server: ```bash python -m app.main ``` The server will start at `http://localhost:8000`. ### API Endpoints Interactive documentation is available at **[http://localhost:8000/docs](http://localhost:8000/docs)**. #### 1. Chat with Memory `POST /api/v1/chat` ```json { "message": "My secret code is 42", "importance": "high" } ``` *The agent will response using context from L1 or L2 memory. High importance messages are protected from being forgotten (evicted from L1).* #### 2. Add Memory Manually `POST /api/v1/memory` ```json { "content": "The project deadline is next Friday.", "importance": "normal", "source": "slack_integration" } ``` ## 🔌 MCP Server Support (Claude Desktop Integration) MemCube implements the **Model Context Protocol (MCP)**, allowing it to be used as a native memory tool by intelligent agents like **Claude Desktop** or **Cursor**. ### Features * **Persistent Memory**: Claude can save important facts about you that persist across sessions. * **Semantic Search**: Claude can retrieve relevant past context based on your current query. ### Setup for Claude Desktop 1. Ensure local backend is running (`python -m app.main`) 2. Edit your Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json` on Mac, or `%APPDATA%\Claude\claude_desktop_config.json` on Windows): ```json { "mcpServers": { "memcube": { "command": "python", "args": ["/absolute/path/to/memcube_backend/mcp_server.py"] } } } ``` 3. Restart Claude Desktop. You will see 🛠️ icon indicating MemCube tools (`save_memory`, `retrieve_memory`) are available. ## 📂 Project Structure ```text memcube_backend/ ├── app/ │ ├── api/ # API Routes │ ├── core/ # Config & Settings │ ├── llm/ # AI Provider Factory (OpenAI/Custom/Mock) │ ├── models/ # Pydantic Data Models │ ├── services/ # Business Logic │ │ ├── manager.py # Memory Manager (Orchestrator) │ │ ├── memory_l1.py # Hot Memory Logic │ │ └── memory_l2.py # Cold Memory Logic │ └── main.py # App Entry Point ├── .env # Environment Variables └── requirements.txt # Dependencies ``` ## 🧠 Architecture Details ### The "Spillover" Mechanism When L1 (Hot Memory) reaches capacity, it evicts the **Least Recently Used (LRU)** item to L2 (Cold Memory). * **Exception**: If an item is marked `importance="high"`, it is moved to the front of the cache and **not evicted**, ensuring critical instructions (like "My name is Alice" or "Act as a python expert") are always fast to access. ### The "Waterfall" Retrieval When retrieving context for a user query: 1. **Search L1**: Computes cosine similarity with all hot memories. 2. **Early Exit**: If a match is found with similarity > `0.85`, it returns immediately. 3. **Fallback**: If no good match is found in L1, it searches the Vector Database (L2).

memcube_backend

Content

MCP Config

Connection Info

You Might Also Like

markitdown

markitdown

Filesystem

TrendRadar

mempalace

mempalace

memcube_backend

Scan with WeChat to Share

Authentication Required

Content

MCP Config

Connection Info

You Might Also Like

markitdown

markitdown

Filesystem

TrendRadar

mempalace

mempalace