Content
# AuditLuma - Advanced Code Audit AI System 🔍
<div align="center">




</div>
AuditLuma is an intelligent code audit system that employs an innovative **hierarchical RAG architecture**, integrating multiple AI agents and advanced technologies, including Haystack-AI orchestrator, txtai knowledge retrieval, R2R context enhancement, and Self-RAG validation, to provide comprehensive and accurate security analysis for codebases.
## 🌟 Architecture Highlights
- 🏗️ **Hierarchical RAG Architecture** - Four-layer intelligent architecture: Haystack orchestration + txtai retrieval + R2R enhancement + Self-RAG validation
- 🚀 **Haystack-AI Orchestrator** - Intelligent task decomposition and result integration, supporting fallback to traditional orchestrators
- 🔍 **Intelligent Knowledge Retrieval** - txtai-driven semantic retrieval and contextual understanding
- 🎯 **Precise Validation** - Self-RAG multi-model cross-validation, effectively reducing false positives
- 🔄 **Adaptive Architecture** - Automatically selects the optimal architectural mode based on project scale
## ✨ Core Features
### 🏗️ Hierarchical RAG Architecture
- **Haystack Orchestration Layer** - Intelligent task decomposition, parallel execution, and result integration
- **txtai Knowledge Retrieval Layer** - Semantic retrieval and contextual understanding
- **R2R Context Enhancement Layer** - Dynamic context expansion and correlation analysis
- **Self-RAG Validation Layer** - Multi-model cross-validation and false positive filtering
### 🚀 Intelligent Orchestration System
- **Haystack-AI Orchestrator** - AI-based intelligent task orchestration (recommended)
- **Traditional Orchestrator** - Rule-driven stable orchestration solution
- **Automatic Fallback Mechanism** - Automatic switch when AI orchestrator is unavailable
- **Dynamic Architecture Selection** - Automatically selects the optimal architecture based on project scale
### 🔍 Advanced Analytical Capabilities
- 🛡️ **Comprehensive Security Analysis** - Thoroughly detect vulnerabilities and provide effective remediation suggestions
- 🌐 **Cross-File Security Analysis** - Identify cross-file vulnerabilities that traditional single-file analysis cannot detect
- 📊 **Global Context Construction** - Construct code call graphs, data flow diagrams, and dependencies
- 🎯 **Taint Analysis** - Trace the propagation path of user inputs within the code
- 🔄 **MCP (Multi-Agent Cooperation Protocol)** - Enhance coordination and collaboration between agents
### 🌐 Enterprise-Level Support
- **Multi-LLM Vendor Support** - Supports multiple vendors including OpenAI, DeepSeek, MoonShot, and Tongyi Qianwen.
- **Automatic Vendor Detection** - Automatically identifies and configures the correct vendor API based on the model name.
- **Asynchronous Parallel Processing** - Utilizes asynchronous concurrency techniques to enhance performance and accelerate analysis speed.
- **Visualization Features** - Generates dependency graphs and detailed security reports.
## 📋 Table of Contents
- [Quick Start](#-quick-start)
- [Hierarchical RAG Architecture](#-hierarchical-rag-architecture)
- [Documentation](#-documentation)
- [Installation](#-installation)
- [Usage](#-usage)
- [Configuration](#-configuration)
- [Supported Languages](#-supported-languages)
- [Architecture](#-architecture)
- [Report Format](#-report-format)
- [Contributing](#-contributing)
- [License](#-license)
## 🚀 Quick Start
```bash
```
# 1. Clone the project
git clone https://github.com/Vistaminc/AuditLuma.git
cd AuditLuma
# 2. Install Dependencies
pip install -r requirements.txt
# 3. Using Hierarchical RAG Architecture Analysis (Recommended)
python main.py --architecture hierarchical --haystack-orchestrator ai -d ./your-project
# 4. View Architecture Information
```bash
python main.py --show-architecture-info
```
## 🏗️ Hierarchical RAG Architecture
AuditLuma 2.0 introduces an innovative four-layer RAG architecture, significantly enhancing analysis accuracy and efficiency:
```
┌─────────────────────────────────────────────────────────────┐
│ Hierarchical RAG Architecture │
├─────────────────────────────────────────────────────────────┤
│ Layer 1: Haystack Orchestration Layer │
│ ├─ Haystack-AI Orchestrator (Recommended) - Intelligent task decomposition and result integration │
│ └─ Traditional Orchestrator - Rule-driven stable solution │
├─────────────────────────────────────────────────────────────┤
│ Layer 2: txtai Knowledge Retrieval Layer │
│ ├─ Semantic search and similarity matching │
│ └─ Context understanding and knowledge graph construction │
├─────────────────────────────────────────────────────────────┤
│ Layer 3: R2R Context Enhancement Layer │
│ ├─ Dynamic context expansion │
│ └─ Correlation analysis and dependency tracking │
├─────────────────────────────────────────────────────────────┤
│ Layer 4: Self-RAG Validation Layer │
│ ├─ Multi-model cross-validation │
│ └─ False positive filtering and confidence assessment │
└─────────────────────────────────────────────────────────────┘
```
### Architectural Advantages
- **🎯 Improved Accuracy** - Four-layer verification mechanism significantly reduces false positives
- **⚡ Performance Optimization** - Intelligent caching and parallel processing enhance analysis speed
- **🔄 Adaptive** - Automatically selects the optimal configuration based on project scale
- **🛡️ Reliability** - Multiple fallback mechanisms ensure stable system operation
## 📚 Documentation
### 🚀 Getting Started Guide
- [Installation Guide](./docs/installation-guide.md) - Detailed installation steps and environment configuration
- [User Guide](./docs/user-guide.md) - A complete tutorial from beginner to advanced usage
- [Quick Reference](./docs/quick-reference.md) - A quick reference manual for commonly used commands and configurations
### 🏗️ Core Documentation
- [Hierarchical RAG Architecture Guide](./docs/hierarchical-rag-guide.md) - Detailed explanation and usage guide for the hierarchical RAG architecture
- [Configuration Reference](./docs/configuration-reference.md) - Complete configuration options and parameter descriptions
- [Best Practices](./docs/best-practices.md) - Usage recommendations, performance optimization, and security configuration
### 🔧 Technical Documentation
- [Architecture Design](./docs/architecture-design.md) - System architecture and design philosophy
- [Troubleshooting Guide](./docs/troubleshooting.md) - Common issues, error diagnosis, and solutions
- [Project Structure](./项目结构.md) - Detailed project directory structure and module descriptions
### 📖 Online Resources
- [AuditLuma Documentation](https://iwt6omodfh0.feishu.cn/drive/folder/OwWqf7EYblaqTNdaDbtcnQcHnTt) - Complete online documentation and tutorials
## 🚀 Installation
Clone the repository and install the dependencies:
```bash
git clone https://github.com/Vistaminc/AuditLuma.git
cd AuditLuma
pip install -r requirements.txt
```
### Optional Dependencies
**FAISS Vector Search Library**
By default, AuditLuma uses a simple built-in vector storage implementation. If you need to handle large codebases, it is recommended to install FAISS for improved performance:
```bash
# CPU Version
pip install faiss-cpu
# GPU Version (Supports CUDA)
pip install faiss-gpu
```
After installing FAISS, the system will automatically detect and use it for vector storage and retrieval, significantly improving performance when analyzing large projects.
## 🛠 Usage
### Basic Usage
```bash
```
# Using Hierarchical RAG Architecture (Recommended)
python main.py --architecture hierarchical -d ./your-project -o ./reports
# Using Haystack-AI Orchestrator (Default, Recommended)
python main.py --architecture hierarchical --haystack-orchestrator ai -d ./your-project
# Using Traditional Orchestrator
python main.py --architecture hierarchical --haystack-orchestrator traditional -d ./your-project
# Automatic Architecture Selection (Based on Project Size)
python main.py --architecture auto -d ./your-project
# Traditional RAG Architecture (Backward Compatible)
```bash
python main.py --architecture traditional -d ./your-project
```
### Advanced Usage
```bash
```
# Enable Performance Comparison Mode
python main.py --architecture hierarchical --enable-performance-comparison -d ./your-project
# View Architecture Information and Configuration
python main.py --show-architecture-info
# Configuration Migration (Upgrade from Traditional Configuration to Hierarchical RAG)
python main.py --config-migrate
# AI-Enhanced Cross-File Analysis
```bash
python main.py --architecture hierarchical --enhanced-analysis -d ./your-project
```
### Command Line Parameters
#### Basic Parameters
| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| `-d, --directory` | Target project directory | `./goalfile` |
| `-o, --output` | Report output directory | `./reports` |
| `-w, --workers` | Number of parallel worker threads | max_batch_size in configuration |
| `-f, --format` | Report format (html/pdf/json) | report_format in configuration |
#### Architecture Selection Parameters
| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| `--architecture` | RAG architecture mode (traditional/hierarchical/auto) | `auto` |
| `--haystack-orchestrator` | Haystack orchestrator type (traditional/ai) | `ai` |
| `--force-traditional` | Force the use of traditional RAG architecture | - |
| `--force-hierarchical` | Force the use of hierarchical RAG architecture | - |
| `--enable-performance-comparison` | Enable performance comparison mode | - |
| `--auto-switch-threshold` | File count threshold for automatic architecture switching | `100` |
#### Hierarchical RAG Specific Parameters
| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| `--enable-txtai` | Enable txtai knowledge retrieval layer | - |
| `--enable-r2r` | Enable R2R context enhancement layer | - |
| `--enable-self-rag-validation` | Enable Self-RAG validation layer | - |
| `--disable-caching` | Disable hierarchical caching system | - |
| `--disable-monitoring` | Disable performance monitoring | - |
#### Traditional Function Parameters
| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| `--no-mcp` | Disable Multi-Agent Collaboration Protocol | Enabled by default |
| `--no-self-rag` | Disable Self-RAG Retrieval | Enabled by default |
| `--no-deps` | Skip Dependency Analysis | Not skipped by default |
| `--no-remediation` | Skip Generating Remediation Suggestions | Not skipped by default |
| `--no-cross-file` | Disable Cross-File Vulnerability Detection | Enabled by default |
| `--enhanced-analysis` | Enable AI-Enhanced Cross-File Analysis | Disabled by default |
#### Other Parameters
| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| `--verbose` | Enable detailed logging | Disabled by default |
| `--dry-run` | Dry run mode (does not perform actual analysis) | - |
| `--config-migrate` | Migrate configuration to hierarchical RAG format | - |
| `--show-architecture-info` | Display current architecture information and exit | - |
## ⚙️ Configuration
Configure the system by editing the `config/config.yaml` file. AuditLuma 2.0 supports hierarchical RAG (Retrieval-Augmented Generation) architecture configuration.
### Hierarchical RAG Configuration
```yaml
# Hierarchical RAG Architecture Model Configuration
hierarchical_rag_models:
# Whether to enable hierarchical RAG architecture
enabled: true
# Haystack orchestration layer configuration
haystack:
# Orchestrator type selection: traditional or ai (Haystack-AI, recommended)
orchestrator_type: "ai" # Default uses Haystack-AI orchestrator
# Default model (supports model@provider format)
default_model: "qwen3:32b@ollama"
# Task-specific model configuration
task_models:
security_scan: "gpt-4@openai" # Security scan uses a stronger model
syntax_check: "deepseek-chat@deepseek" # Syntax check
logic_analysis: "qwen-turbo@qwen" # Logic analysis
dependency_analysis: "gpt-3.5-turbo@openai" # Dependency analysis
# txtai knowledge retrieval layer model configuration
txtai:
retrieval_model: "gpt-3.5-turbo@openai" # Knowledge retrieval model
embedding_model: "text-embedding-ada-002@openai" # Embedding model
# R2R context enhancement layer model configuration
r2r:
context_model: "gpt-3.5-turbo@openai" # Context analysis model
enhancement_model: "gpt-3.5-turbo@openai" # Enhancement model
# Self-RAG validation layer model configuration
self_rag_validation:
validation_model: "gpt-3.5-turbo@openai" # Main validation model
cross_validation_models: # Multiple models used for cross-validation
- "gpt-4@openai"
- "deepseek-chat@deepseek"
- "gpt-3.5-turbo@openai"
### Model Specification Format
AuditLuma supports the use of a unified model specification format `model@provider` to specify the model and provider:
```
deepseek-chat@deepseek # Specifies the deepseek-chat model from the DeepSeek provider
gpt-4-turbo@openai # Specifies the gpt-4-turbo model from the OpenAI provider
qwen-turbo@qwen # Specifies the qwen-turbo model from the Qwen provider
```
If the provider is not specified (without using the @ symbol), the system will automatically infer the provider based on the model name.
### Architecture Selection Configuration
```yaml
# Global Settings
global:
# Default architecture mode: traditional, hierarchical, auto
default_architecture: "hierarchical"
# Auto switch threshold (number of files)
auto_switch_threshold: 100
# Enable performance comparison
enable_performance_comparison: false
```
### Multi-Vendor Support
AuditLuma supports multiple LLM (Large Language Model) vendors and can automatically detect the vendor based on the model name:
| Model Prefix | Vendor |
|--------------|--------|
| `gpt-` | OpenAI |
| `deepseek-` | DeepSeek |
| `qwen-` | Tongyi Qianwen |
| `glm-` or `chatglm` | Zhiyuan AI |
| `baichuan` | Baichuan |
| `ollama-` | ollama |
- Note: The OpenAI vendor can interface with all OpenAI format relay platforms.
## 💻 Supported Languages
AuditLuma supports analyzing the following programming languages:
### Main Languages (Top 10 Included)
- Python (.py)
- JavaScript (.js, .jsx)
- TypeScript (.ts, .tsx)
- Java (.java)
- C# (.cs)
- C++ (.cpp, .cc, .hpp)
- C (.c, .h)
- Go (.go)
- Ruby (.rb)
- PHP (.php)
- Lua (.lua)
### Other Supported Languages
- Rust (.rs)
- Swift (.swift)
- Kotlin (.kt)
- Scala (.scala)
- Dart (.dart)
- Bash (.sh, .bash)
- PowerShell (.ps1, .psm1)
### Markup and Configuration Languages
- HTML (.html, .htm)
- CSS (.css)
- JSON (.json)
- XML (.xml)
- YAML (.yml, .yaml)
- SQL (.sql)
## 🏛 Architecture
AuditLuma uses a multi-agent architecture, consisting of the following components:

1. **Agent Orchestrator** - Coordinates all agents in the workflow
2. **Code Analysis Agent** - Analyzes code structure and extracts dependencies
3. **Security Analysis Agent** - Identifies security vulnerabilities
4. **Fix Suggestion Agent** - Generates targeted vulnerability remediation plans
5. **Visualization Component** - Produces intuitive reports and dependency graphs
## 📊 Report Formats
AuditLuma supports the following report formats:
- 📋 **HTML Report** - Contains vulnerability details, statistical charts, and interactive visualizations
- 📄 **PDF Report** - A format suitable for printing and sharing
- 🔄 **JSON Report** - A machine-readable format suitable for further processing and integration
## 💬 Contribution
We welcome code and suggestions! Please follow these steps:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Create a Pull Request
## 📞 Communication Methods
- QQ: 1047736593
## 🤝 Partners
- [Marshmallow Cybersecurity Circle](https://vip.bdziyi.com/?ref=711)
## Support and Appreciation
If you find AuditLuma helpful, you are welcome to support us in the following ways:
- Your sponsorship will be used to help us continuously improve and enhance AuditLuma!
<div style="display: flex; justify-content: space-between; max-width: 600px; margin: 0 auto;">
<div style="flex: 1; margin-right: 20px;">
<img src="https://github.com/Vistaminc/Miniluma/blob/main/ui/web/static/img/zanshang/wechat.jpg"/>
</div>
<div style="flex: 1;">
<img src="https://github.com/Vistaminc/Miniluma/blob/main/ui/web/static/img/zanshang/zfb.jpg"/>
</div>
</div>
## Star History
[](https://www.star-history.com/#)
## 📜 License
MIT
---
<div align="center">
<sub>Built with ❤️ by AuditLuma Team</sub>
</div>
Connection Info
You Might Also Like
MarkItDown MCP
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
Git
A Model Context Protocol server for Git automation and interaction.
Everything
Model Context Protocol Servers