Content

# Tool List ## Project Introduction A powerful tool for exporting Zhihu collections, supporting batch export of Zhihu collections (public and private) to Markdown format files. Supports custom output paths, cross-platform compatibility, image download, and Obsidian compatible format. Supports using large model API to generate article summaries with one click. **MCP Server is also provided**, which can be directly called by AI Agents (such as Claude Code) to provide the ability to save Zhihu collections for large models. ## Main Features - 📚 **Batch Processing**: Supports processing multiple collections simultaneously - 📂 **Custom Output**: Supports users to specify output directories - 🖥️ **Cross-Platform Compatibility**: Supports Windows, Linux, macOS and other systems - ✨ **Intelligent Deduplication**: Automatically checks for existing files to avoid duplicate downloads - 🖼️ **Image Download**: Automatically downloads and saves images in articles - 📝 **Obsidian Compatible**: Output Markdown format is fully compatible with Obsidian - 📈 **Real-Time Logs**: Supports real-time log refresh, immediately displaying processing progress - 🔍 **Automatic Collection Acquisition**: Automatically acquires user collection lists through independent scripts - 🛠️ **Robust Error Handling**: Single article failure does not affect overall processing flow - 🔧 **Enhanced Debugging Support**: Automatically generates debugging files for easy problem tracking - 🤖 **MCP Support**: Provides MCP Server, which can be called by AI Agents ## Two Operation Modes ### Method 1: Command Line Operation (Traditional Method) Suitable for directly running export tasks in the terminal. #### Installation ```bash pip install -r requirements.txt ``` #### Configuration File Settings First, create the main configuration file: ```bash # Copy configuration example file cp config_examples.json config.json ``` Edit the `config.json` file to configure your collection information: ```json { "zhihuUrls": [ { "name": "Collection Name", "url": "https://www.zhihu.com/collection/123456789" }, { "name": "Another Collection", "url": "https://www.zhihu.com/collection/987654321" } ], "outputPath": "/path/to/your/output/directory", "os": "linux" } ``` The zhihuUrls can be obtained from <img src="./readme/image.png"> #### Automatically Acquire Collections (Optional) If you want to automatically acquire all collections, you can run: ```bash python fetch_collections.py ``` This will automatically update the collection list in the `config.json` file. #### Run the Main Program ```bash python main.py ``` ### Method 2: MCP Server Operation (AI Agent Method) Suitable for integration and calling by Claude Code or other AI tools. #### Install MCP Dependencies ```bash # Install MCP package pip install mcp ``` #### Start MCP Server ```bash python mcp_server.py ``` The Server will communicate through stdio and can be used in Claude Code or other MCP clients. #### Available Tools | Tool Name | Description | |--------|------| | `list_collections` | List all Zhihu collections in the configuration file | | `export_collection` | Export the specified Zhihu collection to a Markdown file | | `get_collection_info` | Get basic information (number of articles, etc.) of the specified collection | | `search_collections` | Search for collections containing keywords in the configuration file | ##### list_collections ```python # Return formatted collection list ``` ##### export_collection Parameters: - `collection_url` (required): Collection URL - `collection_name` (optional): Collection name - `output_dir` (optional): Output directory ```python # Example export_collection( collection_url="https://www.zhihu.com/collection/123456789", collection_name="Python Study", output_dir="my_downloads" ) ``` ##### get_collection_info Parameters: - `collection_url` (required): Collection URL ##### search_collections Parameters: - `keyword` (required): Search keyword #### Use in Claude Code Add server configuration in the `CLAUDE.md` or user configuration file's MCP configuration section: ```json { "mcpServers": { "zhihu-collections": { "command": "python", "args": ["mcp_server.py"], "env": {}, "cwd": "Your project path" } } } ``` Then you can call using natural language: - "List my Zhihu collections" - "Export collection https://www.zhihu.com/collection/123456789" - "Search for collections containing Python" - "Get the number of articles in collection https://www.zhihu.com/collection/123456789" ## Configuration Options Explanation ### Basic Configuration - **zhihuUrls**: Collection list, each collection contains name and URL - **outputPath**: Custom output path (optional, leave blank for default path) - **os**: Operating system type (optional, leave blank for automatic detection) ### Supported Operating Systems and Path Formats - **Windows**: `"D:\\Documents\\ZhihuExports"` or `"D:/Documents/ZhihuExports"` - **Linux/Unix**: `"/usr/local/share/zhihu-exports"` - **macOS**: `"~/Documents/ZhihuExports"` - **Cygwin**: `"/cygdrive/d/Documents/ZhihuExports"` ### Private Collection Access For private collections, create a `cookies.json` file: ```json [ {"name": "cookie_name", "value": "cookie_value"}, {"name": "another_cookie", "value": "another_value"} ] ``` ## Output Results ### File Structure ``` Output Directory/ ├── Collection Name 1/ │ ├── Article 1.md │ ├── Article 2.md │ └── assets/ │ ├── image1.jpg │ └── image2.png ├── Collection Name 2/ ├── logs/ │ ├── debug_20240101_120000.log │ └── 20240101_120000.json └── debug/ ├── debug_answer_123456.html └── debug_post_789012.html ``` ### Default Output Location - **Default Path**: `downloads/` folder in the project directory - **Custom Path**: Specify `outputPath` in `config.json` - **Image Storage**: `assets/` folder in each collection directory - **Log Files**: `logs/` folder in the output directory - **Debugging Files**: `debug/` folder in the output directory (save unresolvable page HTML) ## Troubleshooting ### Common Issues 1. **Content Download Failure** - Check `downloads/debug/` directory for HTML files to analyze page structure - Check network connection and cookies for validity - Confirm article URL is accessible 2. **Empty Log File** - The program has fixed the real-time log refresh issue - If the problem persists, check output directory write permissions 3. **TypeError: cannot unpack non-iterable NoneType object** - This issue has been fixed in v2.1 - If encountered, update to the latest version 4. **Column article returns "The article link was 404" but the browser can open normally** - v2.1 added intelligent content detection and precise error analysis - Check debug directory HTML files for specific reasons - May need to update cookies or page structure changes ### Debugging Support - **Log Files**: `downloads/logs/debug_*.log` contains detailed processing information - **Debugging HTML**: `downloads/debug/debug_*.html` saves unresolvable pages - **Test Scripts**: `test/` directory contains various functional test scripts # BUG Feedback If you encounter any issues during use, please provide BUG information in the issue. To facilitate reproduction and resolution, please be sure to attach error prompts or related URLs. # Suggestions If you have any suggestions, welcome to initiate discussions in the issue. ## Update Log ### v2.3 Anti-Crawling Mechanism Bypass - 🔄 **API Backup Mechanism**: When HTML request returns 403, automatically call Zhihu API (`/api/v4/answers/{id}?include=content`) to obtain content - 📦 **Streaming Download**: Column articles use streaming download to solve network interruption problems for large articles (e.g., long articles with many images) - 🔁 **Retry Mechanism**: Column articles add 3 retries to improve download success rate - 🔐 **Headers Enhancement**: Update request Headers to simulate modern browsers and reduce anti-crawling interception probability ### v2.2 MCP Support - 🤖 **MCP Server**: Added `mcp_server.py`, supporting direct calls by AI Agents - 📋 **4 Tools**: Provide list_collections / export_collection / get_collection_info / search_collections tools - 🔌 **Standard Protocol**: Follow MCP protocol, supporting mainstream AI tool integration such as Claude Code ### v2.1 Enhanced Features - 🚀 **Real-Time Log System**: Support real-time log refresh and immediate display of processing progress - 🛠️ **Robust Error Handling**: Fixed TypeError and other critical errors, single article failure does not affect overall processing - 🔍 **Enhanced HTML Parsing**: Support multiple Zhihu page structures, improve content acquisition success rate - 🔧 **Debugging File Generation**: Automatically save unresolvable page HTML to `debug/` directory for analysis - 🔄 **Automatic Collection Acquisition**: Added `fetch_collections.py` independent script to automatically acquire collection lists - 📊 **Detailed Error Logs**: Record specific error information and stack tracking for easy problem location - 🎯 **Intelligent Content Detection**: When standard CSS selectors fail, automatically enable intelligent algorithms to detect article content - 🔍 **Precise Error Analysis**: Distinguish different error types such as 404, login requirements, permission issues, etc. ### v2.0 New Features - ✨ Batch processing of multiple collections - ⚙️ Configuration file system (config.json) - 📂 Custom output path support - 🖥️ Cross-platform path processing - ✨ Intelligent file deduplication function - 📈 Detailed processing log system - 🔄 Backward compatibility with old version configuration files ## Todo - Optimize crawling speed - Add more export format support - GUI interface development - Third-party large model API support

Zhihu-Collections-MCP

Content

MCP Config

Connection Info

You Might Also Like

everything-claude-code

markitdown

awesome-claude-skills

antigravity-awesome-skills

openfang

context-mode

Zhihu-Collections-MCP

Scan with WeChat to Share

Authentication Required

Content

MCP Config

Connection Info

You Might Also Like

everything-claude-code

markitdown

awesome-claude-skills

antigravity-awesome-skills

openfang

context-mode