Content

# Paper Fetch Skill > Fetch papers as agent-ready markdown — DOI/URL/title in, structured full text out. CLI · MCP · Skill. **Paper Fetch Skill** —— AI reading layer for papers. You input DOI, URL, or title, and it returns structured metadata + clean Markdown full text + image resources, directly feeding into Codex / Claude Code / any MCP host. No paywall bypassing, only where you already have access, upgrading AI from "only reading abstracts" to "reading full text". If you find it helpful, star⭐ to support! ## 🙁 Pain Points for AI Agents Reading Papers 1. You have permission to access the full text, but AI doesn't, so AI can only find abstracts. 2. PDFs can't be parsed correctly for text and images, and agent understanding is not as good as Markdown. 3. Article HTML has a lot of irrelevant web information, causing semantic burden and token consumption for agents. 4. Images in article HTML can't be read by agents. ## 😍 What This Project Does ✅ This project converges these issues into a tool layer: 1. When you have full-text access permission, let AI also access the full text, not just abstracts. 2. Input the DOI, URL, or title of a known paper, crawl a Markdown version that's easier for AI to understand, and prepare clean data for subsequent knowledge base construction. ✅ The project provides three main entry points: 1. `paper-fetch`: Command-line tool, suitable for manual large-scale rapid paper crawling. 2. `paper-fetch-mcp`: Stdio MCP server, suitable for connecting to Codex, Claude Code, and other MCP-supported hosts. 3. `skills/paper-fetch-skill/`: Static agent skill, telling agents when to call the paper crawling tool. Core capabilities: - Support DOI, URL, and title queries. - Output structured paper metadata, main text Markdown, citation information, and local cache resources. - Support 17 publisher/platform full-text providers, including arXiv, Elsevier, Springer, Wiley, Science, PNAS, IEEE, Copernicus, AMS, MDPI, Royal Society Publishing, Annual Reviews, PLOS, Oxford Academic, ACS, IOP, and AIP. - Return abstract-only or metadata-only results with warnings when full text cannot be obtained. Project boundaries: - Not a substitute for topic search, literature recommendation, or review generation; but can crawl and verify candidate paper full texts in these processes to enhance subsequent analysis quality. - No paywall bypassing or access authorization; usability depends on provider, credentials, and local running environment. - For Wiley, Science, PNAS, AMS, Annual Reviews, ACS, IOP, AIP, and MDPI, use CloakBrowser for unified browser path. ## Demo After installing the skill, the agent can recognize the applicable boundaries of `paper-fetch-skill` and confirm whether to save the full text and image resources before crawling. ![Agent recognizes paper-fetch-skill capability range](figures/agent-skill-overview.png) The following examples are from real open crawling results in `figures/`. ### Nature Example - Paper: Towards end-to-end automation of AI research - DOI: `10.1038/s41586-026-10265-5` - Source: Springer/Nature HTML full text - License: [`CC BY 4.0`](https://creativecommons.org/licenses/by/4.0) - Markdown full text: [`towards-end-to-end-automation-of-ai-research.md`](figures/towards-end-to-end-automation-of-ai-research.md) ![Nature paper crawling result](figures/nature-oa-fetch-result.png) ### Science Advances Example - Paper: Deforestation-induced runoff changes dominated by forest-climate feedbacks - DOI: `10.1126/sciadv.adp3964` - Source: Science Advances / Science provider - Markdown full text: [`deforestation-induced-runoff-changes-dominated-by-forest-climate-feedbacks.md`](figures/deforestation-induced-runoff-changes-dominated-by-forest-climate-feedbacks.md) ![Science Advances paper crawling result](figures/science-fetch-result.png) ## Quick Installation ### Offline Installation (Recommended) The offline release asset contains 4 Linux ABI self-extracting `.sh` installers, 4 macOS ABI tarballs, and 1 Windows x86_64 installer. The macOS tarball is built on `macos-latest` with the same build script, targeting runner architecture and CPython 3.11, 3.12, 3.13, and 3.14, and verified by macOS CI for installation, headful preset, and CloakBrowser browser launch smoke. ```text paper-fetch-skill-offline-linux-x86_64-cp311.sh paper-fetch-skill-offline-linux-x86_64-cp312.sh paper-fetch-skill-offline-linux-x86_64-cp313.sh paper-fetch-skill-offline-linux-x86_64-cp314.sh paper-fetch-skill-offline-macos-<arch>-cp311.tar.gz paper-fetch-skill-offline-macos-<arch>-cp312.tar.gz paper-fetch-skill-offline-macos-<arch>-cp313.tar.gz paper-fetch-skill-offline-macos-<arch>-cp314.tar.gz paper-fetch-skill-windows-x86_64-setup.exe ``` #### **I. Windows x86_64:** **1. Download the installer** Download ```text paper-fetch-skill-windows-x86_64-setup.exe ``` **2. Double-click to install or run the installer locally** ```powershell .\paper-fetch-skill-windows-x86_64-setup.exe ``` The installer defaults to installing to `%LOCALAPPDATA%\PaperFetchSkill`, does not require administrator privileges. Automatically installs the `paper-fetch` CLI tool, registers MCP, and installs the Skill. If user-level PATH / Skill / MCP integration or smoke check fails on the local machine, the runtime remains in the installation directory, with detailed warnings in `%LOCALAPPDATA%\PaperFetchSkill\install-helper.log`. **3. Verify installation** Open a new PowerShell ```powershell paper-fetch --help ``` If the output is `usage: cli.py [-h] -` (and more), the installation is successful. **4. Enable Wiley / Science / PNAS / AMS / Annual Reviews / ACS / IOP / AIP / MDPI browser path** The installer registers CloakBrowser's default headless environment and enables a regular Chrome browser UA in `offline.env` by default, reducing the probability of entering Cloudflare challenge on AGU/Wiley pages. For restricted environments, set `CLOAKBROWSER_BINARY_PATH` in `offline.env` to point to a pre-installed browser; if the desktop display environment still encounters challenges, set `CLOAKBROWSER_HEADLESS=false`, or use `--preset=headful` when installing the Linux / macOS offline bundle. **5. Enable Elsevier access permission** Elsevier's official XML/API and PDF fallback require applying for a key from <https://dev.elsevier.com/> and writing it to `offline.env` in the installation directory: ```powershell notepad "$env:LOCALAPPDATA\PaperFetchSkill\offline.env" ``` **6. Refresh agent skill** After modifying Codex / Claude Code skill, MCP configuration, or `offline.env`, restart the corresponding host; already started MCP services will not automatically inherit new environment variables. **7. Frequently Asked Questions** See [`docs/deployment.md`](docs/deployment.md) for Windows installer and offline installation details. #### **II. Linux** **1. Download the installer** Check the Python version ```bash python3 --version ``` Download the package matching the target machine's Python version from Releases. ```text paper-fetch-skill-offline-linux-x86_64-cp311.sh paper-fetch-skill-offline-linux-x86_64-cp312.sh paper-fetch-skill-offline-linux-x86_64-cp313.sh paper-fetch-skill-offline-linux-x86_64-cp314.sh ``` The Linux `.sh` is a self-extracting installer, with an internal payload being a pre-installed runtime package, not a source code snapshot. Defaults to installing to `~/.local/share/paper-fetch-skill`, can also specify a fixed directory with `--install-dir <path>`. Ubuntu 24.04 has a default Python version of 3.12, and Ubuntu 26.04 has a default Python version of 3.14. Run the installer directly: ```bash chmod +x paper-fetch-skill-offline-linux-x86_64-cp312.sh ./paper-fetch-skill-offline-linux-x86_64-cp312.sh --preset=headless --no-user-config source ~/.local/share/paper-fetch-skill/activate-offline.sh ``` For desktop display environments, use: ```bash ./paper-fetch-skill-offline-linux-x86_64-cp312.sh --preset=headful --no-user-config ``` To fix to a custom directory: ```bash ./paper-fetch-skill-offline-linux-x86_64-cp312.sh --install-dir "$HOME/tools/paper-fetch-skill" --preset=headless --no-user-config source "$HOME/tools/paper-fetch-skill/activate-offline.sh" ``` Linux / macOS offline installation prioritizes `MATHML_TO_LATEX_NODE_BIN` pointing to the package's internal Playwright Node, avoiding dependence on the system's PATH `node`; the generated `activate-offline.sh` can be `source` in bash or zsh. macOS offline release assets provide tarballs by CPython ABI; download and extract the tarball matching the target machine's architecture and Python version. macOS browser debugging usage: ```bash tar -xzf paper-fetch-skill-offline-macos-arm64-cp312.tar.gz cd paper-fetch-skill-offline-macos-arm64-cp312 ./install-offline.sh --preset=headful --no-user-config source ~/.local/share/paper-fetch-skill/activate-offline.sh ``` #### **III. Update and Uninstall** **Update** Windows downloads the new `paper-fetch-skill-windows-x86_64-setup.exe` and runs it directly. The installer backs up `%LOCALAPPDATA%\PaperFetchSkill\offline.env`, cleans up the old installation payload, installs the new runtime, and writes back user configuration, refreshing managed runtime configuration, PATH, Skill, and MCP registration. Linux downloads the new `.sh` matching the target machine's Python version and runs it directly. The default installation directory is fixed to `~/.local/share/paper-fetch-skill`, and the upgrade cleans up the old runtime payload, removes old source code/build residuals, and retains `offline.env` in the installation directory. To reuse an external env file without modifying it, use `--reuse-env-file`: ```bash ./paper-fetch-skill-offline-linux-x86_64-cp312.sh --preset=headless --no-user-config ./paper-fetch-skill-offline-linux-x86_64-cp312.sh --preset=headless --no-user-config --reuse-env-file /path/to/shared/offline.env source ~/.local/share/paper-fetch-skill/activate-offline.sh ``` `--reuse-env-file` lets the shell / Skill / MCP point to the new runtime but does not modify the reused `offline.env`. Restart Codex / Claude Code after updating. **Uninstall** Windows uninstalls `Paper Fetch Skill` from “Settings > Apps > Installed Apps” or runs: ```powershell & "$env:LOCALAPPDATA\PaperFetchSkill\unins000.exe" ``` Backup `offline.env` before uninstalling if you want to retain API keys. Linux runs: ```bash ~/.local/share/paper-fetch-skill/install-offline.sh --uninstall ``` This command only cleans up user-level PATH / Skill / MCP integration, not deleting the fixed installation directory, `bin/`, `runtime/`, `offline.env`, or `downloads/`; run `~/.local/share/paper-fetch-skill/install-offline.sh --purge` to delete the installation directory after confirming it is no longer needed. ### Online Installation (Not Recommended, for Development) Run in the repository root: ```bash ./install.sh ``` Defaults to creating a `.venv` in the repository, installing Python packages, and preparing CloakBrowser dependencies and formula backends. To install only Python packages and basic configuration: ```bash ./install.sh --lite ``` See [`docs/providers.md`](docs/providers.md#arxiv) for arXiv path details. To install into the current Python environment: ```bash python3 -m pip install . ``` Available commands after installation: ```bash paper-fetch --query "10.1186/1471-2105-11-421" paper-fetch-mcp ``` ### CLI Behavior Quick Reference The output of `paper-fetch` is divided between the local artifact parameters as follows: - `--format markdown|json|both` specifies the serialization format of the main output file to stdout, `--output` or `--output-dir`, defaulting to `markdown`. - `--query-file <path>` enables batch fetching, one DOI, URL, or title per line; empty lines and comment lines starting with `#` are ignored. In batch mode, the main output is not printed to stdout, but instead, each main output is written to the output directory, and a JSONL summary is generated. - `--output <path>` writes the formatted result to the specified file; explicitly specifying `--output -` means printing to the terminal. - `--output-dir <dir>` is the directory where the main output, Markdown, PDF fallback source files, and local assets are saved; the CLI automatically creates this directory before fetching. If `--output` is not explicitly specified, the main output will be written to `<doi>.md`, `<doi>.json`, or `<doi>.both.json`, and the body will not be printed to the terminal. - `--batch-concurrency <1..8>` controls batch concurrency, defaulting to `1`; `--batch-results <path>` can override the default `<output-dir>/batch-results.jsonl`. - `--artifact-mode markdown-assets|all|none` controls the retention of intermediate artifacts, with the CLI defaulting to `markdown-assets`: saving Markdown, assets according to `--asset-profile`, but not retaining provider original HTML/XML, fetch-envelope/cache JSON, or HTTP textual cache; if the body comes from PDF fallback, the PDF source file will still be saved for tracing. - `--artifact-mode all` retains the old behavior: provider HTML/PDF, auxiliary artifacts, HTTP textual cache, and other debugging artifacts can be saved to disk. - `--artifact-mode none` does not save provider artifacts or assets; explicitly specifying `--output <path>`, `--save-markdown`, and the main output received by `--output-dir` when not explicitly specified can still write files. `--no-download` is retained for compatibility but is deprecated, equivalent to `--artifact-mode none`. - `--asset-profile none|body|all` controls the scope of local content asset downloads, with the CLI defaulting to `body`: `none` does not download local assets but retains Markdown parsable remote image links, `body` saves body images/charts/formula images, and `all` additionally saves supplementary materials. See [`docs/cli.md`](docs/cli.md) for complete command combinations, main output and artifact distinctions, error output, and exit codes. For example: ```bash paper-fetch --query "https://www.nature.com/articles/s41559-026-03039-9" \ --output-dir ./papers ``` This will write Markdown to `./papers/<doi>.md`, not print the body to the terminal, and save body images and other assets according to the default `--asset-profile body`; by default, provider original HTML/XML or JSON/cache sidecar will not be saved. Explicitly use `--artifact-mode all` for complete debugging artifacts. If you need to force printing to the terminal, explicitly pass `--output -`. For batch fetching, prepare a query file: ```text # One DOI, URL, or title per line 10.1186/1471-2105-11-421 https://www.nature.com/articles/s41559-026-03039-9 ``` Then run: ```bash paper-fetch --query-file ./queries.txt \ --output-dir ./papers \ --batch-concurrency 4 ``` This will write each Markdown and body asset to `./papers` and generate `./papers/batch-results.jsonl`. Single failures will be recorded to JSONL and continue processing subsequent entries. If you only want to control the file path of the formatted result, explicitly use `--output`: ```bash paper-fetch --query "10.1186/1471-2105-11-421" \ --format markdown \ --output ./papers/article.md \ --output-dir ./papers ``` Explicit `--output <path>` only controls the main output file path and does not automatically create the parent directory of the file. At the end of the installation script, you will be prompted for the Elsevier official API configuration entry. Before fetching Elsevier full text, you need to apply for a key from <https://dev.elsevier.com/> and fill in `ELSEVIER_API_KEY` in the configuration file. ### Configuration File Default configuration file location: ```text ~/.config/paper-fetch/.env ``` When you need an API key, custom download directory, or User-Agent, you can create a configuration file: ```bash mkdir -p ~/.config/paper-fetch cp .env.example ~/.config/paper-fetch/.env ``` Among them, Elsevier official XML/API and PDF fallback require at least applying for and configuring from <https://dev.elsevier.com/>: ```bash ELSEVIER_API_KEY="..." ``` You can also explicitly specify it through environment variables: ```bash export PAPER_FETCH_ENV_FILE=/path/to/.env ``` See [`docs/providers.md`](docs/providers.md) for a complete list of environment variables. ### Integrating with Codex Install the skill and register the MCP server: ```bash ./scripts/install-codex-skill.sh --register-mcp ``` Register with a configuration file: ```bash ./scripts/install-codex-skill.sh --register-mcp --env-file ~/.config/paper-fetch/.env ``` Install only to the current project: ```bash ./scripts/install-codex-skill.sh --project --register-mcp ``` After installation, restart Codex to let it rescan skills and MCP configurations. ### Integrating with Claude Code ```bash ./scripts/install-claude-skill.sh --register-mcp ``` Commonly used parameters include: ```bash ./scripts/install-claude-skill.sh --project --register-mcp ./scripts/install-claude-skill.sh --register-mcp --env-file ~/.config/paper-fetch/.env ``` ### Manual MCP Registration Any host supporting stdio MCP can directly run: ```bash paper-fetch-mcp ``` Or: ```bash python3 -m paper_fetch.mcp.server ``` Codex CLI can manually register the same stdio server: ```bash codex mcp add paper-fetch -- python3 -X utf8 -m paper_fetch.mcp.server ``` ### Common Fetching Parameters The complete semantics of MCP default mode, `artifact_mode`, `prefer_cache`, `no_download`, and `save_markdown` are in [`docs/providers.md`](docs/providers.md#mcp-download-and-markdown-save). MCP `artifact_mode` defaults to `markdown-assets`; `strategy.asset_profile` supports `none`, `body`, `all`, and if MCP/Python API is not explicitly set, it is determined by the provider. ### Update After updating the repository, reinstall the package and agent integration: ```bash python3 -m pip install . ./scripts/install-codex-skill.sh --register-mcp ``` For Claude Code users, execute: ```bash ./scripts/install-claude-skill.sh --register-mcp ``` ## Documentation - [`docs/deployment.md`](docs/deployment.md): Installation, configuration, MCP registration, and updates. - [`docs/providers.md`](docs/providers.md): Provider capabilities, environment variables, and runtime configurations. - [`docs/README.md`](docs/README.md): Complete documentation navigation. - [`docs/architecture/overview.md`](docs/architecture/overview.md): Architecture boundaries and maintainer perspectives. ## Disclaimer This project retrieves research paper content through publicly accessible open access interfaces, publisher routes, and user-configured credentials. - The retrieved literature is only for personal academic research and learning use and shall not be used for commercial purposes. - Please comply with the copyright laws and regulations of your country/region and the intellectual property policies of your institution. - This project does not bypass paywalls or access authorization; availability depends on the provider, credentials, and local running environment. - This project does not store, distribute, or disseminate any literature content, only assisting users in locating, fetching, or converting paper content that users have the right to access. - Literature samples in fixtures are only used for testing, and it is strictly prohibited to redistribute any form of fixtures. - Users are responsible for their literature retrieval and usage. ## Community <https://linux.do/>

paper-fetch-skill

Content

MCP Config

Connection Info

You Might Also Like

Filesystem

Fetch

Context 7

context7-mcp

mempalace

chrome-devtools-mcp

paper-fetch-skill

Scan with WeChat to Share

Authentication Required

Content

MCP Config

Connection Info

You Might Also Like

Filesystem

Fetch

Context 7

context7-mcp

mempalace

chrome-devtools-mcp