Content
# Paper Fetch Skill
> Fetch papers as agent-ready markdown — DOI/URL/title in, structured full text out. CLI · MCP · Skill.
**Paper Fetch Skill** —— AI reading layer for papers.
You input DOI, URL, or title, and it returns structured metadata + clean Markdown full text + image resources, directly feeding into Codex / Claude Code / any MCP host.
No paywall bypassing, only where you already have access, upgrading AI from "only reading abstracts" to "reading full text".
If you find it helpful, star⭐ to support!
## 🙁 Pain Points for AI Agents Reading Papers
1. You have permission to access the full text, but AI doesn't, so AI can only find abstracts.
2. PDFs can't be parsed correctly for text and images, and agent understanding is not as good as Markdown.
3. Article HTML has a lot of irrelevant web information, causing semantic burden and token consumption for agents.
4. Images in article HTML can't be read by agents.
## 😍 What This Project Does
✅ This project converges these issues into a tool layer:
1. When you have full-text access permission, let AI also access the full text, not just abstracts.
2. Input the DOI, URL, or title of a known paper, crawl a Markdown version that's easier for AI to understand, and prepare clean data for subsequent knowledge base construction.
✅ The project provides three main entry points:
1. `paper-fetch`: Command-line tool, suitable for manual large-scale rapid paper crawling.
2. `paper-fetch-mcp`: Stdio MCP server, suitable for connecting to Codex, Claude Code, and other MCP-supported hosts.
3. `skills/paper-fetch-skill/`: Static agent skill, telling agents when to call the paper crawling tool.
Core capabilities:
- Support DOI, URL, and title queries.
- Output structured paper metadata, main text Markdown, citation information, and local cache resources.
- Support 17 publisher/platform full-text providers, including arXiv, Elsevier, Springer, Wiley, Science, PNAS, IEEE, Copernicus, AMS, MDPI, Royal Society Publishing, Annual Reviews, PLOS, Oxford Academic, ACS, IOP, and AIP.
- Return abstract-only or metadata-only results with warnings when full text cannot be obtained.
Project boundaries:
- Not a substitute for topic search, literature recommendation, or review generation; but can crawl and verify candidate paper full texts in these processes to enhance subsequent analysis quality.
- No paywall bypassing or access authorization; usability depends on provider, credentials, and local running environment.
- For Wiley, Science, PNAS, AMS, Annual Reviews, ACS, IOP, AIP, and MDPI, use CloakBrowser for unified browser path.
## Demo
After installing the skill, the agent can recognize the applicable boundaries of `paper-fetch-skill` and confirm whether to save the full text and image resources before crawling.

The following examples are from real open crawling results in `figures/`.
### Nature Example
- Paper: Towards end-to-end automation of AI research
- DOI: `10.1038/s41586-026-10265-5`
- Source: Springer/Nature HTML full text
- License: [`CC BY 4.0`](https://creativecommons.org/licenses/by/4.0)
- Markdown full text: [`towards-end-to-end-automation-of-ai-research.md`](figures/towards-end-to-end-automation-of-ai-research.md)

### Science Advances Example
- Paper: Deforestation-induced runoff changes dominated by forest-climate feedbacks
- DOI: `10.1126/sciadv.adp3964`
- Source: Science Advances / Science provider
- Markdown full text: [`deforestation-induced-runoff-changes-dominated-by-forest-climate-feedbacks.md`](figures/deforestation-induced-runoff-changes-dominated-by-forest-climate-feedbacks.md)

## Quick Installation
### Offline Installation (Recommended)
The offline release asset contains 4 Linux ABI self-extracting `.sh` installers, 4 macOS ABI tarballs, and 1 Windows x86_64 installer. The macOS tarball is built on `macos-latest` with the same build script, targeting runner architecture and CPython 3.11, 3.12, 3.13, and 3.14, and verified by macOS CI for installation, headful preset, and CloakBrowser browser launch smoke.
```text
paper-fetch-skill-offline-linux-x86_64-cp311.sh
paper-fetch-skill-offline-linux-x86_64-cp312.sh
paper-fetch-skill-offline-linux-x86_64-cp313.sh
paper-fetch-skill-offline-linux-x86_64-cp314.sh
paper-fetch-skill-offline-macos-<arch>-cp311.tar.gz
paper-fetch-skill-offline-macos-<arch>-cp312.tar.gz
paper-fetch-skill-offline-macos-<arch>-cp313.tar.gz
paper-fetch-skill-offline-macos-<arch>-cp314.tar.gz
paper-fetch-skill-windows-x86_64-setup.exe
```
#### **I. Windows x86_64:**
**1. Download the installer**
Download
```text
paper-fetch-skill-windows-x86_64-setup.exe
```
**2. Double-click to install or run the installer locally**
```powershell
.\paper-fetch-skill-windows-x86_64-setup.exe
```
The installer defaults to installing to `%LOCALAPPDATA%\PaperFetchSkill`, does not require administrator privileges. Automatically installs the `paper-fetch` CLI tool, registers MCP, and installs the Skill. If user-level PATH / Skill / MCP integration or smoke check fails on the local machine, the runtime remains in the installation directory, with detailed warnings in `%LOCALAPPDATA%\PaperFetchSkill\install-helper.log`.
**3. Verify installation**
Open a new PowerShell
```powershell
paper-fetch --help
```
If the output is `usage: cli.py [-h] -` (and more), the installation is successful.
**4. Enable Wiley / Science / PNAS / AMS / Annual Reviews / ACS / IOP / AIP / MDPI browser path**
The installer registers CloakBrowser's default headless environment and enables a regular Chrome browser UA in `offline.env` by default, reducing the probability of entering Cloudflare challenge on AGU/Wiley pages. For restricted environments, set `CLOAKBROWSER_BINARY_PATH` in `offline.env` to point to a pre-installed browser; if the desktop display environment still encounters challenges, set `CLOAKBROWSER_HEADLESS=false`, or use `--preset=headful` when installing the Linux / macOS offline bundle.
**5. Enable Elsevier access permission**
Elsevier's official XML/API and PDF fallback require applying for a key from <https://dev.elsevier.com/> and writing it to `offline.env` in the installation directory:
```powershell
notepad "$env:LOCALAPPDATA\PaperFetchSkill\offline.env"
```
**6. Refresh agent skill**
After modifying Codex / Claude Code skill, MCP configuration, or `offline.env`, restart the corresponding host; already started MCP services will not automatically inherit new environment variables.
**7. Frequently Asked Questions**
See [`docs/deployment.md`](docs/deployment.md) for Windows installer and offline installation details.
#### **II. Linux**
**1. Download the installer**
Check the Python version
```bash
python3 --version
```
Download the package matching the target machine's Python version from Releases.
```text
paper-fetch-skill-offline-linux-x86_64-cp311.sh
paper-fetch-skill-offline-linux-x86_64-cp312.sh
paper-fetch-skill-offline-linux-x86_64-cp313.sh
paper-fetch-skill-offline-linux-x86_64-cp314.sh
```
The Linux `.sh` is a self-extracting installer, with an internal payload being a pre-installed runtime package, not a source code snapshot. Defaults to installing to `~/.local/share/paper-fetch-skill`, can also specify a fixed directory with `--install-dir <path>`.
Ubuntu 24.04 has a default Python version of 3.12, and Ubuntu 26.04 has a default Python version of 3.14.
Run the installer directly:
```bash
chmod +x paper-fetch-skill-offline-linux-x86_64-cp312.sh
./paper-fetch-skill-offline-linux-x86_64-cp312.sh --preset=headless --no-user-config
source ~/.local/share/paper-fetch-skill/activate-offline.sh
```
For desktop display environments, use:
```bash
./paper-fetch-skill-offline-linux-x86_64-cp312.sh --preset=headful --no-user-config
```
To fix to a custom directory:
```bash
./paper-fetch-skill-offline-linux-x86_64-cp312.sh --install-dir "$HOME/tools/paper-fetch-skill" --preset=headless --no-user-config
source "$HOME/tools/paper-fetch-skill/activate-offline.sh"
```
Linux / macOS offline installation prioritizes `MATHML_TO_LATEX_NODE_BIN` pointing to the package's internal Playwright Node, avoiding dependence on the system's PATH `node`; the generated `activate-offline.sh` can be `source` in bash or zsh.
macOS offline release assets provide tarballs by CPython ABI; download and extract the tarball matching the target machine's architecture and Python version. macOS browser debugging usage:
```bash
tar -xzf paper-fetch-skill-offline-macos-arm64-cp312.tar.gz
cd paper-fetch-skill-offline-macos-arm64-cp312
./install-offline.sh --preset=headful --no-user-config
source ~/.local/share/paper-fetch-skill/activate-offline.sh
```
#### **III. Update and Uninstall**
**Update**
Windows downloads the new `paper-fetch-skill-windows-x86_64-setup.exe` and runs it directly. The installer backs up `%LOCALAPPDATA%\PaperFetchSkill\offline.env`, cleans up the old installation payload, installs the new runtime, and writes back user configuration, refreshing managed runtime configuration, PATH, Skill, and MCP registration.
Linux downloads the new `.sh` matching the target machine's Python version and runs it directly. The default installation directory is fixed to `~/.local/share/paper-fetch-skill`, and the upgrade cleans up the old runtime payload, removes old source code/build residuals, and retains `offline.env` in the installation directory. To reuse an external env file without modifying it, use `--reuse-env-file`:
```bash
./paper-fetch-skill-offline-linux-x86_64-cp312.sh --preset=headless --no-user-config
./paper-fetch-skill-offline-linux-x86_64-cp312.sh --preset=headless --no-user-config --reuse-env-file /path/to/shared/offline.env
source ~/.local/share/paper-fetch-skill/activate-offline.sh
```
`--reuse-env-file` lets the shell / Skill / MCP point to the new runtime but does not modify the reused `offline.env`. Restart Codex / Claude Code after updating.
**Uninstall**
Windows uninstalls `Paper Fetch Skill` from “Settings > Apps > Installed Apps” or runs:
```powershell
& "$env:LOCALAPPDATA\PaperFetchSkill\unins000.exe"
```
Backup `offline.env` before uninstalling if you want to retain API keys.
Linux runs:
```bash
~/.local/share/paper-fetch-skill/install-offline.sh --uninstall
```
This command only cleans up user-level PATH / Skill / MCP integration, not deleting the fixed installation directory, `bin/`, `runtime/`, `offline.env`, or `downloads/`; run `~/.local/share/paper-fetch-skill/install-offline.sh --purge` to delete the installation directory after confirming it is no longer needed.
### Online Installation (Not Recommended, for Development)
Run in the repository root:
```bash
./install.sh
```
Defaults to creating a `.venv` in the repository, installing Python packages, and preparing CloakBrowser dependencies and formula backends.
To install only Python packages and basic configuration:
```bash
./install.sh --lite
```
See [`docs/providers.md`](docs/providers.md#arxiv) for arXiv path details.
To install into the current Python environment:
```bash
python3 -m pip install .
```
Available commands after installation:
```bash
paper-fetch --query "10.1186/1471-2105-11-421"
paper-fetch-mcp
```
### CLI Behavior Quick Reference
The output of `paper-fetch` is divided between the local artifact parameters as follows:
- `--format markdown|json|both` specifies the serialization format of the main output file to stdout, `--output` or `--output-dir`, defaulting to `markdown`.
- `--query-file <path>` enables batch fetching, one DOI, URL, or title per line; empty lines and comment lines starting with `#` are ignored. In batch mode, the main output is not printed to stdout, but instead, each main output is written to the output directory, and a JSONL summary is generated.
- `--output <path>` writes the formatted result to the specified file; explicitly specifying `--output -` means printing to the terminal.
- `--output-dir <dir>` is the directory where the main output, Markdown, PDF fallback source files, and local assets are saved; the CLI automatically creates this directory before fetching. If `--output` is not explicitly specified, the main output will be written to `<doi>.md`, `<doi>.json`, or `<doi>.both.json`, and the body will not be printed to the terminal.
- `--batch-concurrency <1..8>` controls batch concurrency, defaulting to `1`; `--batch-results <path>` can override the default `<output-dir>/batch-results.jsonl`.
- `--artifact-mode markdown-assets|all|none` controls the retention of intermediate artifacts, with the CLI defaulting to `markdown-assets`: saving Markdown, assets according to `--asset-profile`, but not retaining provider original HTML/XML, fetch-envelope/cache JSON, or HTTP textual cache; if the body comes from PDF fallback, the PDF source file will still be saved for tracing.
- `--artifact-mode all` retains the old behavior: provider HTML/PDF, auxiliary artifacts, HTTP textual cache, and other debugging artifacts can be saved to disk.
- `--artifact-mode none` does not save provider artifacts or assets; explicitly specifying `--output <path>`, `--save-markdown`, and the main output received by `--output-dir` when not explicitly specified can still write files. `--no-download` is retained for compatibility but is deprecated, equivalent to `--artifact-mode none`.
- `--asset-profile none|body|all` controls the scope of local content asset downloads, with the CLI defaulting to `body`: `none` does not download local assets but retains Markdown parsable remote image links, `body` saves body images/charts/formula images, and `all` additionally saves supplementary materials.
See [`docs/cli.md`](docs/cli.md) for complete command combinations, main output and artifact distinctions, error output, and exit codes.
For example:
```bash
paper-fetch --query "https://www.nature.com/articles/s41559-026-03039-9" \
--output-dir ./papers
```
This will write Markdown to `./papers/<doi>.md`, not print the body to the terminal, and save body images and other assets according to the default `--asset-profile body`; by default, provider original HTML/XML or JSON/cache sidecar will not be saved. Explicitly use `--artifact-mode all` for complete debugging artifacts. If you need to force printing to the terminal, explicitly pass `--output -`.
For batch fetching, prepare a query file:
```text
# One DOI, URL, or title per line
10.1186/1471-2105-11-421
https://www.nature.com/articles/s41559-026-03039-9
```
Then run:
```bash
paper-fetch --query-file ./queries.txt \
--output-dir ./papers \
--batch-concurrency 4
```
This will write each Markdown and body asset to `./papers` and generate `./papers/batch-results.jsonl`. Single failures will be recorded to JSONL and continue processing subsequent entries.
If you only want to control the file path of the formatted result, explicitly use `--output`:
```bash
paper-fetch --query "10.1186/1471-2105-11-421" \
--format markdown \
--output ./papers/article.md \
--output-dir ./papers
```
Explicit `--output <path>` only controls the main output file path and does not automatically create the parent directory of the file.
At the end of the installation script, you will be prompted for the Elsevier official API configuration entry. Before fetching Elsevier full text, you need to apply for a key from <https://dev.elsevier.com/> and fill in `ELSEVIER_API_KEY` in the configuration file.
### Configuration File
Default configuration file location:
```text
~/.config/paper-fetch/.env
```
When you need an API key, custom download directory, or User-Agent, you can create a configuration file:
```bash
mkdir -p ~/.config/paper-fetch
cp .env.example ~/.config/paper-fetch/.env
```
Among them, Elsevier official XML/API and PDF fallback require at least applying for and configuring from <https://dev.elsevier.com/>:
```bash
ELSEVIER_API_KEY="..."
```
You can also explicitly specify it through environment variables:
```bash
export PAPER_FETCH_ENV_FILE=/path/to/.env
```
See [`docs/providers.md`](docs/providers.md) for a complete list of environment variables.
### Integrating with Codex
Install the skill and register the MCP server:
```bash
./scripts/install-codex-skill.sh --register-mcp
```
Register with a configuration file:
```bash
./scripts/install-codex-skill.sh --register-mcp --env-file ~/.config/paper-fetch/.env
```
Install only to the current project:
```bash
./scripts/install-codex-skill.sh --project --register-mcp
```
After installation, restart Codex to let it rescan skills and MCP configurations.
### Integrating with Claude Code
```bash
./scripts/install-claude-skill.sh --register-mcp
```
Commonly used parameters include:
```bash
./scripts/install-claude-skill.sh --project --register-mcp
./scripts/install-claude-skill.sh --register-mcp --env-file ~/.config/paper-fetch/.env
```
### Manual MCP Registration
Any host supporting stdio MCP can directly run:
```bash
paper-fetch-mcp
```
Or:
```bash
python3 -m paper_fetch.mcp.server
```
Codex CLI can manually register the same stdio server:
```bash
codex mcp add paper-fetch -- python3 -X utf8 -m paper_fetch.mcp.server
```
### Common Fetching Parameters
The complete semantics of MCP default mode, `artifact_mode`, `prefer_cache`, `no_download`, and `save_markdown` are in [`docs/providers.md`](docs/providers.md#mcp-download-and-markdown-save). MCP `artifact_mode` defaults to `markdown-assets`; `strategy.asset_profile` supports `none`, `body`, `all`, and if MCP/Python API is not explicitly set, it is determined by the provider.
### Update
After updating the repository, reinstall the package and agent integration:
```bash
python3 -m pip install .
./scripts/install-codex-skill.sh --register-mcp
```
For Claude Code users, execute:
```bash
./scripts/install-claude-skill.sh --register-mcp
```
## Documentation
- [`docs/deployment.md`](docs/deployment.md): Installation, configuration, MCP registration, and updates.
- [`docs/providers.md`](docs/providers.md): Provider capabilities, environment variables, and runtime configurations.
- [`docs/README.md`](docs/README.md): Complete documentation navigation.
- [`docs/architecture/overview.md`](docs/architecture/overview.md): Architecture boundaries and maintainer perspectives.
## Disclaimer
This project retrieves research paper content through publicly accessible open access interfaces, publisher routes, and user-configured credentials.
- The retrieved literature is only for personal academic research and learning use and shall not be used for commercial purposes.
- Please comply with the copyright laws and regulations of your country/region and the intellectual property policies of your institution.
- This project does not bypass paywalls or access authorization; availability depends on the provider, credentials, and local running environment.
- This project does not store, distribute, or disseminate any literature content, only assisting users in locating, fetching, or converting paper content that users have the right to access.
- Literature samples in fixtures are only used for testing, and it is strictly prohibited to redistribute any form of fixtures.
- Users are responsible for their literature retrieval and usage.
## Community
<https://linux.do/>
MCP Config
Below is the configuration for this MCP Server. You can copy it directly to Cursor or other MCP clients.
mcp.json
Connection Info
You Might Also Like
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Fetch
Retrieve and process content from web pages by converting HTML into markdown format.
Context 7
Context7 MCP provides up-to-date code documentation for any prompt.
context7-mcp
Context7 MCP Server provides natural language access to documentation for...
mempalace
The highest-scoring AI memory system ever benchmarked. And it's free.
chrome-devtools-mcp
Chrome DevTools for coding agents