arXiv provider
The arXiv provider mounts the arXiv API under /arxiv. Each paper is a directory keyed by its arXiv ID. You get the PDF, the LaTeX source archive, structured metadata, and version history, all readable with standard file tools and no SDK.
Mount point
Section titled “Mount point”/omnifs/arxiv/The provider requires no credentials. All reads go through the arXiv public API.
Path reference
Section titled “Path reference”Per-paper directory
Section titled “Per-paper directory”ls /omnifs/arxiv/papers/{id}paper.pdf source.tar.gz metadata.json links.json versions/{id} is the arXiv paper identifier in any of its standard forms: 1706.03762, 2301.00001, or cs.LG/0510009 for older papers.
| Path | Description |
|---|---|
paper.pdf | The compiled PDF, current version |
source.tar.gz | LaTeX source archive, current version |
metadata.json | Title, authors, abstract, categories, submission date, DOI, and version list |
links.json | Related links: DOI, journal ref, and HTML abstract page |
versions/ | Directory with one entry per revision (v1, v2, …) |
Per-version directory
Section titled “Per-version directory”ls /omnifs/arxiv/papers/{id}/versions/v{n}Each version directory exposes the same leaves as the top-level paper directory (paper.pdf, source.tar.gz, metadata.json), pinned to that specific revision. Useful for comparing a preprint against its published revision.
Category feeds
Section titled “Category feeds”ls /omnifs/arxiv/categories/{cat}/newls /omnifs/arxiv/categories/{cat}/{YYYY}/{MM}/{DD}{cat} is any arXiv category identifier: cs.LG, quant-ph, math.CO, and so on. The new directory lists papers from the most recent announcement batch. Date directories list papers announced on that day.
Each entry in a category listing is itself an ID directory with the same per-paper structure above.
Search
Section titled “Search”ls /omnifs/arxiv/search/{query}{query} is a URL-encoded search string passed to the arXiv search API. Results appear as subdirectories named by arXiv ID. Example: ls /omnifs/arxiv/search/transformer+attention.
Configuration
Section titled “Configuration”The arXiv provider has no required configuration. The config block in the provider manifest is empty. No tokens, no credentials.
Examples
Section titled “Examples”Read the title of “Attention Is All You Need” (arXiv 1706.03762):
cat /omnifs/arxiv/papers/1706.03762/metadata.json | jq .title"Attention Is All You Need"List all files for that paper:
ls /omnifs/arxiv/papers/1706.03762paper.pdf source.tar.gz metadata.json links.json versions/Pull the abstract into a variable:
abstract=$(cat /omnifs/arxiv/papers/1706.03762/metadata.json | jq -r .abstract)Grep abstracts from a category feed:
for d in /omnifs/arxiv/categories/cs.LG/new/*/; do id=$(basename "$d") cat "$d/metadata.json" | jq -r '"'"$id"': " + .title'doneCompare metadata between v1 and v3 of a paper:
diff \ <(cat /omnifs/arxiv/papers/2301.00001/versions/v1/metadata.json | jq .) \ <(cat /omnifs/arxiv/papers/2301.00001/versions/v3/metadata.json | jq .)Download the LaTeX source of a specific version:
cp /omnifs/arxiv/papers/1706.03762/versions/v5/source.tar.gz ~/downloads/Search and print titles:
ls /omnifs/arxiv/search/attention+mechanism | while read id; do cat /omnifs/arxiv/papers/"$id"/metadata.json | jq -r .titledoneThe paper.pdf and source.tar.gz leaves can be large. The omnifs host caches fetched content in a capacity-bounded cache invalidated by upstream events, so repeated reads of the same paper version do not re-fetch from arXiv.
Older papers using the pre-2007 identifier format (cs.LG/0510009) work as-is in the path. The slash in those IDs is part of the arXiv standard; omnifs encodes it transparently.
The arXiv API enforces a rate limit of one request every three seconds for unauthenticated clients. The host-level cache absorbs most repeated access patterns within a session.