API Documentation

Quickstart

Extract data from a document in under 30 seconds. All you need is an API key.

1. Get your API key

2. Make your first request

cURL

curl -X POST https://api.apapyr.com/v1/extract \
  -H "Authorization: Bearer sk_live_your_key" \
  -F "file=@invoice.pdf"

Python

import requests

response = requests.post(
    "https://api.apapyr.com/v1/extract",
    headers={"Authorization": "Bearer sk_live_your_key"},
    files={"file": open("invoice.pdf", "rb")},
    data={"document_type": "invoice"}
)

data = response.json()
print(data["data"]["fields"]["total"]["value"])  # 1250.00

JavaScript

const form = new FormData();
form.append("file", fs.createReadStream("invoice.pdf"));
form.append("document_type", "invoice");

const res = await fetch("https://api.apapyr.com/v1/extract", {
  method: "POST",
  headers: { "Authorization": "Bearer sk_live_your_key" },
  body: form
});

const data = await res.json();
console.log(data.data.fields.total.value); // 1250.00

Authentication

All API requests require an API key passed in the Authorization header:

Header

Authorization: Bearer sk_live_your_api_key_here

API keys start with sk_live_. Keep them secret — anyone with your key can make requests on your behalf.

Extract Document

POST /v1/extract

Upload a document and extract structured data from it.

Parameters

Parameter	Type	Required	Description
`file`	file	Yes	PDF, PNG, JPG, or WEBP. Max 20MB.
`document_type`	string	No	One of: auto, invoice, receipt, w2, bank_statement, contract. Default: auto
`webhook_url`	string	No	URL to POST results to when extraction completes.

Response

JSON

{
  "id": "ext_abc123",
  "status": "completed",
  "document_type": "invoice",
  "confidence": 0.97,
  "data": {
    "document_type": "invoice",
    "fields": {
      "vendor_name": { "value": "Acme Corp", "confidence": 0.99 },
      "invoice_number": { "value": "INV-4821", "confidence": 0.98 },
      "total": { "value": 1250.00, "confidence": 0.98 },
      "due_date": { "value": "2026-04-15", "confidence": 0.95 }
    },
    "line_items": [
      {
        "description": { "value": "Widget A", "confidence": 0.97 },
        "quantity": { "value": 50, "confidence": 0.99 },
        "unit_price": { "value": 25.00, "confidence": 0.98 },
        "amount": { "value": 1250.00, "confidence": 0.99 }
      }
    ]
  },
  "validation_warnings": [],
  "processing_time_ms": 2340,
  "cached": false
}

Get Extraction

GET /v1/extract/{id}

Retrieve the result of a previous extraction by its ID.

List Extractions

GET /v1/extractions?limit=20&offset=0

List your recent extractions with pagination.

Usage & Billing

GET /v1/usage

Check your current plan usage, remaining pages, and overage status.

JSON

{
  "plan": "pro",
  "limit": 10000,
  "used": 3420,
  "remaining": 6580,
  "overage": 0,
  "overage_price_per_page": "$0.03"
}

Document Schemas

GET /v1/schemas

Lists all supported document types and the fields that will be extracted from each.

Document Types

Type	Key Fields
`invoice`	vendor, total, tax, due_date, line_items
`receipt`	merchant, total, tax, tip, payment_method, line_items
`w2`	employer, wages, federal_tax, state_tax
`bank_statement`	bank, balances, transactions
`contract`	parties, dates, value, obligations
`auto`	Automatically detects type and extracts all relevant fields

Webhooks

Pass a webhook_url parameter when creating an extraction. We'll POST the result to your URL when processing completes:

Webhook Payload

{
  "event": "extraction.completed",
  "extraction_id": "ext_abc123",
  "data": { /* same as extraction response */ }
}

Error Handling

Code	Meaning
401	Missing or invalid API key
403	Account deactivated
404	Extraction not found
413	File too large (max 20MB)
422	Extraction failed (unsupported format or unreadable document)
429	Monthly page limit reached (free tier only)

Rate Limits

Plan	Requests/min	Pages/month
Free	10	50
Starter	60	1,000
Pro	120	10,000
Business	300	100,000

SDKs — Python & Node.js

Official SDKs handle authentication, file uploads, and response parsing for you.

Python

pip install apapyr

from apapyr import aPapyr

client = aPapyr("sk_live_your_key")
result = client.extract("invoice.pdf")

print(result.get_field("total"))           # 1250.00
print(result.get_field("vendor_name"))     # "Acme Corp"
print(result.confidence)                # 0.97
print(result.to_flat_dict())            # {"vendor_name": "Acme Corp", ...}

View on PyPI

Node.js

JavaScript

npm install apapyr

const { aPapyr } = require("apapyr");

const client = new aPapyr("sk_live_your_key");
const result = await client.extract("invoice.pdf");

console.log(result.getField("total"));       // 1250.00
console.log(result.getField("vendor_name")); // "Acme Corp"
console.log(result.confidence);              // 0.97

View on npm

AI Agents (MCP)

aPapyr ships with an MCP server so AI agents (Claude Code, Cursor, Windsurf) can extract documents natively.

Setup

Claude Code

claude mcp add apapyr -- npx apapyr-mcp-server

Set your API key as an environment variable:

Shell

export APAPYR_API_KEY=sk_live_your_key

For Cursor/VS Code, add to your MCP config:

JSON

{
  "mcpServers": {
    "apapyr": {
      "command": "npx",
      "args": ["apapyr-mcp-server"],
      "env": { "APAPYR_API_KEY": "sk_live_your_key" }
    }
  }
}

Then just ask your AI: "Extract the data from invoice.pdf" — it handles everything.

View on npm