DocuClipper logo

For AI agents

Canonical "how to use this API" reference for automated tools and AI agents. Use the Agent API (/agent/*), it's purpose-built for machine clients: PAT auth, presigned-S3 upload, clean JSON output, and an MCP-compatible tool dispatcher.

Stable machine-readable endpoints

  • OpenAPI 3.0: /api-docs/openapi.json (also at /api-docs/.well-known/openapi.json)
  • LLM hint file: /api-docs/llms.txt
  • MCP tool list: GET /api/v1/agent/mcp/tools (runtime-discoverable)

Auth + base URL

Generate a PAT in the web UI (Account → API). Tokens look like dcp_<43-char base64url> and are shown once at creation.

bash
BASE="https://www.docuclipper.com/api/v1"
PAT="dcp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Every request:
Authorization: Bearer $PAT

Canonical workflow (bank statement)

Production agents should use webhooks. Polling exists for tests and demos, but at scale it wastes compute on both sides and triggers our rate limits.

  1. One-time setup: subscribe to job events: POST /agent/webhooks with { url, events: ["bank_statement.extraction.completed"] }. Save the signing secret returned in the response, it is shown only once and you'll need it to verify each event.
  2. Get a presigned upload URL: POST /agent/documents/upload-url with { filename, mimetype }
  3. Upload bytes to S3: PUT the file to the returned url. Content-Type must equal the mimetype you sent.
  4. Create a job: POST /agent/jobs with { documents: [<id>] } (bank-mode + v2 are defaults).
  5. Receive the result: your webhook fires when the job hits Succeeded; verify the X-DocuClipper-Signature header, then GET /agent/jobs/<id>/data for the structured payload (or /transactions for the flat per-row view).

Polling fallback (only for dev/tests, never production): GET /agent/jobs/<id> until status === "Succeeded". Use exponential backoff starting at 2 seconds, sustained tight polling will be rate-limited.

cURL example (production: webhook-driven)

bash
# 1. ONE-TIME: subscribe to job-completion events
RESP=$(curl -s -X POST "$BASE/agent/webhooks" \
  -H "Authorization: Bearer $PAT" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/hook","events":["bank_statement.extraction.completed"]}')
SECRET=$(echo "$RESP" | jq -r .secret)   # store this, shown only once

# 2. PER-JOB: get presigned upload URL
RESP=$(curl -s -X POST "$BASE/agent/documents/upload-url" \
  -H "Authorization: Bearer $PAT" \
  -H "Content-Type: application/json" \
  -d '{"filename":"jan.pdf","mimetype":"application/pdf"}')
URL=$(echo "$RESP" | jq -r .url)
DOC_ID=$(echo "$RESP" | jq -r .document.id)

# 3. PUT the file directly to S3
curl -X PUT "$URL" -H "Content-Type: application/pdf" --data-binary @jan.pdf

# 4. Create the job, your webhook fires when it completes
JOB_ID=$(curl -s -X POST "$BASE/agent/jobs" \
  -H "Authorization: Bearer $PAT" \
  -H "Content-Type: application/json" \
  -d "{\"documents\":[$DOC_ID]}" | jq -r .jobId)

# 5. In your webhook handler: verify X-DocuClipper-Signature, then fetch:
curl -s "$BASE/agent/jobs/$JOB_ID/data" -H "Authorization: Bearer $PAT" | jq

cURL example (dev/test: polling)

bash
# Use polling only when you can't receive webhooks (local dev, scripts, demos).
# Steps 2-4 same as above, then:
while true; do
  STATUS=$(curl -s "$BASE/agent/jobs/$JOB_ID" -H "Authorization: Bearer $PAT" | jq -r .status)
  [[ "$STATUS" == "Succeeded" ]] && break
  [[ "$STATUS" == "Failed" || "$STATUS" == "OutOfCredits" ]] && exit 1
  sleep 3
done
curl -s "$BASE/agent/jobs/$JOB_ID/data" -H "Authorization: Bearer $PAT" | jq

Job types

The Agent API is scoped to bank statements and invoices, the document types where DocuClipper's reconciliation pipeline adds the most value. Both use the same flow: create the job with the right jobType, then call GET /agent/jobs/<id>/data, it auto-dispatches.

  • Bank statements / check images: defaults, just send { documents: [<id>] }. Use GET /agent/jobs/<id>/transactions for the flat per-row view (filtered to real transactions; pass ?includeRaw=true for every OCR row), or GET /agent/jobs/<id>/data for the grouped documentId → account → bankMode.transactions[] shape.
  • Invoices: { documents: [<id>], jobType: "Invoice" }, then GET /agent/jobs/<id>/data returns the invoice payload keyed by documentId.
  • Other types (tax forms, receipts, generic OCR): not in the agent surface, use the legacy POST /protected/job endpoint with jobType: "Form" etc.

Webhooks (avoid polling)

Subscribe to push notifications instead of polling /agent/jobs/<id>. The signing secret is returned exactly once at creation, store it immediately. We only keep a SHA-256 hash; lost secrets must be rotated via POST /agent/webhooks/<id>/regenerate-secret.

bash
# 1. Discover available event types
curl -H "Authorization: Bearer $PAT" "$BASE/agent/webhooks/events"

# 2. Subscribe (capture the one-time secret)
RESP=$(curl -s -X POST "$BASE/agent/webhooks" \
  -H "Authorization: Bearer $PAT" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/hook","events":["bank_statement.extraction.completed"]}')
SECRET=$(echo "$RESP" | jq -r .secret)   # store this securely, shown once

# 3. Send a test event to verify your endpoint
WEBHOOK_ID=$(echo "$RESP" | jq -r .id)
curl -X POST "$BASE/agent/webhooks/$WEBHOOK_ID/test" \
  -H "Authorization: Bearer $PAT"

# 4. Inspect delivery attempts
curl "$BASE/agent/webhooks/$WEBHOOK_ID/deliveries" \
  -H "Authorization: Bearer $PAT"

Verify each incoming event with HMAC-SHA256 over the raw request body using your stored secret. The signature is in the X-DocuClipper-Signature header.

MCP (Model Context Protocol)

Plug DocuClipper into Claude Desktop, Cursor, Continue, or any MCP-compatible client via docuclipper-mcp , a stdio transport shim that exposes the same tools your agent already understands. See the MCP integration page for the full overview, or the step-by-step setup guide.

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows), then restart Claude Desktop:

json
{
  "mcpServers": {
    "docuclipper": {
      "command": "npx",
      "args": ["-y", "docuclipper-mcp"],
      "env": {
        "DOCUCLIPPER_PAT": "dcp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
      }
    }
  }
}

Cursor

Add to .cursor/mcp.json in your project (or globally at ~/.cursor/mcp.json):

json
{
  "mcpServers": {
    "docuclipper": {
      "command": "npx",
      "args": ["-y", "docuclipper-mcp"],
      "env": {
        "DOCUCLIPPER_PAT": "dcp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
      }
    }
  }
}

Available tools

The shim discovers tools at runtime, so you always get the live set. As of v0.1.0:

  • convert_bank_statement, one-shot: takes a base64 PDF, uploads, extracts, polls until done, returns transactions (json or csv). The path most agents want.
  • download_transactions, fetch transactions from any completed jobId (json or csv, max 10000 rows).
  • get_transactions, alias of download_transactions.
  • upload_url, async building block: presigned S3 PUT URL.
  • convert_document, async: enqueue an extraction job from already-uploaded documentIds.
  • get_job_status, poll a job by id; returns status + transaction count.

For files larger than ~10MB use the async path: upload_urlPUTconvert_document get_job_statusdownload_transactions.

Direct HTTP (custom clients / cURL)

If you can't spawn a subprocess (serverless functions, custom agent runtimes), the same tool registry is available as a JSON-over-HTTP shape. This is the underlying transport the stdio shim talks to.

bash
# List all tools
curl -H "Authorization: Bearer $PAT" "$BASE/agent/mcp/tools"

# Execute a tool (params shape comes from inputSchema)
curl -X POST "$BASE/agent/mcp/tools/convert_bank_statement" \
  -H "Authorization: Bearer $PAT" \
  -H "Content-Type: application/json" \
  -d '{"filename":"jan.pdf","mimetype":"application/pdf","fileBase64":"..."}'

Fraud signals

For bank-statement fraud features, use GET /protected/document/<documentId>/fraudSignals after extraction completes. (Fraud endpoints are currently only on the legacy API.)

Legacy /protected/* API

Existing customers continue to use the JWT-authenticated /protected/* endpoints (multipart upload, POST /protected/job, POST /protected/job/<id>/export). Fully supported , but new integrations should use /agent/*.

Human docs