For AI agents
Canonical "how to use this API" reference for automated tools and AI agents. Use the Agent API (/agent/*), it's purpose-built for machine clients: PAT auth, presigned-S3 upload, clean JSON output, and an MCP-compatible tool dispatcher.
Stable machine-readable endpoints
- OpenAPI 3.0:
/api-docs/openapi.json(also at/api-docs/.well-known/openapi.json) - LLM hint file:
/api-docs/llms.txt - MCP tool list:
GET /api/v1/agent/mcp/tools(runtime-discoverable)
Auth + base URL
Generate a PAT in the web UI (Account → API). Tokens look like dcp_<43-char base64url> and are shown once at creation.
BASE="https://www.docuclipper.com/api/v1"
PAT="dcp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Every request:
Authorization: Bearer $PATCanonical workflow (bank statement)
Production agents should use webhooks. Polling exists for tests and demos, but at scale it wastes compute on both sides and triggers our rate limits.
- One-time setup: subscribe to job events:
POST /agent/webhookswith{ url, events: ["bank_statement.extraction.completed"] }. Save the signing secret returned in the response, it is shown only once and you'll need it to verify each event. - Get a presigned upload URL:
POST /agent/documents/upload-urlwith{ filename, mimetype } - Upload bytes to S3:
PUTthe file to the returnedurl.Content-Typemust equal themimetypeyou sent. - Create a job:
POST /agent/jobswith{ documents: [<id>] }(bank-mode + v2 are defaults). - Receive the result: your webhook fires when the job hits
Succeeded; verify theX-DocuClipper-Signatureheader, thenGET /agent/jobs/<id>/datafor the structured payload (or/transactionsfor the flat per-row view).
Polling fallback (only for dev/tests, never production): GET /agent/jobs/<id> until status === "Succeeded". Use exponential backoff starting at 2 seconds, sustained tight polling will be rate-limited.
cURL example (production: webhook-driven)
# 1. ONE-TIME: subscribe to job-completion events
RESP=$(curl -s -X POST "$BASE/agent/webhooks" \
-H "Authorization: Bearer $PAT" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com/hook","events":["bank_statement.extraction.completed"]}')
SECRET=$(echo "$RESP" | jq -r .secret) # store this, shown only once
# 2. PER-JOB: get presigned upload URL
RESP=$(curl -s -X POST "$BASE/agent/documents/upload-url" \
-H "Authorization: Bearer $PAT" \
-H "Content-Type: application/json" \
-d '{"filename":"jan.pdf","mimetype":"application/pdf"}')
URL=$(echo "$RESP" | jq -r .url)
DOC_ID=$(echo "$RESP" | jq -r .document.id)
# 3. PUT the file directly to S3
curl -X PUT "$URL" -H "Content-Type: application/pdf" --data-binary @jan.pdf
# 4. Create the job, your webhook fires when it completes
JOB_ID=$(curl -s -X POST "$BASE/agent/jobs" \
-H "Authorization: Bearer $PAT" \
-H "Content-Type: application/json" \
-d "{\"documents\":[$DOC_ID]}" | jq -r .jobId)
# 5. In your webhook handler: verify X-DocuClipper-Signature, then fetch:
curl -s "$BASE/agent/jobs/$JOB_ID/data" -H "Authorization: Bearer $PAT" | jqcURL example (dev/test: polling)
# Use polling only when you can't receive webhooks (local dev, scripts, demos).
# Steps 2-4 same as above, then:
while true; do
STATUS=$(curl -s "$BASE/agent/jobs/$JOB_ID" -H "Authorization: Bearer $PAT" | jq -r .status)
[[ "$STATUS" == "Succeeded" ]] && break
[[ "$STATUS" == "Failed" || "$STATUS" == "OutOfCredits" ]] && exit 1
sleep 3
done
curl -s "$BASE/agent/jobs/$JOB_ID/data" -H "Authorization: Bearer $PAT" | jqJob types
The Agent API is scoped to bank statements and invoices, the document types where DocuClipper's reconciliation pipeline adds the most value. Both use the same flow: create the job with the right jobType, then call GET /agent/jobs/<id>/data, it auto-dispatches.
- Bank statements / check images: defaults, just send
{ documents: [<id>] }. UseGET /agent/jobs/<id>/transactionsfor the flat per-row view (filtered to real transactions; pass?includeRaw=truefor every OCR row), orGET /agent/jobs/<id>/datafor the groupeddocumentId → account → bankMode.transactions[]shape. - Invoices:
{ documents: [<id>], jobType: "Invoice" }, thenGET /agent/jobs/<id>/datareturns the invoice payload keyed by documentId. - Other types (tax forms, receipts, generic OCR): not in the agent surface, use the legacy
POST /protected/jobendpoint withjobType: "Form"etc.
Webhooks (avoid polling)
Subscribe to push notifications instead of polling /agent/jobs/<id>. The signing secret is returned exactly once at creation, store it immediately. We only keep a SHA-256 hash; lost secrets must be rotated via POST /agent/webhooks/<id>/regenerate-secret.
# 1. Discover available event types
curl -H "Authorization: Bearer $PAT" "$BASE/agent/webhooks/events"
# 2. Subscribe (capture the one-time secret)
RESP=$(curl -s -X POST "$BASE/agent/webhooks" \
-H "Authorization: Bearer $PAT" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com/hook","events":["bank_statement.extraction.completed"]}')
SECRET=$(echo "$RESP" | jq -r .secret) # store this securely, shown once
# 3. Send a test event to verify your endpoint
WEBHOOK_ID=$(echo "$RESP" | jq -r .id)
curl -X POST "$BASE/agent/webhooks/$WEBHOOK_ID/test" \
-H "Authorization: Bearer $PAT"
# 4. Inspect delivery attempts
curl "$BASE/agent/webhooks/$WEBHOOK_ID/deliveries" \
-H "Authorization: Bearer $PAT"Verify each incoming event with HMAC-SHA256 over the raw request body using your stored secret. The signature is in the X-DocuClipper-Signature header.
MCP (Model Context Protocol)
Plug DocuClipper into Claude Desktop, Cursor, Continue, or any MCP-compatible client via docuclipper-mcp , a stdio transport shim that exposes the same tools your agent already understands. See the MCP integration page for the full overview, or the step-by-step setup guide.
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows), then restart Claude Desktop:
{
"mcpServers": {
"docuclipper": {
"command": "npx",
"args": ["-y", "docuclipper-mcp"],
"env": {
"DOCUCLIPPER_PAT": "dcp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}
}
}Cursor
Add to .cursor/mcp.json in your project (or globally at ~/.cursor/mcp.json):
{
"mcpServers": {
"docuclipper": {
"command": "npx",
"args": ["-y", "docuclipper-mcp"],
"env": {
"DOCUCLIPPER_PAT": "dcp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}
}
}Available tools
The shim discovers tools at runtime, so you always get the live set. As of v0.1.0:
convert_bank_statement, one-shot: takes a base64 PDF, uploads, extracts, polls until done, returns transactions (json or csv). The path most agents want.download_transactions, fetch transactions from any completedjobId(json or csv, max 10000 rows).get_transactions, alias ofdownload_transactions.upload_url, async building block: presigned S3 PUT URL.convert_document, async: enqueue an extraction job from already-uploadeddocumentIds.get_job_status, poll a job by id; returns status + transaction count.
For files larger than ~10MB use the async path: upload_url → PUT → convert_document → get_job_status → download_transactions.
Direct HTTP (custom clients / cURL)
If you can't spawn a subprocess (serverless functions, custom agent runtimes), the same tool registry is available as a JSON-over-HTTP shape. This is the underlying transport the stdio shim talks to.
# List all tools
curl -H "Authorization: Bearer $PAT" "$BASE/agent/mcp/tools"
# Execute a tool (params shape comes from inputSchema)
curl -X POST "$BASE/agent/mcp/tools/convert_bank_statement" \
-H "Authorization: Bearer $PAT" \
-H "Content-Type: application/json" \
-d '{"filename":"jan.pdf","mimetype":"application/pdf","fileBase64":"..."}'Fraud signals
For bank-statement fraud features, use GET /protected/document/<documentId>/fraudSignals after extraction completes. (Fraud endpoints are currently only on the legacy API.)
Legacy /protected/* API
Existing customers continue to use the JWT-authenticated /protected/* endpoints (multipart upload, POST /protected/job, POST /protected/job/<id>/export). Fully supported , but new integrations should use /agent/*.