API Recipes: Upload, Convert, and Fetch Results
End-to-end code for the most common DocuClipper API workflows: upload a PDF, run a bank or invoice job, poll for completion, and fetch the structured results.
This article gives you copy-paste recipes for the DocuClipper Agent API. It assumes you already have a Personal Access Token. If not, start with API Access (Personal Access Tokens).
All endpoints below live under https://www.docuclipper.com/api/v1/agent and require the Authorization: Bearer <PAT> header. The agent API supports two job types: ExtractData (bank and credit card statements, check images) and Invoice (sales invoices, bills, receipts). For other document types, use the web UI.
The four-step flow
Every conversion follows the same shape:
- Ask DocuClipper for a presigned upload URL.
- PUT your PDF to that URL.
- Create a job referencing the document.
- Poll the job until it succeeds, then fetch the data.
Recipe 1: Convert a bank statement to JSON
TOKEN="dc_pat_xxxxxxxx"
BASE="https://www.docuclipper.com/api/v1/agent"
PDF="/path/to/statement.pdf"
# 1. Get a presigned upload URL.
RESP=$(curl -sS -X POST "$BASE/documents/upload-url" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{\"filename\":\"$(basename "$PDF")\",\"mimetype\":\"application/pdf\"}")
DOC_ID=$(echo "$RESP" | jq -r .documentId)
UPLOAD_URL=$(echo "$RESP" | jq -r .uploadUrl)
# 2. Upload the PDF directly to S3.
curl -sS -X PUT "$UPLOAD_URL" \
-H "Content-Type: application/pdf" \
--data-binary "@$PDF"
# 3. Create the job. ExtractData defaults to bank mode.
JOB=$(curl -sS -X POST "$BASE/jobs" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{\"jobType\":\"ExtractData\",\"documents\":[$DOC_ID],\"jobName\":\"My statement\"}")
JOB_ID=$(echo "$JOB" | jq -r .jobId)
# 4. Poll for completion (status transitions: Created -> InProgress -> Succeeded / Failed).
while :; do
STATUS=$(curl -sS "$BASE/jobs/$JOB_ID" -H "Authorization: Bearer $TOKEN" | jq -r .status)
echo "status: $STATUS"
[ "$STATUS" = "Succeeded" ] && break
[ "$STATUS" = "Failed" ] && exit 1
sleep 5
done
# 5. Fetch the structured payload.
curl -sS "$BASE/jobs/$JOB_ID/data" -H "Authorization: Bearer $TOKEN" > result.json
The data endpoint returns transactions grouped by document and account, including the bank-mode reconciliation flags. If you only need a flat list of transactions, use the recipe below instead.
Recipe 2: Get just the transactions (CSV or JSON)
# Flat JSON list (header / OCR-noise rows filtered out by default).
curl -sS "$BASE/jobs/$JOB_ID/transactions" \
-H "Authorization: Bearer $TOKEN"
# CSV download.
curl -sS "$BASE/jobs/$JOB_ID/transactions?format=csv" \
-H "Authorization: Bearer $TOKEN" -o transactions.csv
# Include raw rows (headers, footers, OCR noise) for debugging.
curl -sS "$BASE/jobs/$JOB_ID/transactions?includeRaw=true" \
-H "Authorization: Bearer $TOKEN"
Default limit is 1000, max 10000. If you have a multi-thousand-row statement, paginate by re-running with a larger limit. There is no offset cursor today; the agent endpoint is optimized for one-call retrieval per job.
Recipe 3: Convert an invoice or receipt
# Same upload step as recipe 1 (steps 1-2). Then create with jobType=Invoice.
curl -sS -X POST "$BASE/jobs" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{\"jobType\":\"Invoice\",\"documents\":[$DOC_ID]}"
After the job succeeds, GET /jobs/$JOB_ID/data returns the InvoiceExport shape: vendor, invoice number, dates, totals, line items.
Recipe 4: Convert several PDFs in one job
Pass an array of document IDs. DocuClipper processes them as a single batch:
curl -sS -X POST "$BASE/jobs" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{\"jobType\":\"ExtractData\",\"documents\":[$DOC1,$DOC2,$DOC3]}"
The job's data payload returns one block per document, keyed by document ID. Use this when you have a multi-statement archive from one client and want a single result file.
Recipe 5: Check who you are and your remaining quota
curl -sS "$BASE/whoami" -H "Authorization: Bearer $TOKEN"
Returns your contract ID, plan, scopes, and (if your plan uses agent billing) pagesUsed vs pagesFree. Hit this first to confirm the token is valid and you have headroom before queuing work.
Common mistakes
- Fetching
/databefore the job isSucceededreturns 409. Always poll/jobs/:idfirst. - Setting
jobTypetoForm,Receipt, or anything else returns 400. The agent API supportsExtractDataandInvoiceonly; use the web UI for tax forms and other document types. - Re-using the presigned upload URL. Each URL is single-use. If the PUT fails, request a new URL.
- Forgetting
Content-Type: application/pdfon the PUT. S3 stores the wrong MIME type and downstream OCR can break. - Token in URL or logs. Use the
Authorizationheader, never a query string. Avoidcurl -von requests with a bearer token (the verbose log prints the header).
Webhooks instead of polling
For production, replace the polling loop with a webhook subscription so DocuClipper pushes you a job.succeeded event. See Webhooks Overview for the subscription endpoint and event shape.