API Recipes: Upload, Convert, and Fetch Results

This article gives you copy-paste recipes for the DocuClipper Agent API. It assumes you already have a Personal Access Token. If not, start with API Access (Personal Access Tokens).

All endpoints below live under https://www.docuclipper.com/api/v1/agent and require the Authorization: Bearer <PAT> header. The agent API supports two job types: ExtractData (bank and credit card statements, check images) and Invoice (sales invoices, bills, receipts). For other document types, use the web UI.

The four-step flow

Every conversion follows the same shape:

Ask DocuClipper for a presigned upload URL.
PUT your PDF to that URL.
Create a job referencing the document.
Poll the job until it succeeds, then fetch the data.

Recipe 1: Convert a bank statement to JSON

TOKEN="dc_pat_xxxxxxxx"
BASE="https://www.docuclipper.com/api/v1/agent"
PDF="/path/to/statement.pdf"

# 1. Get a presigned upload URL.
RESP=$(curl -sS -X POST "$BASE/documents/upload-url" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"filename\":\"$(basename "$PDF")\",\"mimetype\":\"application/pdf\"}")
DOC_ID=$(echo "$RESP" | jq -r .documentId)
UPLOAD_URL=$(echo "$RESP" | jq -r .uploadUrl)

# 2. Upload the PDF directly to S3.
curl -sS -X PUT "$UPLOAD_URL" \
  -H "Content-Type: application/pdf" \
  --data-binary "@$PDF"

# 3. Create the job. ExtractData defaults to bank mode.
JOB=$(curl -sS -X POST "$BASE/jobs" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"jobType\":\"ExtractData\",\"documents\":[$DOC_ID],\"jobName\":\"My statement\"}")
JOB_ID=$(echo "$JOB" | jq -r .jobId)

# 4. Poll for completion (status transitions: Created -> InProgress -> Succeeded / Failed).
while :; do
  STATUS=$(curl -sS "$BASE/jobs/$JOB_ID" -H "Authorization: Bearer $TOKEN" | jq -r .status)
  echo "status: $STATUS"
  [ "$STATUS" = "Succeeded" ] && break
  [ "$STATUS" = "Failed" ] && exit 1
  sleep 5
done

# 5. Fetch the structured payload.
curl -sS "$BASE/jobs/$JOB_ID/data" -H "Authorization: Bearer $TOKEN" > result.json

The data endpoint returns transactions grouped by document and account, including the bank-mode reconciliation flags. If you only need a flat list of transactions, use the recipe below instead.

Recipe 2: Get just the transactions (CSV or JSON)

# Flat JSON list (header / OCR-noise rows filtered out by default).
curl -sS "$BASE/jobs/$JOB_ID/transactions" \
  -H "Authorization: Bearer $TOKEN"

# CSV download.
curl -sS "$BASE/jobs/$JOB_ID/transactions?format=csv" \
  -H "Authorization: Bearer $TOKEN" -o transactions.csv

# Include raw rows (headers, footers, OCR noise) for debugging.
curl -sS "$BASE/jobs/$JOB_ID/transactions?includeRaw=true" \
  -H "Authorization: Bearer $TOKEN"

Default limit is 1000, max 10000. If you have a multi-thousand-row statement, paginate by re-running with a larger limit. There is no offset cursor today; the agent endpoint is optimized for one-call retrieval per job.

Recipe 3: Convert an invoice or receipt

# Same upload step as recipe 1 (steps 1-2). Then create with jobType=Invoice.
curl -sS -X POST "$BASE/jobs" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"jobType\":\"Invoice\",\"documents\":[$DOC_ID]}"

After the job succeeds, GET /jobs/$JOB_ID/data returns the InvoiceExport shape: vendor, invoice number, dates, totals, line items.

Recipe 4: Convert several PDFs in one job

Pass an array of document IDs. DocuClipper processes them as a single batch:

curl -sS -X POST "$BASE/jobs" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"jobType\":\"ExtractData\",\"documents\":[$DOC1,$DOC2,$DOC3]}"

The job's data payload returns one block per document, keyed by document ID. Use this when you have a multi-statement archive from one client and want a single result file.

Recipe 5: Check who you are and your remaining quota

curl -sS "$BASE/whoami" -H "Authorization: Bearer $TOKEN"

Returns your contract ID, plan, scopes, and (if your plan uses agent billing) pagesUsed vs pagesFree. Hit this first to confirm the token is valid and you have headroom before queuing work.

Common mistakes

Fetching /data before the job is Succeeded returns 409. Always poll /jobs/:id first.
Setting jobType to Form, Receipt, or anything else returns 400. The agent API supports ExtractData and Invoice only; use the web UI for tax forms and other document types.
Re-using the presigned upload URL. Each URL is single-use. If the PUT fails, request a new URL.
Forgetting Content-Type: application/pdf on the PUT. S3 stores the wrong MIME type and downstream OCR can break.
Token in URL or logs. Use the Authorization header, never a query string. Avoid curl -v on requests with a bearer token (the verbose log prints the header).

Webhooks instead of polling

For production, replace the polling loop with a webhook subscription so DocuClipper pushes you a job.succeeded event. See Webhooks Overview for the subscription endpoint and event shape.