DocuClipper logo
Custom Extraction

Extract Custom Fields from Any Document

Describe what you need in plain English (or draw fields on a sample), save a template, and run it on every similar document.

Last updated

If your document is a bank or credit card statement, invoice, receipt, check image, brokerage statement, or W-2/1099 tax form, use the dedicated extractor for that document type instead. Those flows are tuned per document type and produce more accurate results than the custom-template flow.

Document typeWhere to go
Bank or credit card statementAdd Documents → Bank/CC Statements (PDF)
Bills and receipts you payAdd Documents → Bills & receipts (you pay)
Sales invoices you sendAdd Documents → Sales invoices (you send)
Check imagesAdd Documents → Check Images
Brokerage statementsAdd Documents → Brokerage Statements (PDF)
W-2, 1099, paystubsAdd Documents → Forms (W-2, 1099, Paystubs)

For everything else (rent rolls, trust statements, leases, ACORD forms, packing slips, custom client memos, anything we don't have a dedicated extractor for) use the custom-document flow described below.

Two ways to extract

The custom-document flow has two modes. Pick whichever matches your document.

AI prompt mode (recommended). Describe in plain English what you want and DocuClipper extracts it. Works on documents whose layout varies between samples (different vendors, different lengths, scanned or text PDFs).

Rectangle mode. Draw a box on a sample page. Every future document in the same fixed layout gets the same box applied. Best when your documents are produced by one system and the field always lands in the same spot (e.g. a monthly statement from one provider).

If you're unsure which to use, start with AI prompt mode.

Opening the flow

  1. Click Add Documents in the left sidebar.
  2. Choose Other Documents.
  3. Upload one sample PDF in the layout you want to extract from.

AI prompt mode

For each piece of data you want to extract:

  1. Click Add field.
  2. Choose AI prompt as the field type.
  3. Write what you want in plain English. Be specific.
  4. Pick Single value (one piece of data) or Table (a repeating block of rows with named columns).
  5. DocuClipper runs the prompt on the sample and shows the result. Edit the prompt until it's right.

Writing good prompts

  • Be specific about the field. "The invoice number" beats "the number at the top." "The total amount due, including tax" beats "the total."
  • Name the document context if the doc is unusual: "On this rent roll, the unit number column."
  • For tables, list the columns you want by name. "A table of line items with columns: description, quantity, unit price, total."
  • For values that may not exist on every doc, tell the model: "If there's no PO number, return an empty value."
  • Iterate. If the first prompt returns a wrong value, edit it and re-run. The preview is live.

When AI mode struggles

AI mode reads the document text. On scanned PDFs we run OCR first (Google Vision), so the model sees a transcription, not the image. If extraction is wrong on a scan, the issue is usually in the OCR layer, not the prompt. Two things help:

  • Re-export the PDF at higher resolution before uploading.
  • Phrase the prompt around content rather than position. "The amount labeled Total Due" works better than "the number in the top-right corner."

Rectangle mode

For each piece of data you want to extract:

  1. Click Add field.

  2. Draw a rectangle around the data on the page.

  3. Pick a field type:

    • Text — a single value (name, ID, free text)
    • Number — numeric value, OCR cleanup applied
    • Date — date value, OCR cleanup applied
    • Currency — money amount, OCR cleanup applied
    • Table — a repeating block of rows; define column names and types
  4. The preview on the right shows what's in the box. Adjust the rectangle if needed.

Rectangle mode applies the same coordinates to every page of every document you upload with this template. If your future documents won't have fields in the exact same position, use AI prompt mode instead.

Saving a template

Once the preview looks right, click Save template. Give it a name you'll recognize later (e.g. "ClientCo Monthly Trust Statement"). DocuClipper remembers every field, type, and prompt.

Running on a batch

From Add Documents → Other Documents, pick your saved template under Use a saved template and upload one or many PDFs. Every document is processed against the same field definitions and the results land in a single table you can export to Excel or CSV.

We assume one template per document type, so a single template should handle every document of that layout. If a sub-set of your documents looks different, save a second template.

Updating a template

Open the template from Account → Templates, upload a new sample, and edit fields or prompts. The template ID stays the same so any saved automations keep working.

Worked examples

These are real templates customers run on DocuClipper. Use them as a starting point.

Rent rolls

Rent rolls vary wildly between property-management systems (Yardi, AppFolio, Buildium, RentManager) so AI prompt mode is the only practical option.

Single-value fields:

  • Property name: "The property name shown at the top of the report."
  • Report date: "The 'as of' date for this rent roll. Return in YYYY-MM-DD."
  • Total monthly rent: "The grand total monthly rent across all units."

Table field:

  • Prompt: "A table of units with columns: unit number, tenant name, lease start, lease end, monthly rent, security deposit, status (occupied/vacant/notice). Include every unit listed, including vacant ones."
  • Page range: All pages.

Trust account statements

For attorneys, real-estate brokers, and title companies. Layout is consistent within one provider so rectangle mode works if you have one provider, AI mode if you handle multiple.

Single-value fields:

  • Trust account number: AI prompt "The trust or escrow account number, usually shown near the top." Or rectangle around the account-number block.
  • Statement period: AI prompt "The statement period date range. Return as start_date,end_date in YYYY-MM-DD."
  • Beginning balance and ending balance: AI prompt "The beginning trust balance for this period." / "The ending trust balance."

Table field:

  • Prompt: "A table of disbursements and deposits with columns: date, matter or client name, check number, payee or payor, amount, running balance."

Packing slips and delivery receipts

Single-value fields:

  • Order number, PO number, ship date, carrier, tracking number — one AI prompt each, named directly.

Table field:

  • Prompt: "A line-item table with columns: SKU, description, ordered quantity, shipped quantity, backordered quantity. If no backorder column exists, leave it empty."

ACORD insurance forms (e.g. ACORD 25 Certificate of Liability)

ACORD forms have stable government-mandated layouts, so rectangle mode is the right fit. Draw boxes around:

  • The producer block (insurance broker name and address)
  • The insured block
  • Each policy row (insurer, policy number, eff/exp dates, limits)

Save the template once per ACORD form number; ACORD 25, ACORD 27, and ACORD 28 each need their own template because the field positions differ.

Lease agreements

Lease language varies, so AI mode is best.

  • Tenant name(s): "The full legal name(s) of the tenant(s) on this lease, comma-separated if more than one."
  • Property address: "The full address of the leased property."
  • Lease term: "The lease start and end date as start,end in YYYY-MM-DD."
  • Base rent: "The base monthly rent amount, as a number."
  • Security deposit: "The security deposit amount, as a number. If none, return 0."

Custom client memos and one-off PDFs

If you run the same kind of summary memo each month from one client (project status, billing summary, etc.), AI prompts let you pull the half-dozen numbers you actually care about and skip the rest. Save one template per memo format.

Tips

  • Use AI prompts for any field whose position varies between documents.
  • Use rectangles for fields that are always at the same coordinates.
  • Tables in AI mode work best when you list the columns by name in the prompt.
  • For tables that span pages, set the table field's page range to All pages.

Related