DocuClipper logo
Bank Statements

Converting Forms (W-2, 1099, Paystubs, Tax Forms)

Extract structured data from W-2s, 1099s, paystubs, and other US tax forms. Supported form types, the fields DocuClipper extracts per form, accuracy expectations, and common gotchas.

Last updated

DocuClipper extracts data from US tax and payroll forms (W-2s, 1099s, paystubs, and 1040-family returns) into structured rows you can review, edit, and export to Excel, CSV, or QuickBooks. This article covers exactly which forms are recognized, which fields come back per form, and what to expect on edge cases like scans and handwritten amounts.

Where to start a Forms job

  • Sidebar: click Add Documents, then Forms (W-2, 1099, Paystubs, Tax Forms) to open the upload page.
  • Inside a project: open the Forms tab of any project. If there are no forms yet, click Add Forms.
  • Direct link: /extract/form opens the upload page directly.

Step 1: Upload

  1. Drag and drop the PDFs onto the Upload Forms page, or click to pick files. You can batch-upload multiple forms in one job.
  2. Optionally set a Job name and Tags.
  3. If the PDF has extra pages (cover sheets, transmittal forms, etc.), use the built-in viewer to exclude or rotate them.
  4. Click Import.

Step 2: Review and edit

Each recognized form becomes one row of extracted data.

  • Use the Documents dropdown to switch between uploaded PDFs.
  • Click into any cell to edit the extracted value. Changes save automatically.
  • The left panel shows the source PDF; the right shows the extracted fields. Click a field to highlight where DocuClipper pulled it from.
  • Add more forms to the same job by clicking Add Forms and repeating Step 1.

Step 3: Download

  1. Select which forms to include (all by default).
  2. Pick an Output format: Excel, CSV, or QBO. Multi-form jobs export as a single spreadsheet with one row per form and one column per extracted field.
  3. Click Download.

For format-specific details see Supported output formats.

Which forms are supported

DocuClipper classifies an upload by reading the IRS OMB control number and the form's title line off the page. Both signals come from the actual print, so DocuClipper recognizes the form regardless of the year, payer, or payroll-software variant, as long as those two markers are visible.

The classifier currently recognizes these US forms:

Wage and information returns

  • W-2 (Wage and Tax Statement, OMB 1545-0008)
  • 1099-NEC (Nonemployee Compensation)
  • 1099-MISC (Miscellaneous Information)
  • 1099-DIV (Dividends and Distributions)
  • 1099-INT (Interest Income)
  • 1099-B (Proceeds From Broker and Barter Exchange Transactions)
  • 1099-G (Certain Government Payments)
  • 1099-R (Distributions From Pensions, Annuities, Retirement Plans)
  • 1099-Q (Payments From Qualified Education Programs)
  • 1099-K (Payment Card and Third Party Network Transactions)
  • 1099-C (Cancellation of Debt)
  • 1099-S (Proceeds From Real Estate Transactions)
  • 1098 (Mortgage Interest Statement)
  • 1098-T (Tuition Statement)
  • 5498 (IRA Contribution Information)

Income tax returns

  • 1040 (US Individual Income Tax Return)
  • 1041 (Income Tax Return for Estates and Trusts)
  • 1065 (Return of Partnership Income)
  • 1120 (US Corporation Income Tax Return)
  • 1120-S (Income Tax Return for an S Corporation)
  • Schedule K-1 (Form 1065)
  • Schedule K-1 (Form 1120-S)

Paystubs

Paystubs do not have an OMB number or a single canonical layout, so DocuClipper handles them through the generic key-value extractor rather than a dedicated classifier. In practice that works well for paystubs from major payroll providers (Gusto, ADP, Paychex, Workday, QuickBooks Payroll) because their layouts label fields cleanly: gross pay, net pay, federal withholding, state withholding, Social Security, Medicare, year-to-date totals. Boutique or hand-built paystubs may need manual review of the extracted row.

If your form is not in the list above (a state-level form, a country-specific form, or a niche tax document), use the custom-extraction flow to draw your own fields and save a reusable template.

Fields extracted per form

For the high-volume forms, DocuClipper has a canonical field list and validates that each one came back populated. Below is the per-form field set; numbered references match the box numbers as printed on the IRS form.

W-2 (Wage and Tax Statement)

  • Employer EIN (Box b)
  • Employer name and address (Box c)
  • Employee SSN (Box a)
  • Employee name and address (Box e)
  • Wages, tips, other compensation (Box 1)
  • Federal income tax withheld (Box 2)
  • Social Security wages (Box 3)
  • Social Security tax withheld (Box 4)
  • Medicare wages and tips (Box 5)
  • Medicare tax withheld (Box 6)
  • State wages, tips, etc. (Box 16)
  • State income tax (Box 17)

State boxes 16 and 17 are correctly left empty for W-2s from no-wage-tax states (FL, TX, NV, WA, SD, WY, AK, NH, TN). Empty in those cases is not an extraction miss.

1099-NEC (Nonemployee Compensation)

  • Payer name and address
  • Payer TIN
  • Recipient TIN
  • Recipient name and address
  • Nonemployee compensation (Box 1)
  • Federal income tax withheld (Box 4)
  • State tax withheld (Box 5)

1099-MISC (Miscellaneous Information)

  • Payer name and address, Payer TIN
  • Recipient name and address, Recipient TIN
  • Rents (Box 1)
  • Royalties (Box 2)
  • Other income (Box 3)
  • Federal income tax withheld (Box 4)

1099-DIV (Dividends and Distributions)

  • Payer name and address, Payer TIN
  • Recipient name and address, Recipient TIN
  • Total ordinary dividends (Box 1a)
  • Qualified dividends (Box 1b)
  • Total capital gain distributions (Box 2a)
  • Federal income tax withheld (Box 4)

1099-INT (Interest Income)

  • Payer name and address, Payer TIN
  • Recipient name and address, Recipient TIN
  • Interest income (Box 1)
  • Early withdrawal penalty (Box 2)
  • Federal income tax withheld (Box 4)

Other 1099s, 1098s, 1040, K-1s

For 1099-B/G/R/Q/K/C/S, 1098/1098-T, 5498, 1040, 1041, 1065, 1120/1120-S, and the K-1 schedules, DocuClipper recognizes the form type and extracts the labeled fields it finds via the generic key-value pass. There isn't a curated per-box checklist for these yet, so the exact column set you see in the export depends on which fields the source PDF labeled cleanly.

Accuracy expectations

  • Native (digital) PDFs straight from a payroll or tax-software provider extract with the highest accuracy. The text layer is exact; the only failure modes are unusual layouts.
  • Clean scans at 300 DPI or higher of an original printed form extract well. Most labeled boxes come back populated.
  • Phone photos and low-DPI scans are the largest source of misses. Fields with handwritten amounts, glare, or skew are the most common to drop.
  • Forms missing the OMB number (employer-generated W-2 lookalikes, broker-customized 1099-Bs) still classify correctly when the title line scans cleanly. If both the OMB and the title line are missing or garbled, DocuClipper falls back to a density-based classifier (counts how many of each form's expected labels it found). Density requires at least three matchers and a clear lead, otherwise the form is left as unknown.

Multi-form PDFs

If a single PDF contains several forms (a stack of 1099s for the same payer, or a packaged tax return with W-2s plus 1099s plus a 1040), DocuClipper splits and classifies each form individually. Each one becomes its own row in the export. The split happens automatically on page boundaries.

The classifier is per-page, so a form that spans two pages (most 1040 schedules, some K-1s) is identified by whichever page contains the title and OMB markers, then merged with adjacent pages of the same form.

Output formats

  • Excel (.xlsx) is the default. One sheet, one row per form, one column per field. Multi-form jobs are easy to filter and pivot in Excel.
  • CSV has the same shape as Excel without formatting.
  • QBO is supported for forms that map cleanly into a QuickBooks transaction shape (mainly 1099s as bills/expenses against the payer). For W-2s and 1040-family returns, prefer Excel/CSV; QuickBooks does not have a native import shape for those.

Common gotchas

  • Handwritten amounts. OCR for handwriting is hit-or-miss. If a customer or contractor handwrote box values on a printed form, expect to manually correct those cells. Typed values (laser-printed) extract reliably.
  • Black-and-white photocopies of carbon-copy forms lose the red-print boxes that delimit fields. The labels still extract, but boxes can bleed into each other; review the row before exporting.
  • Multi-employer or multi-state W-2s with multiple Box 16/17 rows: DocuClipper extracts the first state row. For multi-state W-2s, manually add the additional rows after extraction or split the PDF first.
  • 1099-NEC vs 1099-MISC vs 1099-R confusion can happen on poorly-scanned forms where the OMB number is unreadable. The classifier is conservative and will mark these unknown rather than guess. Re-scan at 300 DPI to fix.
  • State and country-specific forms (state W-2 equivalents, Canadian T4s, UK P60s) are not in the classifier. Use custom extraction instead.
  • W-9 forms are not extracted. W-9 is a request form, not a return form, so DocuClipper does not include it in the typed-form taxonomy.

Related

For issues, email support@docuclipper.com.