Understanding Bank Statement Fraud Detection


Authenticity score #

Each document receives a score from 0 to 100. It summarizes multiple checks and the confidence of the findings.

Score range Interpretation
90–100 Document appears authentic
70–89 Minor concerns detected
50–69 Moderate risk detected
0–49 High risk of fraud detected

Important: The score is not a final decision. Always review the individual signals and consider the context.

What we analyze #

PDF metadata analysis #

  • Creation vs modification date gaps.
  • Unusual producers (e.g., image editors) for bank statements.
  • Out-of-pattern PDF versions.

Structural analysis #

  • Column alignment consistency across a page.
  • Field height consistency (sudden variations can indicate edits).
  • Font usage anomalies within a line or section.

Transaction analysis #

  • Future-dated transactions.
  • Running balance inconsistencies.
  • Debit/credit math mismatches.

Reconciliation check #

  • Opening/closing balance reconciliation for the statement period.
  • Flags when totals do not match the reported balances.

Document fingerprinting #

We compute a stable fingerprint from metadata, embedded fonts, and page dimensions. This enables:

  • Matching against known authentic templates.
  • Detecting duplicates or modified versions.
  • Building a library of trusted statement patterns over time.
Metadata (producer, creator, version)
Font set + embedding
Page size/orientation

How the score is calculated #

Each signal has a confidence (0–1) and a severity weight. The score starts at 100 and is reduced by weighted, high-confidence findings.

Example signal Weight (relative) Why it matters
RECONCILIATION_FAILED High The numbers in the statement do not add up.
SUSPICIOUS_PDF_PRODUCER Medium–High Uncommon tools for bank statements can indicate editing.
MISALIGNED_COLUMN Medium Layout shifts can suggest copy/paste or overlay edits.
FONT_MISMATCH Low–Medium Mixed fonts within a line may indicate tampering.

Best practices for review #

  • Review the full signal list, not just the score.
  • Prioritize multiple high-confidence signals on the same page.
  • Consider context. Scans, print-to-PDF, or corporate PDF workflows can cause benign anomalies.
  • Compare against another statement from the same institution when possible.

Fraud signal glossary #

RECONCILIATION_FAILED — transactions do not reconcile with reported balances

What we check: Opening balance + net activity = closing balance for the period.

Why it matters: Non-reconciling statements often indicate edits or missing lines.

Legitimate causes: Partial statements, excluded pages, or export filters.

Review tips: Verify period dates, page completeness, and subtotals for each page.

BALANCE_MISMATCH — running balances inconsistent with line items

What we check: Each line’s debit/credit correctly updates the running balance.

Why it matters: Edited amounts or missing transactions break the running balance chain.

Legitimate causes: Mid-cycle adjustments that are summarized elsewhere.

Review tips: Spot-check lines around the first mismatch; compare with bank exports if available.

UNRECONCILED_BALANCE — a per-line math inconsistency

What we check: Expected balance after each line equals the printed balance.

Why it matters: Pinpoints the exact transaction where the chain breaks.

Review tips: Recompute affected lines; verify the sign (debit vs credit) and any fees.

FUTURE_DATE — transaction date in the future

What we check: Any transaction dated after the current date.

Why it matters: Future-dated entries can signal manual editing.

Legitimate causes: Time zone differences on late-month postings are rare but possible.

Review tips: Cross-check with online banking; confirm posting vs transaction date semantics.

MISALIGNED_COLUMN — inconsistent horizontal alignment of the same column on a page

What we check: Position variance of fields like date or amount within the same page.

Why it matters: Copy/paste overlays or manual edits can shift text off the natural grid.

Legitimate causes: Low-quality scans or skew introduced during print/scan.

Review tips: Compare suspect lines to neighbors; look for different kerning or aliasing.

INCONSISTENT_FIELD_HEIGHT — unusual text box height vs neighbors

What we check: Height variance of text boxes in the same column beyond a typical range.

Why it matters: Overlays or pasted text often render at slightly different bounding boxes.

Legitimate causes: Mixed fonts or sub/superscript on certain lines.

Review tips: Inspect the specific box; zoom in to check baseline and character rendering.

FONT_MISMATCH — multiple fonts within the same line or section

What we check: Sudden font family changes inside a single logical line.

Why it matters: Indicates manual edits or stitched content.

Legitimate causes: Bank templates that intentionally mix fonts for emphasis.

Review tips: Compare to a known-good statement from the same bank and month.

SUSPICIOUS_PDF_PRODUCER — uncommon creation tools for bank statements

What we check: PDF Producer string (e.g., Word, Photoshop, Preview).

Why it matters: Banks rarely generate statements via general-purpose editors.

Legitimate causes: Users who scanned or re-saved statements before upload.

Review tips: Ask for the original download from online banking.

MODIFIED_AFTER_CREATION — large gap between creation and modification time

What we check: Hours between PDF creation and last modification.

Why it matters: Late modifications can indicate edits post-issuance.

Legitimate causes: Batch stamping or archiving systems.

Review tips: Validate with a fresh download from the issuing bank.

SUSPICIOUS_PDF_VERSION — unusual or very old PDF version

What we check: PDF version string outside common ranges for bank exports.

Why it matters: Some editing tools save with atypical versions.

Legitimate causes: Legacy back-office software.

Review tips: Compare with other statements from the same institution.

SUSPICIOUS_CREATOR — missing or unusual PDF Creator

What we check: Empty or placeholder Creator strings.

Why it matters: Weak provenance metadata.

Legitimate causes: Some print-to-PDF drivers omit creator fields.

Review tips: Prefer native PDF exports over prints or screenshots.

DOESNT_MATCH_KNOWN_FINGERPRINTS — no match to approved templates

What we check: The document’s fingerprint (metadata, fonts, page size) against a library of known authentic patterns.

Why it matters: Unfamiliar patterns may indicate edits or a nonstandard source.

Legitimate causes: New bank template versions, regional formats, or first-time issuers.

Review tips: Request another statement from the same account and month for comparison.

Notes #

  • Signals include a confidence score and location data (page number, bounding box) when available to speed investigation.
  • The presence of any single signal is not proof of fraud. Use multiple signals plus context to make decisions.
Was this article helpful?
Updated on August 12, 2025