Authenticity score #
Each document receives a score from 0 to 100. It summarizes multiple checks and the confidence of the findings.
Score range | Interpretation |
---|---|
90–100 | Document appears authentic |
70–89 | Minor concerns detected |
50–69 | Moderate risk detected |
0–49 | High risk of fraud detected |
Important: The score is not a final decision. Always review the individual signals and consider the context.
What we analyze #
PDF metadata analysis #
- Creation vs modification date gaps.
- Unusual producers (e.g., image editors) for bank statements.
- Out-of-pattern PDF versions.
Structural analysis #
- Column alignment consistency across a page.
- Field height consistency (sudden variations can indicate edits).
- Font usage anomalies within a line or section.
Transaction analysis #
- Future-dated transactions.
- Running balance inconsistencies.
- Debit/credit math mismatches.
Reconciliation check #
- Opening/closing balance reconciliation for the statement period.
- Flags when totals do not match the reported balances.
Document fingerprinting #
We compute a stable fingerprint from metadata, embedded fonts, and page dimensions. This enables:
- Matching against known authentic templates.
- Detecting duplicates or modified versions.
- Building a library of trusted statement patterns over time.
How the score is calculated #
Each signal has a confidence (0–1) and a severity weight. The score starts at 100 and is reduced by weighted, high-confidence findings.
Example signal | Weight (relative) | Why it matters |
---|---|---|
RECONCILIATION_FAILED |
High | The numbers in the statement do not add up. |
SUSPICIOUS_PDF_PRODUCER |
Medium–High | Uncommon tools for bank statements can indicate editing. |
MISALIGNED_COLUMN |
Medium | Layout shifts can suggest copy/paste or overlay edits. |
FONT_MISMATCH |
Low–Medium | Mixed fonts within a line may indicate tampering. |
Best practices for review #
- Review the full signal list, not just the score.
- Prioritize multiple high-confidence signals on the same page.
- Consider context. Scans, print-to-PDF, or corporate PDF workflows can cause benign anomalies.
- Compare against another statement from the same institution when possible.
Fraud signal glossary #
RECONCILIATION_FAILED
— transactions do not reconcile with reported balances
What we check: Opening balance + net activity = closing balance for the period.
Why it matters: Non-reconciling statements often indicate edits or missing lines.
Legitimate causes: Partial statements, excluded pages, or export filters.
Review tips: Verify period dates, page completeness, and subtotals for each page.
BALANCE_MISMATCH
— running balances inconsistent with line items
What we check: Each line’s debit/credit correctly updates the running balance.
Why it matters: Edited amounts or missing transactions break the running balance chain.
Legitimate causes: Mid-cycle adjustments that are summarized elsewhere.
Review tips: Spot-check lines around the first mismatch; compare with bank exports if available.
UNRECONCILED_BALANCE
— a per-line math inconsistency
What we check: Expected balance after each line equals the printed balance.
Why it matters: Pinpoints the exact transaction where the chain breaks.
Review tips: Recompute affected lines; verify the sign (debit vs credit) and any fees.
FUTURE_DATE
— transaction date in the future
What we check: Any transaction dated after the current date.
Why it matters: Future-dated entries can signal manual editing.
Legitimate causes: Time zone differences on late-month postings are rare but possible.
Review tips: Cross-check with online banking; confirm posting vs transaction date semantics.
MISALIGNED_COLUMN
— inconsistent horizontal alignment of the same column on a page
What we check: Position variance of fields like date or amount within the same page.
Why it matters: Copy/paste overlays or manual edits can shift text off the natural grid.
Legitimate causes: Low-quality scans or skew introduced during print/scan.
Review tips: Compare suspect lines to neighbors; look for different kerning or aliasing.
INCONSISTENT_FIELD_HEIGHT
— unusual text box height vs neighbors
What we check: Height variance of text boxes in the same column beyond a typical range.
Why it matters: Overlays or pasted text often render at slightly different bounding boxes.
Legitimate causes: Mixed fonts or sub/superscript on certain lines.
Review tips: Inspect the specific box; zoom in to check baseline and character rendering.
FONT_MISMATCH
— multiple fonts within the same line or section
What we check: Sudden font family changes inside a single logical line.
Why it matters: Indicates manual edits or stitched content.
Legitimate causes: Bank templates that intentionally mix fonts for emphasis.
Review tips: Compare to a known-good statement from the same bank and month.
SUSPICIOUS_PDF_PRODUCER
— uncommon creation tools for bank statements
What we check: PDF Producer string (e.g., Word, Photoshop, Preview).
Why it matters: Banks rarely generate statements via general-purpose editors.
Legitimate causes: Users who scanned or re-saved statements before upload.
Review tips: Ask for the original download from online banking.
MODIFIED_AFTER_CREATION
— large gap between creation and modification time
What we check: Hours between PDF creation and last modification.
Why it matters: Late modifications can indicate edits post-issuance.
Legitimate causes: Batch stamping or archiving systems.
Review tips: Validate with a fresh download from the issuing bank.
SUSPICIOUS_PDF_VERSION
— unusual or very old PDF version
What we check: PDF version string outside common ranges for bank exports.
Why it matters: Some editing tools save with atypical versions.
Legitimate causes: Legacy back-office software.
Review tips: Compare with other statements from the same institution.
SUSPICIOUS_CREATOR
— missing or unusual PDF Creator
What we check: Empty or placeholder Creator strings.
Why it matters: Weak provenance metadata.
Legitimate causes: Some print-to-PDF drivers omit creator fields.
Review tips: Prefer native PDF exports over prints or screenshots.
DOESNT_MATCH_KNOWN_FINGERPRINTS
— no match to approved templates
What we check: The document’s fingerprint (metadata, fonts, page size) against a library of known authentic patterns.
Why it matters: Unfamiliar patterns may indicate edits or a nonstandard source.
Legitimate causes: New bank template versions, regional formats, or first-time issuers.
Review tips: Request another statement from the same account and month for comparison.
Notes #
- Signals include a confidence score and location data (page number, bounding box) when available to speed investigation.
- The presence of any single signal is not proof of fraud. Use multiple signals plus context to make decisions.