DocuClipper

Project I/O & automation

Configure input sources (e.g. watch a Google Drive or Dropbox folder) and output destinations (webhooks, Google Sheets, Drive folder) per project. When a job completes for a project, all enabled output configs run (e.g. POST to your webhook and create a new Google Sheet per job, or upload to a Drive folder).

Overview

  • Input configs: Watch a Google Drive folder; when new PDF/image files appear they are auto-ingested, a bank-statement job is created and tagged to the project. Use push notifications (recommended): when you create or update a gdrive_folder input with enabled: true, the API registers a Drive push channel; Google POSTs to your webhook when the folder changes. Set DRIVE_WEBHOOK_BASE_URL (e.g. https://www.docuclipper.com) so the webhook URL is reachable. Channels expire in ~24h and are renewed by the GDriveWatchRenewal job (run hourly/daily). Alternatively, use polling via POST /tagLevel/gdrive-watch.
  • Output configs: When a job for that project completes, results are sent to each enabled output (webhook URL, GSheet, Drive folder).
  • Projects are TagLevels with type = "project". Base path: /api/v1/tagLevel.

Input configs

List, create, update, and delete input sources for a project. Types: gdrive_folder, dropbox_folder, email, api. Each config can set documentType, templateId, and type-specific config (e.g. { "folderId": "..." } for GDrive).

MethodPath
GET/tagLevel/:projectId/input-configs
POST/tagLevel/:projectId/input-configs
PUT/tagLevel/:projectId/input-configs/:id
DELETE/tagLevel/:projectId/input-configs/:id

POST body for create: { type, config?, documentType?, templateId?, enabled? }. For gdrive_folder, config must include folderId (Drive folder ID to watch). Use documentType to choose how each new file is processed:

  • bank_statement (default) – ExtractData with bank mode
  • invoice – Invoice job type
  • form – Form job type
  • template – ExtractData with a specific template (requires templateId)
  • generic – Generic extraction

New files (PDF/images) are ingested and a job is created with the chosen type, then tagged to the project.

Push vs polling

Push (default): When you create or update a gdrive_folder input with enabled: true and a valid folderId, the server registers a Drive push channel. Google sends POST requests to {baseUrl}/api/v1/webhooks/google-drive when the folder changes. Configure DRIVE_WEBHOOK_BASE_URL (HTTPS, reachable by Google). Run GDriveWatchRenewal (e.g. hourly) to renew channels before they expire (~24h).

Polling fallback: Call POST /api/v1/protected/tagLevel/gdrive-watch to process all enabled gdrive_folder configs once (e.g. from a cron every 5 minutes).

MethodPath
POST/tagLevel/gdrive-watch

Returns 202 with { ok: true, message: "GDrive folder watch queued" }.

Output configs

List, create, update, and delete output destinations. Types: webhook, gsheet, gdrive_folder, s3. For webhook, config must include url and events (array of event names from the webhook events list). A secret is auto-generated for signing. For gsheet, a new spreadsheet is created for each job; config may include folderId (optional Drive folder to add the sheet to) and titlePrefix (e.g. title becomes "My Export Job 123"). For gdrive_folder, config must include folderId (and optionally formats: ["json","csv"]).

MethodPath
GET/tagLevel/:projectId/output-configs
GET/tagLevel/:projectId/output-configs/events
POST/tagLevel/:projectId/output-configs
PUT/tagLevel/:projectId/output-configs/:id
DELETE/tagLevel/:projectId/output-configs/:id

POST body for webhook: { type: "webhook", config: { url, events }, format?, enabled? }. Response includes config.secret for verifying signatures (same as contract webhooks).

Drive folder picker (for output config)

To let users select or create a folder when configuring a gdrive_folder (or optional gsheet folder), use these endpoints. The project must have a Google Drive integration linked (TagLevelIntegration). List folders under an optional parent, or create a new folder; use the returned id as config.folderId in the output config.

MethodPath
GET/tagLevel/:projectId/drive/folders
POST/tagLevel/:projectId/drive/folders

GET: optional query parentId (default: root). Returns { folders: [{ id, name, parents? }] }. POST body: { name, parentId? }. Returns { id, name }, use id as config.folderId.

Linking jobs to a project

When a job completes, the project is resolved from the job's tag (TagLevelEntity with entityType "Job" and entityId = job id). If that tag level is a project, all enabled project output configs run (webhook, new GSheet, Drive upload). Contract-level webhooks always run as well.

Example: GDrive → bank statement → webhook + GSheet

  1. Create a project (TagLevel with type project) and link a Google Drive integration to it.
  2. Add an input config: POST /tagLevel/:projectId/input-configs with { type: "gdrive_folder", config: { folderId: "YOUR_DRIVE_FOLDER_ID" }, enabled: true }.
  3. Add a webhook output: POST /tagLevel/:projectId/output-configs with { type: "webhook", config: { url: "https://webhook.site/your-id", events: ["bank_statement.extraction.completed", "document.extraction.completed"] }, enabled: true }.
  4. Add a GSheet output (new sheet per job, under a folder): POST /tagLevel/:projectId/output-configs with { type: "gsheet", config: { folderId: "XYZ_FOLDER_ID", titlePrefix: "Bank Export" }, enabled: true }. The project must have Google Sheets and Drive scope linked.
  5. Trigger the watch: POST /api/v1/protected/tagLevel/gdrive-watch (e.g. from cron every 5 min). When you drop a bank statement PDF in the Drive folder, the next run will pick it up, create a job, and on completion send the webhook and create a new Google Sheet in the given folder.