Project I/O & automation
Configure input sources (e.g. watch a Google Drive or Dropbox folder) and output destinations (webhooks, Google Sheets, Drive folder) per project. When a job completes for a project, all enabled output configs run (e.g. POST to your webhook and create a new Google Sheet per job, or upload to a Drive folder).
Overview
- Input configs: Watch a Google Drive folder; when new PDF/image files appear they are auto-ingested, a bank-statement job is created and tagged to the project. Use push notifications (recommended): when you create or update a
gdrive_folderinput withenabled: true, the API registers a Drive push channel; Google POSTs to your webhook when the folder changes. SetDRIVE_WEBHOOK_BASE_URL(e.g.https://www.docuclipper.com) so the webhook URL is reachable. Channels expire in ~24h and are renewed by theGDriveWatchRenewaljob (run hourly/daily). Alternatively, use polling viaPOST /tagLevel/gdrive-watch. - Output configs: When a job for that project completes, results are sent to each enabled output (webhook URL, GSheet, Drive folder).
- Projects are TagLevels with
type = "project". Base path:/api/v1/tagLevel.
Input configs
List, create, update, and delete input sources for a project. Types: gdrive_folder, dropbox_folder, email, api. Each config can set documentType, templateId, and type-specific config (e.g. { "folderId": "..." } for GDrive).
| Method | Path |
|---|---|
| GET | /tagLevel/:projectId/input-configs |
| POST | /tagLevel/:projectId/input-configs |
| PUT | /tagLevel/:projectId/input-configs/:id |
| DELETE | /tagLevel/:projectId/input-configs/:id |
POST body for create: { type, config?, documentType?, templateId?, enabled? }. For gdrive_folder, config must include folderId (Drive folder ID to watch). Use documentType to choose how each new file is processed:
bank_statement(default) – ExtractData with bank modeinvoice– Invoice job typeform– Form job typetemplate– ExtractData with a specific template (requirestemplateId)generic– Generic extraction
New files (PDF/images) are ingested and a job is created with the chosen type, then tagged to the project.
Push vs polling
Push (default): When you create or update a gdrive_folder input with enabled: true and a valid folderId, the server registers a Drive push channel. Google sends POST requests to {baseUrl}/api/v1/webhooks/google-drive when the folder changes. Configure DRIVE_WEBHOOK_BASE_URL (HTTPS, reachable by Google). Run GDriveWatchRenewal (e.g. hourly) to renew channels before they expire (~24h).
Polling fallback: Call POST /api/v1/protected/tagLevel/gdrive-watch to process all enabled gdrive_folder configs once (e.g. from a cron every 5 minutes).
| Method | Path |
|---|---|
| POST | /tagLevel/gdrive-watch |
Returns 202 with { ok: true, message: "GDrive folder watch queued" }.
Output configs
List, create, update, and delete output destinations. Types: webhook, gsheet, gdrive_folder, s3. For webhook, config must include url and events (array of event names from the webhook events list). A secret is auto-generated for signing. For gsheet, a new spreadsheet is created for each job; config may include folderId (optional Drive folder to add the sheet to) and titlePrefix (e.g. title becomes "My Export Job 123"). For gdrive_folder, config must include folderId (and optionally formats: ["json","csv"]).
| Method | Path |
|---|---|
| GET | /tagLevel/:projectId/output-configs |
| GET | /tagLevel/:projectId/output-configs/events |
| POST | /tagLevel/:projectId/output-configs |
| PUT | /tagLevel/:projectId/output-configs/:id |
| DELETE | /tagLevel/:projectId/output-configs/:id |
POST body for webhook: { type: "webhook", config: { url, events }, format?, enabled? }. Response includes config.secret for verifying signatures (same as contract webhooks).
Drive folder picker (for output config)
To let users select or create a folder when configuring a gdrive_folder (or optional gsheet folder), use these endpoints. The project must have a Google Drive integration linked (TagLevelIntegration). List folders under an optional parent, or create a new folder; use the returned id as config.folderId in the output config.
| Method | Path |
|---|---|
| GET | /tagLevel/:projectId/drive/folders |
| POST | /tagLevel/:projectId/drive/folders |
GET: optional query parentId (default: root). Returns { folders: [{ id, name, parents? }] }. POST body: { name, parentId? }. Returns { id, name }, use id as config.folderId.
Linking jobs to a project
When a job completes, the project is resolved from the job's tag (TagLevelEntity with entityType "Job" and entityId = job id). If that tag level is a project, all enabled project output configs run (webhook, new GSheet, Drive upload). Contract-level webhooks always run as well.
Example: GDrive → bank statement → webhook + GSheet
- Create a project (TagLevel with type
project) and link a Google Drive integration to it. - Add an input config:
POST /tagLevel/:projectId/input-configswith{ type: "gdrive_folder", config: { folderId: "YOUR_DRIVE_FOLDER_ID" }, enabled: true }. - Add a webhook output:
POST /tagLevel/:projectId/output-configswith{ type: "webhook", config: { url: "https://webhook.site/your-id", events: ["bank_statement.extraction.completed", "document.extraction.completed"] }, enabled: true }. - Add a GSheet output (new sheet per job, under a folder):
POST /tagLevel/:projectId/output-configswith{ type: "gsheet", config: { folderId: "XYZ_FOLDER_ID", titlePrefix: "Bank Export" }, enabled: true }. The project must have Google Sheets and Drive scope linked. - Trigger the watch:
POST /api/v1/protected/tagLevel/gdrive-watch(e.g. from cron every 5 min). When you drop a bank statement PDF in the Drive folder, the next run will pick it up, create a job, and on completion send the webhook and create a new Google Sheet in the given folder.