Feature Overview

Document splitting can handle scenarios where a multi-page file contains multiple different types of documents, or multiple tickets are pasted on one page. Document splitting includes two types of capabilities: file splitting (split) and multi-image cropping (crop)

File Splitting

Automatically split long documents into single pages or sub-documents. Use cases:
  1. Medical insurance claims scenario: a multi-page file with pages 1-2 being insurance policies, pages 3-5 being invoices, and pages 6-10 being hospitalization records.
  2. Logistics import/export scenario: page 1 is an export customs declaration, page 2 is an invoice, page 3 is a packing list, and page 4 is a sales contract.
📖 Detailed Documentation: See File Splitting Feature Detailed Documentation for more use cases, API parameters, and example code.

Multi-Image Cropping

Used for cropping when one page contains multiple independent tickets. Use cases:
  1. Expense reimbursement pasting invoices scenario: an A4 paper with train tickets, flight itineraries, multiple taxi invoices laid flat
📖 Detailed Documentation: See Multi-Image Cropping Feature Detailed Documentation for more use cases, API parameters, and example code.
Through file splitting and multi-image cropping, different file categories in files can be split out, which can help subsequent extraction processes more accurately extract information from various types of files.

Feature Enable Parameters

File splitting and multi-image cropping functions are disabled by default.
In the upload interface, the following two parameters control whether to enable file splitting and cropping functions:
  • split_flag Whether to perform file splitting, default false
  • crop_flag Whether to perform multi-image cropping, default false
File splitting and multi-image cropping functions can be used in combination, for example, only enabling file splitting or only enabling multi-image cropping, or enabling both file splitting and multi-image cropping simultaneously. Request example:
curl -X POST \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  -F "file=@/path/to/long.pdf" \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/upload?workspace_id=<your-workspace-id>&split_flag=true"

Query Splitting Results

Use file/fetch to query. The child_files field in the response describes sub-task information:
curl
curl \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&batch_number=<your-batch-number>"
Key fields in the return structure include:
  • files[].child_files[]: Sub-task list (exists when splitting or cropping)
Where files[].child_files[].task_type indicates the splitting task type:
  • task_type=0, indicates the sub-file was generated by file splitting
  • task_type=3, indicates the sub-file was generated by multi-image cropping
When files are generated by multi-image cropping, the field child_files[].from_parent_position_list represents the coordinates of the cropping result in the original image. The coordinate representation can refer to the Coordinate System Description.

Return Example (Excerpt)

{
  "code": 200,
  "result": {
    "files": [
      {
        "id": "parent-001",
        "name": "multi-photo.pdf",
        "child_files": [
          {
            "id": "child-001",
            "task_id": "t-1",
            "task_type": 0,
            "name": "multi-photo.pdf#1",
            "format": "pdf",
            "category": "invoice",
            "from_parent_position_list": [12, 30, 420, 30, 420, 320, 12, 320]
          }
        ],
      }
    ]
  }
}