Feature Overview

For complex documents containing multiple files of multiple categories, the file splitting function supports intelligent recognition of document content to achieve automatic splitting and classification of documents.

Use Cases

1. Medical Insurance Claims Scenario

A multi-page file contains:
  • Pages 1-2: Insurance policy information
  • Pages 3-5: Medical invoices
  • Pages 6-10: Hospitalization records
Through the file splitting function, these three different types of documents can be split separately, facilitating subsequent classification and extraction processing.

2. Logistics Import/Export Scenario

One file contains:
  • Page 1: Export customs declaration
  • Page 2: Commercial invoice
  • Page 3: Packing List
  • Page 4: Sales contract
The file splitting function can intelligently split by document type.

API Parameter Configuration

Enable File Splitting Function

Set split_flag=true in the upload interface to enable the file splitting function:
curl -X POST \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  -F "file=@/path/to/multi-page-document.pdf" \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/upload?workspace_id=<your-workspace-id>&split_flag=true"

Parameter Description

Parameter NameTypeDefault ValueDescription
split_flagbooleanfalseWhether to enable file splitting function

Example Code

import requests
import json

def upload_with_split(file_path, workspace_id, app_id, secret_code):
    """
    Upload file and enable file splitting function
    """
    url = "https://docflow.textin.com/api/app-api/sip/platform/v2/file/upload"
    
    headers = {
        "x-ti-app-id": app_id,
        "x-ti-secret-code": secret_code
    }
    
    params = {
        "workspace_id": workspace_id,
        "split_flag": "true"  # Enable file splitting function
    }
    
    with open(file_path, 'rb') as file:
        files = {'file': file}
        response = requests.post(url, headers=headers, params=params, files=files)
    
    return response.json()

def fetch_split_results(workspace_id, batch_number, app_id, secret_code):
    """
    Query file splitting results
    """
    url = "https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch"
    
    headers = {
        "x-ti-app-id": app_id,
        "x-ti-secret-code": secret_code
    }
    
    params = {
        "workspace_id": workspace_id,
        "batch_number": batch_number
    }
    
    response = requests.get(url, headers=headers, params=params)
    return response.json()

# Usage example
if __name__ == "__main__":
    # Configuration information
    WORKSPACE_ID = "your-workspace-id"
    APP_ID = "your-app-id"
    SECRET_CODE = "your-secret-code"
    FILE_PATH = "/path/to/multi-page-document.pdf"
    
    # Upload file and enable file splitting
    upload_result = upload_with_split(FILE_PATH, WORKSPACE_ID, APP_ID, SECRET_CODE)
    print("Upload result:", json.dumps(upload_result, indent=2, ensure_ascii=False))
    
    # Get batch number
    batch_number = upload_result.get("result", {}).get("batch_number")
    
    if batch_number:
        # Query file splitting results
        fetch_result = fetch_split_results(WORKSPACE_ID, batch_number, APP_ID, SECRET_CODE)
        print("File splitting result:", json.dumps(fetch_result, indent=2, ensure_ascii=False))

Return Result Description

File Splitting Result Structure

After enabling the file splitting function, the result returned by the file/fetch interface will include the child_files field, which describes the information of sub-documents after splitting:
{
  "code": 200,
  "result": {
    "files": [
      {
        "id": "parent-file-001",
        "name": "multi-document.pdf",
        "format": "pdf",
        "child_files": [
          {
            "id": "child-001",
            "task_id": "task-001",
            "task_type": 0,  // 0 indicates sub-file generated by file splitting
            "name": "multi-document.pdf#1",
            "format": "pdf",
            "category": "invoice",
          },
          {
            "id": "child-002", 
            "task_id": "task-002",
            "task_type": 0,
            "name": "multi-document.pdf#2",
            "format": "pdf",
            "category": "contract",
          }
        ]
      }
    ]
  }
}

Key Field Description

Field NameTypeDescription
child_filesarrayList of sub-files after splitting
child_files[].idstringUnique identifier of sub-file
child_files[].task_typeintegerTask type, 0 indicates generated by file splitting
child_files[].categorystringDocument classification result
child_files[].pagesstringPage information of sub-files after splitting, including page numbers of sub-files in the original file