Amend File Category (File Split)

Overview

For completed file split tasks, if the category of split child files is incorrect or you need to adjust the page range of child files, you can use the amend category API to modify the category and page numbers of split files.

This API is used to modify the category and page numbers of child files generated by file split tasks (task_type = 2, parent task). You need to first obtain the parent task’s task_id.

Use Cases

Split Category Correction: After automatic splitting, some child files have incorrect category recognition and need manual correction
Page Range Adjustment: The page range of split files needs adjustment, such as merging or re-dividing multiple child files

API Endpoint

Endpoint: POST /api/app-api/sip/platform/v2/file/amend_category

Request Parameters

Parameter	Type	Required	Description
`workspace_id`	string	Yes	Workspace ID
`task_id`	string	Yes	Parent task ID (file split task ID)
`split_tasks`	array	Yes	File split task list, each element contains `category` and `pages`

split_tasks Parameter Description

Parameter	Type	Required	Description
`category`	string	Yes	Child task file category
`pages`	array	Yes	Child file page number array, starting from 0

Parameter Description

task_id: Parent task ID (task_type = 2), can be obtained through the file/fetch API
category: New file category name, must be a file category already configured in the DocFlow workspace. If a child file doesn’t need category modification, you can keep the original category unchanged
pages: Page number array indicating the original file pages contained in this child file. For example, [0, 1] means pages 1 and 2 (starting from 0). If a child file doesn’t need page number modification, you can keep the original page numbers unchanged

Important: The split_tasks array must contain all split child file information, even if some child files don’t need category or page number modifications. If only partial child file information is submitted, the unlisted child files will be deleted or cause processing exceptions.

Example Code

# Important: Must include all split child file information
# Assuming the original file is split into 3 child files, even if you only need to modify the first child file's category,
# you must include information for all 3 child files
curl -X POST \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  -H "Content-Type: application/json" \
  -d '{
    "workspace_id": "1234567890",
    "task_id": "1234567890",
    "split_tasks": [
      {
        "category": "Electronic Invoice (Regular)",
        "pages": [0, 1]
      },
      {
        "category": "Contract",
        "pages": [2, 3, 4]
      },
      {
        "category": "Receipt",
        "pages": [5, 6]
      }
    ]
  }' \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/amend_category"

Get Parent Task ID and Child File Information

Before modifying the file category, you need to obtain the parent task’s task_id and child file information. You can query it through the file/fetch API:

curl \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&file_id=<your-file-id>"

Response

After successfully modifying the file category, the API returns a success response:

{
  "code": 200,
  "msg": "success"
}

Complete Example

The following is a complete example showing how to query file split task information and then modify child file categories and page numbers:

Python

import requests
import json

def get_split_task_info(workspace_id, file_id, app_id, secret_code):
    """Get file split task information"""
    url = "https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch"
    headers = {
        "x-ti-app-id": app_id,
        "x-ti-secret-code": secret_code
    }
    params = {"workspace_id": workspace_id, "file_id": file_id}
    response = requests.get(url, headers=headers, params=params)
    return response.json()

def amend_split_category(workspace_id, task_id, split_tasks, app_id, secret_code):
    """Amend category and page numbers for file split tasks"""
    url = "https://docflow.textin.com/api/app-api/sip/platform/v2/file/amend_category"
    headers = {
        "x-ti-app-id": app_id,
        "x-ti-secret-code": secret_code,
        "Content-Type": "application/json"
    }
    payload = {
        "workspace_id": workspace_id,
        "task_id": task_id,
        "split_tasks": split_tasks
    }
    response = requests.post(url, headers=headers, json=payload)
    return response.json()

# Usage example
WORKSPACE_ID = "1234567890"
FILE_ID = "202412190001"
APP_ID = "<your-app-id>"
SECRET_CODE = "<your-secret-code>"

# 1. Get file split task information
file_info = get_split_task_info(WORKSPACE_ID, FILE_ID, APP_ID, SECRET_CODE)
files = file_info.get("result", {}).get("files", [])
if files:
    file_data = files[0]
    parent_task_id = file_data.get("task_id")
    task_type = file_data.get("task_type")
    child_files = file_data.get("child_files", [])
    
    # Confirm it's a file split parent task (task_type = 2)
    if task_type == 2:
        print(f"Parent task ID: {parent_task_id}")
        print("Current child file information:")
        
        # Build split_tasks parameter
        # Important: Must include all child file information, even if some don't need modification
        split_tasks = []
        for child in child_files:
            if child.get("task_type") == 0:  # Child files generated by file split
                # Get current page number information
                pages_info = child.get("pages", [])
                if isinstance(pages_info, list) and pages_info:
                    # If pages is an object array, extract page numbers
                    pages = [p.get("page") if isinstance(p, dict) else p for p in pages_info]
                elif isinstance(pages_info, dict):
                    # If pages is a dictionary, try to extract pages array
                    pages = pages_info.get("pages", [])
                else:
                    # If pages is a simple array, use directly
                    pages = pages_info if isinstance(pages_info, list) else []
                
                current_category = child.get("category")
                print(f"  - Category: {current_category}, Pages: {pages}")
                
                # Example: Only modify the first child file's category, keep others unchanged
                # Note: Must include all child files, even if they don't need modification
                if len(split_tasks) == 0:
                    # Modify the first child file's category
                    split_tasks.append({
                        "category": "Electronic Invoice (Regular)",  # New category
                        "pages": pages  # Keep original page numbers
                    })
                else:
                    # Other child files keep original category and page numbers
                    split_tasks.append({
                        "category": current_category,  # Keep original category
                        "pages": pages  # Keep original page numbers
                    })
        
        # 2. Amend file category and page numbers
        # Ensure all child file information is included
        if split_tasks:
            print(f"\nPreparing to submit information for {len(split_tasks)} child files")
            result = amend_split_category(WORKSPACE_ID, parent_task_id, split_tasks, APP_ID, SECRET_CODE)
            print(f"Amendment result: {json.dumps(result, indent=2, ensure_ascii=False)}")
    else:
        print(f"This task is not a file split parent task (task_type={task_type})")

Page Number Notes

Page numbers start from 0, meaning page 1 corresponds to page number 0, page 2 corresponds to page number 1, and so on
The pages array indicates the original file page numbers contained in this child file

Notes

Must Include All Child Files: The split_tasks array must contain all split child file information, even if some child files don’t need category or page number modifications. If only partial child file information is submitted, the unlisted child files will be deleted or cause processing exceptions
Task Type Restriction: Only file split parent tasks (task_type = 2) support using the split_tasks parameter
Category Must Exist: The specified category must already be configured in the DocFlow workspace, otherwise an error will be returned
Category Name Matching: Category names must exactly match the configuration (case-sensitive)
Page Range: Ensure page numbers in the pages array are within valid range (0 to total pages - 1), and page numbers cannot be duplicated
No Duplicate Pages: Each page number can only appear in one child file, no overlapping allowed
Reprocessing After Modification: After modifying file categories and page numbers, the system will reprocess data according to the new categories and page ranges

Overview

Upload

Parse

Classify

Split

Extract

Delete Tasks

Retry Tasks

FAQ

Amend File Category (File Split)

Overview

Use Cases

API Endpoint

Request Parameters

split_tasks Parameter Description

Parameter Description

Example Code

Get Parent Task ID and Child File Information

Response

Complete Example

Page Number Notes

Notes

Overview

Upload

Parse

Classify

Split

Extract

Delete Tasks

Retry Tasks

FAQ

​Overview

​Use Cases

​API Endpoint

​Request Parameters

​split_tasks Parameter Description

​Parameter Description

​Example Code

​Get Parent Task ID and Child File Information

​Response

​Complete Example

​Page Number Notes

​Notes

Overview

Use Cases

API Endpoint

Request Parameters

split_tasks Parameter Description

Parameter Description

Example Code

Get Parent Task ID and Child File Information

Response

Complete Example

Page Number Notes

Notes