File Category is a core concept in DocFlow for organizing and defining document types. Each file category can be configured with fields, tables, samples, etc., for document classification and intelligent extraction.
Core Concepts
Before using the file category APIs, understanding the following core concepts will help you better comprehend how the system works:
Sample Files
Sample files are typical example documents of the file category. DocFlow uses these samples to:
- Train classification models: Help the system identify and distinguish different types of documents
- Optimize extraction performance: Improve field extraction accuracy by learning format and layout patterns from samples
- Establish recognition templates: Provide reference benchmarks for automatically recognizing similar documents
Requirements: Each file category requires at least 1 sample file, with a maximum of 10. We recommend uploading 3-5 representative samples for best results.
Regular Fields
Regular fields refer to key information that exists in the document in non-table format. Each field contains a field name (key) and corresponding value. Fields may span across pages or rows.
Extraction Result Location: Field information is located at result.files[].data.fields[] in the extraction result, with each field containing:
key: Field name (e.g., “Invoice Code”, “Issue Date”)
value: Field value (extracted text content)
position[]: Position coordinate information in the document
Typical use cases:
- Invoice category: Invoice code, invoice number, issue date, buyer name, total amount
- Contract category: Contract number, party A name, party B name, signing date, contract amount
- ID card category: Name, gender, ethnicity, date of birth, ID number
Purpose of configuring fields:
- Explicitly tell the system which information to extract from documents
- Guide AI models for precise extraction through field descriptions and prompts
- Define data formats and validation rules for fields
Table Fields
Table fields refer to structured data in table format. DocFlow can recognize table structures in documents and convert table content into structured data format. Tables consist of multiple rows and columns, and each table can be configured with multiple fields (columns).
Extraction Result Location: Table information is located at result.files[].data.items[][] in the extraction result, using a two-dimensional array structure:
- Outer array: Represents table rows
- Inner array: Represents cells within a row
- Each cell contains
key (column name), value (cell value), and position (position coordinates)
Typical use cases:
- Invoice category: Item details table (goods/services name, specification, unit, quantity, unit price, amount)
- Reimbursement form: Expense details table (expense item, date, amount, remarks)
- Order category: Order details table (product name, quantity, unit price, subtotal)
Difference between table fields and regular fields:
- Regular fields: Non-table key-value pairs, returned in
result.files[].data.fields[], typically single information points in the document
- Table fields: Structured table data, returned in
result.files[].data.items[][], supporting extraction of multiple rows at once
- Use cases: Regular fields are suitable for fixed information in document headers and footers; table fields are suitable for detail lists with repetitive structured information
Getting Started
This guide introduces how to use file category-related APIs: create, list, update, and delete.
Create File Category
Create a new file category by uploading at least one sample file and configuring at least one field:
curl -X POST \
-H "x-ti-app-id: <your-app-id>" \
-H "x-ti-secret-code: <your-secret-code>" \
-F "workspace_id=<your-workspace-id>" \
-F "name=Invoice" \
-F "category_prompt=VAT invoice with fields such as invoice code, invoice number, etc." \
-F "extract_model=llm" \
-F "sample_files=@/path/to/invoice_sample.pdf" \
-F 'fields=[{"name":"Invoice Code","description":"Invoice code description","prompt":"Please extract the invoice code"}]' \
"https://docflow.textin.com/api/app-api/sip/platform/v2/category/create"
Request Parameters:
workspace_id (required): Workspace ID
name (required): File category name, max length 50
category_prompt (optional): Prompt for classification, max length 500
extract_model (required): Extraction model, options: llm, vlm
sample_files (required): Sample file list, at least one sample file required; maximum 10 sample files per category
fields (required): Field configuration list (JSON string), at least one field required, table fields can only be configured in the default table (table_id=-1)
Response Example:
{
"code": 200,
"msg": "success",
"result": {
"category_id": "1234567890"
}
}
List File Categories
Get all file categories in a workspace:
curl \
-H "x-ti-app-id: <your-app-id>" \
-H "x-ti-secret-code: <your-secret-code>" \
"https://docflow.textin.com/api/app-api/sip/platform/v2/category/list?workspace_id=<your-workspace-id>&page=1&page_size=20&enabled=1"
Request Parameters:
workspace_id (required): Workspace ID
page (optional): Page number, default is 1
page_size (optional): Items per page, default is 1000
enabled (optional): Status filter, options: all (All), 1 (Enabled), 0 (Disabled), 2 (Draft), default is 1
Response Example:
{
"code": 200,
"msg": "success",
"result": {
"total": 10,
"page": 1,
"page_size": 20,
"categories": [
{
"id": "1234567890",
"name": "Invoice",
"category_prompt": "VAT invoice with fields such as invoice code, invoice number, etc.",
"extract_model": "llm",
"enabled": 1
}
]
}
}
Update File Category
Update information for a specified file category:
curl -X POST \
-H "x-ti-app-id: <your-app-id>" \
-H "x-ti-secret-code: <your-secret-code>" \
-H "Content-Type: application/json" \
-d '{
"workspace_id": "<your-workspace-id>",
"category_id": "1234567890",
"name": "Updated Category Name",
"category_prompt": "Updated prompt",
"enabled": 1
}' \
"https://docflow.textin.com/api/app-api/sip/platform/v2/category/update"
Request Parameters:
workspace_id (required): Workspace ID
category_id (required): File category ID
name (optional): File category name, max length 50
category_prompt (optional): Prompt for classification, max length 500
enabled (optional): Status, 0: Disabled, 1: Enabled, 2: Draft
Delete File Category
Delete specified file category(s) (supports batch deletion):
curl -X POST \
-H "x-ti-app-id: <your-app-id>" \
-H "x-ti-secret-code: <your-secret-code>" \
-H "Content-Type: application/json" \
-d '{
"workspace_id": "<your-workspace-id>",
"category_ids": ["1234567890", "0987654321"]
}' \
"https://docflow.textin.com/api/app-api/sip/platform/v2/category/delete"
Request Parameters:
workspace_id (required): Workspace ID
category_ids (required): Array of file category IDs to delete
Deleting a file category will also delete all its fields, tables, and samples. Please proceed with caution.
Next Steps