Sample files are example documents for file categories, used to train and optimize document classification and extraction models. Each file category requires at least one sample file, with a maximum of ten sample files. This guide introduces how to manage sample files for file categories via API.
Upload Sample
Upload sample file(s) for a specified file category:
curl -X POST \
-H "x-ti-app-id: <your-app-id>" \
-H "x-ti-secret-code: <your-secret-code>" \
-F "workspace_id=<your-workspace-id>" \
-F "category_id=<category-id>" \
-F "file=@/path/to/sample.pdf" \
"https://docflow.textin.com/api/app-api/sip/platform/v2/category/sample/upload"
Request Parameters:
workspace_id (required): Workspace ID
category_id (required): File category ID
file (required): Sample file
Response Example:
{
"code" : 200 ,
"msg" : "success" ,
"result" : {
"sample_id" : "sample_123"
}
}
Sample file requirements :
Sample files should be typical representative documents of the category
Recommended to upload 3-5 sample files to improve classification accuracy
Supported file formats refer to the supported formats for document upload
List Samples
Get the sample list for a specified file category:
curl \
-H "x-ti-app-id: <your-app-id>" \
-H "x-ti-secret-code: <your-secret-code>" \
"https://docflow.textin.com/api/app-api/sip/platform/v2/category/sample/list?workspace_id=<your-workspace-id>&category_id=<category-id>&page=1&page_size=20"
Request Parameters:
workspace_id (required): Workspace ID
category_id (required): File category ID
page (optional): Page number, default is 1
page_size (optional): Items per page, default is 20, maximum is 100
Response Example:
{
"code" : 200 ,
"msg" : "success" ,
"result" : {
"total" : 5 ,
"page" : 1 ,
"page_size" : 20 ,
"samples" : [
{
"sample_id" : "sample_123" ,
"file_name" : "invoice_sample_01.pdf"
},
{
"sample_id" : "sample_456" ,
"file_name" : "invoice_sample_02.pdf"
}
]
}
}
Download Sample
Download a specified sample file:
curl \
-H "x-ti-app-id: <your-app-id>" \
-H "x-ti-secret-code: <your-secret-code>" \
-o "sample.pdf" \
"https://docflow.textin.com/api/app-api/sip/platform/v2/category/sample/download?workspace_id=<your-workspace-id>&category_id=<category-id>&sample_id=<sample-id>"
Request Parameters:
workspace_id (required): Workspace ID
category_id (required): File category ID
sample_id (required): Sample ID
Response : Returns file binary stream (application/octet-stream)
Delete Sample
Delete specified sample file(s), supporting batch deletion:
curl -X POST \
-H "x-ti-app-id: <your-app-id>" \
-H "x-ti-secret-code: <your-secret-code>" \
-H "Content-Type: application/json" \
-d '{
"workspace_id": "<your-workspace-id>",
"category_id": "<category-id>",
"sample_ids": ["sample_123", "sample_456"]
}' \
"https://docflow.textin.com/api/app-api/sip/platform/v2/category/sample/delete"
Request Parameters:
workspace_id (required): Workspace ID
category_id (required): File category ID
sample_ids (required): Array of sample IDs to delete
Sample deletion is irreversible. Please proceed with caution. Each file category must have at least one sample file.
Sample Management Best Practices
Sample Selection
Choosing appropriate sample files is crucial for improving classification and extraction accuracy:
Representativeness : Select typical representative documents of the category
Diversity : Cover different formats and layouts that may appear in the category
Quality : Ensure sample files are clear, complete, and undamaged
Quantity : Recommended to upload 3-5 sample files
Sample Quantity Guidelines
Minimum : At least 1 sample per category (required when creating category)
Recommended : 3-5 samples provide good classification results
Maximum : A maximum of 10 samples per category
Sample Update Strategy
When classification or extraction results are unsatisfactory, consider:
Adding new samples : Upload more typical samples
Replacing samples : Delete atypical samples and upload better ones
Sample diversification : Ensure samples cover various possible document formats
Example: Batch Upload Samples
import requests
import os
ti_app_id = "<your-app-id>"
ti_secret_code = "<your-secret-code>"
workspace_id = "<your-workspace-id>"
category_id = "<category-id>"
host = "https://docflow.textin.com"
url = "/api/app-api/sip/platform/v2/category/sample/upload"
# Sample files directory
sample_dir = "/path/to/samples"
sample_files = [f for f in os.listdir(sample_dir) if f.endswith( '.pdf' )]
print ( f "Preparing to upload { len (sample_files) } sample files" )
for filename in sample_files:
file_path = os.path.join(sample_dir, filename)
with open (file_path, 'rb' ) as f:
files = { 'file' : f}
data = {
'workspace_id' : workspace_id,
'category_id' : category_id
}
resp = requests.post(
url = f " { host }{ url } " ,
data = data,
files = files,
headers = {
"x-ti-app-id" : ti_app_id,
"x-ti-secret-code" : ti_secret_code,
},
timeout = 60 ,
)
result = resp.json()
if result.get( "code" ) == 200 :
sample_id = result.get( "result" , {}).get( "sample_id" )
print ( f "✓ { filename } uploaded successfully, ID: { sample_id } " )
else :
print ( f "✗ { filename } upload failed: { result.get( 'msg' ) } " )
print ( "Sample upload completed" )
See all 45 lines
Next Steps