本文介绍如何查询 DocFlow 任务的处理状态、失败原因和耗时信息。通过任务状态可以了解文件的处理进度和结果。
DocFlow 采用异步处理模式,文件上传后会进入处理队列。您可以通过多种方式查询任务状态,了解处理进度、失败原因和任务耗时等信息。
任务状态说明
DocFlow 中的任务状态通过 recognition_status 字段表示,具体状态如下:
| 状态值 | 状态名称 | 说明 |
|---|
| 0 | 待识别 | 文件已上传,等待开始处理 |
| 1 | 识别成功 | 文件处理完成,识别成功 |
| 2 | 识别失败 | 文件处理失败,可查看失败原因 |
| 3 | 分类中 | 正在进行文件分类处理 |
| 4 | 抽取中 | 正在进行字段抽取处理 |
| 5 | 准备中 | 任务准备阶段 |
| 6 | 文件拆分中 | 正在进行文件拆分处理 |
| 7 | 切图中 | 正在进行多图切分处理 |
| 10 | 分类完成 | 仅分类模式下的完成状态 |
| 20 | 解析中 | 正在进行文档解析处理 |
查询任务状态
通过批次号查询
使用 batch_number 查询整个批次的任务状态:
curl \
-H "x-ti-app-id: <your-app-id>" \
-H "x-ti-secret-code: <your-secret-code>" \
"https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&batch_number=<your-batch-number>"
通过文件ID查询
使用 file_id 查询特定文件的任务状态:
curl \
-H "x-ti-app-id: <your-app-id>" \
-H "x-ti-secret-code: <your-secret-code>" \
"https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&file_id=<your-file-id>"
通过任务ID查询
使用 task_id 查询特定任务的状态:
curl \
-H "x-ti-app-id: <your-app-id>" \
-H "x-ti-secret-code: <your-secret-code>" \
"https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&task_id=<your-task-id>"
查询任务耗时
任务处理完成后,可以通过 duration_ms 字段查看任务耗时(单位:毫秒):
import requests
import json
ti_app_id = "<your-app-id>"
ti_secret_code = "<your-secret-code>"
workspace_id = "<your-workspace-id>"
batch_number = "<your-batch-number>"
host = "https://docflow.textin.com"
url = "/api/app-api/sip/platform/v2/file/fetch"
resp = requests.get(
url=f"{host}{url}",
params={"workspace_id": workspace_id, "batch_number": batch_number},
headers={"x-ti-app-id": ti_app_id, "x-ti-secret-code": ti_secret_code},
timeout=60,
)
data = resp.json()
for f in data.get("result", {}).get("files", []):
print(f"文件ID: {f['id']}")
print(f"文件名: {f.get('name')}")
print(f"任务状态: {f.get('recognition_status')}")
# 查看任务耗时
if f.get('duration_ms'):
duration_seconds = f['duration_ms'] / 1000
print(f"任务耗时: {duration_seconds:.2f} 秒")
print("---")
失败原因分析
当任务状态为 2(识别失败)时,可以通过 failure_causes 字段查看具体的失败原因:
import requests
import json
ti_app_id = "<your-app-id>"
ti_secret_code = "<your-secret-code>"
workspace_id = "<your-workspace-id>"
batch_number = "<your-batch-number>"
host = "https://docflow.textin.com"
url = "/api/app-api/sip/platform/v2/file/fetch"
resp = requests.get(
url=f"{host}{url}",
params={"workspace_id": workspace_id, "batch_number": batch_number},
headers={"x-ti-app-id": ti_app_id, "x-ti-secret-code": ti_secret_code},
timeout=60,
)
data = resp.json()
for f in data.get("result", {}).get("files", []):
if f.get('recognition_status') == 2:
print(f"文件ID: {f['id']}")
print(f"文件名: {f.get('name')}")
print(f"失败原因: {f.get('failure_causes')}")
print("---")
状态过滤查询
您可以通过 recognition_status 参数过滤特定状态的任务:
curl \
-H "x-ti-app-id: <your-app-id>" \
-H "x-ti-secret-code: <your-secret-code>" \
"https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&recognition_status=2"
响应示例
查询任务状态的响应示例:
{
"code": 200,
"msg": "成功",
"result": {
"files": [
{
"id": "1955840505753140508",
"task_id": "1981692246135111680",
"name": "企业信用报告.pdf",
"format": "pdf",
"recognition_status": 1,
"verification_status": 0,
"category": "credit_report",
"duration_ms": 15000,
"failure_causes": null
},
{
"id": "1955840505753140509",
"task_id": "1981692246135111681",
"name": "发票.pdf",
"format": "pdf",
"recognition_status": 2,
"verification_status": 0,
"category": "invoice",
"duration_ms": 8000,
"failure_causes": "文件格式不支持"
}
]
}
}