本文介绍如何查询 DocFlow 任务的处理状态、失败原因和耗时信息。通过任务状态可以了解文件的处理进度和结果。
任务状态说明
DocFlow 中的任务状态通过 recognition_status 字段表示,具体状态如下:
| 状态值 | 状态名称 | 说明 | 
|---|
| 0 | 待识别 | 文件已上传,等待开始处理 | 
| 1 | 识别成功 | 文件处理完成,识别成功 | 
| 2 | 识别失败 | 文件处理失败,可查看失败原因 | 
| 3 | 分类中 | 正在进行文件分类处理 | 
| 4 | 抽取中 | 正在进行字段抽取处理 | 
| 5 | 准备中 | 任务准备阶段 | 
| 6 | 文件拆分中 | 正在进行文件拆分处理 | 
| 7 | 切图中 | 正在进行多图切分处理 | 
| 10 | 分类完成 | 仅分类模式下的完成状态 | 
| 20 | 解析中 | 正在进行文档解析处理 | 
查询任务状态
通过批次号查询
使用 batch_number 查询整个批次的任务状态:
curl \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&batch_number=<your-batch-number>"
通过文件ID查询
使用 file_id 查询特定文件的任务状态:
curl \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&file_id=<your-file-id>"
通过任务ID查询
使用 task_id 查询特定任务的状态:
curl \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&task_id=<your-task-id>"
查询任务耗时
任务处理完成后,可以通过 duration_ms 字段查看任务耗时(单位:毫秒):
import requests
import json
ti_app_id = "<your-app-id>"
ti_secret_code = "<your-secret-code>"
workspace_id = "<your-workspace-id>"
batch_number = "<your-batch-number>"
host = "https://docflow.textin.com"
url = "/api/app-api/sip/platform/v2/file/fetch"
resp = requests.get(
    url=f"{host}{url}",
    params={"workspace_id": workspace_id, "batch_number": batch_number},
    headers={"x-ti-app-id": ti_app_id, "x-ti-secret-code": ti_secret_code},
    timeout=60,
)
data = resp.json()
for f in data.get("result", {}).get("files", []):
    print(f"文件ID: {f['id']}")
    print(f"文件名: {f.get('name')}")
    print(f"任务状态: {f.get('recognition_status')}")
    
    # 查看任务耗时
    if f.get('duration_ms'):
        duration_seconds = f['duration_ms'] / 1000
        print(f"任务耗时: {duration_seconds:.2f} 秒")
    
    print("---")
失败原因分析
当任务状态为 2(识别失败)时,可以通过 failure_causes 字段查看具体的失败原因:
import requests
import json
ti_app_id = "<your-app-id>"
ti_secret_code = "<your-secret-code>"
workspace_id = "<your-workspace-id>"
batch_number = "<your-batch-number>"
host = "https://docflow.textin.com"
url = "/api/app-api/sip/platform/v2/file/fetch"
resp = requests.get(
    url=f"{host}{url}",
    params={"workspace_id": workspace_id, "batch_number": batch_number},
    headers={"x-ti-app-id": ti_app_id, "x-ti-secret-code": ti_secret_code},
    timeout=60,
)
data = resp.json()
for f in data.get("result", {}).get("files", []):
    if f.get('recognition_status') == 2:
        print(f"文件ID: {f['id']}")
        print(f"文件名: {f.get('name')}")
        print(f"失败原因: {f.get('failure_causes')}")
        print("---")
状态过滤查询
您可以通过 recognition_status 参数过滤特定状态的任务:
curl \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&recognition_status=2"
响应示例
查询任务状态的响应示例:
{
  "code": 200,
  "msg": "成功",
  "result": {
    "files": [
      {
        "id": "1955840505753140508",
        "task_id": "1981692246135111680",
        "name": "企业信用报告.pdf",
        "format": "pdf",
        "recognition_status": 1,
        "verification_status": 0,
        "category": "credit_report",
        "duration_ms": 15000,
        "failure_causes": null
      },
      {
        "id": "1955840505753140509",
        "task_id": "1981692246135111681",
        "name": "发票.pdf",
        "format": "pdf",
        "recognition_status": 2,
        "verification_status": 0,
        "category": "invoice",
        "duration_ms": 8000,
        "failure_causes": "文件格式不支持"
      }
    ]
  }
}