跳转到主要内容
本文介绍如何查询 DocFlow 任务的处理状态、失败原因和耗时信息。通过任务状态可以了解文件的处理进度和结果。
DocFlow 采用异步处理模式,文件上传后会进入处理队列。您可以通过多种方式查询任务状态,了解处理进度、失败原因和任务耗时等信息。

任务状态说明

DocFlow 中的任务状态通过 recognition_status 字段表示,具体状态如下:
状态值状态名称说明
0待识别文件已上传,等待开始处理
1识别成功文件处理完成,识别成功
2识别失败文件处理失败,可查看失败原因
3分类中正在进行文件分类处理
4抽取中正在进行字段抽取处理
5准备中任务准备阶段
6文件拆分中正在进行文件拆分处理
7切图中正在进行多图切分处理
10分类完成仅分类模式下的完成状态
20解析中正在进行文档解析处理

查询任务状态

通过批次号查询

使用 batch_number 查询整个批次的任务状态:
curl \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&batch_number=<your-batch-number>"

通过文件ID查询

使用 file_id 查询特定文件的任务状态:
curl \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&file_id=<your-file-id>"

通过任务ID查询

使用 task_id 查询特定任务的状态:
curl \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&task_id=<your-task-id>"

查询任务耗时

任务处理完成后,可以通过 duration_ms 字段查看任务耗时(单位:毫秒):
import requests
import json

ti_app_id = "<your-app-id>"
ti_secret_code = "<your-secret-code>"
workspace_id = "<your-workspace-id>"
batch_number = "<your-batch-number>"

host = "https://docflow.textin.com"
url = "/api/app-api/sip/platform/v2/file/fetch"

resp = requests.get(
    url=f"{host}{url}",
    params={"workspace_id": workspace_id, "batch_number": batch_number},
    headers={"x-ti-app-id": ti_app_id, "x-ti-secret-code": ti_secret_code},
    timeout=60,
)

data = resp.json()
for f in data.get("result", {}).get("files", []):
    print(f"文件ID: {f['id']}")
    print(f"文件名: {f.get('name')}")
    print(f"任务状态: {f.get('recognition_status')}")
    
    # 查看任务耗时
    if f.get('duration_ms'):
        duration_seconds = f['duration_ms'] / 1000
        print(f"任务耗时: {duration_seconds:.2f} 秒")
    
    print("---")

失败原因分析

当任务状态为 2(识别失败)时,可以通过 failure_causes 字段查看具体的失败原因:
import requests
import json

ti_app_id = "<your-app-id>"
ti_secret_code = "<your-secret-code>"
workspace_id = "<your-workspace-id>"
batch_number = "<your-batch-number>"

host = "https://docflow.textin.com"
url = "/api/app-api/sip/platform/v2/file/fetch"

resp = requests.get(
    url=f"{host}{url}",
    params={"workspace_id": workspace_id, "batch_number": batch_number},
    headers={"x-ti-app-id": ti_app_id, "x-ti-secret-code": ti_secret_code},
    timeout=60,
)

data = resp.json()
for f in data.get("result", {}).get("files", []):
    if f.get('recognition_status') == 2:
        print(f"文件ID: {f['id']}")
        print(f"文件名: {f.get('name')}")
        print(f"失败原因: {f.get('failure_causes')}")
        print("---")

状态过滤查询

您可以通过 recognition_status 参数过滤特定状态的任务:
curl \
  -H "x-ti-app-id: <your-app-id>" \
  -H "x-ti-secret-code: <your-secret-code>" \
  "https://docflow.textin.com/api/app-api/sip/platform/v2/file/fetch?workspace_id=<your-workspace-id>&recognition_status=2"

响应示例

查询任务状态的响应示例:
{
  "code": 200,
  "msg": "成功",
  "result": {
    "files": [
      {
        "id": "1955840505753140508",
        "task_id": "1981692246135111680",
        "name": "企业信用报告.pdf",
        "format": "pdf",
        "recognition_status": 1,
        "verification_status": 0,
        "category": "credit_report",
        "duration_ms": 15000,
        "failure_causes": null
      },
      {
        "id": "1955840505753140509",
        "task_id": "1981692246135111681",
        "name": "发票.pdf",
        "format": "pdf",
        "recognition_status": 2,
        "verification_status": 0,
        "category": "invoice",
        "duration_ms": 8000,
        "failure_causes": "文件格式不支持"
      }
    ]
  }
}