Extract Specific Fields

For tasks that have completed extraction, you can use the extract specific fields API to extract additional fields or re-extract individual existing fields for the task. This API returns the complete extraction results of all fields.

Features

Extract Additional Fields: Add new field extraction for tasks that have completed extraction
Re-extract Fields: Re-extract existing fields, which can be used to correct or optimize extraction results
Support Table Fields: Can extract specific fields from tables
Return Complete Results: Returns complete extraction results of all fields, with the same structure as /api/app-api/sip/platform/v2/file/fetch

Field Extraction Rules

The API adopts different extraction strategies based on whether the field exists in the original classification configuration:

Additional Fields (Fields Not in Original Results)

For fields that do not exist in the original extraction results (additional fields), the system will use the prompt provided in the request for extraction:

If a prompt is provided in the request, it will be used to guide field extraction
If no prompt is provided in the request, the default extraction logic will be used

Use Case: When you need to extract new fields for documents that are not defined in the classification configuration.

Configured Fields (Fields Already in Original Classification)

For fields that already exist in the original classification configuration, the system will prioritize using the settings from the classification configuration for extraction:

Use the prompt from the classification configuration (if configured)
Apply post-processing rules from the classification configuration
Ignore the prompt parameter passed in the request

Use Case: Re-extracting existing fields to ensure unified classification configuration rules are used, maintaining consistency in extraction results.

Usage Recommendations

Extract New Fields: Provide a prompt in the request, and the system will use that prompt for extraction
Re-extract Existing Fields: Simply specify the key, and the system will automatically use the rules from the classification configuration; no need to provide a prompt in the request

API Endpoint

Endpoint: POST /api/app-api/sip/platform/v2/file/extract_fields Request Parameters:

Parameter	Type	Required	Description
workspace_id	string	Yes	Workspace ID
task_id	string	Yes	Task ID
fields	array	No	List of fields to extract, each field contains `key` (field name) and `prompt` (field hint, optional)
tables	array	No	List of table fields to extract, each table contains `name` (table name) and `fields` (field list)

Field Structure

ExtractFieldReqVO:

{
  "key": "Invoice Code",        // Field name
  "prompt": "Keep only the year part"  // Field hint (optional)
}

Table Structure:

{
  "name": "Table1",        // Table name
  "fields": [              // Field list
    {
      "key": "Goods Name",
      "prompt": "Extract full product name"
    }
  ]
}

Example Code

import requests
import json

def extract_specific_fields(workspace_id, task_id, app_id, secret_code):
    """Extract specific fields"""
    
    host = "https://docflow.textin.com"
    url = "/api/app-api/sip/platform/v2/file/extract_fields"
    
    # Request body
    payload = {
        "workspace_id": workspace_id,
        "task_id": task_id,
        "fields": [
            {
                "key": "Invoice Code",
                "prompt": "Extract complete invoice code"
            },
            {
                "key": "Invoice Date",
                "prompt": "Keep only the year part"
            }
        ],
        "tables": [
            {
                "name": "Table1",
                "fields": [
                    {
                        "key": "Goods Name",
                        "prompt": "Extract full product name"
                    },
                    {
                        "key": "Unit Price"
                    }
                ]
            }
        ]
    }
    
    resp = requests.post(
        f"{host}{url}",
        json=payload,
        headers={
            "x-ti-app-id": app_id,
            "x-ti-secret-code": secret_code,
            "Content-Type": "application/json"
        },
        timeout=60,
    )
    
    if resp.status_code != 200:
        print(f"Request failed: {resp.status_code}")
        print(f"Error message: {resp.text}")
        return None
    
    data = resp.json()
    
    if data.get("code") != 200:
        print(f"API returned error: {data.get('message')}")
        return None
    
    # Process returned results
    result = data.get("result", {})
    files = result.get("files", [])
    
    for file in files:
        print(f"File name: {file.get('name')}")
        print(f"Task ID: {file.get('task_id')}")
        
        # Extract field information
        file_data = file.get("data", {})
        fields = file_data.get("fields", [])
        
        if fields:
            print("\n=== Field Information ===")
            for field in fields:
                key = field.get("key", "")
                value = field.get("value", "")
                positions = field.get("position", [])
                
                print(f"Field: {key}")
                print(f"Value: {value}")
                
                # Display position information
                for i, pos in enumerate(positions):
                    page = pos.get("page", 0)
                    vertices = pos.get("vertices", [])
                    print(f"  Position {i+1} (Page {page+1}): {vertices}")
                print("-" * 30)
        
        # Extract table information
        tables = file_data.get("tables", [])
        if tables:
            print("\n=== Table Information ===")
            for table in tables:
                table_name = table.get("tableName", "")
                print(f"Table name: {table_name}")
                items = table.get("items", [])
                for row_idx, row in enumerate(items):
                    print(f"  Row {row_idx + 1}:")
                    for cell in row:
                        print(f"    {cell.get('key')}: {cell.get('value')}")
    
    return data

# Usage example
if __name__ == "__main__":
    workspace_id = "<your-workspace-id>"
    task_id = "<your-task-id>"
    app_id = "<your-app-id>"
    secret_code = "<your-secret-code>"
    
    result = extract_specific_fields(workspace_id, task_id, app_id, secret_code)

Request Examples

Extract Basic Fields Only

{
  "workspace_id": "1234567890",
  "task_id": "202412190001",
  "fields": [
    {
      "key": "Invoice Code"
    },
    {
      "key": "Invoice Date",
      "prompt": "Keep only the year part"
    }
  ]
}

Extract Table Fields Only

{
  "workspace_id": "1234567890",
  "task_id": "202412190001",
  "tables": [
    {
      "name": "Table1",
      "fields": [
        {
          "key": "Goods Name"
        },
        {
          "key": "Unit Price"
        },
        {
          "key": "Quantity"
        }
      ]
    }
  ]
}

Extract Both Basic Fields and Table Fields

{
  "workspace_id": "1234567890",
  "task_id": "202412190001",
  "fields": [
    {
      "key": "Invoice Code"
    },
    {
      "key": "Invoice Date"
    }
  ],
  "tables": [
    {
      "name": "Table1",
      "fields": [
        {
          "key": "Goods Name",
          "prompt": "Extract full product name"
        },
        {
          "key": "Unit Price"
        }
      ]
    }
  ]
}

Return Data Example

{
  "code": 200,
  "message": "success",
  "result": {
    "total": 1,
    "page": 1,
    "page_size": 20,
    "files": [
      {
        "id": "202412190001",
        "task_id": "202412190001",
        "name": "invoice.pdf",
        "recognition_status": 1,
        "data": {
          "fields": [
            {
              "key": "Invoice Code",
              "value": "3100231130",
              "position": [
                {
                  "page": 0,
                  "vertices": [100, 150, 200, 150, 200, 180, 100, 180]
                }
              ]
            },
            {
              "key": "Invoice Date",
              "value": "2024",
              "position": [
                {
                  "page": 0,
                  "vertices": [400, 150, 500, 150, 500, 180, 400, 180]
                }
              ]
            }
          ],
          "tables": [
            {
              "tableName": "Table1",
              "tableType": "0",
              "items": [
                [
                  {
                    "key": "Goods Name",
                    "value": "Electronic Computer Microcomputer Host",
                    "position": [
                      {
                        "page": 0,
                        "vertices": [100, 300, 400, 300, 400, 330, 100, 330]
                      }
                    ]
                  },
                  {
                    "key": "Unit Price",
                    "value": "5000.00",
                    "position": [
                      {
                        "page": 0,
                        "vertices": [500, 300, 600, 300, 600, 330, 500, 330]
                      }
                    ]
                  }
                ]
              ]
            }
          ]
        }
      }
    ]
  }
}

Notes

Task Status: This API is only applicable to tasks that have completed extraction (recognition_status is 1 or 2)
Field Name: The key field name needs to match the field name in the configured field template (for configured fields) or use a custom name (for additional fields)
Field Hint (prompt):
- For additional fields (not in original results), prompt will take effect and can be used to guide extraction logic
- For configured fields (already in original classification), prompt will be ignored, and the system will use rules from the classification configuration
Return Results: The API returns complete extraction results of all fields, including previously extracted fields and newly extracted fields
Table Name: The name in tables needs to match the actual table name in the document

Quick Start - Quick start guide for document extraction functionality
Basic Field Information - Basic field information structure and processing methods
Table Field Information - Table field information structure and processing methods

Overview

Upload

Parse

Classify

Split

Extract

Delete Tasks

Retry Tasks

FAQ

Extract Specific Fields

Features

Field Extraction Rules

Additional Fields (Fields Not in Original Results)

Configured Fields (Fields Already in Original Classification)

Usage Recommendations

API Endpoint

Field Structure

Example Code

Request Examples

Extract Basic Fields Only

Extract Table Fields Only

Extract Both Basic Fields and Table Fields

Return Data Example

Notes

Overview

Upload

Parse

Classify

Split

Extract

Delete Tasks

Retry Tasks

FAQ

​Features

​Field Extraction Rules

​Additional Fields (Fields Not in Original Results)

​Configured Fields (Fields Already in Original Classification)

​Usage Recommendations

​API Endpoint

​Field Structure

​Example Code

​Request Examples

​Extract Basic Fields Only

​Extract Table Fields Only

​Extract Both Basic Fields and Table Fields

​Return Data Example

​Notes

​Related Pages

Features

Field Extraction Rules

Additional Fields (Fields Not in Original Results)

Configured Fields (Fields Already in Original Classification)

Usage Recommendations

API Endpoint

Field Structure

Example Code

Request Examples

Extract Basic Fields Only

Extract Table Fields Only

Extract Both Basic Fields and Table Fields

Return Data Example

Notes

Related Pages