Extract Document Field

The "Extract Document Field" endpoint allows you to extract a specific field or piece of information from a document stored in Google Drive. By specifying the document ID, field name, and optional instructions on how to answer, you can retrieve targeted information. Additionally, the batch parameter allows you to control how the extraction task is grouped with other similar tasks.

Endpoint

POST /v1/drive/document-field

Parameters

  • document_id (required, string): The ID of the document you want to extract information from. This ID can be obtained from the Document Index or directly from the Google Drive URL.

    Example URL: https://drive.google.com/file/d/1X2Y3Z4A5B6C7D8E9F/view

    In this example, the document ID is 1X2Y3Z4A5B6C7D8E9F.

  • field_name (required, string): The specific field or piece of information you want to extract (e.g., "total revenue", "contract end date").

  • instructions (optional, string): Additional instructions on how the AI should answer or extract the information (e.g., "Provide the date in YYYY-MM-DD format").

  • batch (optional, integer): Specifies how this extraction task is grouped with others.

    • 0 means that this extraction is done in isolation.

    • 1 is the default and means that this extraction is grouped into a single LLM task with others in group 1.

    • 2 would mean this extraction is grouped with others in group 2, and so on.

Note: Grouping a large number of fields into a single extraction task can make the task more complex for the AI, which may decrease the quality of the results. Therefore, it may be beneficial to handle complex tasks in isolation (batch 0) to ensure more accurate outcomes.

Note: The user must be logged in with an account that is authorized to access the specified document in Google Drive.

Example Request

POST /v1/drive/document-field
{
  "document_id": "1X2Y3Z4A5B6C7D8E9F",
  "field_name": "total revenue",
  "instructions": "Provide the amount in USD",
  "batch": 0
}

Example Response

jsonCopy code{
  "document_id": "1X2Y3Z4A5B6C7D8E9F",
  "field_name": "total revenue",
  "value": "5,000,000 USD",
  "instructions": "Provide the amount in USD",
  "batch": 0
}

Usage Notes

  • Document ID Retrieval: Obtain the document ID from the Google Drive URL or through the Document Index endpoint.

  • Field Specification: Clearly define the field or information you want to extract to get accurate results.

  • Batch Parameter: Use the batch parameter to control how tasks are grouped. While grouping can be efficient for simpler tasks, handling more complex or critical tasks in isolation can improve the accuracy of the AI's responses.

  • Authorization: Ensure that you are logged in with an account that has access to the document you are querying.

Example Use Case: Targeted Data Extraction in Due Diligence

During due diligence, you may need to extract specific data points, such as financial figures or contract terms, from a large set of documents. The "Extract Document Field" endpoint allows you to do this efficiently, especially when dealing with multiple documents or fields. By using the batch parameter, you can streamline the extraction process by grouping related tasks, but for more complex or sensitive extractions, it may be better to handle them in isolation to ensure the highest quality of results.

Last updated