Extract Document Text

The "Extract Document Text" endpoint allows you to extract the full text content from a document stored in Google Drive. This is useful when you need to process or analyze the entire content of a document for various purposes, such as text analysis, search indexing, or data extraction. The endpoint supports a variety of document formats.

Supported Document Formats

.docx (Microsoft Word)
.pptx (Microsoft PowerPoint)
.xlsx (Microsoft Excel)
.pdf (Portable Document Format)
.txt (Plain Text)

Endpoint

POST /v1/drive/document-text

Parameters

document_id (required, string): The ID of the document you want to extract text from. This ID can be obtained from the Document Index or directly from the Google Drive URL.
Example URL: https://drive.google.com/file/d/1X2Y3Z4A5B6C7D8E9F/view
In this example, the document ID is 1X2Y3Z4A5B6C7D8E9F.

Note: The user must be logged in with an account that is authorized to access the specified document in Google Drive.

Example Request: Extract Document Text

POST /v1/drive/document-text
{
  "document_id": "1X2Y3Z4A5B6C7D8E9F"
}

Example Response

{
  "document_id": "1X2Y3Z4A5B6C7D8E9F",
  "text": "This document provides an overview of the financial performance of the company for the fiscal year 2023..."
}

Usage Notes

Document ID Retrieval: Obtain the document ID from the Google Drive URL or through the Document Index endpoint.
Full Text Extraction: This endpoint is designed to retrieve the entire text content of the document, which can then be used for further processing or analysis.
Supported Formats: The endpoint supports text extraction from .docx, .pptx, .xlsx, .pdf, and .txt documents.
Authorization: Ensure that you are logged in with an account that has access to the document you are querying.

Example Use Case: Full-Text Analysis

During various business processes, such as due diligence, compliance checks, or content analysis, you may need to work with the full text of a document. The "Extract Document Text" endpoint provides a straightforward way to retrieve all textual content from a document stored in Google Drive, enabling you to perform more detailed analysis or integrate the text into other workflows.

PreviousExtract Document Field NextSummarize Document

Last updated 10 months ago