Extract Document Text
The "Extract Document Text" endpoint allows you to extract the full text content from a document stored in Google Drive. This is useful when you need to process or analyze the entire content of a document for various purposes, such as text analysis, search indexing, or data extraction. The endpoint supports a variety of document formats.
Supported Document Formats
.docx
(Microsoft Word).pptx
(Microsoft PowerPoint).xlsx
(Microsoft Excel).pdf
(Portable Document Format).txt
(Plain Text)
Endpoint
Parameters
document_id (required, string): The ID of the document you want to extract text from. This ID can be obtained from the Document Index or directly from the Google Drive URL.
Example URL:
https://drive.google.com/file/d/1X2Y3Z4A5B6C7D8E9F/view
In this example, the document ID is
1X2Y3Z4A5B6C7D8E9F
.
Note: The user must be logged in with an account that is authorized to access the specified document in Google Drive.
Example Request: Extract Document Text
Example Response
Usage Notes
Document ID Retrieval: Obtain the document ID from the Google Drive URL or through the Document Index endpoint.
Full Text Extraction: This endpoint is designed to retrieve the entire text content of the document, which can then be used for further processing or analysis.
Supported Formats: The endpoint supports text extraction from
.docx
,.pptx
,.xlsx
,.pdf
, and.txt
documents.Authorization: Ensure that you are logged in with an account that has access to the document you are querying.
Example Use Case: Full-Text Analysis
During various business processes, such as due diligence, compliance checks, or content analysis, you may need to work with the full text of a document. The "Extract Document Text" endpoint provides a straightforward way to retrieve all textual content from a document stored in Google Drive, enabling you to perform more detailed analysis or integrate the text into other workflows.
Last updated