# Extract Document Text

The "Extract Document Text" endpoint allows you to extract the full text content from a document stored in Google Drive. This is useful when you need to process or analyze the entire content of a document for various purposes, such as text analysis, search indexing, or data extraction. The endpoint supports a variety of document formats.

### Supported Document Formats

* `.docx` (Microsoft Word)
* `.pptx` (Microsoft PowerPoint)
* `.xlsx` (Microsoft Excel)
* `.pdf` (Portable Document Format)
* `.txt` (Plain Text)

### Endpoint

```bash
POST /v1/drive/document-text
```

### Parameters

* **document\_id** (required, string): The ID of the document you want to extract text from. This ID can be obtained from the Document Index or directly from the Google Drive URL.

  Example URL: `https://drive.google.com/file/d/1X2Y3Z4A5B6C7D8E9F/view`

  In this example, the document ID is `1X2Y3Z4A5B6C7D8E9F`.

**Note:** The user must be logged in with an account that is authorized to access the specified document in Google Drive.

### Example Request: Extract Document Text

```json
POST /v1/drive/document-text
{
  "document_id": "1X2Y3Z4A5B6C7D8E9F"
}
```

### Example Response

```json
{
  "document_id": "1X2Y3Z4A5B6C7D8E9F",
  "text": "This document provides an overview of the financial performance of the company for the fiscal year 2023..."
}
```

### Usage Notes

* **Document ID Retrieval:** Obtain the document ID from the Google Drive URL or through the Document Index endpoint.
* **Full Text Extraction:** This endpoint is designed to retrieve the entire text content of the document, which can then be used for further processing or analysis.
* **Supported Formats:** The endpoint supports text extraction from `.docx`, `.pptx`, `.xlsx`, `.pdf`, and `.txt` documents.
* **Authorization:** Ensure that you are logged in with an account that has access to the document you are querying.

### Example Use Case: Full-Text Analysis

During various business processes, such as due diligence, compliance checks, or content analysis, you may need to work with the full text of a document. The "Extract Document Text" endpoint provides a straightforward way to retrieve all textual content from a document stored in Google Drive, enabling you to perform more detailed analysis or integrate the text into other workflows.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.corpdev.ai/corpdev.ai-docs/document-api/extract-document-text.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
