CorpDev.Ai Docs
  • Welcome
  • Basics
    • Basic Company Search
    • Find Companies by Description
    • Company Lists
    • Company Analysis
      • Basic Facts
      • Qualitative Analysis
      • Custom Questions/Data Points
    • Export to Excel and Word
    • Integrations
  • Excel Add-in
    • Introduction & Installation
    • COMPANY Functions
    • DOCUMENT Functions
    • ANALYST Functions
  • API Intro
    • Quick Start
    • Authentication
    • Rate Limits
  • Company Search API
    • Search by Name or Domain
    • Search by Description
  • Company Research API
    • Get Company Facts
    • Get Company Field
    • Ask Company Question
  • Document API
    • Document Index
    • Ask Document Question
    • Extract Document Field
    • Extract Document Text
    • Summarize Document
  • AI Analyst API
    • Ask Question
Powered by GitBook
On this page
  • Supported Document Formats
  • Endpoint
  • Parameters
  • Example Request: Extract Document Text
  • Example Response
  • Usage Notes
  • Example Use Case: Full-Text Analysis
  1. Document API

Extract Document Text

The "Extract Document Text" endpoint allows you to extract the full text content from a document stored in Google Drive. This is useful when you need to process or analyze the entire content of a document for various purposes, such as text analysis, search indexing, or data extraction. The endpoint supports a variety of document formats.

Supported Document Formats

  • .docx (Microsoft Word)

  • .pptx (Microsoft PowerPoint)

  • .xlsx (Microsoft Excel)

  • .pdf (Portable Document Format)

  • .txt (Plain Text)

Endpoint

POST /v1/drive/document-text

Parameters

  • document_id (required, string): The ID of the document you want to extract text from. This ID can be obtained from the Document Index or directly from the Google Drive URL.

    Example URL: https://drive.google.com/file/d/1X2Y3Z4A5B6C7D8E9F/view

    In this example, the document ID is 1X2Y3Z4A5B6C7D8E9F.

Note: The user must be logged in with an account that is authorized to access the specified document in Google Drive.

Example Request: Extract Document Text

POST /v1/drive/document-text
{
  "document_id": "1X2Y3Z4A5B6C7D8E9F"
}

Example Response

{
  "document_id": "1X2Y3Z4A5B6C7D8E9F",
  "text": "This document provides an overview of the financial performance of the company for the fiscal year 2023..."
}

Usage Notes

  • Document ID Retrieval: Obtain the document ID from the Google Drive URL or through the Document Index endpoint.

  • Full Text Extraction: This endpoint is designed to retrieve the entire text content of the document, which can then be used for further processing or analysis.

  • Supported Formats: The endpoint supports text extraction from .docx, .pptx, .xlsx, .pdf, and .txt documents.

  • Authorization: Ensure that you are logged in with an account that has access to the document you are querying.

Example Use Case: Full-Text Analysis

During various business processes, such as due diligence, compliance checks, or content analysis, you may need to work with the full text of a document. The "Extract Document Text" endpoint provides a straightforward way to retrieve all textual content from a document stored in Google Drive, enabling you to perform more detailed analysis or integrate the text into other workflows.

PreviousExtract Document FieldNextSummarize Document

Last updated 8 months ago