Metadata-Version: 2.4
Name: document-intelligence-mcp
Version: 0.1.0
Summary: MCP Server for local document analysis: PDF text extraction, table detection, DOCX parsing, language detection and keyword search — no cloud API required
Project-URL: Homepage, https://github.com/AiAgentKarl/document-intelligence-mcp
Project-URL: Repository, https://github.com/AiAgentKarl/document-intelligence-mcp
Author-email: AiAgentKarl <coach1916@gmail.com>
License: MIT
Keywords: ai-agent,document,document-intelligence,docx,mcp,pdf,text-extraction
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: General
Requires-Python: >=3.10
Requires-Dist: httpx>=0.27.0
Requires-Dist: langdetect>=1.0.9
Requires-Dist: mcp[cli]>=1.0.0
Requires-Dist: pdfplumber>=0.11.0
Requires-Dist: pymupdf>=1.24.0
Requires-Dist: python-docx>=1.1.0
Description-Content-Type: text/markdown

# document-intelligence-mcp

[![PyPI version](https://badge.fury.io/py/document-intelligence-mcp.svg)](https://badge.fury.io/py/document-intelligence-mcp)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

**Local document intelligence for AI agents** — extract text, detect tables, read metadata, analyze structure, search keywords, and detect language from PDF and DOCX files. No cloud API required, no API key needed.

## Features

- **10 MCP Tools** for PDF and DOCX processing
- **Local processing** — no data leaves your machine
- **No API key** required
- Supports **PDF** (via PyMuPDF + pdfplumber) and **Microsoft Word DOCX** (via python-docx)
- Language detection via langdetect (55+ languages)

## Tools

| Tool | Description |
|------|-------------|
| `tool_extract_text_from_pdf` | Extract all text from a PDF, page by page |
| `tool_extract_tables_from_pdf` | Detect and extract tables from PDF |
| `tool_get_pdf_metadata` | Read PDF metadata: title, author, dates, outline |
| `tool_analyze_document_structure` | Detect headings, font sizes, section structure |
| `tool_search_in_pdf` | Search for keywords with context in PDF |
| `tool_extract_text_from_docx` | Extract all text from a Word DOCX file |
| `tool_extract_tables_from_docx` | Extract all tables from a DOCX file |
| `tool_analyze_docx_structure` | Analyze headings, styles, and structure of DOCX |
| `tool_count_words_and_stats` | Word count, sentence count, reading time, top words |
| `tool_detect_document_language` | Detect language of PDF or DOCX (55+ languages) |

## Installation

```bash
pip install document-intelligence-mcp
```

## Claude Desktop Configuration

Add to your `claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "document-intelligence": {
      "command": "document-intelligence-mcp"
    }
  }
}
```

## Usage Examples

**Extract text from a PDF:**
```
Extract the text from /path/to/report.pdf
```

**Find tables in a PDF:**
```
Find all tables in /path/to/financial_report.pdf
```

**Search for a keyword:**
```
Search for "revenue" in /path/to/annual_report.pdf
```

**Get document stats:**
```
Count the words and estimate reading time for /path/to/document.docx
```

**Detect language:**
```
What language is /path/to/document.pdf written in?
```

## Requirements

- Python 3.10+
- PyMuPDF >= 1.24.0
- pdfplumber >= 0.11.0
- python-docx >= 1.1.0
- langdetect >= 1.0.9

## License

MIT License — free to use, modify, and distribute.

---

Built by [AiAgentKarl](https://github.com/AiAgentKarl) | Part of the AI Agent Economy toolkit
