ParseFlow v1.1.0 - Office Documents Support ๐๐
Release Date: 2025-12-03
We're excited to announce ParseFlow v1.1.0, a major feature release that adds Word (docx) and Excel (xlsx/xls) document parsing support! ๐
๐ What's New
๐ Word Document Support
ParseFlow now supports Word (.docx) documents with comprehensive parsing capabilities:
- โ Text Extraction - Extract plain text from Word documents
- โ HTML Conversion - Convert Word documents to HTML
- โ Metadata Retrieval - Get document properties and file information
- โ Text Search - Search for keywords with context snippets
- โ
MCP Tools - 2 new tools for AI assistants (
extract_word,search_word)
Example:
import { WordParser } from 'parseflow-core';
const parser = new WordParser();
const result = await parser.extractText('report.docx');
console.log(result.text);๐ Excel Spreadsheet Support
Full support for Excel (.xlsx/.xls) spreadsheets:
- โ Multi-Sheet Extraction - Extract data from all sheets or specific ones
- โ Multiple Formats - JSON, CSV, or plain text output
- โ Cell Search - Find values across all sheets with cell coordinates
- โ Range Extraction - Extract specific cell ranges (e.g., A1:C10)
- โ Workbook Metadata - Sheet names, counts, and properties
- โ
MCP Tools - 2 new tools for AI assistants (
extract_excel,search_excel)
Example:
import { ExcelParser } from 'parseflow-core';
const parser = new ExcelParser();
const data = await parser.extractData('spreadsheet.xlsx', {
sheetName: 'Sales',
format: 'json'
});
console.log(data);๐ ๏ธ MCP Server Updates
The MCP server now includes 9 tools (up from 5):
PDF Tools (Existing - 5 tools)
extract_text- Extract text from PDFsearch_pdf- Search in PDFget_metadata- Get PDF metadataextract_images- Extract imagesget_toc- Get table of contents
Word Tools (New - 2 tools)
extract_word- Extract text/HTML from Word documentssearch_word- Search in Word documents
Excel Tools (New - 2 tools)
extract_excel- Extract data from Excel spreadsheetssearch_excel- Search in Excel cells
Usage in Claude Desktop:
"่ฏท่ฏปๅ report.docx ๆไปถ็ๅ
ๅฎน"
โ Uses extract_word tool
"ๅจ sales.xlsx ไธญๆฅๆพ 'ไบงๅA'"
โ Uses search_excel tool
๐ฆ Package Updates
parseflow-core v1.1.0
- New:
WordParserclass for Word document parsing - New:
ExcelParserclass for Excel spreadsheet parsing - Dependencies: Added
mammoth@^1.11.0andxlsx@^0.18.5 - Updated: Package description now mentions all supported formats
parseflow-mcp-server v1.1.0
- New: 4 additional MCP tools (2 Word + 2 Excel)
- Updated: Server description updated to mention Office documents
- Total: 9 tools serving AI assistants
๐ Documentation
New Documentation
- OFFICE_EXAMPLES.md - Comprehensive guide with examples
- Word parsing methods (4 approaches)
- Excel parsing methods (8 approaches)
- 5 real-world use cases
- Performance tips and troubleshooting
Updated Documentation
-
README.md - Completely rewritten
- Feature overview for all formats
- Quick start guides
- MCP server configuration
- Project structure
-
CHANGELOG.md - v1.1.0 entry added
- Detailed feature list
- Breaking changes (none!)
- Upgrade guide
๐งช Testing
All new features are thoroughly tested:
-
โ Word Parser: 4/4 tests passing
- Text extraction
- Metadata retrieval
- Text search
- HTML conversion
-
โ Excel Parser: 8/8 tests passing
- Multi-sheet extraction
- Format conversion (JSON/CSV/Text)
- Cell search
- Metadata retrieval
Test Files Included:
Wordๆต่ฏๆไปถ.docx(6 MB)Excelๆต่ฏๆไปถ.xlsx(19 KB)
๐ Installation
npm
# Core library
npm install [email protected]
# MCP Server (global)
npm install -g [email protected]pnpm
pnpm add [email protected]
pnpm add -g [email protected]๐ Supported Formats
| Format | Extension | Read | Search | Metadata | Tools |
|---|---|---|---|---|---|
| โ | โ | โ | 5 | ||
| Word | .docx | โ | โ | โ | 2 |
| Excel | .xlsx/.xls | โ | โ | โ | 2 |
๐ง Dependencies
New Dependencies
mammoth@^1.11.0- Word document parsingxlsx@^0.18.5- Excel spreadsheet parsing
Existing Dependencies
pdf-parse@^1.1.1- PDF parsingpdf-lib@^1.17.1- PDF manipulation@modelcontextprotocol/sdk@^1.0.4- MCP SDK
๐ Bug Fixes
- Fixed Excel metadata extraction reliability
- Added null checks for sheet names in Excel parser
- Improved error handling for malformed Office files
- Better error messages for unsupported file types
๐งน Cleanup
- Removed 8 redundant documentation files (~35 KB)
- Simplified
PROJECT_STATUS.md - Improved project organization
- Updated
.gitignorefor test files
๐ Upgrade Guide
From v1.0.x
No breaking changes! Simply update:
npm install parseflow-core@latest
npm install -g parseflow-mcp-server@latestNew Features
Import the new parsers:
import { WordParser, ExcelParser } from 'parseflow-core';For MCP users, the new tools are automatically available after updating.
๐ Examples
Extract Text from Word Document
import { WordParser } from 'parseflow-core';
const parser = new WordParser();
const result = await parser.extractText('document.docx');
console.log(result.text);Extract Data from Excel
import { ExcelParser } from 'parseflow-core';
const parser = new ExcelParser();
const sheets = await parser.extractData('data.xlsx');
sheets.forEach(sheet => {
console.log(`${sheet.sheetName}: ${sheet.rowCount} rows`);
});Search Across Documents
const wordParser = new WordParser();
const excelParser = new ExcelParser();
// Search in Word
const wordMatches = await wordParser.searchText('report.docx', 'budget');
// Search in Excel
const excelMatches = await excelParser.searchText('data.xlsx', 'revenue');More examples in OFFICE_EXAMPLES.md!
๐ Links
- npm Core: https://www.npmjs.com/package/parseflow-core
- npm MCP: https://www.npmjs.com/package/parseflow-mcp-server
- GitHub: https://github.com/Libres-coder/ParseFlow
- MCP Registry: https://registry.modelcontextprotocol.io/
- Issues: https://github.com/Libres-coder/ParseFlow/issues
- Documentation: https://github.com/Libres-coder/ParseFlow#readme
๐ Acknowledgments
Special thanks to:
- mammoth - For excellent Word document parsing
- xlsx (SheetJS) - For comprehensive Excel support
- MCP Community - For feedback and support
๐ Full Changelog
See CHANGELOG.md for complete details.
๐ฏ What's Next?
Looking ahead to v1.2.0:
- PowerPoint (pptx) support
- Encrypted document support
- OCR text recognition
- Performance optimizations
๐ฌ Feedback
We'd love to hear from you!
- Report bugs: GitHub Issues
- Request features: GitHub Discussions
Made with โค๏ธ by Libres-coder
Enjoy ParseFlow v1.1.0! ๐