Skip to content

ParseFlow v1.1.0 - Office Documents Support

Latest

Choose a tag to compare

@Libres-coder Libres-coder released this 03 Dec 14:23
· 20 commits to main since this release

ParseFlow v1.1.0 - Office Documents Support ๐Ÿ“„๐Ÿ“Š

Release Date: 2025-12-03

We're excited to announce ParseFlow v1.1.0, a major feature release that adds Word (docx) and Excel (xlsx/xls) document parsing support! ๐ŸŽ‰


๐ŸŒŸ What's New

๐Ÿ“ Word Document Support

ParseFlow now supports Word (.docx) documents with comprehensive parsing capabilities:

  • โœ… Text Extraction - Extract plain text from Word documents
  • โœ… HTML Conversion - Convert Word documents to HTML
  • โœ… Metadata Retrieval - Get document properties and file information
  • โœ… Text Search - Search for keywords with context snippets
  • โœ… MCP Tools - 2 new tools for AI assistants (extract_word, search_word)

Example:

import { WordParser } from 'parseflow-core';

const parser = new WordParser();
const result = await parser.extractText('report.docx');
console.log(result.text);

๐Ÿ“Š Excel Spreadsheet Support

Full support for Excel (.xlsx/.xls) spreadsheets:

  • โœ… Multi-Sheet Extraction - Extract data from all sheets or specific ones
  • โœ… Multiple Formats - JSON, CSV, or plain text output
  • โœ… Cell Search - Find values across all sheets with cell coordinates
  • โœ… Range Extraction - Extract specific cell ranges (e.g., A1:C10)
  • โœ… Workbook Metadata - Sheet names, counts, and properties
  • โœ… MCP Tools - 2 new tools for AI assistants (extract_excel, search_excel)

Example:

import { ExcelParser } from 'parseflow-core';

const parser = new ExcelParser();
const data = await parser.extractData('spreadsheet.xlsx', {
  sheetName: 'Sales',
  format: 'json'
});
console.log(data);

๐Ÿ› ๏ธ MCP Server Updates

The MCP server now includes 9 tools (up from 5):

PDF Tools (Existing - 5 tools)

  • extract_text - Extract text from PDF
  • search_pdf - Search in PDF
  • get_metadata - Get PDF metadata
  • extract_images - Extract images
  • get_toc - Get table of contents

Word Tools (New - 2 tools)

  • extract_word - Extract text/HTML from Word documents
  • search_word - Search in Word documents

Excel Tools (New - 2 tools)

  • extract_excel - Extract data from Excel spreadsheets
  • search_excel - Search in Excel cells

Usage in Claude Desktop:

"่ฏท่ฏปๅ– report.docx ๆ–‡ไปถ็š„ๅ†…ๅฎน"
โ†’ Uses extract_word tool

"ๅœจ sales.xlsx ไธญๆŸฅๆ‰พ 'ไบงๅ“A'"
โ†’ Uses search_excel tool

๐Ÿ“ฆ Package Updates

parseflow-core v1.1.0

  • New: WordParser class for Word document parsing
  • New: ExcelParser class for Excel spreadsheet parsing
  • Dependencies: Added mammoth@^1.11.0 and xlsx@^0.18.5
  • Updated: Package description now mentions all supported formats

parseflow-mcp-server v1.1.0

  • New: 4 additional MCP tools (2 Word + 2 Excel)
  • Updated: Server description updated to mention Office documents
  • Total: 9 tools serving AI assistants

๐Ÿ“š Documentation

New Documentation

  • OFFICE_EXAMPLES.md - Comprehensive guide with examples
    • Word parsing methods (4 approaches)
    • Excel parsing methods (8 approaches)
    • 5 real-world use cases
    • Performance tips and troubleshooting

Updated Documentation

  • README.md - Completely rewritten

    • Feature overview for all formats
    • Quick start guides
    • MCP server configuration
    • Project structure
  • CHANGELOG.md - v1.1.0 entry added

    • Detailed feature list
    • Breaking changes (none!)
    • Upgrade guide

๐Ÿงช Testing

All new features are thoroughly tested:

  • โœ… Word Parser: 4/4 tests passing

    • Text extraction
    • Metadata retrieval
    • Text search
    • HTML conversion
  • โœ… Excel Parser: 8/8 tests passing

    • Multi-sheet extraction
    • Format conversion (JSON/CSV/Text)
    • Cell search
    • Metadata retrieval

Test Files Included:

  • Wordๆต‹่ฏ•ๆ–‡ไปถ.docx (6 MB)
  • Excelๆต‹่ฏ•ๆ–‡ไปถ.xlsx (19 KB)

๐Ÿš€ Installation

npm

# Core library
npm install [email protected]

# MCP Server (global)
npm install -g [email protected]

pnpm


๐Ÿ“Š Supported Formats

Format Extension Read Search Metadata Tools
PDF .pdf โœ… โœ… โœ… 5
Word .docx โœ… โœ… โœ… 2
Excel .xlsx/.xls โœ… โœ… โœ… 2

๐Ÿ”ง Dependencies

New Dependencies

  • mammoth@^1.11.0 - Word document parsing
  • xlsx@^0.18.5 - Excel spreadsheet parsing

Existing Dependencies

  • pdf-parse@^1.1.1 - PDF parsing
  • pdf-lib@^1.17.1 - PDF manipulation
  • @modelcontextprotocol/sdk@^1.0.4 - MCP SDK

๐Ÿ› Bug Fixes

  • Fixed Excel metadata extraction reliability
  • Added null checks for sheet names in Excel parser
  • Improved error handling for malformed Office files
  • Better error messages for unsupported file types

๐Ÿงน Cleanup

  • Removed 8 redundant documentation files (~35 KB)
  • Simplified PROJECT_STATUS.md
  • Improved project organization
  • Updated .gitignore for test files

๐Ÿ”„ Upgrade Guide

From v1.0.x

No breaking changes! Simply update:

npm install parseflow-core@latest
npm install -g parseflow-mcp-server@latest

New Features

Import the new parsers:

import { WordParser, ExcelParser } from 'parseflow-core';

For MCP users, the new tools are automatically available after updating.


๐Ÿ“– Examples

Extract Text from Word Document

import { WordParser } from 'parseflow-core';

const parser = new WordParser();
const result = await parser.extractText('document.docx');
console.log(result.text);

Extract Data from Excel

import { ExcelParser } from 'parseflow-core';

const parser = new ExcelParser();
const sheets = await parser.extractData('data.xlsx');

sheets.forEach(sheet => {
  console.log(`${sheet.sheetName}: ${sheet.rowCount} rows`);
});

Search Across Documents

const wordParser = new WordParser();
const excelParser = new ExcelParser();

// Search in Word
const wordMatches = await wordParser.searchText('report.docx', 'budget');

// Search in Excel
const excelMatches = await excelParser.searchText('data.xlsx', 'revenue');

More examples in OFFICE_EXAMPLES.md!


๐ŸŒ Links


๐Ÿ™ Acknowledgments

Special thanks to:

  • mammoth - For excellent Word document parsing
  • xlsx (SheetJS) - For comprehensive Excel support
  • MCP Community - For feedback and support

๐Ÿ“ Full Changelog

See CHANGELOG.md for complete details.


๐ŸŽฏ What's Next?

Looking ahead to v1.2.0:

  • PowerPoint (pptx) support
  • Encrypted document support
  • OCR text recognition
  • Performance optimizations

๐Ÿ’ฌ Feedback

We'd love to hear from you!


Made with โค๏ธ by Libres-coder

Enjoy ParseFlow v1.1.0! ๐ŸŽ‰