Skip to content

Conversation

@jorben
Copy link
Collaborator

@jorben jorben commented Jan 26, 2026

Summary

  • Refactors the Office splitter design document with separate handlers for each format (Word, PowerPoint, Excel)
  • Adds security improvements including path validation to prevent path traversal attacks
  • Documents support for Office Open XML formats (.docx, .pptx, .xlsx) and explicitly excludes legacy OLE formats
  • Introduces shared helper modules: RenderWindowPoolFactory, TempFileManager, PathValidator, ChunkedRenderer

Test plan

  • Review the design document for completeness and accuracy
  • Verify the architecture diagram reflects the intended implementation
  • Confirm security considerations are properly addressed

🤖 Generated with Claude Code

jorben and others added 4 commits January 26, 2026 13:12
…urity

Refactor the Office splitter design document with major architectural changes:

- Split single OfficeSplitter into WordSplitter, PPTSplitter, ExcelSplitter
- Remove legacy OLE format support (.doc, .ppt, .xls) - only OOXML supported
- Add PathValidator for security against path traversal attacks
- Add PageRangeParser for page/sheet range selection
- Add RenderWindowPoolFactory for shared rendering resources
- Add ChunkedRenderer for memory-optimized large document rendering
- Update architecture diagrams to reflect new modular design
- Add comprehensive test specifications for all components

This design provides better maintainability through separation of concerns
and improved security with explicit path validation.

Co-Authored-By: Claude <noreply@anthropic.com>
Unify page range input format across all document types by removing
name-based Sheet selection. Excel now uses the same numeric format
as PDF, Word, and PowerPoint (e.g., "1-3,5" instead of "#1-2" or
"Sheet1,数据表").

Changes:
- Remove type and names fields from SheetRange interface
- Simplify parseSheetRange to use parseNumeric directly
- Update filterSheets to index-only filtering
- Update documentation and examples

Co-Authored-By: Claude <noreply@anthropic.com>
Keep the comprehensive design with:
- Separate splitters (Word/PPT/Excel)
- Security features (PathValidator)
- Memory optimization (ChunkedRenderer)
- Window pool management

Discard master's simpler single OfficeSplitter approach.

Co-Authored-By: Claude <noreply@anthropic.com>
@jorben jorben merged commit 5f08015 into master Jan 26, 2026
2 checks passed
@jorben jorben deleted the feat/office-splitter-design branch January 27, 2026 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants