Text Analysis & Academic Intelligence Microservice
Transform text content into actionable insights through comprehensive linguistic analysis, writing quality assessment, and academic integrity checking.
# Docker deployment (recommended)
docker-compose up -d
# Or raw deployment
./deploy.sh
# API available at: http://localhost:8002
# Documentation: http://localhost:8002/docsGET /health- Service health checkPOST /text- Text analysis (readability, quality, word frequency)POST /academic- Academic analysis (citations, DOI resolution, integrity)POST /files- File upload + analysis (PDF, DOCX, TXT, MD)
POST /advanced/ngrams- N-gram extraction with optional filter termsPOST /advanced/ner- Named entity recognitionPOST /advanced/search/keywords- Batch keyword search across multiple terms
POST /files/infer-metadata- Infer year, company, industry, document type from contentPOST /text/infer-metadata- Metadata inference from raw text- Page-level text extraction (via
include_extracted_text=trueon/files)
- Root endpoint:
GET /- Service info and available endpoints - For presentations: Use PresentationLens
- For recordings: Use RecordingLens
- Text Analysis: Readability, writing quality, word frequency for any text content
- Academic Analysis: Citation verification, DOI resolution, AI detection, integrity checking
- Document Intelligence: Extract and analyze text from PDFs and Word documents
- Sustainability Research: Batch keyword analysis for TCFD, GRI, SDGs, SASB frameworks
- Corporate Report Analysis: Auto-detect metadata (year, company, industry) from annual reports
- Multi-Service Workflows: Integrate with specialized analysis services
DocumentLens powers the document-lens-desktop Electron application for researchers analyzing corporate sustainability reports. Features include:
- Smart metadata inference (company name, year, industry, document type)
- Framework keyword analysis (TCFD, GRI, SDGs, SASB)
- Batch processing with SQLite storage
- Offline operation via bundled Python backend
DocumentLens is part of a focused microservices architecture:
| Service | Purpose | Repository |
|---|---|---|
| DocumentLens | Text analysis & academic intelligence | This repo |
| PresentationLens | Presentation design & structure analysis | presentation-lens |
| RecordingLens | Student recordings (video/audio) analysis | recording-lens |
| CodeLens | Source code quality & analysis | code-lens |
| SubmissionLens | Student submission router & frontend | submission-lens |
graph LR
A[Student Submission] --> B[SubmissionLens Frontend]
B --> C{File Type Router}
C -->|Text/PDF/DOCX| D[DocumentLens]
C -->|PPTX| E[PresentationLens]
C -->|Video/Audio| F[RecordingLens]
C -->|Source Code| G[CodeLens]
E --> D
F --> D
G --> D
D --> H[Combined Feedback]
H --> B
B --> I[Student Dashboard]
git clone https://github.com/michael-borck/document-lens.git
cd document-lens
docker-compose up -d # Single container deploymentgit clone https://github.com/michael-borck/document-lens.git
cd document-lens
./deploy.sh # Handles venv, dependencies, and production server# Install dev dependencies
uv sync --extra dev
# Run all tests
uv run pytest tests/ -v
# Run specific test file
uv run pytest tests/test_files.py -v
# Run only PDF tests
uv run pytest tests/ -m pdf -v
# Skip slow tests
uv run pytest tests/ -m "not slow" -v
# Run with coverage report
uv run pytest tests/tests/conftest.py- Shared fixtures and test client setuptests/test_health.py- Health/smoke teststests/test_text_analysis.py- Text analysis endpoint teststests/test_academic_analysis.py- Academic analysis endpoint teststests/test_files.py- PDF file upload tests
Place test files (PDF, DOCX, etc.) in the test-data/ directory. The test suite automatically discovers and uses these files for parameterized tests.
DEPLOYMENT.md- Deployment guide for Docker and raw installationsDOCUMENTLENS_SETUP.md- Setup and usage instructions.env.example- Configuration templatedocs/- Additional architecture and integration documentation
DocumentLens: Pure text intelligence at the heart of content analysis