A comprehensive, privacy-focused text analysis platform that provides deep insights into writing style, linguistic patterns, and content themes using advanced NLP techniques. Perfect for writers, researchers, and content analysts who want professional-grade text analysis without compromising data privacy.
- Enhanced Keyness Chart: Interactive D3.js visualization showing both positive and negative keyness
- Effect Size Visualization: Horizontal bar chart displaying words that are over/under-represented compared to reference corpus
- Statistical Significance: Log-likelihood calculations with confidence metrics
- Color-coded Results: Blue bars for positive keyness, red bars for negative keyness
- Dynamic Cluster Visualization: Interactive scatter plots with Plotly.js
- Multiple Clustering Algorithms: K-means clustering with optimized cluster count
- Word Relationship Mapping: Visualize semantic relationships between terms
- Cluster Statistics: Detailed metrics for each semantic group
- Multi-dimensional Analysis: Overall sentiment scoring with positive/negative/neutral breakdown
- Visual Sentiment Display: Pie charts and donut charts showing sentiment distribution
- Confidence Scoring: Reliability metrics for sentiment predictions
- Comprehensive Metrics: Word count, sentence count, average sentence length
- Readability Analysis: Multiple readability scores and complexity metrics
- Vocabulary Analysis: Type-token ratio, lexical diversity measures
- Zero Data Retention: All text deleted immediately after analysis
- Local Processing: Core analysis runs locally without external API calls
- Optional AI Integration: Ollama integration for enhanced insights (fully local)
- Multi-format Export: JSON, PNG, PDF export capabilities
- Print-ready Charts: High-quality visualizations for reports
- Complete Analysis Package: Export all results with charts and statistics
- Responsive Design: Works seamlessly on desktop and mobile
- Interactive Visualizations: Hover tooltips, zoom, pan functionality
- Real-time Analysis: Fast processing with progress indicators
- Clean UI: Modern interface built with Tailwind CSS
- React 18 with TypeScript for type-safe development
- D3.js for advanced custom visualizations
- Plotly.js for interactive scientific charts
- Chart.js with react-chartjs-2 for standard charts
- Tailwind CSS for responsive styling
- React Dropzone for file upload handling
- Vite for fast development and building
- HTML2Canvas & jsPDF for export functionality
- FastAPI for high-performance async API
- Advanced NLP Stack:
- NLTK for natural language processing
- spaCy for advanced linguistic analysis
- scikit-learn for machine learning and clustering
- Gensim for topic modeling and word embeddings
- TextBlob for sentiment analysis
- Data Processing:
- NumPy & Pandas for numerical computing
- Matplotlib & Seaborn for data visualization
- Optional AI Integration:
- Ollama for local AI model integration
- HTTPX for async HTTP requests
- Docker & Docker Compose
- Make (for easy commands)
# Start everything with one command:
make up
# For first-time setup (includes AI model):
make setupmake up # ποΈ Build and start all services (main command)
make dev # π§ Start in development mode with logs
make setup # βοΈ Complete setup including Ollama model
make down # π Stop all services
make restart # π Restart all services
make logs # π View all service logs
make status # π Check service status
make health # π₯ Check application health
make clean # π§Ή Clean up containers and volumes- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Health Check: http://localhost:8000/api/health
ReflexAI is built as a modern, containerized microservices application with a clear separation between frontend presentation, backend processing, and AI enhancement layers. The architecture prioritizes privacy, performance, and scalability.
graph TB
UI[Frontend React App] --> API[FastAPI Backend]
API --> NLP[NLP Pipeline]
API --> Files[File Handler]
API --> AI[Ollama AI Service]
Files --> Parser[Document Parser]
NLP --> Keyness[Keyness Analyzer]
NLP --> Semantic[Semantic Clusterer]
NLP --> Sentiment[Sentiment Analyzer]
NLP --> Stats[Text Processor]
The backend follows a modular service-oriented architecture with clearly separated concerns:
Purpose: Orchestrates the complete text analysis workflow Key Features:
- Unified processing pipeline for all analysis types
- Parallel processing of multiple analysis stages
- Comprehensive error handling and logging
- Performance optimization with timing metrics
- Status tracking and progress reporting
Processing Stages:
- Text Preprocessing: Cleaning and normalization (
TextProcessor) - Statistical Analysis: Word count, readability metrics, vocabulary richness
- Keyness Analysis: Log-likelihood calculations for word significance
- Semantic Clustering: K-means clustering with TF-IDF vectorization
- Sentiment Analysis: Multi-dimensional sentiment scoring
- AI Insights: Optional Ollama-powered thematic analysis
Purpose: Advanced text preprocessing and linguistic analysis Key Features:
- spaCy Integration: Advanced NLP with named entity recognition
- NLTK Integration: Tokenization, POS tagging, stopword removal
- Fallback Processing: Graceful degradation when spaCy unavailable
- Multilingual Support: Configurable language models
Processing Capabilities:
- Lemmatization for improved clustering accuracy
- Named Entity Recognition (NER) for content insights
- Advanced tokenization with linguistic analysis
- Stop word filtering and text normalization
Purpose: Statistical analysis of word significance and distinctiveness Algorithm: Log-likelihood ratio with effect size calculation Key Features:
- Positive Keyness: Words over-represented vs reference corpus
- Negative Keyness: Words under-represented (missing/rare)
- Effect Size Calculation: Standardized difference measures
- Confidence Scoring: Statistical reliability metrics
- Reference Corpus: Default English frequency baseline
Mathematical Foundation:
# Log-likelihood calculation
score = 2 * observed_freq * log(observed_freq / expected_freq)
effect_size = score / sqrt(observed_freq)Purpose: Groups semantically related content using machine learning Algorithm: K-means clustering with TF-IDF vectorization and PCA dimensionality reduction Key Features:
- Adaptive Clustering: Dynamic cluster count based on content
- TF-IDF Vectorization: Term frequency-inverse document frequency
- PCA Reduction: Dimensionality reduction for performance
- Cluster Labeling: Automatic theme detection and naming
Technical Implementation:
- Sentence-based clustering for semantic coherence
- Feature extraction with stop word filtering
- Fallback clustering for small datasets
- Coordinate generation for visualization
Purpose: Multi-dimensional emotional tone analysis Algorithm: TextBlob polarity and subjectivity analysis Key Features:
- Overall Sentiment: Compound polarity score (-1 to +1)
- Dimensional Breakdown: Positive, negative, neutral percentages
- Sentence-level Analysis: Granular sentiment tracking
- Confidence Metrics: Reliability scoring
Purpose: Secure, optimized file upload and processing system Key Features:
- Multi-format Support: 9 document formats via
DocumentParser - Progressive Upload: Chunked streaming with progress tracking
- Memory Optimization: Memory mapping for large files (>1MB)
- Privacy Protection: Automatic file deletion after processing
- Security Controls: File size limits, format validation
- Background Cleanup: Automated old file removal
Supported Formats:
.txt- Plain text.docx- Microsoft Word (2007+).pdf- PDF documents.rtf- Rich Text Format.odt- OpenDocument Text.html/.htm- HTML documents.md/.markdown- Markdown files
Purpose: Universal document format parsing and text extraction Architecture: Modular parser system with graceful fallbacks Key Features:
- Format Detection: MIME type and extension validation
- Library Management: Dynamic availability checking
- Content Extraction: Text-only extraction with structure preservation
- Error Handling: Comprehensive error recovery
Parser Implementations:
- DOCX:
python-docxwith table extraction - PDF:
PyPDF2with page-wise processing - RTF:
striprtffor rich text conversion - ODT:
odfpyfor OpenDocument parsing - HTML:
BeautifulSoupwith tag stripping - Markdown:
markdownwith HTML conversion
Purpose: Local AI model integration for enhanced insights Model: Llama 3.2 1B (lightweight, privacy-focused) Key Features:
- Local Processing: No external API calls
- Optional Enhancement: Graceful degradation if unavailable
- Connection Management: Health checking and reconnection
- Model Management: Automatic model pulling and validation
AI Capabilities:
- Thematic analysis and pattern recognition
- Writing style characterization
- Emotional tone interpretation
- Literary analysis insights
The application uses Pydantic models for type safety and API documentation:
Primary response model containing all analysis data:
class AnalysisResult(BaseModel):
id: str # Unique analysis identifier
timestamp: str # ISO timestamp
status: AnalysisStatus # PENDING/PROCESSING/COMPLETED/FAILED
keyness: Optional[KeynessResult]
semanticClustering: Optional[SemanticClusteringResult]
sentiment: Optional[SentimentResult]
textStatistics: Optional[TextStatistics]
aiInsights: Optional[Dict] # Ollama-generated insights
metadata: Optional[ProcessingMetadata]
error_message: Optional[str]Statistical word significance analysis:
class KeynessResult(BaseModel):
keywords: List[KeywordItem] # Ranked keyword list
total_keywords: int
processing_time_ms: float
reference_corpus: str # "general_english"
class KeywordItem(BaseModel):
word: str
score: float # Log-likelihood score
frequency: int # Observed frequency
rank: int # Significance ranking
effect_size: float # Standardized effect
confidence: float # Reliability metricMachine learning-based content clustering:
class SemanticClusteringResult(BaseModel):
clusters: List[SemanticCluster]
total_clusters: int
processing_time_ms: float
algorithm: str # "kmeans_embedding"
similarity_threshold: float
class SemanticCluster(BaseModel):
id: int
label: str # Auto-generated theme label
words: List[str] # Cluster terms
size: int # Cluster size
coherence_score: float # Cluster quality metric
word_coordinates: List[WordCoordinate] # Visualization dataMulti-dimensional emotional analysis:
class SentimentResult(BaseModel):
overall: float # Compound sentiment (-1 to +1)
positive: float # Positive percentage (0-100)
negative: float # Negative percentage (0-100)
neutral: float # Neutral percentage (0-100)
compound: float # Alternative overall score
confidence: float # Analysis reliability
sentence_sentiments: List[Dict] # Per-sentence breakdownComprehensive readability and complexity metrics:
class TextStatistics(BaseModel):
character_count: int
word_count: int
sentence_count: int
paragraph_count: int
avg_sentence_length: float
avg_word_length: float
unique_words: int
vocabulary_richness: float # Type-token ratio
readability_score: float # Flesch Reading Ease (0-100)-
POST /api/upload
- Multi-format file upload with progress tracking
- Immediate analysis and result caching
- Background file deletion for privacy
- Returns:
{success, message, analysisId, progress}
-
GET /api/upload/progress/{session_id}
- Real-time upload progress monitoring
- Returns:
{progress}(0-100)
- POST /api/analyze
- Direct text input analysis
- Bypasses file handling
- Returns: Complete
AnalysisResult
-
GET /api/results/{analysis_id}
- Retrieve cached analysis results
- Results automatically expire for privacy
-
GET /api/download/{analysis_id}
- Download results as JSON
- Automatic cleanup after download
- Streaming response for large datasets
-
GET /api/health
- Service status and health metrics
- NLP model availability
- Ollama integration status
- File cleanup configuration
-
GET /api/files/stats
- Temporary directory statistics
- File count, sizes, and ages
- Storage usage monitoring
-
POST /api/files/cleanup
- Manual file cleanup trigger
- Background cleanup initiation
-
GET /api/formats
- Supported file format listing
- Parser availability status
- AI service management endpoints
- Model availability checking
- Model downloading and setup
The React frontend uses a component-based architecture with clear separation of concerns:
Purpose: Root component orchestrating the entire user experience Key Features:
- State Management: Analysis results, upload progress, error handling
- View Modes: Enhanced vs basic visualization modes
- Privacy Management: Privacy policy and consent handling
- Export Orchestration: Multi-format export coordination
State Management:
const [isAnalyzing, setIsAnalyzing] = useState(false);
const [results, setResults] = useState<AnalysisResult | null>(null);
const [uploadProgress, setUploadProgress] = useState(0);
const [viewMode, setViewMode] = useState<'enhanced' | 'basic'>('enhanced');Purpose: Drag-and-drop file upload interface with progress tracking Key Features:
- React Dropzone: Advanced file selection with validation
- Multi-format Support: 7 document formats accepted
- Progress Visualization: Real-time upload progress bar
- Error Handling: Format validation and size limit enforcement
- Accessibility: Screen reader support and keyboard navigation
Purpose: Interactive D3.js-powered keyness visualization Features:
- Horizontal bar chart with positive/negative keyness
- Interactive tooltips with detailed statistics
- Zoom and pan functionality
- Export capabilities (PNG, PDF)
- Color-coded significance levels
Purpose: Plotly.js-based cluster visualization Features:
- Scatter plot with cluster boundaries
- Interactive cluster exploration
- Word coordinate mapping
- Hover details and selection
- 3D visualization options
Purpose: Multi-chart sentiment visualization Features:
- Pie chart for sentiment distribution
- Gauge chart for overall sentiment
- Sentence-level sentiment timeline
- Color-coded emotional indicators
Purpose: Comprehensive readability and complexity metrics display Features:
- Statistical summary cards
- Readability score interpretation
- Vocabulary analysis charts
- Comparative baseline indicators
Purpose: API communication and data management Features:
- Axios Integration: HTTP client with error handling
- Type Safety: TypeScript interfaces for all API responses
- Progress Tracking: Upload progress monitoring
- Result Caching: Client-side result management
sequenceDiagram
participant U as User
participant F as Frontend
participant A as API
participant P as Pipeline
participant S as Services
U->>F: Upload file
F->>A: POST /api/upload
A->>P: Analyze text
P->>S: Process (Keyness, Semantic, Sentiment)
S-->>P: Analysis results
P-->>A: Complete results
A-->>F: Analysis ID + results
F-->>U: Display visualizations
U->>F: Export results
F->>F: Generate exports
F-->>U: Download files
A->>A: Background cleanup
backend:
build: ./backend
container_name: reflexai-backend
ports: ["8000:8000"]
environment:
- DELETE_AFTER_ANALYSIS=true
- OLLAMA_MODEL=llama3.2:1b
networks: [app-network]frontend:
build: ./frontend
container_name: reflexai-frontend
ports: ["3000:3000"]
environment:
- VITE_API_URL=http://localhost:8000
depends_on: [backend]
networks: [app-network]ollama:
image: ollama/ollama
container_name: reflexai-ollama
ports: ["11434:11434"]
volumes: [ollama_data:/root/.ollama]
networks: [app-network]- Base Image:
python:3.11-slim - Dependencies: System packages (gcc) + Python requirements
- NLP Models: Automatic NLTK and spaCy model downloading
- Optimization: Multi-stage build with caching optimization
- Base Image:
node:18-alpine - Build Process: Vite-powered build system
- Production: Nginx serving optimized static files
Development Commands:
make up: Complete environment setupmake dev: Development mode with live logsmake setup: Full setup including AI model downloadmake health: System health checkingmake clean: Complete cleanup
Monitoring Commands:
make status: Service status overviewmake logs: Aggregated service logsmake ollama-status: AI model status
- Automatic file deletion after analysis completion
- Configurable cleanup intervals (default: 30 minutes)
- Memory-only result caching with expiration
- No persistent storage of user content
- Core NLP processing runs locally
- No external API calls for analysis
- Optional AI enhancement via local Ollama
- No cloud dependencies for core functionality
- File size limits (50MB default)
- Format validation and sanitization
- Temporary file isolation
- Background cleanup processes
- Multi-layer file format validation
- Content sanitization for XSS prevention
- Request rate limiting
- CORS configuration for frontend-only access
class Settings(BaseSettings):
PROJECT_NAME: str = "Text Analysis Tool"
MAX_FILE_SIZE: int = 10485760 # 10MB default
DELETE_AFTER_ANALYSIS: bool = True # Privacy protection
CLEANUP_INTERVAL_SECONDS: int = 1800 # 30 minutes
MAX_FILE_AGE_SECONDS: int = 3600 # 1 hour
OLLAMA_MODEL: str = "llama3.2:1b" # Lightweight model- FastAPI async/await throughout
- Non-blocking file I/O with
aiofiles - Concurrent analysis pipeline stages
- Background task management
- Memory mapping for large files (>1MB)
- Streaming file processing
- Chunked upload handling (64KB chunks)
- Garbage collection optimization
- In-memory result caching
- Progressive upload progress tracking
- Model initialization caching
- Preprocessing result reuse
- Vite-powered development and production builds
- Tree-shaking for minimal bundle size
- Dynamic imports for code splitting
- Asset optimization and compression
- Canvas rendering for large datasets
- Debounced user interactions
- Lazy loading of visualization libraries
- Progressive enhancement
- Progressive upload indicators
- Optimistic UI updates
- Error boundary implementation
- Responsive design with mobile optimization
# System requirements
brew install docker docker-compose make # macOS
# or
sudo apt-get install docker.io docker-compose make # Ubuntu# Clone and start
git clone <repository>
cd ReflexAI
make up # Start all services
make dev # Development with logs
make setup # Include AI model# Frontend only
cd frontend && npm install && npm run dev
# Backend only
cd backend && python -m venv venv
source venv/bin/activate && pip install -r requirements.txt
uvicorn main:app --reload
# Full stack with Docker
make up && make logs- Unit Tests: Service-level testing with pytest
- Integration Tests: API endpoint testing
- NLP Pipeline Tests: Analysis accuracy validation
- Performance Tests: Large file handling
- Component Tests: React component testing
- Integration Tests: API communication testing
- Visualization Tests: Chart rendering validation
- E2E Tests: Full user workflow testing
/api/health: Service status and configuration/api/files/stats: File system monitoring- Docker health checks for all services
- Structured logging with correlation IDs
- Performance timing for all operations
- Error tracking with full stack traces
- User action tracking (privacy-compliant)
- FastAPI automatic API documentation (
/docs) - React Developer Tools integration
- Docker container inspection
- Real-time log streaming
make up # Local development
make dev # With live logs# With production optimizations
docker-compose -f docker-compose.prod.yml up -d- Horizontal scaling via Docker Swarm or Kubernetes
- Load balancing for multiple backend instances
- Shared storage for temporary files
- Database integration for result persistence (optional)
- Processing time per analysis stage
- File upload success rates
- Memory usage and optimization
- Error rates and types
- Analysis completion rates
- Feature usage statistics
- Performance benchmarks
- Error recovery rates
- Small files (<1MB): <2 seconds analysis
- Large files (10-50MB): <30 seconds analysis
- Concurrent users: 10+ simultaneous analyses
- Memory usage: <500MB per analysis