Transform email analysis with AI-powered classification, summarization, and emotion detection
π Quick Start
Β Β
β¨ Features
Β Β
ποΈ Architecture
Β Β
π» Development
π― Built for researchers, data scientists, and NLP enthusiasts β Leverage cutting-edge transformer models and advanced NLP techniques to unlock insights from the Enron email dataset with GPU-accelerated performance.
EnronClassifier is a sophisticated email analysis platform that combines the power of modern transformer models with an intuitive cross-platform interface. Whether you're conducting research, analyzing communication patterns, or exploring NLP techniques, our application provides comprehensive tools for email classification, summarization, and emotion analysis.
|
|
|
|
|
|
Our advanced classification system categorizes emails into 10 distinct categories using state-of-the-art transformer models:
| π― Category | π Description | π Examples |
|---|---|---|
| π― Strategic Planning | Long-term strategy, acquisitions, corporate planning | "Q4 merger discussion", "5-year growth plan" |
| βοΈ Daily Operations | Routine tasks, operational procedures | "Daily status update", "Process changes" |
| π° Financial | Budget, accounting, expense reports | "Monthly P&L", "Budget approval" |
| βοΈ Legal & Compliance | Legal matters, regulatory compliance | "Contract review", "Compliance audit" |
| π€ Client & External | External communications, partnerships | "Client meeting", "Vendor negotiation" |
| π₯ HR & Personnel | Human resources, hiring matters | "New hire onboarding", "Performance review" |
| ποΈ Meetings & Events | Scheduling, event planning | "Board meeting agenda", "Conference planning" |
| π¨ Urgent & Critical | Time-sensitive, emergency issues | "System outage", "Urgent approval needed" |
| π¬ Personal & Informal | Personal communications, informal chats | "Lunch plans", "Weekend discussion" |
| π§ Technical & IT | Technical support, system issues | "Server maintenance", "Software bug report" |
| π οΈ Tool | π Version | π Notes |
|---|---|---|
| Node.js & npm | v18+ |
Download here |
| Docker | Latest |
Get Docker |
| Rust | Latest |
Required for Tauri builds |
| Python | 3.10+ |
For local Flask development |
| Ollama | Latest |
Required - Install Ollama |
|
Linux, macOS, WSL # Complete setup in one command
./bin/enron_classifier.shThat's it! β¨ The script handles everything:
|
Docker-based setup # 1. Download dataset
./bin/download_enron.cmd
# 2. Generate database
./bin/generate_db.cmd
# 3. Start services
docker compose up --build
# 4. Launch desktop app
npm --prefix ./apps/enron_classifier run tauri dev |
AI Response Generation Setup
For AI-powered email response generation, you need to install and configure Ollama:
# 1. Install Ollama from https://ollama.com
# 2. Pull a recommended model (choose one)
ollama pull llama3.2:3b # Lightweight, fast responses
ollama pull llama3.1:8b # Balanced performance
ollama pull codellama:7b # For technical emails
ollama pull mistral # Default model
# 3. Verify installation
ollama listBy default, the shell script ./bin/enron_classifier.sh checks whether the default model (configured in the app) is installed using ollama list. If the model is missing, the script automatically pulls it using ollama pull.
You can change the default model by editing the file:
// apps/enron_classifier/src/config.js
// API configuration
export const API_URL = 'http://localhost:5050/api';
// Timeout configuration (in milliseconds)
export const API_TIMEOUT = 240000; // 4 minutes
// Model configuration
export const DEFAULT_MODEL = 'mistral';
export const DEFAULT_TEMPERATURE = 0.7;config.js for response generation. It does not automatically select other models even if they're installed.
graph TB
subgraph "Frontend Layer"
A[React + JavaScript UI]
B[Tauri Desktop Wrapper]
C[Web Application]
end
subgraph "API Layer"
D[Flask REST API]
E[Advanced NLP Pipeline]
F[Transformer Models]
end
subgraph "Data Layer"
G[SQLite Database]
H[Enron Email Dataset]
end
subgraph "AI Services"
I[Ollama Server]
J[LLM Models]
end
A --> D
B --> A
C --> A
D --> E
E --> F
D --> G
G --> H
D --> I
I --> J
style A fill:#61DAFB,stroke:#21325B,color:#000
style D fill:#000000,stroke:#FFFFFF,color:#fff
style F fill:#FFD21E,stroke:#FF6B35,color:#000
style I fill:#00D4AA,stroke:#007A5E,color:#000
| Layer | Technologies | Purpose |
|---|---|---|
| π¨ Frontend | React 18, JavaScript, Tailwind CSS, Framer Motion | Modern, responsive user interface |
| π₯οΈ Desktop | Tauri, Rust | Cross-platform desktop application |
| π API | Flask 3.1, Python 3.10+ | RESTful backend services |
| π§ ML/NLP | Transformers, BART, DistilBERT, NLTK, spaCy | Advanced language processing |
| ποΈ Database | SQLite | Efficient email data storage |
| π€ AI | Ollama, Llama models, Mistral | Intelligent response generation |
NLP_project/
βββ π± apps/
β βββ π flask_api/ # Backend API
β β βββ app/
β β β βββ π£οΈ routes/ # API endpoints
β β β βββ π§ services/ # ML/NLP services
β β β β βββ π§ enron_classifier.py
β β β β βββ π emotion_enhancer.py
β β β β βββ π summarizer.py
β β β β βββ π€ responder.py
β β β β βββ π·οΈ ner_engine.py
β β β βββ π§ͺ tests/
β β βββ π models/
β βββ βοΈ enron_classifier/ # Frontend React app
β β βββ src/
β β β βββ π§© components/
β β β βββ π contexts/
β β β βββ πͺ hooks/
β β β βββ π± pages/
β β βββ π¦ src-tauri/ # Tauri config
β βββ ποΈ SQLite_db/ # Database generation
βββ π bin/ # Setup scripts
βββ π³ docker-compose.yml
βββ π README.md
| Command | Description | Platform |
|---|---|---|
./bin/enron_classifier.sh |
π― Complete setup | Unix |
./bin/enron_classifier.sh --api-only |
π§ API development | Unix |
./bin/enron_classifier.sh --frontend-only |
π¨ Frontend development | Unix |
docker compose up --build |
π³ Docker development | All |
npm run tauri dev |
π₯οΈ Desktop app development | All |
npm run dev |
π Web app development | All |
| π£οΈ Endpoint | π Method | π― Purpose | π Input |
|---|---|---|---|
/classify |
POST |
Email classification | Email text |
/summarize |
POST |
Text summarization | Email content |
/emotion-enhance |
POST |
Emotion analysis | Email text |
/respond |
POST |
AI response generation | Email context |
/users |
GET |
List Enron users | None |
/users/<id>/emails |
GET |
User's emails | User ID |
| π₯οΈ Platform | π Acceleration | π Performance | π οΈ Setup Command |
|---|---|---|---|
| π§ Linux + NVIDIA | Full CUDA | βββββ Excellent | ./bin/enron_classifier.sh |
| πͺ Windows + NVIDIA | Full CUDA | βββββ Excellent | Docker setup |
| π macOS (Apple Silicon) | Limited MPS | ββββ Good | ./bin/enron_classifier.sh |
| π macOS (Intel) | CPU only | βββ Good | ./bin/enron_classifier.sh |
| π§ WSL | CPU/CUDA | ββββ Good | ./bin/enron_classifier.sh |
- π GPU Acceleration: Use NVIDIA GPUs for best performance
- π Apple Silicon: Native setup recommended over Docker
- πΎ Memory: 8GB+ RAM recommended for large datasets
- π Caching: Models are cached after first load
The classifier uses a fast, transformer-based zero-shot labeling system to assign initial labels to each email (based on BART embeddings or sentence embeddings + cosine similarity).
This means even the very first training round generates labels automatically, but for best accuracy you may want to retrain on more data.
By default, the model is trained on a 100k email sample (~20 minutes on modern hardware).
For best results, we recommend training on the full dataset (~500k emails), which may take around 1.5β2 hours depending on your CPU or GPU.
You can retrain easily using the built-in API:
curl -i -X POST http://localhost:5050/api/classify/train \
-H "Content-Type: application/json" \
-d '{"enron_dir": "../SQLite_db/enron.db", "max_emails": 500000}'
or, for Docker users:
curl -i -X POST http://localhost:5050/api/classify/train \
-H "Content-Type: application/json" \
-d '{"enron_dir": "app/data/enron.db", "max_emails": 500000}'This test suite evaluates the Enron Email Classifier on a small subset of emails to ensure the training pipeline, model creation, and classification process work correctly.
- Loads 3,000 emails from the Enron dataset.
- Uses the zero-shot labeler (embedding-based or BART) to generate initial labels for these emails.
- Trains a fresh model from scratch on this small dataset using an ensemble classifier.
- Evaluates performance on a 600-email test set and generates:
- A confusion matrix heatmap
- A detailed classification report
- A CSV file with per-email predictions and confidences.
- Saves all results in the
test_results/directory.
- Training on the full dataset (~500k emails) would take hours, even on a powerful machine.
- This test uses only 3,000 emails to keep the test fast and manageableβroughly 10β15 seconds to run.
- Performance on this small test set is expected to be low (accuracy ~10β20%) due to the limited data.
python3 -m app.tests.classification_testNote: The test always trains a new model from scratch on the 3,000-email sample. For production, train the model on a larger dataset (100k+ emails) using the training endpoint (recommended).
π§ Common Issues & Solutions
# Make script executable
chmod +x ./bin/enron_classifier.sh
# Run with debug output
bash -x ./bin/enron_classifier.sh# Reset Docker environment
docker compose down --volumes
docker compose up --build --force-recreate# Check Ollama status
ollama list
curl http://localhost:11434/api/tags
# Restart Ollama service
ollama serve# Clear and reinstall dependencies
npm cache clean --force
rm -rf node_modules package-lock.json
npm installHow to contribute:
- π΄ Fork the repository
- πΏ Create your feature branch (
git checkout -b feature/amazing-feature) - β¨ Commit your changes (
git commit -m 'Add amazing feature') - π Push to the branch (
git push origin feature/amazing-feature) - π― Open a Pull Request
Contribution areas:
- π§ ML model improvements
- π¨ UI/UX enhancements
- π Documentation updates
- π§ͺ Test coverage expansion
- π Bug fixes
| Role | Focus | Expertise |
|---|---|---|
| π Lead Developer | Architecture & Integration | Full-stack, ML Pipeline |
| π§ ML Engineer | NLP & AI Models | Transformers, GPU Optimization |
| π¨ Frontend Developer | UI/UX & React | JavaScript, Modern Web |
| π Backend Developer | API & Services | Flask, Python, Databases |
| π Data Scientist | Analytics & Insights | Statistics, Visualization |
This project is licensed under the MIT License - see the LICENSE file for details.
Built with β€οΈ and powered by:
Special thanks to the open-source community and research institutions that make advanced NLP accessible to everyone.
π§ Transform emails into insights with AI! π§
