📬 EnronClassifier

🚀 Advanced Email Intelligence Platform

Transform email analysis with AI-powered classification, summarization, and emotion detection

🚀 Quick Start ✨ Features 🏗️ Architecture 💻 Development

🎯 Built for researchers, data scientists, and NLP enthusiasts — Leverage cutting-edge transformer models and advanced NLP techniques to unlock insights from the Enron email dataset with GPU-accelerated performance.

🌟 What is EnronClassifier?

EnronClassifier is a sophisticated email analysis platform that combines the power of modern transformer models with an intuitive cross-platform interface. Whether you're conducting research, analyzing communication patterns, or exploring NLP techniques, our application provides comprehensive tools for email classification, summarization, and emotion analysis.

✨ Key Features

⚠️ Note: The initial label generation step can take time (e.g. ~10 minutes per 10k emails on CPU), but it runs automatically as part of the training process — no manual labeling required!

🧠 Intelligent Classification 10 comprehensive categories for precise email sorting Zero-shot classification using BART transformer Semantic understanding with sentence transformers Multi-label support for complex emails	📄 Smart Summarization Extractive summarization using NLTK, spaCy, Sumy Key phrase extraction for quick insights Configurable length based on your needs Context preservation for accurate summaries
🎭 Emotion Analysis Sentiment detection with confidence scores Tone analysis for communication insights Enhancement suggestions for better phrasing Emotional context understanding	🤖 AI Response Generation Ollama integration for intelligent replies Context-aware responses based on email content Multiple LLM options (Llama, CodeLlama, Mistral) Customizable tone and style
⚡ Performance Optimized GPU acceleration with full CUDA support Apple Silicon (MPS) compatibility Intelligent fallback to CPU when needed Efficient caching for faster processing	💻 Cross-Platform Desktop app via Tauri framework Web interface for browser-based usage Docker support for easy deployment Unix & Windows compatibility

🏷️ Classification System

Our advanced classification system categorizes emails into 10 distinct categories using state-of-the-art transformer models:

🎯 Category	📝 Description	🔍 Examples
🎯 Strategic Planning	Long-term strategy, acquisitions, corporate planning	"Q4 merger discussion", "5-year growth plan"
⚙️ Daily Operations	Routine tasks, operational procedures	"Daily status update", "Process changes"
💰 Financial	Budget, accounting, expense reports	"Monthly P&L", "Budget approval"
⚖️ Legal & Compliance	Legal matters, regulatory compliance	"Contract review", "Compliance audit"
🤝 Client & External	External communications, partnerships	"Client meeting", "Vendor negotiation"
👥 HR & Personnel	Human resources, hiring matters	"New hire onboarding", "Performance review"
🗓️ Meetings & Events	Scheduling, event planning	"Board meeting agenda", "Conference planning"
🚨 Urgent & Critical	Time-sensitive, emergency issues	"System outage", "Urgent approval needed"
💬 Personal & Informal	Personal communications, informal chats	"Lunch plans", "Weekend discussion"
🔧 Technical & IT	Technical support, system issues	"Server maintenance", "Software bug report"

🚀 Quick Start Guide

📋 Prerequisites

🛠️ Tool	📌 Version	📝 Notes
Node.js & npm	`v18+`	Download here
Docker	`Latest`	Get Docker
Rust	`Latest`	Required for Tauri builds
Python	`3.10+`	For local Flask development
Ollama	`Latest`	Required - Install Ollama

🎯 One-Command Setup

🐧 Unix Systems

Linux, macOS, WSL

# Complete setup in one command
./bin/enron_classifier.sh

That's it! ✨ The script handles everything:

📥 Downloads Enron dataset
🗄️ Builds SQLite database
🎨 Installs frontend dependencies
🤖 Sets up Ollama models
🚀 Starts Flask API
💻 Launches desktop app

🪟 Windows

Docker-based setup

# 1. Download dataset
./bin/download_enron.cmd

# 2. Generate database
./bin/generate_db.cmd

# 3. Start services
docker compose up --build

# 4. Launch desktop app
npm --prefix ./apps/enron_classifier run tauri dev

🤖 Ollama Setup

AI Response Generation Setup

For AI-powered email response generation, you need to install and configure Ollama:

# 1. Install Ollama from https://ollama.com

# 2. Pull a recommended model (choose one)
ollama pull llama3.2:3b        # Lightweight, fast responses
ollama pull llama3.1:8b        # Balanced performance
ollama pull codellama:7b       # For technical emails
ollama pull mistral            # Default model

# 3. Verify installation
ollama list

By default, the shell script ./bin/enron_classifier.sh checks whether the default model (configured in the app) is installed using ollama list. If the model is missing, the script automatically pulls it using ollama pull.

You can change the default model by editing the file:

// apps/enron_classifier/src/config.js

// API configuration
export const API_URL = 'http://localhost:5050/api';

// Timeout configuration (in milliseconds)
export const API_TIMEOUT = 240000; // 4 minutes

// Model configuration
export const DEFAULT_MODEL = 'mistral';
export const DEFAULT_TEMPERATURE = 0.7;

⚠️ Important: The application uses only the configured default model defined in config.js for response generation. It does not automatically select other models even if they're installed.

🏗️ Architecture Overview

graph TB
    subgraph "Frontend Layer"
        A[React + JavaScript UI]
        B[Tauri Desktop Wrapper]
        C[Web Application]
    end

    subgraph "API Layer"
        D[Flask REST API]
        E[Advanced NLP Pipeline]
        F[Transformer Models]
    end

    subgraph "Data Layer"
        G[SQLite Database]
        H[Enron Email Dataset]
    end

    subgraph "AI Services"
        I[Ollama Server]
        J[LLM Models]
    end

    A --> D
    B --> A
    C --> A
    D --> E
    E --> F
    D --> G
    G --> H
    D --> I
    I --> J

    style A fill:#61DAFB,stroke:#21325B,color:#000
    style D fill:#000000,stroke:#FFFFFF,color:#fff
    style F fill:#FFD21E,stroke:#FF6B35,color:#000
    style I fill:#00D4AA,stroke:#007A5E,color:#000

🔧 Technology Stack

Layer	Technologies	Purpose
🎨 Frontend	React 18, JavaScript, Tailwind CSS, Framer Motion	Modern, responsive user interface
🖥️ Desktop	Tauri, Rust	Cross-platform desktop application
🔗 API	Flask 3.1, Python 3.10+	RESTful backend services
🧠 ML/NLP	Transformers, BART, DistilBERT, NLTK, spaCy	Advanced language processing
🗄️ Database	SQLite	Efficient email data storage
🤖 AI	Ollama, Llama models, Mistral	Intelligent response generation

🛠️ Development Setup

📁 Project Structure

NLP_project/
├── 📱 apps/
│   ├── 🐍 flask_api/              # Backend API
│   │   ├── app/
│   │   │   ├── 🛣️ routes/         # API endpoints
│   │   │   ├── 🔧 services/       # ML/NLP services
│   │   │   │   ├── 🧠 enron_classifier.py
│   │   │   │   ├── 🎭 emotion_enhancer.py
│   │   │   │   ├── 📄 summarizer.py
│   │   │   │   ├── 🤖 responder.py
│   │   │   │   └── 🏷️ ner_engine.py
│   │   │   └── 🧪 tests/
│   │   └── 📊 models/
│   ├── ⚛️ enron_classifier/       # Frontend React app
│   │   ├── src/
│   │   │   ├── 🧩 components/
│   │   │   ├── 🔄 contexts/
│   │   │   ├── 🪝 hooks/
│   │   │   └── 📱 pages/
│   │   └── 🦀 src-tauri/         # Tauri config
│   └── 🗄️ SQLite_db/             # Database generation
├── 📜 bin/                       # Setup scripts
├── 🐳 docker-compose.yml
└── 📖 README.md

🚀 Development Commands

Command	Description	Platform
`./bin/enron_classifier.sh`	🎯 Complete setup	Unix
`./bin/enron_classifier.sh --api-only`	🔧 API development	Unix
`./bin/enron_classifier.sh --frontend-only`	🎨 Frontend development	Unix
`docker compose up --build`	🐳 Docker development	All
`npm run tauri dev`	🖥️ Desktop app development	All
`npm run dev`	🌐 Web app development	All

🔗 API Endpoints

🛣️ Endpoint	📝 Method	🎯 Purpose	📊 Input
`/classify`	`POST`	Email classification	Email text
`/summarize`	`POST`	Text summarization	Email content
`/emotion-enhance`	`POST`	Emotion analysis	Email text
`/respond`	`POST`	AI response generation	Email context
`/users`	`GET`	List Enron users	None
`/users/<id>/emails`	`GET`	User's emails	User ID

⚡ Performance & Optimization

🎯 Hardware Acceleration

🖥️ Platform	🚀 Acceleration	📈 Performance	🛠️ Setup Command
🐧 Linux + NVIDIA	Full CUDA	⭐⭐⭐⭐⭐ Excellent	`./bin/enron_classifier.sh`
🪟 Windows + NVIDIA	Full CUDA	⭐⭐⭐⭐⭐ Excellent	Docker setup
🍎 macOS (Apple Silicon)	Limited MPS	⭐⭐⭐⭐ Good	`./bin/enron_classifier.sh`
🍎 macOS (Intel)	CPU only	⭐⭐⭐ Good	`./bin/enron_classifier.sh`
🐧 WSL	CPU/CUDA	⭐⭐⭐⭐ Good	`./bin/enron_classifier.sh`

💡 Performance Tips

🚀 GPU Acceleration: Use NVIDIA GPUs for best performance
🍎 Apple Silicon: Native setup recommended over Docker
💾 Memory: 8GB+ RAM recommended for large datasets
🔄 Caching: Models are cached after first load

📊 Model Training Note

The classifier uses a fast, transformer-based zero-shot labeling system to assign initial labels to each email (based on BART embeddings or sentence embeddings + cosine similarity).
This means even the very first training round generates labels automatically, but for best accuracy you may want to retrain on more data.

By default, the model is trained on a 100k email sample (~20 minutes on modern hardware).
For best results, we recommend training on the full dataset (~500k emails), which may take around 1.5–2 hours depending on your CPU or GPU.

You can retrain easily using the built-in API:

curl -i -X POST http://localhost:5050/api/classify/train \
 -H "Content-Type: application/json" \
 -d '{"enron_dir": "../SQLite_db/enron.db", "max_emails": 500000}'

or, for Docker users:

curl -i -X POST http://localhost:5050/api/classify/train \
-H "Content-Type: application/json" \
-d '{"enron_dir": "app/data/enron.db", "max_emails": 500000}'

⚠️ Note: The initial label generation step can take time (e.g. ~10 minutes per 10k emails on CPU), but it runs automatically as part of the training process — no manual labeling required!

🧪 Running the Enron Email Classifier Test Suite

This test suite evaluates the Enron Email Classifier on a small subset of emails to ensure the training pipeline, model creation, and classification process work correctly.

📝 What the Test Does:

Loads 3,000 emails from the Enron dataset.
Uses the zero-shot labeler (embedding-based or BART) to generate initial labels for these emails.
Trains a fresh model from scratch on this small dataset using an ensemble classifier.
Evaluates performance on a 600-email test set and generates:
- A confusion matrix heatmap
- A detailed classification report
- A CSV file with per-email predictions and confidences.
Saves all results in the test_results/ directory.

⚡ Why Such a Small Dataset?

Training on the full dataset (~500k emails) would take hours, even on a powerful machine.
This test uses only 3,000 emails to keep the test fast and manageable—roughly 10–15 seconds to run.
Performance on this small test set is expected to be low (accuracy ~10–20%) due to the limited data.

🚀 How to Run It:

python3 -m app.tests.classification_test

Note: The test always trains a new model from scratch on the 3,000-email sample. For production, train the model on a larger dataset (100k+ emails) using the training endpoint (recommended).

🛠️ Troubleshooting

🔧 Common Issues & Solutions

🐧 Unix Setup Issues

# Make script executable
chmod +x ./bin/enron_classifier.sh

# Run with debug output
bash -x ./bin/enron_classifier.sh

🐳 Docker Issues

# Reset Docker environment
docker compose down --volumes
docker compose up --build --force-recreate

🤖 Ollama Connection Issues

# Check Ollama status
ollama list
curl http://localhost:11434/api/tags

# Restart Ollama service
ollama serve

📦 Node.js Issues

# Clear and reinstall dependencies
npm cache clean --force
rm -rf node_modules package-lock.json
npm install

🤝 Contributing

🌟 We welcome contributions!

How to contribute:

🍴 Fork the repository
🌿 Create your feature branch (git checkout -b feature/amazing-feature)
✨ Commit your changes (git commit -m 'Add amazing feature')
🚀 Push to the branch (git push origin feature/amazing-feature)
🎯 Open a Pull Request

Contribution areas:

🧠 ML model improvements
🎨 UI/UX enhancements
📚 Documentation updates
🧪 Test coverage expansion
🐛 Bug fixes

👥 Team

Role	Focus	Expertise
🚀 Lead Developer	Architecture & Integration	Full-stack, ML Pipeline
🧠 ML Engineer	NLP & AI Models	Transformers, GPU Optimization
🎨 Frontend Developer	UI/UX & React	JavaScript, Modern Web
🐍 Backend Developer	API & Services	Flask, Python, Databases
📊 Data Scientist	Analytics & Insights	Statistics, Visualization

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with ❤️ and powered by:

Special thanks to the open-source community and research institutions that make advanced NLP accessible to everyone.

🎉 Ready to revolutionize email analysis?

🚀 Get Started Now

📧 Transform emails into insights with AI! 📧

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
.github		.github
apps		apps
bin		bin
docs		docs
.editorconfig		.editorconfig
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierrc		.prettierrc
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.md		LICENSE.md
README.md		README.md
WhoDidWhat.md		WhoDidWhat.md
classifier_Architecture.html		classifier_Architecture.html
docker-compose.yml		docker-compose.yml

License

NoamFav/EnronBox

Folders and files

Latest commit

History

Repository files navigation

📬 EnronClassifier

🚀 Advanced Email Intelligence Platform

🌟 What is EnronClassifier?

✨ Key Features

🧠 Intelligent Classification

📄 Smart Summarization

🎭 Emotion Analysis

🤖 AI Response Generation

⚡ Performance Optimized

💻 Cross-Platform

🏷️ Classification System

🚀 Quick Start Guide

📋 Prerequisites

🎯 One-Command Setup

🐧 Unix Systems

🪟 Windows

🤖 Ollama Setup

🏗️ Architecture Overview

🔧 Technology Stack

🛠️ Development Setup

📁 Project Structure

🚀 Development Commands

🔗 API Endpoints

⚡ Performance & Optimization

🎯 Hardware Acceleration

💡 Performance Tips

📊 Model Training Note

🧪 Running the Enron Email Classifier Test Suite

📝 What the Test Does:

⚡ Why Such a Small Dataset?

🚀 How to Run It:

🛠️ Troubleshooting

🐧 Unix Setup Issues

🐳 Docker Issues

🤖 Ollama Connection Issues

📦 Node.js Issues

🤝 Contributing

🌟 We welcome contributions!

👥 Team

📜 License

🙏 Acknowledgments

🎉 Ready to revolutionize email analysis?

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages