Skip to content

NoamFav/EnronBox

πŸ“¬ EnronClassifier

React Flask Tauri Docker HuggingFace JavaScript

EnronClassifier Logo

πŸš€ Advanced Email Intelligence Platform

Transform email analysis with AI-powered classification, summarization, and emotion detection


πŸš€ Quick Start
Β Β 
✨ Features
Β Β 
πŸ—οΈ Architecture
Β Β 
πŸ’» Development

🎯 Built for researchers, data scientists, and NLP enthusiasts β€” Leverage cutting-edge transformer models and advanced NLP techniques to unlock insights from the Enron email dataset with GPU-accelerated performance.


🌟 What is EnronClassifier?

EnronClassifier is a sophisticated email analysis platform that combines the power of modern transformer models with an intuitive cross-platform interface. Whether you're conducting research, analyzing communication patterns, or exploring NLP techniques, our application provides comprehensive tools for email classification, summarization, and emotion analysis.

EnronClassifier Interface

✨ Key Features

⚠️ Note: The initial label generation step can take time (e.g. ~10 minutes per 10k emails on CPU), but it runs automatically as part of the training process β€” no manual labeling required!

🧠 Intelligent Classification

  • 10 comprehensive categories for precise email sorting
  • Zero-shot classification using BART transformer
  • Semantic understanding with sentence transformers
  • Multi-label support for complex emails

πŸ“„ Smart Summarization

  • Extractive summarization using NLTK, spaCy, Sumy
  • Key phrase extraction for quick insights
  • Configurable length based on your needs
  • Context preservation for accurate summaries

🎭 Emotion Analysis

  • Sentiment detection with confidence scores
  • Tone analysis for communication insights
  • Enhancement suggestions for better phrasing
  • Emotional context understanding

πŸ€– AI Response Generation

  • Ollama integration for intelligent replies
  • Context-aware responses based on email content
  • Multiple LLM options (Llama, CodeLlama, Mistral)
  • Customizable tone and style

⚑ Performance Optimized

  • GPU acceleration with full CUDA support
  • Apple Silicon (MPS) compatibility
  • Intelligent fallback to CPU when needed
  • Efficient caching for faster processing

πŸ’» Cross-Platform

  • Desktop app via Tauri framework
  • Web interface for browser-based usage
  • Docker support for easy deployment
  • Unix & Windows compatibility

🏷️ Classification System

Our advanced classification system categorizes emails into 10 distinct categories using state-of-the-art transformer models:

🎯 Category πŸ“ Description πŸ” Examples
🎯 Strategic Planning Long-term strategy, acquisitions, corporate planning "Q4 merger discussion", "5-year growth plan"
βš™οΈ Daily Operations Routine tasks, operational procedures "Daily status update", "Process changes"
πŸ’° Financial Budget, accounting, expense reports "Monthly P&L", "Budget approval"
βš–οΈ Legal & Compliance Legal matters, regulatory compliance "Contract review", "Compliance audit"
🀝 Client & External External communications, partnerships "Client meeting", "Vendor negotiation"
πŸ‘₯ HR & Personnel Human resources, hiring matters "New hire onboarding", "Performance review"
πŸ—“οΈ Meetings & Events Scheduling, event planning "Board meeting agenda", "Conference planning"
🚨 Urgent & Critical Time-sensitive, emergency issues "System outage", "Urgent approval needed"
πŸ’¬ Personal & Informal Personal communications, informal chats "Lunch plans", "Weekend discussion"
πŸ”§ Technical & IT Technical support, system issues "Server maintenance", "Software bug report"

πŸš€ Quick Start Guide

πŸ“‹ Prerequisites

πŸ› οΈ Tool πŸ“Œ Version πŸ“ Notes
Node.js & npm v18+ Download here
Docker Latest Get Docker
Rust Latest Required for Tauri builds
Python 3.10+ For local Flask development
Ollama Latest Required - Install Ollama

🎯 One-Command Setup

🐧 Unix Systems

Linux, macOS, WSL

# Complete setup in one command
./bin/enron_classifier.sh

That's it! ✨ The script handles everything:

  • πŸ“₯ Downloads Enron dataset
  • πŸ—„οΈ Builds SQLite database
  • 🎨 Installs frontend dependencies
  • πŸ€– Sets up Ollama models
  • πŸš€ Starts Flask API
  • πŸ’» Launches desktop app

πŸͺŸ Windows

Docker-based setup

# 1. Download dataset
./bin/download_enron.cmd

# 2. Generate database
./bin/generate_db.cmd

# 3. Start services
docker compose up --build

# 4. Launch desktop app
npm --prefix ./apps/enron_classifier run tauri dev

πŸ€– Ollama Setup

AI Response Generation Setup

For AI-powered email response generation, you need to install and configure Ollama:

# 1. Install Ollama from https://ollama.com

# 2. Pull a recommended model (choose one)
ollama pull llama3.2:3b        # Lightweight, fast responses
ollama pull llama3.1:8b        # Balanced performance
ollama pull codellama:7b       # For technical emails
ollama pull mistral            # Default model

# 3. Verify installation
ollama list

By default, the shell script ./bin/enron_classifier.sh checks whether the default model (configured in the app) is installed using ollama list. If the model is missing, the script automatically pulls it using ollama pull.

You can change the default model by editing the file:

// apps/enron_classifier/src/config.js

// API configuration
export const API_URL = 'http://localhost:5050/api';

// Timeout configuration (in milliseconds)
export const API_TIMEOUT = 240000; // 4 minutes

// Model configuration
export const DEFAULT_MODEL = 'mistral';
export const DEFAULT_TEMPERATURE = 0.7;

⚠️ Important: The application uses only the configured default model defined in config.js for response generation. It does not automatically select other models even if they're installed.


πŸ—οΈ Architecture Overview

graph TB
    subgraph "Frontend Layer"
        A[React + JavaScript UI]
        B[Tauri Desktop Wrapper]
        C[Web Application]
    end

    subgraph "API Layer"
        D[Flask REST API]
        E[Advanced NLP Pipeline]
        F[Transformer Models]
    end

    subgraph "Data Layer"
        G[SQLite Database]
        H[Enron Email Dataset]
    end

    subgraph "AI Services"
        I[Ollama Server]
        J[LLM Models]
    end

    A --> D
    B --> A
    C --> A
    D --> E
    E --> F
    D --> G
    G --> H
    D --> I
    I --> J

    style A fill:#61DAFB,stroke:#21325B,color:#000
    style D fill:#000000,stroke:#FFFFFF,color:#fff
    style F fill:#FFD21E,stroke:#FF6B35,color:#000
    style I fill:#00D4AA,stroke:#007A5E,color:#000
Loading

πŸ”§ Technology Stack

Layer Technologies Purpose
🎨 Frontend React 18, JavaScript, Tailwind CSS, Framer Motion Modern, responsive user interface
πŸ–₯️ Desktop Tauri, Rust Cross-platform desktop application
πŸ”— API Flask 3.1, Python 3.10+ RESTful backend services
🧠 ML/NLP Transformers, BART, DistilBERT, NLTK, spaCy Advanced language processing
πŸ—„οΈ Database SQLite Efficient email data storage
πŸ€– AI Ollama, Llama models, Mistral Intelligent response generation

πŸ› οΈ Development Setup

πŸ“ Project Structure

NLP_project/
β”œβ”€β”€ πŸ“± apps/
β”‚   β”œβ”€β”€ 🐍 flask_api/              # Backend API
β”‚   β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”‚   β”œβ”€β”€ πŸ›£οΈ routes/         # API endpoints
β”‚   β”‚   β”‚   β”œβ”€β”€ πŸ”§ services/       # ML/NLP services
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 🧠 enron_classifier.py
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 🎭 emotion_enhancer.py
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ πŸ“„ summarizer.py
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ πŸ€– responder.py
β”‚   β”‚   β”‚   β”‚   └── 🏷️ ner_engine.py
β”‚   β”‚   β”‚   └── πŸ§ͺ tests/
β”‚   β”‚   └── πŸ“Š models/
β”‚   β”œβ”€β”€ βš›οΈ enron_classifier/       # Frontend React app
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ 🧩 components/
β”‚   β”‚   β”‚   β”œβ”€β”€ πŸ”„ contexts/
β”‚   β”‚   β”‚   β”œβ”€β”€ πŸͺ hooks/
β”‚   β”‚   β”‚   └── πŸ“± pages/
β”‚   β”‚   └── πŸ¦€ src-tauri/         # Tauri config
β”‚   └── πŸ—„οΈ SQLite_db/             # Database generation
β”œβ”€β”€ πŸ“œ bin/                       # Setup scripts
β”œβ”€β”€ 🐳 docker-compose.yml
└── πŸ“– README.md

πŸš€ Development Commands

Command Description Platform
./bin/enron_classifier.sh 🎯 Complete setup Unix
./bin/enron_classifier.sh --api-only πŸ”§ API development Unix
./bin/enron_classifier.sh --frontend-only 🎨 Frontend development Unix
docker compose up --build 🐳 Docker development All
npm run tauri dev πŸ–₯️ Desktop app development All
npm run dev 🌐 Web app development All

πŸ”— API Endpoints

πŸ›£οΈ Endpoint πŸ“ Method 🎯 Purpose πŸ“Š Input
/classify POST Email classification Email text
/summarize POST Text summarization Email content
/emotion-enhance POST Emotion analysis Email text
/respond POST AI response generation Email context
/users GET List Enron users None
/users/<id>/emails GET User's emails User ID

⚑ Performance & Optimization

🎯 Hardware Acceleration

πŸ–₯️ Platform πŸš€ Acceleration πŸ“ˆ Performance πŸ› οΈ Setup Command
🐧 Linux + NVIDIA Full CUDA ⭐⭐⭐⭐⭐ Excellent ./bin/enron_classifier.sh
πŸͺŸ Windows + NVIDIA Full CUDA ⭐⭐⭐⭐⭐ Excellent Docker setup
🍎 macOS (Apple Silicon) Limited MPS ⭐⭐⭐⭐ Good ./bin/enron_classifier.sh
🍎 macOS (Intel) CPU only ⭐⭐⭐ Good ./bin/enron_classifier.sh
🐧 WSL CPU/CUDA ⭐⭐⭐⭐ Good ./bin/enron_classifier.sh

πŸ’‘ Performance Tips

  • πŸš€ GPU Acceleration: Use NVIDIA GPUs for best performance
  • 🍎 Apple Silicon: Native setup recommended over Docker
  • πŸ’Ύ Memory: 8GB+ RAM recommended for large datasets
  • πŸ”„ Caching: Models are cached after first load

πŸ“Š Model Training Note

The classifier uses a fast, transformer-based zero-shot labeling system to assign initial labels to each email (based on BART embeddings or sentence embeddings + cosine similarity).
This means even the very first training round generates labels automatically, but for best accuracy you may want to retrain on more data.

By default, the model is trained on a 100k email sample (~20 minutes on modern hardware).
For best results, we recommend training on the full dataset (~500k emails), which may take around 1.5–2 hours depending on your CPU or GPU.

You can retrain easily using the built-in API:

curl -i -X POST http://localhost:5050/api/classify/train \
 -H "Content-Type: application/json" \
 -d '{"enron_dir": "../SQLite_db/enron.db", "max_emails": 500000}'

or, for Docker users:

curl -i -X POST http://localhost:5050/api/classify/train \
-H "Content-Type: application/json" \
-d '{"enron_dir": "app/data/enron.db", "max_emails": 500000}'

⚠️ Note: The initial label generation step can take time (e.g. ~10 minutes per 10k emails on CPU), but it runs automatically as part of the training process β€” no manual labeling required!

πŸ§ͺ Running the Enron Email Classifier Test Suite

This test suite evaluates the Enron Email Classifier on a small subset of emails to ensure the training pipeline, model creation, and classification process work correctly.

πŸ“ What the Test Does:

  • Loads 3,000 emails from the Enron dataset.
  • Uses the zero-shot labeler (embedding-based or BART) to generate initial labels for these emails.
  • Trains a fresh model from scratch on this small dataset using an ensemble classifier.
  • Evaluates performance on a 600-email test set and generates:
    • A confusion matrix heatmap
    • A detailed classification report
    • A CSV file with per-email predictions and confidences.
  • Saves all results in the test_results/ directory.

⚑ Why Such a Small Dataset?

  • Training on the full dataset (~500k emails) would take hours, even on a powerful machine.
  • This test uses only 3,000 emails to keep the test fast and manageableβ€”roughly 10–15 seconds to run.
  • Performance on this small test set is expected to be low (accuracy ~10–20%) due to the limited data.

πŸš€ How to Run It:

python3 -m app.tests.classification_test

Note: The test always trains a new model from scratch on the 3,000-email sample. For production, train the model on a larger dataset (100k+ emails) using the training endpoint (recommended).

πŸ› οΈ Troubleshooting

πŸ”§ Common Issues & Solutions

🐧 Unix Setup Issues

# Make script executable
chmod +x ./bin/enron_classifier.sh

# Run with debug output
bash -x ./bin/enron_classifier.sh

🐳 Docker Issues

# Reset Docker environment
docker compose down --volumes
docker compose up --build --force-recreate

πŸ€– Ollama Connection Issues

# Check Ollama status
ollama list
curl http://localhost:11434/api/tags

# Restart Ollama service
ollama serve

πŸ“¦ Node.js Issues

# Clear and reinstall dependencies
npm cache clean --force
rm -rf node_modules package-lock.json
npm install

🀝 Contributing

🌟 We welcome contributions!

Contributions Welcome

How to contribute:

  1. 🍴 Fork the repository
  2. 🌿 Create your feature branch (git checkout -b feature/amazing-feature)
  3. ✨ Commit your changes (git commit -m 'Add amazing feature')
  4. πŸš€ Push to the branch (git push origin feature/amazing-feature)
  5. 🎯 Open a Pull Request

Contribution areas:

  • 🧠 ML model improvements
  • 🎨 UI/UX enhancements
  • πŸ“š Documentation updates
  • πŸ§ͺ Test coverage expansion
  • πŸ› Bug fixes

πŸ‘₯ Team

Role Focus Expertise
πŸš€ Lead Developer Architecture & Integration Full-stack, ML Pipeline
🧠 ML Engineer NLP & AI Models Transformers, GPU Optimization
🎨 Frontend Developer UI/UX & React JavaScript, Modern Web
🐍 Backend Developer API & Services Flask, Python, Databases
πŸ“Š Data Scientist Analytics & Insights Statistics, Visualization

πŸ“œ License

License: MIT

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

Built with ❀️ and powered by:

Hugging Face Ollama Enron Corpus

Special thanks to the open-source community and research institutions that make advanced NLP accessible to everyone.


πŸŽ‰ Ready to revolutionize email analysis?


πŸš€ Get Started Now

πŸ“§ Transform emails into insights with AI! πŸ“§

About

natural language processing applied

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6