Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions examples/nlp_and_llms/cpu-small-nlp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Small Transformer Inference (CPU Baseline)

This template implements a **high-efficiency CPU inference** workflow for Natural Language Processing (NLP). It uses **DistilBERT**, a smaller, faster version of BERT, and demonstrates how to further optimize it using **Dynamic Quantization** to achieve production-grade performance without GPUs.

**Infrastructure:** [Saturn Cloud](https://saturncloud.io/)
**Resource:** Jupyter Notebook
**Hardware:** CPU
**Tech Stack:** PyTorch, Hugging Face Transformers, Scikit-Learn

---

## 📖 Overview

Deploying massive Large Language Models (LLMs) often requires expensive GPUs. However, for specific enterprise tasks like **Sentiment Analysis** or **Named Entity Recognition (NER)**, smaller "distilled" transformers running on standard CPUs are often sufficient, faster, and significantly cheaper.

This template provides a **CPU-optimized baseline**:
1. **Sentiment Analysis:** Using `distilbert-base-uncased`.
2. **Named Entity Recognition (NER):** Using `distilbert-base-cased`.
3. **Optimization:** Applies PyTorch **Dynamic Quantization** to boost inference speed by ~2x and reduce memory usage by ~40%.

---

## 🚀 Quick Start

### 1. Workflow

1. Open **`small_transformer_cpu.ipynb`** in the Jupyter interface.
2. **Run All Cells**:
* **Install:** Sets up `transformers` and `torch` in the current environment.
* **Download:** Fetches the public DistilBERT model (no login required).
* **Benchmark (FP32):** Measures the baseline latency of the standard 32-bit floating point model.
* **Quantize (INT8):** Converts the model weights to 8-bit integers on the fly.
* **Compare:** Validates the speedup (typically **1.5x - 2.0x faster**).

---

## 🧠 Architecture: "Distill & Quantize"

We use a two-step optimization strategy to ensure the model runs efficiently on commodity hardware.

### 1. Distillation

We use **DistilBERT**, which acts as a student model trained to mimic the behavior of the larger BERT model.

* **40% fewer parameters** than BERT.
* **60% faster** inference.
* **97% retained accuracy** on standard benchmarks.

### 2. Dynamic Quantization

Standard models store weights as 32-bit floating point numbers (FP32). This template uses **Dynamic Quantization** to convert the linear layer weights to **8-bit integers (INT8)**.

* **Size Reduction:** The model file shrinks by ~40% (e.g., 255MB → 130MB).
* **Speedup:** CPUs can process 8-bit integer math significantly faster than 32-bit float math, resulting in lower latency per request.

---

## 🏁 Conclusion

This template proves that you don't always need a GPU for NLP. For targeted tasks, a quantized DistilBERT on a modern CPU can handle hundreds of requests per second with minimal cost.

To scale this solution—for example, processing millions of documents or deploying this as a serverless API—consider moving this workload to a [Saturn Cloud](https://saturncloud.io/) CPU cluster.

```

34 changes: 34 additions & 0 deletions examples/nlp_and_llms/cpu-small-nlp/setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash
set -e

GREEN='\033[0;32m'
NC='\033[0m'

echo -e "${GREEN}🚀 Starting Small Transformer Setup...${NC}"

# 1. Robust Python Detection
if command -v python3 &> /dev/null; then
PY_CMD="python3"
elif command -v python &> /dev/null; then
PY_CMD="python"
else
echo "❌ Error: Could not find 'python3' or 'python' in your PATH."
exit 1
fi

# 2. Create Virtual Environment
echo "📦 Creating Virtual Environment 'venv'..."
$PY_CMD -m venv venv

# 3. Install Dependencies
echo "⬇️ Installing libraries..."
. venv/bin/activate
pip install --upgrade pip
# Core stack: PyTorch (CPU), Transformers (Hugging Face), Scikit-Learn (Metrics)
pip install torch transformers scikit-learn numpy pandas

echo -e "${GREEN}✅ Environment Ready!${NC}"
echo "-------------------------------------------------------"
echo "To generate the notebook:"
echo " $PY_CMD generate_notebook.py"
echo "-------------------------------------------------------"
Loading
Loading