diff --git a/examples/nlp_and_llms/cpu-small-nlp/README.md b/examples/nlp_and_llms/cpu-small-nlp/README.md
new file mode 100644
index 00000000..91421224
--- /dev/null
+++ b/examples/nlp_and_llms/cpu-small-nlp/README.md
@@ -0,0 +1,65 @@
+# Small Transformer Inference (CPU Baseline)
+
+This template implements a **high-efficiency CPU inference** workflow for Natural Language Processing (NLP). It uses **DistilBERT**, a smaller, faster version of BERT, and demonstrates how to further optimize it using **Dynamic Quantization** to achieve production-grade performance without GPUs.
+
+**Infrastructure:** [Saturn Cloud](https://saturncloud.io/)
+**Resource:** Jupyter Notebook
+**Hardware:** CPU
+**Tech Stack:** PyTorch, Hugging Face Transformers, Scikit-Learn
+
+---
+
+## 📖 Overview
+
+Deploying massive Large Language Models (LLMs) often requires expensive GPUs. However, for specific enterprise tasks like **Sentiment Analysis** or **Named Entity Recognition (NER)**, smaller "distilled" transformers running on standard CPUs are often sufficient, faster, and significantly cheaper.
+
+This template provides a **CPU-optimized baseline**:
+1.  **Sentiment Analysis:** Using `distilbert-base-uncased`.
+2.  **Named Entity Recognition (NER):** Using `distilbert-base-cased`.
+3.  **Optimization:** Applies PyTorch **Dynamic Quantization** to boost inference speed by ~2x and reduce memory usage by ~40%.
+
+---
+
+## 🚀 Quick Start
+
+### 1. Workflow
+
+1. Open **`small_transformer_cpu.ipynb`** in the Jupyter interface.
+2. **Run All Cells**:
+* **Install:** Sets up `transformers` and `torch` in the current environment.
+* **Download:** Fetches the public DistilBERT model (no login required).
+* **Benchmark (FP32):** Measures the baseline latency of the standard 32-bit floating point model.
+* **Quantize (INT8):** Converts the model weights to 8-bit integers on the fly.
+* **Compare:** Validates the speedup (typically **1.5x - 2.0x faster**).
+
+---
+
+## 🧠 Architecture: "Distill & Quantize"
+
+We use a two-step optimization strategy to ensure the model runs efficiently on commodity hardware.
+
+### 1. Distillation
+
+We use **DistilBERT**, which acts as a student model trained to mimic the behavior of the larger BERT model.
+
+* **40% fewer parameters** than BERT.
+* **60% faster** inference.
+* **97% retained accuracy** on standard benchmarks.
+
+### 2. Dynamic Quantization
+
+Standard models store weights as 32-bit floating point numbers (FP32). This template uses **Dynamic Quantization** to convert the linear layer weights to **8-bit integers (INT8)**.
+
+* **Size Reduction:** The model file shrinks by ~40% (e.g., 255MB → 130MB).
+* **Speedup:** CPUs can process 8-bit integer math significantly faster than 32-bit float math, resulting in lower latency per request.
+
+---
+
+## 🏁 Conclusion
+
+This template proves that you don't always need a GPU for NLP. For targeted tasks, a quantized DistilBERT on a modern CPU can handle hundreds of requests per second with minimal cost.
+
+To scale this solution—for example, processing millions of documents or deploying this as a serverless API—consider moving this workload to a [Saturn Cloud](https://saturncloud.io/) CPU cluster.
+
+```
+
diff --git a/examples/nlp_and_llms/cpu-small-nlp/setup.sh b/examples/nlp_and_llms/cpu-small-nlp/setup.sh
new file mode 100755
index 00000000..be732959
--- /dev/null
+++ b/examples/nlp_and_llms/cpu-small-nlp/setup.sh
@@ -0,0 +1,34 @@
+#!/bin/bash
+set -e
+
+GREEN='\033[0;32m'
+NC='\033[0m'
+
+echo -e "${GREEN}🚀 Starting Small Transformer Setup...${NC}"
+
+# 1. Robust Python Detection
+if command -v python3 &> /dev/null; then
+    PY_CMD="python3"
+elif command -v python &> /dev/null; then
+    PY_CMD="python"
+else
+    echo "❌ Error: Could not find 'python3' or 'python' in your PATH."
+    exit 1
+fi
+
+# 2. Create Virtual Environment
+echo "📦 Creating Virtual Environment 'venv'..."
+$PY_CMD -m venv venv
+
+# 3. Install Dependencies
+echo "⬇️  Installing libraries..."
+. venv/bin/activate
+pip install --upgrade pip
+# Core stack: PyTorch (CPU), Transformers (Hugging Face), Scikit-Learn (Metrics)
+pip install torch transformers scikit-learn numpy pandas
+
+echo -e "${GREEN}✅ Environment Ready!${NC}"
+echo "-------------------------------------------------------"
+echo "To generate the notebook:"
+echo "   $PY_CMD generate_notebook.py"
+echo "-------------------------------------------------------"
\ No newline at end of file
diff --git a/examples/nlp_and_llms/cpu-small-nlp/small_transformer_cpu.ipynb b/examples/nlp_and_llms/cpu-small-nlp/small_transformer_cpu.ipynb
new file mode 100644
index 00000000..2bdaa87e
--- /dev/null
+++ b/examples/nlp_and_llms/cpu-small-nlp/small_transformer_cpu.ipynb
@@ -0,0 +1,449 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# ⚡ Small Transformer Inference (CPU Baseline)\n",
+    "\n",
+    "This notebook demonstrates how to achieve **high-performance inference** on a CPU using DistilBERT and **Dynamic Quantization**.\n",
+    "\n",
+    "**Tasks:**\n",
+    "1. **Sentiment Analysis**: Classifying text as Positive/Negative.\n",
+    "2. **NER**: Extracting entities (Names, Locations) from text.\n",
+    "3. **Optimization**: Quantizing the model to `int8`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 1. Install Dependencies\n",
+    "%pip install torch transformers numpy pandas\n",
+    "\n",
+    "import torch\n",
+    "import time\n",
+    "import os\n",
+    "import pandas as pd\n",
+    "from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline\n",
+    "\n",
+    "# 🧠 CPU Optimization: Control Threads\n",
+    "# Setting this to the number of physical cores is usually best for latency.\n",
+    "torch.set_num_threads(os.cpu_count())\n",
+    "print(f\"✅ Threads set to: {torch.get_num_threads()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Sentiment Analysis Baseline (FP32)\n",
+    "We load `distilbert-base-uncased-finetuned-sst-2-english`. It is a standard baseline for sentiment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "⬇️  Downloading distilbert-base-uncased-finetuned-sst-2-english...\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "e7d674d4cb61417784587c5d248c1735",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Loading weights:   0%|          | 0/104 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "3246f296053a49e6bdece9d6292408ab",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ Loaded & Saved to './model_fp32' (Size: 255.4 MB)\n"
+     ]
+    }
+   ],
+   "source": [
+    "MODEL_NAME = \"distilbert-base-uncased-finetuned-sst-2-english\"\n",
+    "\n",
+    "# Load Model & Tokenizer\n",
+    "print(f\"⬇️  Downloading {MODEL_NAME}...\")\n",
+    "tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\n",
+    "model_fp32 = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)\n",
+    "\n",
+    "# Save model locally so we can accurately measure its size\n",
+    "model_fp32.save_pretrained(\"./model_fp32\")\n",
+    "tokenizer.save_pretrained(\"./model_fp32\")\n",
+    "\n",
+    "# FIX: Check for either .bin (standard) or .safetensors (newer default)\n",
+    "if os.path.exists(\"./model_fp32/pytorch_model.bin\"):\n",
+    "    weights_path = \"./model_fp32/pytorch_model.bin\"\n",
+    "elif os.path.exists(\"./model_fp32/model.safetensors\"):\n",
+    "    weights_path = \"./model_fp32/model.safetensors\"\n",
+    "else:\n",
+    "    raise FileNotFoundError(\"Could not find model weights file (.bin or .safetensors)\")\n",
+    "\n",
+    "file_size = os.path.getsize(weights_path) / 1024**2\n",
+    "print(f\"✅ Loaded & Saved to './model_fp32' (Size: {file_size:.1f} MB)\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "⏱️ Standard (FP32) Latency: 48.62 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Benchmark Function\n",
+    "def benchmark_model(model, text, steps=50):\n",
+    "    inputs = tokenizer(text, return_tensors=\"pt\")\n",
+    "    \n",
+    "    # Warmup\n",
+    "    for _ in range(5):\n",
+    "        _ = model(**inputs)\n",
+    "        \n",
+    "    # Timing\n",
+    "    start = time.time()\n",
+    "    for _ in range(steps):\n",
+    "        with torch.no_grad():\n",
+    "            _ = model(**inputs)\n",
+    "    end = time.time()\n",
+    "    \n",
+    "    avg_time = (end - start) / steps * 1000\n",
+    "    return avg_time\n",
+    "\n",
+    "sample_text = \"Saturn Cloud makes scaling machine learning workloads incredibly easy and efficient.\"\n",
+    "time_fp32 = benchmark_model(model_fp32, sample_text)\n",
+    "print(f\"⏱️ Standard (FP32) Latency: {time_fp32:.2f} ms\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Dynamic Quantization (INT8)\n",
+    "We use `torch.quantization.quantize_dynamic` to convert the Linear layers to 8-bit integers. This requires **no retraining**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/tmp/ipykernel_548476/1943256708.py:1: DeprecationWarning: torch.ao.quantization is deprecated and will be removed in 2.10. \n",
+      "For migrations of users: \n",
+      "1. Eager mode quantization (torch.ao.quantization.quantize, torch.ao.quantization.quantize_dynamic), please migrate to use torchao eager mode quantize_ API instead \n",
+      "2. FX graph mode quantization (torch.ao.quantization.quantize_fx.prepare_fx,torch.ao.quantization.quantize_fx.convert_fx, please migrate to use torchao pt2e quantization API instead (prepare_pt2e, convert_pt2e) \n",
+      "3. pt2e quantization has been migrated to torchao (https://github.com/pytorch/ao/tree/main/torchao/quantization/pt2e) \n",
+      "see https://github.com/pytorch/ao/issues/2259 for more details\n",
+      "  model_int8 = torch.quantization.quantize_dynamic(\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "📉 Quantized Model Size: 132.3 MB\n"
+     ]
+    }
+   ],
+   "source": [
+    "model_int8 = torch.quantization.quantize_dynamic(\n",
+    "    model_fp32, \n",
+    "    {torch.nn.Linear},  # We only quantize the heavy Linear layers\n",
+    "    dtype=torch.qint8\n",
+    ")\n",
+    "\n",
+    "# Verify size reduction\n",
+    "torch.save(model_int8.state_dict(), \"quantized_model.pt\")\n",
+    "size_int8 = os.path.getsize(\"quantized_model.pt\") / 1024**2\n",
+    "print(f\"📉 Quantized Model Size: {size_int8:.1f} MB\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "⏱️ Quantized (INT8) Latency: 33.94 ms\n",
+      "🚀 Speedup: 1.43x faster\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Benchmark Quantized Model\n",
+    "time_int8 = benchmark_model(model_int8, sample_text)\n",
+    "print(f\"⏱️ Quantized (INT8) Latency: {time_int8:.2f} ms\")\n",
+    "\n",
+    "speedup = time_fp32 / time_int8\n",
+    "print(f\"🚀 Speedup: {speedup:.2f}x faster\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. NER Task (Token Classification)\n",
+    "Switching tasks is as easy as changing the pipeline model. We use `dslim/bert-base-NER` (or a smaller DistilBERT variant if available) for Named Entity Recognition."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "e5663f382557412097f12a4f57f7987e",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "config.json:   0%|          | 0.00/829 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "88a1ac3c9cf1417d87519716f92c7f8f",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model.safetensors:   0%|          | 0.00/433M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "c69cb041d69e40fa9939dca257c715ef",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "BertForTokenClassification LOAD REPORT from: dslim/bert-base-NER\n",
+      "Key                      | Status     |  | \n",
+      "-------------------------+------------+--+-\n",
+      "bert.pooler.dense.weight | UNEXPECTED |  | \n",
+      "bert.pooler.dense.bias   | UNEXPECTED |  | \n",
+      "\n",
+      "Notes:\n",
+      "- UNEXPECTED\t:can be ignored when loading from different task/architecture; not ok if you expect identical arch.\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "bcee4695a5d74155bf790a06e0436e62",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tokenizer_config.json:   0%|          | 0.00/59.0 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "ca736329308444f988d4fae6c19d7639",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "vocab.txt: 0.00B [00:00, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "84025cc2ff414f9bb75179e246bfa13f",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "added_tokens.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "6bbe850ba1344b728fdc2dd67fbd8a91",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>word</th>\n",
+       "      <th>entity_group</th>\n",
+       "      <th>score</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Apple Inc</td>\n",
+       "      <td>ORG</td>\n",
+       "      <td>0.999508</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>Abuja</td>\n",
+       "      <td>LOC</td>\n",
+       "      <td>0.998583</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>Nigeria</td>\n",
+       "      <td>LOC</td>\n",
+       "      <td>0.999648</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "        word entity_group     score\n",
+       "0  Apple Inc          ORG  0.999508\n",
+       "1      Abuja          LOC  0.998583\n",
+       "2    Nigeria          LOC  0.999648"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "ner_pipe = pipeline(\"ner\", model=\"dslim/bert-base-NER\", aggregation_strategy=\"simple\")\n",
+    "\n",
+    "text = \"Apple Inc. is planning to open a new store in Abuja, Nigeria next month.\"\n",
+    "entities = ner_pipe(text)\n",
+    "\n",
+    "pd.DataFrame(entities)[['word', 'entity_group', 'score']]"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "cpu-plotly-env",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/examples/nlp_and_llms/nvidia-deepspeed/README.md b/examples/nlp_and_llms/nvidia-deepspeed/README.md
new file mode 100644
index 00000000..ea0fb3b3
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-deepspeed/README.md
@@ -0,0 +1,97 @@
+# DeepSpeed ZeRO-3 Training
+
+![Memory Sharding Icon](./memory-sharding.png)
+
+This template provides a robust environment for training large-scale Transformer models (like GPT-2 Large) using **DeepSpeed ZeRO-Stage 3**. By partitioning model parameters, gradients, and optimizer states across multiple GPUs, this setup overcomes the memory limitations of a single device.
+
+For more information on the underlying platform, visit the [Saturn Cloud Documentation](https://saturncloud.io/docs/).
+## 📂 Project Structure
+
+* **`setup_saturn.sh`**: Environment initialization script to install DeepSpeed and dependencies.
+* **`src/train_transformers.py`**: Main training script using Hugging Face `Trainer` and DeepSpeed.
+* **`ds_config_zero3.json`**: Configuration file for ZeRO-3 sharding and CPU offloading.
+* **`run_job.sh`**: Distributed training launcher script.
+* **`test_inference.py`**: Optimized generation script using DeepSpeed Inference kernels.
+
+---
+
+## 🚀 Complete Procedure
+
+### 1. Environment Setup
+
+Before running any code, you must initialize the virtual environment to install the necessary DeepSpeed and Torch libraries. Refer to the [Saturn Cloud Docs for Setup Guide](https://saturncloud.io/docs/) for advanced configuration.
+
+```bash
+chmod +x setup_saturn.sh
+./setup_saturn.sh
+
+```
+
+### 2. Hardware Preparation
+
+To prevent filesystem errors during kernel compilation on Saturn Cloud's distributed architecture, create the Triton autotune directory:
+
+```bash
+mkdir -p /root/.triton/autotune
+
+```
+
+### 3. Training Execution
+
+Launch the training process across your GPUs using the provided job script:
+
+```bash
+./run_job.sh
+
+```
+
+* **The "Silent Phase"**: Note that ZeRO-3 requires a period of "silence" (usually 2-5 minutes for GPT-2) while it shards the model parameters before the first step appears.
+* **Automatic Consolidation**: The script is configured to automatically gather sharded 16-bit weights into a single `model.safetensors` file upon saving.
+
+### 4. Inference Testing
+
+After training completes and a checkpoint folder (e.g., `checkpoint-65`) is created, run the optimized inference test.
+
+**Update `test_inference.py`:**
+Ensure the `model_path` variable matches your checkpoint folder:
+
+```python
+model_path = "./checkpoints/checkpoint-65"
+
+```
+
+**Launch Inference:**
+Ensure the virtual environment is activated and then run the python test script:
+
+```bash
+python test_inference.py
+
+```
+
+---
+
+## 🛠️ Key Configurations
+
+### ZeRO-3 Optimization (`ds_config_zero3.json`)
+
+* **`stage3_gather_16bit_weights_on_model_save`**: Set to `true` to ensure your checkpoints are saved in a standard format for easy testing.
+* **`overlap_comm`**: Set to `false` in this template to maximize stability and prevent deadlocks on virtualized interconnects.
+
+### Training Stability (`src/train_transformers.py`)
+
+* **NCCL Flags**: The script forces `NCCL_P2P_DISABLE=1` to ensure reliable communication on cloud-based GPU clusters.
+* **Data Collator**: Uses `DataCollatorForLanguageModeling` to handle padding and ensure uniform tensor shapes during training, preventing "ValueError" crashes.
+
+---
+
+## 📈 Scaling Guide
+
+To scale from the verified test to a production-level run:
+
+1. **Model**: Change `model_id` to `"gpt2-large"` in `src/train_transformers.py`.
+2. **Dataset**: Remove the `[:1%]` slice to train on the full dataset.
+3. **Sequence Length**: Increase `max_length` to `512` or `1024` in the `tokenize_function`.
+
+For more community support, visit the [Saturn Cloud Community Slack](https://www.google.com/search?q=https://saturncloud.io/community/).
+
+---
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-deepspeed/ds_config_zero3.json b/examples/nlp_and_llms/nvidia-deepspeed/ds_config_zero3.json
new file mode 100644
index 00000000..5b1cfd29
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-deepspeed/ds_config_zero3.json
@@ -0,0 +1,20 @@
+{
+    "fp16": {
+        "enabled": "auto"
+    },
+    "zero_optimization": {
+        "stage": 3,
+        "overlap_comm": false,
+        "contiguous_gradients": true,
+        "sub_group_size": 1e6,
+        "reduce_bucket_size": 1e6,
+        "stage3_prefetch_bucket_size": 1e6,
+        "stage3_param_persistence_threshold": 1e5,
+        "stage3_max_live_parameters": 1e8,
+        "stage3_max_reuse_distance": 1e8,
+        "stage3_gather_16bit_weights_on_model_save": true
+    },
+    "train_micro_batch_size_per_gpu": "auto",
+    "gradient_accumulation_steps": "auto",
+    "steps_per_print": 10
+}
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-deepspeed/memory-sharding.png b/examples/nlp_and_llms/nvidia-deepspeed/memory-sharding.png
new file mode 100644
index 00000000..474b4caa
Binary files /dev/null and b/examples/nlp_and_llms/nvidia-deepspeed/memory-sharding.png differ
diff --git a/examples/nlp_and_llms/nvidia-deepspeed/run_job.sh b/examples/nlp_and_llms/nvidia-deepspeed/run_job.sh
new file mode 100644
index 00000000..1a3a153b
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-deepspeed/run_job.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+source virt-env/bin/activate
+# Automatically uses all detected GPUs for ZeRO-3 sharding
+deepspeed src/train_transformers.py
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-deepspeed/setup_saturn.sh b/examples/nlp_and_llms/nvidia-deepspeed/setup_saturn.sh
new file mode 100644
index 00000000..d1883933
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-deepspeed/setup_saturn.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+# 1. Update system and install virtual environment tools
+apt-get update && apt-get install -y python3-venv python3-pip ninja-build
+
+# 2. Create and activate the virtual environment
+python3 -m venv virt-env
+source virt-env/bin/activate
+
+# 3. Install core dependencies
+# We recommend installing deepspeed from source for best hardware matching
+pip install --upgrade pip
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
+pip install "transformers[deepspeed]>=4.31.0" datasets accelerate tqdm
+
+# 4. Optional: Install DeepSpeed with optimized ops
+# This step builds the C++/CUDA extensions required for high performance
+DS_BUILD_OPS=1 pip install deepspeed
+
+# 5. Pre-cache dataset to prevent network timeouts during training
+echo "📦 Pre-caching WikiText-2 dataset..."
+python3 -c "from datasets import load_dataset; load_dataset('wikitext', 'wikitext-2-raw-v1', cache_dir='./data')"
+
+echo "✅ Saturn Cloud Environment Setup Complete."
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-deepspeed/src/train_transformerrs.py b/examples/nlp_and_llms/nvidia-deepspeed/src/train_transformerrs.py
new file mode 100644
index 00000000..b2111e3a
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-deepspeed/src/train_transformerrs.py
@@ -0,0 +1,71 @@
+import os
+import datetime
+import torch
+import torch.distributed as dist
+from transformers import (
+    AutoModelForCausalLM, 
+    AutoTokenizer, 
+    TrainingArguments, 
+    Trainer,
+    DataCollatorForLanguageModeling 
+)
+from datasets import load_dataset
+
+# Force NCCL stability on cloud instances
+os.environ["NCCL_P2P_DISABLE"] = "1"
+os.environ["NCCL_IB_DISABLE"] = "1"
+
+def main():
+    if not dist.is_initialized():
+        dist.init_process_group(backend="nccl", timeout=datetime.timedelta(minutes=10))
+
+    model_id = "gpt2"
+    tokenizer = AutoTokenizer.from_pretrained(model_id)
+    tokenizer.pad_token = tokenizer.eos_token
+
+    # 1. Load tiny dataset
+    dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="train[:1%]")
+    
+    # 2. Tokenize function with padding and truncation
+    def tokenize_function(examples):
+        return tokenizer(
+            examples["text"], 
+            truncation=True, 
+            max_length=128, # Keep small for fast test
+            padding="max_length"
+        )
+
+    # 3. Prepare data (Filter out empty rows to avoid errors)
+    dataset = dataset.filter(lambda x: len(x["text"]) > 5)
+    tokenized_ds = dataset.map(tokenize_function, batched=True, remove_columns=dataset.column_names)
+
+    # 4. Data Collator 
+    data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
+
+    training_args = TrainingArguments(
+        output_dir="./checkpoints",
+        per_device_train_batch_size=2,
+        num_train_epochs=1,
+        deepspeed="ds_config_zero3.json", 
+        fp16=True,
+        logging_steps=1,
+        report_to="none"
+    )
+
+    # 5. Load Model
+    model = AutoModelForCausalLM.from_pretrained(model_id)
+    model.config.use_cache = False 
+
+    # 6. Initialize Trainer
+    trainer = Trainer(
+        model=model,
+        args=training_args,
+        train_dataset=tokenized_ds,
+        data_collator=data_collator # ADDED THIS
+    )
+
+    print("🚀 Launching Final Verified ZeRO-3 Engine...")
+    trainer.train()
+
+if __name__ == "__main__":
+    main()
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-deepspeed/test_inference.py b/examples/nlp_and_llms/nvidia-deepspeed/test_inference.py
new file mode 100644
index 00000000..4cb7caec
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-deepspeed/test_inference.py
@@ -0,0 +1,27 @@
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
+
+def test_generation():
+    
+    model_path = "./checkpoints/checkpoint-65"
+    
+    print(f"📦 Loading model from {model_path}...")
+    tokenizer = AutoTokenizer.from_pretrained("gpt2")
+    model = AutoModelForCausalLM.from_pretrained(model_path)
+
+    # Move to GPU if available
+    device = 0 if torch.cuda.is_available() else -1
+    generator = pipeline("text-generation", model=model, tokenizer=tokenizer, device=device)
+
+    # Test prompt from the WikiText domain
+    prompt = "The phenomenon of distributed computing allows"
+    
+    print("🔮 Generating...")
+    output = generator(prompt, max_length=50, num_return_sequences=1, truncation=True)
+    
+    print("\n--- GENERATED TEXT ---")
+    print(output[0]['generated_text'])
+    print("----------------------")
+
+if __name__ == "__main__":
+    test_generation()
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-embeddings-api/FastAPI_Embeddings_Service.ipynb b/examples/nlp_and_llms/nvidia-embeddings-api/FastAPI_Embeddings_Service.ipynb
new file mode 100644
index 00000000..2a85703b
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-embeddings-api/FastAPI_Embeddings_Service.ipynb
@@ -0,0 +1,342 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "47a1df64",
+   "metadata": {},
+   "source": [
+    "# ⚡ FastAPI Embeddings Service (FAISS + Transformers)\n",
+    "A Jupyter notebook template demonstrating how to build a lightweight embeddings and semantic search API using FastAPI, FAISS, and Transformers — all running interactively within Saturn Cloud."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa72bf72",
+   "metadata": {},
+   "source": [
+    "## 🧠 Overview\n",
+    "This notebook walks you through building a FastAPI-based Embeddings Service that:\n",
+    "- Generates text embeddings using a Transformer model\n",
+    "- Stores them in a FAISS index for similarity search\n",
+    "- Exposes both embedding and search endpoints via FastAPI\n",
+    "\n",
+    "You’ll be able to:\n",
+    "- Add texts to the API dynamically\n",
+    "- Perform semantic similarity queries\n",
+    "- Test everything live inside a notebook\n",
+    "\n",
+    "This is perfect for quickly prototyping or demonstrating retrieval-based workflows on [Saturn Cloud](https://saturncloud.io/)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c13ddc05",
+   "metadata": {},
+   "source": [
+    "## ⚙️ 1. Install Dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a9029b37",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install torch transformers sentence-transformers faiss-cpu fastapi uvicorn[standard] pydantic requests numpy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff652f07",
+   "metadata": {},
+   "source": [
+    "## 🧩 2. Load Embedding Model and Initialize FAISS"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0c2641fe",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sentence_transformers import SentenceTransformer\n",
+    "import faiss\n",
+    "import numpy as np\n",
+    "\n",
+    "print('🔧 Loading embedding model...')\n",
+    "model = SentenceTransformer('all-MiniLM-L6-v2', device='cpu')\n",
+    "embedding_dim = model.get_sentence_embedding_dimension()\n",
+    "print(f'✅ Model loaded — Embedding dimension: {embedding_dim}')\n",
+    "\n",
+    "# Initialize FAISS (L2 distance)\n",
+    "index = faiss.IndexFlatL2(embedding_dim)\n",
+    "texts = []"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b298c16",
+   "metadata": {},
+   "source": [
+    "## 🧠 3. Define Core Embedding and Search Functions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ad6d70e3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def add_text(text: str):\n",
+    "    vector = model.encode([text])[0]\n",
+    "    index.add(np.array([vector]).astype('float32'))\n",
+    "    texts.append(text)\n",
+    "    return {'message': 'Text added successfully.', 'total_texts': len(texts)}\n",
+    "\n",
+    "def search_texts(query: str, top_k: int = 3):\n",
+    "    if len(texts) == 0:\n",
+    "        return {'error': 'No texts in index. Please add some first.'}\n",
+    "\n",
+    "    query_vector = model.encode([query])[0]\n",
+    "    D, I = index.search(np.array([query_vector]).astype('float32'), top_k)\n",
+    "    results = [{'text': texts[i], 'distance': float(D[0][j])} for j, i in enumerate(I[0])]\n",
+    "    return {'query': query, 'results': results}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "779df430",
+   "metadata": {},
+   "source": [
+    "## ⚡ 4. Create the FastAPI Application"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2797f8d8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from fastapi import FastAPI\n",
+    "from pydantic import BaseModel\n",
+    "\n",
+    "app = FastAPI(title='FastAPI Embeddings Service (Notebook Edition)')\n",
+    "\n",
+    "class TextIn(BaseModel):\n",
+    "    text: str\n",
+    "\n",
+    "class SearchQuery(BaseModel):\n",
+    "    query: str\n",
+    "    top_k: int = 3\n",
+    "\n",
+    "@app.post('/add_text')\n",
+    "def add_text_endpoint(item: TextIn):\n",
+    "    return add_text(item.text)\n",
+    "\n",
+    "@app.post('/search')\n",
+    "def search_endpoint(query: SearchQuery):\n",
+    "    return search_texts(query.query, query.top_k)\n",
+    "\n",
+    "@app.get('/healthz')\n",
+    "def healthz():\n",
+    "    return {'status': 'ok', 'count': len(texts)}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff37917c",
+   "metadata": {},
+   "source": [
+    "## 🌐 5. Run the API Server Inside Jupyter"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "45918e70",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import threading, time, requests, uvicorn\n",
+    "\n",
+    "PORT = 8002\n",
+    "_uvicorn_server = None\n",
+    "_uvicorn_thread = None\n",
+    "\n",
+    "def start_api_in_thread(host='0.0.0.0', port=PORT, log_level='info'):\n",
+    "    global _uvicorn_server, _uvicorn_thread\n",
+    "    if _uvicorn_thread and _uvicorn_thread.is_alive():\n",
+    "        print(f'ℹ️ Server already running at http://127.0.0.1:{port}')\n",
+    "        return _uvicorn_thread\n",
+    "\n",
+    "    config = uvicorn.Config(app, host=host, port=port, log_level=log_level)\n",
+    "    _uvicorn_server = uvicorn.Server(config)\n",
+    "\n",
+    "    def _run():\n",
+    "        _uvicorn_server.run()\n",
+    "\n",
+    "    _uvicorn_thread = threading.Thread(target=_run, daemon=True)\n",
+    "    _uvicorn_thread.start()\n",
+    "\n",
+    "    for _ in range(30):\n",
+    "        try:\n",
+    "            time.sleep(0.1)\n",
+    "            r = requests.get(f'http://127.0.0.1:{port}/healthz', timeout=0.25)\n",
+    "            if r.status_code == 200:\n",
+    "                print(f'🚀 FastAPI running at http://127.0.0.1:{port} (thread: {_uvicorn_thread.name})')\n",
+    "                return _uvicorn_thread\n",
+    "        except Exception:\n",
+    "            pass\n",
+    "    print('⚠️ Server thread started but not reachable yet.')\n",
+    "    return _uvicorn_thread\n",
+    "\n",
+    "def stop_api(join_timeout=5):\n",
+    "    global _uvicorn_server, _uvicorn_thread\n",
+    "    if _uvicorn_server is None or _uvicorn_thread is None:\n",
+    "        print('ℹ️ No server is currently running.')\n",
+    "        return\n",
+    "    _uvicorn_server.should_exit = True\n",
+    "    _uvicorn_thread.join(timeout=join_timeout)\n",
+    "    print('🛑 Server stopped.')\n",
+    "    _uvicorn_server = None\n",
+    "    _uvicorn_thread = None"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ea47f60",
+   "metadata": {},
+   "source": [
+    "## ▶️ 6. Start the FastAPI Service"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a8b07b0a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "start_api_in_thread()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f45ac59a",
+   "metadata": {},
+   "source": [
+    "## 🧪 7a. Test the API"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1d38b25c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "requests.post('http://127.0.0.1:8002/add_text', json={'text': 'The quick brown fox jumps over the lazy dog.'}).json()\n",
+    "requests.post('http://127.0.0.1:8002/search', json={'query': 'A fast brown animal jumps over a sleepy dog', 'top_k': 3}).json()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "370e5271",
+   "metadata": {},
+   "source": [
+    "## 🧪 7b. More Test the API (using more text)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "48f556a6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "\n",
+    "BASE_URL = \"http://127.0.0.1:8002\"\n",
+    "\n",
+    "# --- Add multiple sample texts ---\n",
+    "samples = [\n",
+    "    \"Artificial intelligence enables machines to learn from experience.\",\n",
+    "    \"Machine learning is a subset of artificial intelligence focused on pattern recognition.\",\n",
+    "    \"FastAPI is a modern, high-performance web framework for building APIs with Python.\",\n",
+    "    \"FAISS is a library for efficient similarity search and clustering of dense vectors.\",\n",
+    "    \"Deep learning models often require GPUs for accelerated computation.\",\n",
+    "    \"Natural language processing helps computers understand human language.\"\n",
+    "]\n",
+    "\n",
+    "for text in samples:\n",
+    "    res = requests.post(f\"{BASE_URL}/add_text\", json={\"text\": text})\n",
+    "    print(f\"📘 Added: {text[:50]}... -> {res.status_code}\")\n",
+    "\n",
+    "# --- Example 1: Semantic similarity query ---\n",
+    "query_1 = \"What library is used for fast vector similarity search?\"\n",
+    "result_1 = requests.post(f\"{BASE_URL}/search\", json={\"query\": query_1, \"top_k\": 3}).json()\n",
+    "print(f\"\\n🔍 Query: {query_1}\")\n",
+    "print(result_1)\n",
+    "\n",
+    "# --- Example 2: Conceptual link query ---\n",
+    "query_2 = \"How do computers understand human speech?\"\n",
+    "result_2 = requests.post(f\"{BASE_URL}/search\", json={\"query\": query_2, \"top_k\": 3}).json()\n",
+    "print(f\"\\n🔍 Query: {query_2}\")\n",
+    "print(result_2)\n",
+    "\n",
+    "# --- Example 3: Broader topic search ---\n",
+    "query_3 = \"Explain how AI learns from data\"\n",
+    "result_3 = requests.post(f\"{BASE_URL}/search\", json={\"query\": query_3, \"top_k\": 3}).json()\n",
+    "print(f\"\\n🔍 Query: {query_3}\")\n",
+    "print(result_3)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee19c2aa",
+   "metadata": {},
+   "source": [
+    "## ⏹️ 8. Stop the API"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ddbf0f70",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "stop_api()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12146134",
+   "metadata": {},
+   "source": [
+    "## 🏁 **Conclusion**\n",
+    "\n",
+    "You’ve built a lightweight **FastAPI Embeddings Service** that generates and searches text embeddings using **Transformers** and **FAISS** — all within **Saturn Cloud**.\n",
+    "\n",
+    "This template serves as a quick starting point for developing AI-powered APIs and retrieval systems.\n",
+    "You can extend it to support larger datasets, custom models, or integrate it into RAG pipelines directly in your Saturn workspace.\n",
+    "\n",
+    "**Built with ❤️ using**\n",
+    "🤗 **Transformers** | 🧮 **FAISS** | ⚡ **FastAPI** | ☁️ **[Saturn Cloud](https://saturncloud.io/)**\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/nlp_and_llms/nvidia-embeddings-api/README.md b/examples/nlp_and_llms/nvidia-embeddings-api/README.md
new file mode 100644
index 00000000..4fbb64a9
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-embeddings-api/README.md
@@ -0,0 +1,121 @@
+# ⚡ **FastAPI Embeddings Service (FAISS + Transformers)**
+
+*A Jupyter notebook example template demonstrating how to build a lightweight embeddings and semantic search API using FastAPI, FAISS, and Transformers — all running interactively within [Saturn Cloud](https://saturncloud.io/).*
+
+---
+This is perfect for quickly prototyping or demonstrating **retrieval-based workflows** on **[Saturn Cloud](https://saturncloud.io/)**.
+
+---
+
+## ⚙️ **1. Install Dependencies**
+
+> First, install all required Python libraries.
+> These packages handle embedding generation, FAISS indexing, and API serving.
+
+```python
+!pip install torch transformers sentence-transformers faiss-cpu fastapi uvicorn[standard] pydantic requests numpy
+```
+---
+
+## 🧩 **2. Load Embedding Model and Initialize FAISS**
+
+> We load a pre-trained SentenceTransformer model (`all-MiniLM-L6-v2`)
+> and initialize a FAISS index to store embeddings in memory.
+
+> FAISS (Facebook AI Similarity Search) provides an efficient vector index for fast nearest-neighbor queries.
+
+---
+
+## 🧠 **3. Define Core Embedding and Search Functions**
+
+> These helper functions form the “machine” behind the API:
+>
+> * `add_text()`: Encodes a new text and stores it in FAISS
+> * `search_texts()`: Finds similar texts to a given query
+
+---
+
+## ⚡ **4. Create the FastAPI Application**
+
+> Now we wrap the embedding and search logic into a FastAPI service.
+> It exposes three main endpoints:
+>
+> * `/add_text`: Add and embed new text
+> * `/search`: Retrieve similar texts
+> * `/healthz`: API health check
+
+---
+
+## 🌐 **5. Run the API Server Inside Jupyter**
+
+> Since Jupyter runs its own event loop, we launch Uvicorn in a **background thread**.
+
+---
+
+## ▶️ **6. Start the FastAPI Service**
+
+> Launch the service.
+> Once it starts, open your browser to **[http://127.0.0.1:8002/docs](http://127.0.0.1:8002/docs)** to explore the Swagger UI.
+
+```python
+start_api_in_thread()
+```
+
+---
+
+## 🧪 **7. Test the API**
+
+> We can interact with the service directly from the notebook using HTTP requests.
+> Try adding a text and then searching for semantically similar content.
+
+### ➕ Add Text
+
+```python
+requests.post("http://127.0.0.1:8002/add_text",
+              json={"text": "The quick brown fox jumps over the lazy dog."}).json()
+```
+
+### 🔍 Search Texts
+
+```python
+requests.post("http://127.0.0.1:8002/search",
+              json={"query": "A fast brown animal jumps over a sleepy dog", "top_k": 3}).json()
+```
+
+---
+
+## ⏹️ **8. Stop the API**
+
+> When you’re done testing, stop the running FastAPI service gracefully.
+
+```python
+stop_api()
+```
+
+---
+
+## ☁️ **9. Run This Template on Saturn Cloud**
+
+This notebook is designed for **[Saturn Cloud](https://saturncloud.io/)** — it runs entirely inside Jupyter, without needing an external process.
+
+**To deploy:**
+
+1. Create a new **Jupyter Server** resource on Saturn Cloud.
+2. Upload this notebook and install dependencies.
+3. Run the cells sequentially.
+4. Open **Port 8002** in your Saturn environment to access the running API.
+5. Use `/add_text` and `/search` to interact with your live embeddings service.
+
+🔗 Learn more at: [https://saturncloud.io/docs](https://saturncloud.io/docs)
+
+---
+
+## 🙌 **Credits**
+
+Built with ❤️ using:
+
+* 🤗 **Transformers**
+* 🧮 **FAISS**
+* ⚡ **FastAPI**
+* 🧠 **SentenceTransformers**
+* ☁️ **[Saturn Cloud](https://saturncloud.io/)**
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-fsdp/README.md b/examples/nlp_and_llms/nvidia-fsdp/README.md
new file mode 100644
index 00000000..ad3b3ed7
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-fsdp/README.md
@@ -0,0 +1,84 @@
+# 🚀 GPT-2 FSDP Training & Inference Template
+
+This repository provides a production-ready setup for fine-tuning GPT-2 on the WikiText-103 dataset using **Fully Sharded Data Parallel (FSDP)**. It is designed for researchers who need to validate models quickly without running exhaustive training cycles.
+
+### ✨ Key Features
+
+* **FSDP Optimization**: Efficiently shards model parameters across available GPUs.
+* **Automated Lifecycle**: Training automatically saves checkpoints every 100 steps and terminates after 3 saves to manage cloud costs.
+* **Dynamic Disk Management**: Automatically cleans up old checkpoints to prevent storage exhaustion.
+* **Robust Inference**: A dedicated validation script that identifies and loads the latest healthy checkpoint.
+
+---
+
+## 🛠️ 1. Setup and Environment
+
+Initialize your environment once per Saturn Cloud resource to install dependencies and cache the dataset.
+
+### Procedural Setup
+
+1. **Permissions**: Make the utility scripts executable.
+```bash
+chmod +x setup_saturn.sh run_job.sh
+
+```
+
+
+2. **Environment Build**: This script creates the `virt-env` and pre-downloads the WikiText shards to the local `./data` folder.
+```bash
+./setup_saturn.sh
+
+```
+
+
+
+---
+
+## 🏃 2. Running the Model
+
+### A. Distributed Training
+
+Launch the training process using the job runner. It automatically handles distributed initialization.
+
+```bash
+./run_job.sh
+
+```
+
+* **What to Expect**: The console will log Loss and VRAM metrics every 10 steps.
+* **Checkpointing**: Every 100 steps, a `.bin` file is saved to the `checkpoints/` directory.
+* **Auto-Termination**: The script exits gracefully after saving 3 checkpoints.
+
+### B. Validation Inference
+
+You can test the model as soon as the first checkpoint is saved.
+
+```bash
+source virt-env/bin/activate
+python test_inference.py
+
+```
+
+**Expected Result**:
+The script will load the latest `.bin` file and generate a text completion.
+
+> **Output example**:
+> `🔄 Attempting to load: checkpoints/gpt2_wikitext_epoch0_step100.bin`
+> `✅ Successfully loaded: checkpoints/gpt2_wikitext_epoch0_step100.bin`
+> `--- Result ---`
+> `The history of WikiText is a collection of high-quality articles used for benchmarking language models...`
+
+---
+
+## 📂 Project Structure
+
+* **`checkpoints/`**: Auto-managed directory for model weights.
+* **`src/train_fsdp.py`**: Core logic for sharding and training.
+* **`test_inference.py`**: Script for model weight validation.
+* **`data/`**: Local cache for the WikiText-103 dataset.
+
+---
+
+## 🌐 Powered by Saturn Cloud
+
+This template is built to run seamlessly on the **[Saturn Cloud Platform](https://saturncloud.io/)**. For advanced scaling, multi-node configurations, or persistent storage volumes, refer to the **[Saturn Cloud Documentation](https://saturncloud.io/docs/)**.
diff --git a/examples/nlp_and_llms/nvidia-fsdp/requirements.txt b/examples/nlp_and_llms/nvidia-fsdp/requirements.txt
new file mode 100644
index 00000000..3082157d
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-fsdp/requirements.txt
@@ -0,0 +1,6 @@
+torch>=2.0.0
+transformers>=4.31.0
+datasets>=2.12.0
+pyarrow
+tqdm
+accelerate
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-fsdp/run_job.sh b/examples/nlp_and_llms/nvidia-fsdp/run_job.sh
new file mode 100644
index 00000000..c0c20e5f
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-fsdp/run_job.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+# 1. Activate the isolated environment
+source virt-env/bin/activate
+
+# 2. Set defaults for RunPod environment
+# If SATURN_GPUS_PER_NODE is empty, use the number of GPUs detected by nvidia-smi
+DETECTOR_GPUS=$(nvidia-smi -L | wc -l)
+NUM_GPUS=${SATURN_GPUS_PER_NODE:-$DETECTOR_GPUS}
+
+# If SATURN variables are missing, default to a standalone single node
+NODE_RANK=${SATURN_NODE_RANK:-0}
+MASTER_ADDR=${SATURN_MASTER_ADDR:-"127.0.0.1"}
+NUM_NODES=${SATURN_NUM_NODES:-1}
+
+echo "🚀 Starting Distributed Training on RunPod..."
+echo "📊 GPUs detected/used: $NUM_GPUS"
+echo "🌐 Node Rank: $NODE_RANK"
+
+# 3. Launch with torchrun
+# Use --standalone for single-node RunPod testing
+torchrun \
+    --nproc_per_node=$NUM_GPUS \
+    --nnodes=$NUM_NODES \
+    --node_rank=$NODE_RANK \
+    --master_addr=$MASTER_ADDR \
+    --master_port=12355 \
+    src/train_fsdp.py
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-fsdp/setup_saturn.sh b/examples/nlp_and_llms/nvidia-fsdp/setup_saturn.sh
new file mode 100644
index 00000000..e5d92b41
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-fsdp/setup_saturn.sh
@@ -0,0 +1,17 @@
+#!/bin/bash
+# 1. Update system packages
+apt update && apt install -y python3-venv
+
+# 2. Create the virtual environment
+echo "🐍 Creating virtual environment..."
+python3 -m venv virt-env
+
+# 3. Activate the environment
+source virt-env/bin/activate
+
+# 4. Install high-performance libraries
+echo "🚀 Installing dependencies..."
+pip install --upgrade pip
+pip install -r requirements.txt
+
+echo "✅ Environment ready."
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-fsdp/src/train_fsdp.py b/examples/nlp_and_llms/nvidia-fsdp/src/train_fsdp.py
new file mode 100644
index 00000000..1accaaac
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-fsdp/src/train_fsdp.py
@@ -0,0 +1,146 @@
+import os
+import sys
+import time
+import functools
+import torch
+import torch.distributed as dist
+import torch.nn as nn
+from datasets import load_dataset
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from torch.utils.data import DataLoader
+from torch.utils.data.distributed import DistributedSampler
+
+# FSDP Specific Imports
+from torch.distributed.fsdp import (
+    FullyShardedDataParallel as FSDP,
+    MixedPrecision,
+    ShardingStrategy,
+    CPUOffload,
+    StateDictType,      
+    FullStateDictConfig,
+)
+from torch.distributed.fsdp.wrap import transformer_auto_wrap_policy
+from transformers.models.gpt2.modeling_gpt2 import GPT2Block
+from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
+    checkpoint_wrapper,
+    CheckpointImpl,
+    apply_activation_checkpointing,
+)
+
+
+
+def setup():
+    """Initializes the distributed process group for NCCL."""
+    dist.init_process_group("nccl")
+
+def cleanup():
+    """Cleans up the distributed process group."""
+    dist.destroy_process_group()
+
+def log_stats(epoch, batch_idx, loss, start_time):
+    """Logs training progress, throughput, and GPU memory efficiency."""
+    if dist.get_rank() == 0:
+        # 1. Calculate Throughput (Samples per Second)
+        elapsed = time.time() - start_time
+        samples_per_sec = (batch_idx + 1) * 4 * dist.get_world_size() / elapsed
+        
+        # 2. Get Memory Metrics in GB
+        peak_mem = torch.cuda.max_memory_allocated() / (1024 ** 3) 
+        reserved_mem = torch.cuda.memory_reserved() / (1024 ** 3)
+
+        print(f"| Epoch: {epoch} | Batch: {batch_idx} | Loss: {loss:.4f} |")
+        print(f"| Speed: {samples_per_sec:.2f} samples/sec | Peak Mem: {peak_mem:.2f} GB | Reserved: {reserved_mem:.2f} GB |")
+        print("-" * 60)
+
+def train_one_epoch(model, dataloader, optimizer, epoch):
+    model.train()
+    start_time = time.time()
+    
+    # --- CONFIGURATION FOR QUICK TESTING ---
+    save_every_n_steps = 100    # Save a checkpoint every 100 steps
+    max_total_steps = 300      # Stop training completely after 300 steps
+    # ----------------------------------------
+
+    for batch_idx, batch in enumerate(dataloader):
+        if batch_idx >= max_total_steps:
+            if dist.get_rank() == 0:
+                print(f"🛑 Reached max steps ({max_total_steps}). Terminating training.")
+            return # Exit the function and stop training
+
+        input_ids = batch["input_ids"].to(torch.cuda.current_device())
+        optimizer.zero_grad()
+        loss = model(input_ids, labels=input_ids).loss
+        loss.backward()
+        optimizer.step()
+        
+        if batch_idx % 10 == 0:
+            log_stats(epoch, batch_idx, loss.item(), start_time)
+
+        # Save Checkpoint
+        if batch_idx > 0 and batch_idx % save_every_n_steps == 0:
+            if dist.get_rank() == 0:
+                os.makedirs("checkpoints", exist_ok=True)
+                checkpoint_path = f"checkpoints/gpt2_step_{batch_idx}.bin"
+                # Standard FSDP state dict gathering
+                with FSDP.state_dict_type(model, StateDictType.FULL_STATE_DICT):
+                    state_dict = model.state_dict()
+                    torch.save(state_dict, checkpoint_path)
+                print(f"💾 Checkpoint saved at {checkpoint_path}")
+
+
+def main():
+    setup()
+    local_rank = int(os.environ["LOCAL_RANK"])
+    torch.cuda.set_device(local_rank)
+
+    # 1. Load Data: WikiText-103
+    dataset = load_dataset("wikitext", "wikitext-103-v1", split="train")
+    tokenizer = AutoTokenizer.from_pretrained("gpt2")
+    tokenizer.pad_token = tokenizer.eos_token
+
+    def tokenize_function(examples):
+        return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=512)
+
+    tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=["text"])
+    tokenized_dataset.set_format("torch")
+    
+    sampler = DistributedSampler(tokenized_dataset, num_replicas=dist.get_world_size(), rank=dist.get_rank())
+    dataloader = DataLoader(tokenized_dataset, batch_size=4, sampler=sampler)
+
+    # 2. Configure FSDP Policies
+    # Use BF16 Mixed Precision for Ampere (A100/H100) efficiency
+    mp_policy = MixedPrecision(param_dtype=torch.bfloat16, reduce_dtype=torch.bfloat16)
+    
+    gpt2_auto_wrap_policy = functools.partial(
+        transformer_auto_wrap_policy,
+        transformer_layer_cls={GPT2Block},
+    )
+
+    # 3. Initialize and Wrap Model
+    model = AutoModelForCausalLM.from_pretrained("gpt2").to(local_rank)
+    
+    model = FSDP(
+        model,
+        auto_wrap_policy=gpt2_auto_wrap_policy,
+        mixed_precision=mp_policy,
+        sharding_strategy=ShardingStrategy.FULL_SHARD, # Max memory efficiency
+        device_id=local_rank
+    )
+
+    # 4. Apply Activation Checkpointing to each Transformer Block
+    non_reentrant_wrapper = functools.partial(checkpoint_wrapper, offload_to_cpu=False)
+    for module in model.modules():
+        if isinstance(module, GPT2Block):
+            module = non_reentrant_wrapper(module)
+
+    # 5. Initialize Optimizer AFTER FSDP Wrapping
+    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)
+
+    # 6. Execute Training
+    for epoch in range(1):
+        train_one_epoch(model, dataloader, optimizer, epoch)
+
+    cleanup()
+
+if __name__ == "__main__":
+    main()
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-fsdp/test_inference.py b/examples/nlp_and_llms/nvidia-fsdp/test_inference.py
new file mode 100644
index 00000000..4ea6b7ec
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-fsdp/test_inference.py
@@ -0,0 +1,25 @@
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+def run_inference(checkpoint_path, prompt="The history of WikiText is"):
+    print(f"Loading checkpoint: {checkpoint_path}")
+    
+    # 1. Initialize Model and Tokenizer
+    tokenizer = AutoTokenizer.from_pretrained("gpt2")
+    model = AutoModelForCausalLM.from_pretrained("gpt2")
+    
+    # 2. Load trained weights
+    # Use map_location='cpu' if testing on a machine without GPU
+    state_dict = torch.load(checkpoint_path, map_location='cpu')
+    model.load_state_dict(state_dict)
+    
+    # 3. Generate Text
+    inputs = tokenizer(prompt, return_tensors="pt")
+    outputs = model.generate(**inputs, max_length=200, do_sample=True, top_p=0.95)
+    
+    print("\n--- GENERATED TEXT ---")
+    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+
+if __name__ == "__main__":
+    # Point this to your latest .bin file in the checkpoints folder
+    run_inference("checkpoints/gpt2_step_200.bin")
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-langgraph/README.md b/examples/nlp_and_llms/nvidia-langgraph/README.md
new file mode 100644
index 00000000..82d37738
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-langgraph/README.md
@@ -0,0 +1,56 @@
+# 🧠 LangGraph Agent Sandbox
+
+This sample notebook demonstrates how to build a **local multi-agent coding assistant** using **LangGraph**, **Transformers**, and **LangChain Sandbox** — fully compatible with [Saturn Cloud](https://saturncloud.io/).
+
+The system allows you to:
+- Generate clean Python code from natural-language prompts.
+- Check the code for syntax and structure validity.
+- Execute the generated code safely in an isolated sandbox.
+- Interactively explore different code generation tasks — all locally, with **no API keys required**.
+
+---
+
+## ⚙️ What You’ll Learn
+- How to design an **agentic workflow** using LangGraph.
+- How to use **local transformer models (Phi-3 Mini)** for reasoning.
+- How to integrate a **safe execution sandbox** (with local fallback).
+- How to run multi-stage LLM pipelines entirely within Saturn Cloud.
+
+---
+
+## 🧩 Notebook Structure
+
+| Stage | Description |
+|--------|--------------|
+| **1. Install Dependencies** | Installs LangGraph, LangChain, and related libraries. |
+| **2. Load Model & Sandbox** | Loads the local Hugging Face model and initializes a secure sandbox (with fallback). |
+| **3. Define Workflow Agents** | Builds LangGraph nodes for Code Generation, Syntax Checking, and Execution. |
+| **4. Batch Testing** | Runs several example coding tasks through the full pipeline. |
+| **5. Interactive Mode** | Launches an interactive terminal to test your own code prompts. |
+
+---
+
+## 🚀 How to Run on Saturn Cloud
+
+1. **Open this template** in your Saturn Cloud environment.  
+2. Run all cells sequentially from top to bottom.  
+3. The local model (`Phi-3-mini`) will load automatically.  
+4. Explore the pre-loaded test prompts or use the **interactive assistant** in Stage 5.  
+5. All runs execute securely inside your Saturn Cloud instance — no external API calls.
+
+---
+
+## ☁️ About Saturn Cloud
+
+[Saturn Cloud](https://saturncloud.io/) provides powerful GPU-accelerated Jupyter environments that make it easy to run, scale, and share AI and data-science projects.  
+This template is part of Saturn Cloud’s **open-source educational catalog**, showcasing safe, local AI workflows.
+
+---
+
+### 🧠 Built With
+
+- 🤗 **Transformers**
+- 🧮 **LangGraph**
+- ⚡ **LangChain Sandbox**
+- ☁️ **Saturn Cloud**
+
diff --git a/examples/nlp_and_llms/nvidia-langgraph/langGraph_agent_sandbox.ipynb b/examples/nlp_and_llms/nvidia-langgraph/langGraph_agent_sandbox.ipynb
new file mode 100644
index 00000000..3ab15999
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-langgraph/langGraph_agent_sandbox.ipynb
@@ -0,0 +1,256 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "cdb861f5",
+   "metadata": {},
+   "source": [
+    "\n",
+    "# 🧠 LangGraph Agent Sandbox\n",
+    "\n",
+    "This template demonstrates how to build a **local multi-agent code assistant** using **LangGraph** and **Transformers** — fully compatible with **[Saturn Cloud](https://saturncloud.io/)**.\n",
+    "\n",
+    "It uses:\n",
+    "- 🧩 **LangGraph** for workflow control  \n",
+    "- 🤗 **Transformers (Phi-3-mini)** for local code generation  \n",
+    "- 🧮 **LangChain Sandbox** for safe isolated execution (with fallback mode)  \n",
+    "\n",
+    "Run this notebook on a Saturn Cloud Jupyter Server to explore autonomous LLM agents that generate, validate, and execute code locally.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47a75517",
+   "metadata": {},
+   "source": [
+    "## 🧩 Stage 1 — Install Dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e83bbd47",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install langchain langchain-community langchain-openai langchain-sandbox"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6164d9f",
+   "metadata": {},
+   "source": [
+    "## 🧠 Stage 2 — Load Local Model and Sandbox"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2ce8e67d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "import os\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline\n",
+    "from langchain_community.llms import HuggingFacePipeline\n",
+    "from langchain_sandbox import PyodideSandbox\n",
+    "\n",
+    "model_id = \"microsoft/Phi-3-mini-4k-instruct\"\n",
+    "print(f\"🔧 Loading model: {model_id} ...\")\n",
+    "\n",
+    "try:\n",
+    "    tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
+    "    model = AutoModelForCausalLM.from_pretrained(model_id)\n",
+    "    pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer, max_new_tokens=256)\n",
+    "    llm = HuggingFacePipeline(pipeline=pipe)\n",
+    "    print(\"✅ Local LLM ready (using Hugging Face Transformers).\")\n",
+    "except Exception as e:\n",
+    "    print(f\"⚠️ Model load failed: {e}\")\n",
+    "    llm = None\n",
+    "\n",
+    "try:\n",
+    "    sandbox = PyodideSandbox()\n",
+    "    print(\"🧪 Pyodide Sandbox ready for isolated execution.\")\n",
+    "except Exception as e:\n",
+    "    print(f\"⚠️ Sandbox initialization failed: {e}\")\n",
+    "    print(\"➡️ Using lightweight local sandbox emulator (safe fallback).\")\n",
+    "\n",
+    "    class LocalSandbox:\n",
+    "        def run(self, code):\n",
+    "            try:\n",
+    "                exec_locals = {}\n",
+    "                exec(code, {}, exec_locals)\n",
+    "                return {\"output_text\": str(exec_locals)}\n",
+    "            except Exception as err:\n",
+    "                return {\"output_text\": f\"Error: {err}\"}\n",
+    "\n",
+    "    sandbox = LocalSandbox()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12677323",
+   "metadata": {},
+   "source": [
+    "## ⚙️ Stage 3 — Define Workflow Agents"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "46377a9e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from langgraph.graph import StateGraph\n",
+    "from typing import TypedDict, Dict\n",
+    "import re, io, sys, contextlib\n",
+    "\n",
+    "class CodeState(TypedDict):\n",
+    "    prompt: str\n",
+    "    code: str\n",
+    "    check: str\n",
+    "    output: str\n",
+    "\n",
+    "class CodeGenerator:\n",
+    "    def run(self, state: CodeState) -> Dict:\n",
+    "        query = f\"Write clean, well-commented Python 3 code for: {state['prompt']}\"\n",
+    "        response = llm.invoke(query)\n",
+    "        generated = response if isinstance(response, str) else getattr(response, \"content\", str(response))\n",
+    "        print(\"🧠 Generated code:\")\n",
+    "        print(generated[:300], \"...\" if len(generated) > 300 else \"\")\n",
+    "        return {\"code\": generated}\n",
+    "\n",
+    "class SyntaxChecker:\n",
+    "    def run(self, state: CodeState) -> Dict:\n",
+    "        code = state.get(\"code\", \"\")\n",
+    "        issues = []\n",
+    "        if not re.search(r\"def\\s+\\w+\\(\", code):\n",
+    "            issues.append(\"❌ No function definition detected.\")\n",
+    "        if code.count(\"(\") != code.count(\")\"):\n",
+    "            issues.append(\"⚠️ Unbalanced parentheses.\")\n",
+    "        msg = \"✅ Syntax check passed.\" if not issues else \" | \".join(issues)\n",
+    "        print(f\"🧮 Syntax Check : {msg}\")\n",
+    "        return {\"check\": msg}\n",
+    "\n",
+    "class SandboxExecutor:\n",
+    "    def __init__(self, sandbox_instance):\n",
+    "        self.sandbox = sandbox_instance\n",
+    "\n",
+    "    def run(self, state: CodeState) -> Dict:\n",
+    "        code = state.get(\"code\", \"\")\n",
+    "        if not code:\n",
+    "            return {\"output\": \"No code detected.\"}\n",
+    "        code = re.sub(r\"```python|```\", \"\", code).strip()\n",
+    "        buffer = io.StringIO()\n",
+    "        try:\n",
+    "            with contextlib.redirect_stdout(buffer):\n",
+    "                exec_locals = {}\n",
+    "                exec(code, {}, exec_locals)\n",
+    "            return {\"output\": buffer.getvalue().strip() or \"✅ Executed successfully.\"}\n",
+    "        except Exception as e:\n",
+    "            return {\"output\": f\"Error: {e}\"}\n",
+    "\n",
+    "def build_workflow():\n",
+    "    g = StateGraph(CodeState)\n",
+    "    g.add_node(\"generate_code\", CodeGenerator().run)\n",
+    "    g.add_node(\"check_syntax\", SyntaxChecker().run)\n",
+    "    g.add_node(\"execute_code\", SandboxExecutor(sandbox).run)\n",
+    "    g.set_entry_point(\"generate_code\")\n",
+    "    g.add_edge(\"generate_code\", \"check_syntax\")\n",
+    "    g.add_edge(\"check_syntax\", \"execute_code\")\n",
+    "    print(\"✅ LangGraph workflow built.\")\n",
+    "    return g.compile()\n",
+    "\n",
+    "workflow = build_workflow()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44849559",
+   "metadata": {},
+   "source": [
+    "## 🧪 Stage 4 — Batch Testing"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e5b25c7d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "examples = [\n",
+    "    \"Write a Python function to check if a number is prime.\",\n",
+    "    \"Create a Python script that sorts a list of strings alphabetically.\",\n",
+    "    \"Generate a function to calculate factorial using recursion.\"\n",
+    "]\n",
+    "for i, prompt in enumerate(examples, start=1):\n",
+    "    print(f\"\\n🧠 Test {i}: {prompt}\")\n",
+    "    print(\"=\" * 80)\n",
+    "    result = workflow.invoke({\"prompt\": prompt, \"code\": \"\", \"check\": \"\", \"output\": \"\"})\n",
+    "    print(result)\n",
+    "    print(\"=\" * 80)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0160c826",
+   "metadata": {},
+   "source": [
+    "## 💬 Stage 5 — Interactive Assistant"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "93c1be83",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from rich.console import Console\n",
+    "from rich.panel import Panel\n",
+    "\n",
+    "console = Console()\n",
+    "def run_interactive():\n",
+    "    console.print(Panel.fit(\"💬 Type a coding task (or 'exit' to quit)\", style=\"bold cyan\"))\n",
+    "    while True:\n",
+    "        prompt = input(\"\\n🧠 Enter task: \").strip()\n",
+    "        if prompt.lower() in [\"exit\", \"quit\", \"q\"]:\n",
+    "            console.print(\"\\n👋 Exiting interactive mode.\\n\", style=\"bold yellow\")\n",
+    "            break\n",
+    "        state = {\"prompt\": prompt, \"code\": \"\", \"check\": \"\", \"output\": \"\"}\n",
+    "        result = workflow.invoke(state)\n",
+    "        console.print(result)\n",
+    "run_interactive()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "692377c3",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## 🏁 Conclusion\n",
+    "\n",
+    "You’ve built a **LangGraph Agent Sandbox** — a self-contained multi-agent system that generates, checks, and executes Python code using local models.  \n",
+    "All this runs **directly inside Saturn Cloud**, without external APIs.\n",
+    "\n",
+    "**Built with ❤️ using:**  \n",
+    "🤗 Transformers | 🧮 LangGraph | ⚡ LangChain | ☁️ [Saturn Cloud](https://saturncloud.io/)\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/nlp_and_llms/nvidia-lora/README.md b/examples/nlp_and_llms/nvidia-lora/README.md
new file mode 100644
index 00000000..38b24c71
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-lora/README.md
@@ -0,0 +1,12 @@
+# LoRA Fine-Tuning (PEFT + Transformers)
+
+![LoRA Fine-Tuning Header](https://cdn-icons-png.flaticon.com/512/8101/8101225.png)
+
+This template illustrates how **LoRA fine-tuning** can significantly reduce resource requirements while maintaining strong model performance.
+By running it on **Saturn Cloud**, you benefit from a GPU-optimized, scalable environment that simplifies the entire fine-tuning workflow — from experimentation to production deployment.
+
+Learn more:
+
+* 🔗 [Saturn Cloud Documentation](https://saturncloud.io/docs/)
+* 🔗 [Saturn Cloud Templates Gallery](https://saturncloud.io/resources/templates/)
+* 🔗 [PEFT Library (Hugging Face)](https://huggingface.co/docs/peft/index)
diff --git a/examples/nlp_and_llms/nvidia-lora/nvidia_lora.ipynb b/examples/nlp_and_llms/nvidia-lora/nvidia_lora.ipynb
new file mode 100644
index 00000000..4ac2ecd0
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-lora/nvidia_lora.ipynb
@@ -0,0 +1 @@
+{"cells":[{"cell_type":"markdown","id":"0c21f79d","metadata":{"id":"0c21f79d"},"source":["# LoRA Fine-Tuning\n","\n","![](https://miro.medium.com/v2/resize:fit:700/1*bwbhjqxxC6IPKGxnmpVlwg.png)\n","\n","This example template demonstrates **parameter-efficient fine-tuning (PEFT)** using **LoRA (Low-Rank Adaptation)** with the FLAN-T5 model on a free public dataset (SAMSum) for summarization.\n","\n","This provides a lightweight, GPU-friendly workflow that runs fully offline — no API keys required. The notebook guides you through each step: loading data, applying LoRA adapters, fine-tuning, evaluating, and saving your model for reuse.\n","\n","On [Saturn Cloud](https://saturncloud.io), you can scale from a single NVIDIA GPU to multi-GPU clusters, enabling distributed inference for larger models or higher throughput workloads — all within a managed, GPU-ready environment."]},{"cell_type":"markdown","id":"572d0e23-b689-4be9-999b-a5da2f670d90","metadata":{"id":"572d0e23-b689-4be9-999b-a5da2f670d90"},"source":["## Install dependencies"]},{"cell_type":"code","execution_count":10,"id":"982862db-82e2-4c70-9221-3ed04c03aad3","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"982862db-82e2-4c70-9221-3ed04c03aad3","executionInfo":{"status":"ok","timestamp":1761300519635,"user_tz":-60,"elapsed":444023,"user":{"displayName":"Durojaye Olusegun","userId":"09188621512197003284"}},"outputId":"3779bc45-2105-4f23-ab71-65ad97e06f29"},"outputs":[{"output_type":"stream","name":"stdout","text":["\u001b[1m\u001b[33mwarning\u001b[39m\u001b[0m\u001b[1m:\u001b[0m \u001b[1mThe `--system` flag has no effect, a system Python interpreter is always used in `uv venv`\u001b[0m\n","Using CPython 3.12.12 interpreter at: \u001b[36m/usr/bin/python3\u001b[39m\n","Creating virtual environment at: \u001b[36mlora-env\u001b[39m\n","\u001b[33m?\u001b[0m \u001b[1mA virtual environment already exists at `lora-env`. Do you want to replace it?\u001b[0m \u001b[38;5;8m[y/n]\u001b[0m \u001b[38;5;8m›\u001b[0m \u001b[36myes\u001b[0m\n","\n","\u001b[0J\u001b[32m✔\u001b[0m \u001b[1mA virtual environment already exists at `lora-env`. Do you want to replace it?\u001b[0m \u001b[38;5;8m·\u001b[0m \u001b[36myes\u001b[0m\n","\u001b[?25hActivate with: \u001b[32msource lora-env/bin/activate\u001b[39m\n","0.00s - Debugger warning: It seems that frozen modules are being used, which may\n","0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off\n","0.00s - to python to disable frozen modules.\n","0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.\n","Installed kernelspec lora-env in /root/.local/share/jupyter/kernels/lora-env\n"]}],"source":["# Step 1: Install UV (fast, modern package manager)\n","!pip install -q uv\n","# Step 2: Create a clean environment with Python 3.12\n","!uv venv lora-env -p 3.12\n","\n","# Step 3: Activate and install all required libraries inside it\n","!source lora-env/bin/activate && uv pip install -q torch transformers datasets peft accelerate evaluate bitsandbytes jedi\n","\n","# Step 4: Add the environment as a selectable Jupyter kernel\n","!source lora-env/bin/activate && pip install -q ipykernel\n","!python -m ipykernel install --user --name=lora-env --display-name \"LoRA Fine-Tune Env\"\n","\n","# (Optional fallback for environments without bitsandbytes)\n","try:\n","    import bitsandbytes\n","except Exception:\n","    print(\"⚠️ bitsandbytes not available — skipping GPU quantisation support.\")\n","\n","\n","!pip install -q --upgrade \\\n","    sentencepiece \\\n","    protobuf \\\n","    tqdm"]},{"cell_type":"markdown","id":"c12336a1-ae67-4f40-8bcc-df3b5ce9c404","metadata":{"id":"c12336a1-ae67-4f40-8bcc-df3b5ce9c404"},"source":["Download and prepares the GovReport Summarization dataset from `Hugging Face (ccdv/govreport-summarization)`. The dataset contains long government reports paired with their human-written summaries, making it suitable for text summarization tasks."]},{"cell_type":"code","execution_count":11,"id":"3b6f4321-71f6-4358-bce8-7665b0c3e560","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"3b6f4321-71f6-4358-bce8-7665b0c3e560","executionInfo":{"status":"ok","timestamp":1761300520639,"user_tz":-60,"elapsed":978,"user":{"displayName":"Durojaye Olusegun","userId":"09188621512197003284"}},"outputId":"c9ff928c-0bff-4a54-ec94-3b4301bf0b45"},"outputs":[{"output_type":"stream","name":"stdout","text":["⏳ Downloading dataset: ccdv/govreport-summarization\n","✅ Dataset ready (govreport-summarization)\n"]}],"source":["from datasets import load_dataset, Dataset\n","import pandas as pd\n","\n","print(\"⏳ Downloading dataset: ccdv/govreport-summarization\")\n","ds = load_dataset(\"ccdv/govreport-summarization\")\n","train_ds = ds[\"train\"].select(range(1000))\n","eval_ds  = ds[\"validation\"].select(range(200))\n","TEXT_COL, TARGET_COL = \"report\", \"summary\"\n","print(\"✅ Dataset ready (govreport-summarization)\")"]},{"cell_type":"markdown","id":"0dd28e48-64a4-4133-8310-e9aed982e595","metadata":{"id":"0dd28e48-64a4-4133-8310-e9aed982e595"},"source":["Loads the **FLAN-T5-Small model** and its tokenizer from Hugging Face. The tokenizer converts text into numerical tokens the model can understand, while the model itself (a sequence-to-sequence language model) performs tasks such as summarization or text generation."]},{"cell_type":"code","execution_count":12,"id":"1b080fd4-c153-4657-8642-bdb858a3f5e9","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"1b080fd4-c153-4657-8642-bdb858a3f5e9","executionInfo":{"status":"ok","timestamp":1761300521822,"user_tz":-60,"elapsed":1174,"user":{"displayName":"Durojaye Olusegun","userId":"09188621512197003284"}},"outputId":"bef9cc71-31d4-41fb-eaca-6d63312e5379"},"outputs":[{"output_type":"stream","name":"stdout","text":["⏳ Loading model: google/flan-t5-small\n","✅ Model and tokenizer loaded successfully!\n","Tokenizer vocab size: 32100\n"]}],"source":["from transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n","\n","model_name = \"google/flan-t5-small\"\n","print(f\"⏳ Loading model: {model_name}\")\n","\n","tokenizer = AutoTokenizer.from_pretrained(model_name)\n","model = AutoModelForSeq2SeqLM.from_pretrained(model_name)\n","\n","print(\"✅ Model and tokenizer loaded successfully!\")\n","print(\"Tokenizer vocab size:\", len(tokenizer))\n"]},{"cell_type":"markdown","source":["Adding LoRA (Low-Rank Adaptation) adapter to the base model using PEFT (Parameter-Efficient Fine-Tuning). Instead of updating all model parameters, LoRA inserts lightweight adapter layers that learn task-specific updates—making fine-tuning faster and more memory-efficient."],"metadata":{"id":"KhKaRIjZom1R"},"id":"KhKaRIjZom1R"},{"cell_type":"code","execution_count":13,"id":"d5f3740d-76c3-4f76-92ee-c61dcbed3144","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"d5f3740d-76c3-4f76-92ee-c61dcbed3144","executionInfo":{"status":"ok","timestamp":1761300521858,"user_tz":-60,"elapsed":19,"user":{"displayName":"Durojaye Olusegun","userId":"09188621512197003284"}},"outputId":"5ff20b33-214e-4a15-9338-3a7aac5fdd31"},"outputs":[{"output_type":"stream","name":"stdout","text":["✅ LoRA adapter added successfully!\n","trainable params: 688,128 || all params: 77,649,280 || trainable%: 0.8862\n"]}],"source":["from peft import LoraConfig, get_peft_model\n","\n","# LoRA configuration\n","lora_config = LoraConfig(\n","    r=16,                  # rank\n","    lora_alpha=32,         # scaling factor\n","    lora_dropout=0.05,     # dropout for regularisation\n","    bias=\"none\",\n","    task_type=\"SEQ_2_SEQ_LM\"  # T5-style sequence-to-sequence\n",")\n","\n","# Apply adapter to model\n","model = get_peft_model(model, lora_config)\n","\n","# Print summary\n","print(\"✅ LoRA adapter added successfully!\")\n","model.print_trainable_parameters()\n"]},{"cell_type":"markdown","source":["Prepare the text data for training by converting it into numerical tokens that the model can process."],"metadata":{"id":"dLitVvFvo5vn"},"id":"dLitVvFvo5vn"},{"cell_type":"code","execution_count":14,"id":"93fa696d-a672-4ad3-8343-73f8ebc71c7a","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":121,"referenced_widgets":["9db3a5ac0dd84249a2b236b96c58aad8","2c821f95cbf94e6f972651544b51bacf","69e89bf8eace41aa850498fd3fd61f99","3aaca7366ecb47d8b4ac27b6301aa91b","48ba285de8364e65a380add6e08e4d69","29dfb08a2a1d43b3878cb8a98b285b09","4edaefbb46844f8ba1583f63c20f9ccf","168534f6a2f3457b8dfa29da5aa15d6a","3e54568d0ae94350a1a461a6b1cc3423","bd8316fe2cc24289bf8d39ab6f065e43","d802453c7a484c89897a30b8ddde157b"]},"id":"93fa696d-a672-4ad3-8343-73f8ebc71c7a","executionInfo":{"status":"ok","timestamp":1761300535524,"user_tz":-60,"elapsed":13642,"user":{"displayName":"Durojaye Olusegun","userId":"09188621512197003284"}},"outputId":"e9c09e29-4170-4e06-e229-5b7b5f002740"},"outputs":[{"output_type":"display_data","data":{"text/plain":["Map:   0%|          | 0/200 [00:00<?, ? examples/s]"],"application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"9db3a5ac0dd84249a2b236b96c58aad8"}},"metadata":{}},{"output_type":"stream","name":"stdout","text":["✅ Tokenisation complete!\n","Sample tokenised entry:\n","{'report': 'The structure of the armed forces is based on the Total Force concept, which recognizes that all elements of the structure—active duty military personnel, reservists, defense contractors, host nation military and civilian personnel, and DOD federal civilian employees—contribute to national defense. In recent years, federal civilian personnel have deployed along with military personnel to participate in Operations Joint Endeavor, conducted in the countries of Bosnia-Herzegovina, Croatia, and Hungary; Joint Guardian, in Kosovo; and Desert Storm, in Southwest Asia. Further, since the beginning of the Global War on Terrorism, the role of DOD’s federal civilian personnel has expanded to include participation in combat support functions in Operations Enduring Freedom and Iraqi Freedom. DOD relies on the federal civilian personnel it deploys to support a range of essential missions, including intelligence collection, criminal investigations, and weapon systems acquisition and maintenance. To ensure that its federal civilian employees will deploy to combat zones and perform critical combat support functions in theater, DOD established the emergency-essential program in 1985. Under this program, DOD designates as “emergency-essential” those civilian employees whose positions are required to ensure the success of combat operations or the availability of combat-essential systems. DOD can deploy federal civilian employees either on a voluntary or involuntary basis to accomplish the DOD mission. DOD has established force health protection and surveillance policies aimed at assessing and reducing or preventing health risks for its deployed federal civilian personnel; however, the department lacks procedures to ensure the components’ full implementation of its policies. In reviewing DOD federal civilian deployment records and other electronic documentation at selected component locations, we found that these components lacked documentation to show that they had fully complied with DOD’s force health protection and surveillance policy requirements for some federal civilian personnel who deployed to Afghanistan and Iraq. As a larger issue, DOD’s policies did not require the centralized collection of data on the identity of its deployed civilians, their movements in theater, or their health status, further hindering its efforts to assess the overall effectiveness of its force health protection and surveillance capabilities. In August 2006, DOD issued a revised policy (to be effective in December 2006) that outlines procedures to address its lack of centralized deployment and health-related data. However, the procedures are not comprehensive enough to ensure that DOD will be sufficiently informed of the extent to which its components fully comply with its requirements to monitor the health of deployed federal civilians. The DOD components included in our review lacked documentation to show that they always implemented force health protection and surveillance requirements for deployed federal civilians. These requirements include completing (1) pre-deployment health assessments to ensure that only medically fit personnel deploy outside of the United States as part of a contingency or combat operation; (2) pre-deployment immunizations to address possible health threats in deployment locations; (3) pre-deployment medical screenings for tuberculosis and human immunodeficiency virus (HIV); and (4) post-deployment health assessments to document current health status, experiences, environmental exposures, and health concerns related to their work while deployed. DOD’s force health protection and surveillance policies require the components to assess the medical condition of federal civilians to ensure that only medically fit personnel deploy outside of the United States as part of a contingency or combat operation. The policies stipulate that all deploying civilian personnel are to complete pre-deployment health assessment forms within 30 days of their deployments, and health care providers are to review the assessments to confirm the civilians’ health readiness status and identify any needs for additional clinical evaluations prior to their deployments. While the components that we included in our review had procedures in place that would enable them to implement DOD’s pre-deployment health assessment policies, it was not clear to what extent they had done so. Our review of deployment records and other documentation at the selected component locations found that these components lacked documentation to show that some federal civilian personnel who deployed to Afghanistan and Iraq had received the required pre-deployment health assessments. For those deployed federal civilians in our review, we found that, overall, a small number of deployment records (52 out of 3,771) were missing documentation to show that they had received their pre-deployment health assessments, as reflected in table 1. As shown in table 1, the federal civilian deployment records we included in our review showed wide variation by location regarding documentation of pre-deployment health assessments, ranging from less than 1 percent to more than 90 percent. On an aggregate component-level basis, at the Navy location in our review, we found that documentation was missing for 19 of the 52 records in our review. At the Air Force locations, documentation was missing for 29 of the 37 records in our review. In contrast, all three Army locations had hard copy or electronic records which indicated that almost all of their federal deployed civilians had received pre-deployment health assessments. In addition to completing pre-deployment health assessment forms, DOD’s force health protection and surveillance policies stipulate that all DOD deploying federal civilians receive theater-specific immunizations to address possible health threats in deployment locations. Immunizations required for all civilian personnel who deploy to Afghanistan and Iraq include: hepatitis A (two-shot series); tetanus-diphtheria (within 10 years of deployment); smallpox (within 5 years of deployment); typhoid; and influenza (within the last 12 months of deployment). As reflected in table 2, based on the deployment records maintained by the components at locations included in our review, the overall number of federal civilian deployment records lacking documentation of only one of the required immunizations for deployment to Afghanistan and Iraq was 285 out of 3,771. However, 3,313 of the records we reviewed were missing documentation of two or more immunizations. At the Army’s Fort Bliss, our review of its electronic deployment data determined that none of its deployed federal civilians had documentation to show that they had received immunizations. Officials at this location stated that they believed some immunizations had been given; however, they could not provide documentation as evidence of this. DOD policies require deploying federal civilians to receive certain screenings, such as for tuberculosis and HIV. Table 3 indicates that 55 of the 3,771 federal civilian deployment records included in our review were lacking documentation of the required tuberculosis screening; and approximately 35 were lacking documentation of HIV screenings prior to deployment. DOD’s force health protection and surveillance policies also require returning DOD federal civilian personnel to undergo post-deployment health assessments to document current health status, experiences, environmental exposures, and health concerns related to their work while deployed. The post-deployment process begins within 5 days of civilians’ redeployment from the theater to their home or demobilization processing stations. DOD’s policies require civilian personnel to complete a post- deployment assessment that includes questions on health and exposure concerns. A health care provider is to review each assessment and recommend additional clinical evaluation or treatment as needed. As reflected in table 4, our review of deployment records at the selected component locations found that these components lacked documentation to show that most deployed federal civilians (3,525 out of 3,771) who deployed to Afghanistan and Iraq had received the required post- deployment health assessments upon their return to the United States. Federal civilian deployment records lacking evidence of post-deployment health assessments ranged from 3 at the U.S. Army Corps of Engineers Transatlantic Programs Center and Wright-Patterson Air Force Base, respectively, to 2,977 at Fort Bliss. Beyond the aforementioned weaknesses found in the selected components’ implementation of force health protection and surveillance requirements for deploying federal civilians, as a larger issue, DOD lacks comprehensive, centralized data that would enable it to readily identify its deployed civilians, track their movements in theater, or monitor their health status, further hindering efforts to assess the overall effectiveness of its force health protection and surveillance capabilities. The Defense Manpower Data Center (DMDC) is responsible for maintaining the department’s centralized system that currently collects location-specific deployment information for military servicemembers, such as grid coordinates, latitude/longitude coordinates, or geographic location codes. However, DOD has not taken steps to similarly maintain centralized data on its deployed federal civilians. In addition, DOD had not provided guidance that would require its components to track and report data on the locations and movements of DOD federal civilian personnel in theaters of operations. In the absence of such a requirement, each DOD component collected and reported aggregated data that identified the total number of DOD federal civilian personnel in a theater of operations, but each lacked the ability to gather, analyze, and report information that could be used to specifically identify individuals at risk for occupational and environmental exposures during deployments. In previously reporting on the military services’ implementation of DOD’s force health protection and surveillance policies in 2003, we highlighted the importance of knowing the identity of servicemembers who deployed during a given operation and of tracking their movements within the theater of operations as major elements of a military medical surveillance system. We further noted the Institute of Medicine’s finding that documentation on the location of units and individuals during a given deployment is important for epidemiological studies and appropriate medical care during and after deployments. For example, this information allows epidemiologists to study the incidences of disease patterns across populations of deployed servicemembers who may have been exposed to diseases and hazards within the theater, and health care professionals to treat their medical problems appropriately. Without location-specific information for all of its deployed federal civilians and centralized data in its department-level system, DOD limits its ability to ensure that sufficient and appropriate consideration will also be given to addressing the health care concerns of these individuals. DOD also had not provided guidance to the components that would require them to forward completed deployment health assessments for all federal civilians to the Army Medical Surveillance Activity (AMSA), where these assessments are suppose to be archived in the Defense Medical Surveillance System (DMSS), integrated with other historical and current data on personnel and deployments, and used to monitor the health of personnel who participate in deployments. The overall success of deployment force protection and surveillance efforts, in large measure, depends on the completeness of health assessment data. The lack of such data may hamper DOD’s ability to intervene in a timely manner to address health care problems that may arise from DOD federal civilian deployments to overseas locations in support of contingency operations. With increases in the department’s use of federal civilian personnel to support military operations, DOD officials have recognized the need for more complete and centralized location-specific deployment information and deployment-related health information on its deployed federal civilians. In this regard, in August 2006, the Office of the Under Secretary of Defense for Personnel and Readiness issued revised policy and program guidance that generally addressed the shortcomings in DOD’s force health protection and surveillance capabilities. The revised policy and guidance, scheduled to become effective in December 2006, require the components within 3 years, to electronically report (at least weekly) to DMDC, location-specific data for all deployed personnel, including federal civilians. In addition, the policy and guidance require the components to submit all completed health assessment forms to the AMSA for inclusion in DMSS. Nonetheless, DOD’s new policy is not comprehensive enough to ensure that the department will be sufficiently informed of the extent to which its components are complying with existing health protection requirements for its deployed federal civilians. Although the policy requires DOD components to report certain location-specific and health data for all of their deployed personnel, including federal civilians, it does not establish an oversight and quality assurance mechanism for assessing and ensuring the full implementation of the force health protection and surveillance requirements by all DOD components that our prior work has identified as essential in providing care to military personnel. In a September 2003 report on the Army’s and the Air Force’s compliance with force health protection policy for servicemembers, we noted that neither of the military services had fully complied with DOD’s force health protection and surveillance policies for many active duty servicemembers, including the policies requiring that servicemembers be assessed before and after deploying overseas and receive certain immunizations. We further noted that DOD, at that time, did not have an effective quality assurance program to provide oversight of, and ensure compliance with, the department’s force health protection and surveillance requirements, and that the lack of such a system was a major cause of the high rate of noncompliance that we identified at the units we visited. In response to a legislative mandate and our recommendation, DOD established an oversight mechanism to evaluate the success of its force health protection and surveillance policies in ensuring that servicemembers received pre- and post-deployment medical examinations and that record-keeping requirements were met. This oversight mechanism included (1) periodic site visits jointly conducted with staff from the Office of the Assistant Secretary for Health Affairs and staff from the military services to assess compliance with the deployment health requirements, (2) periodic reports from the services on their quality assurance programs, and (3) periodic reports from AMSA on health assessment data maintained in the centralized database. Until the department provides a similar oversight and quality assurance mechanism for its deployed federal civilians, it will not be effectively positioned to ensure compliance with its policies, or ensure the health care and protection of these individuals as they continue to support contingency operations. DOD has established medical treatment policies that cover its federal civilians while they are deployed to support contingency operations in Afghanistan and Iraq, and available workers’ compensation claims we reviewed confirmed that those deployed federal civilians received care consistent with the policies. These policies state that DOD federal civilians who require treatment for injuries or diseases sustained during overseas hostilities may be provided care under the DOD military health system. Thus, DOD’s deployed federal civilians may receive care through the military’s treatment facilities. As shown in figure 1, DOD’s military health system provides four levels of medical care to personnel who are injured or become ill while deployed. Specifically, medical treatment during a military contingency begins with level one care, which consists of basic first aid and emergency care at a unit in the theater of operation. The treatment then moves to a second level of care, where, at an Aid station, injured or ill personnel are examined and evaluated to determine their priority for continued movement outside of the theater of operation and to the next (third) level of care. At the third level, injured or ill personnel are treated in a medical installation staffed and equipped for resuscitation, surgery, and postoperative care. Finally, at the fourth level of care, which occurs far from the theater of operation, injured or ill personnel are treated in a hospital staffed and equipped for definitive care. Injured or ill DOD federal civilians deployed in support of contingency operations in Afghanistan and Iraq who require level four medical care are transported to DOD’s Regional Medical Center in Landstuhl, Germany. Injured or ill DOD federal civilians who cannot be returned to duty in theater are evacuated to the United States for continuation of medical care. In these cases (or where previously deployed federal civilians later identify injuries or diseases and subsequently request medical treatment), DOD’s policy provides for its federal civilians who require treatment for deployment-related injuries or occupational illnesses to receive medical care through either the military’s medical treatment facilities or civilian facilities. The policy stipulates that federal civilians who are injured or become ill as a result of their deployment must file a Federal Employees’ Compensation Act (FECA) claim with DOD, which then files a claim with the Department of Labor’s Office of Workers’ Compensation Programs (OWCP). The Department of Labor’s OWCP is responsible for making a decision to award or deny medical benefits. OWCP must establish—based on evidence provided by the DOD civilian—that the employee is eligible for workers’ compensation benefits due to the injury or disease for which the benefits are claimed. To obtain benefits under FECA, DOD federal civilians must show that (1) they were employed by the U.S. government, (2) they were injured (exposed) in the workplace, (3) they have filed a claim in a timely manner, (4) they have a disabling medical condition, and (5) there is a causal link between their medical condition and the injury or exposure. Three avenues of appeal are provided for DOD federal civilians in the event that the initial claim is denied: (1) reconsideration by an OWCP claims examiner, (2) a hearing or review of the written record by OWCP’s Branch of Hearings and Review, and (3) a review by the Employees’ Compensation Appeals Board. DOD’s medical treatment process and the OWCP’s claims process are shown in figure 2. Overall, the claims we reviewed showed that the DOD federal civilians who sustained injuries or diseases while deployed had received care that was consistent with DOD’s medical treatment policies. Specifically, in reviewing a sample of seven workers’ compensation claims (out of a universe of 83) filed under the Federal Employees’ Compensation Act by DOD federal civilians who deployed to Iraq, we found that in three cases where care was initiated in theater the affected federal civilians had received treatment in accordance with DOD’s policies. For example, in one case, a deployed federal civilian was treated for traumatic injuries at a hospital outside of the theater of operation and could not return to duty in theater because of the severity of the injuries sustained. The civilian was evacuated to the United States and received medical care through several of the military’s medical treatment facilities as well as through a civilian facility. Further, in all seven claims that we reviewed, DOD federal civilians who requested medical care after returning to the United States, had, in accordance with DOD’s policy, received initial medical examinations and/or treatment for their deployment-related injuries or illnesses and diseases through either military or civilian treatment facilities. While OWCP has primary responsibility for processing and approving all FECA claims for medical benefits, as noted earlier, the scope of our review did not include assessing actions taken by the Department of Labor’s OWCP in further processing workers’ compensation claims for injured or ill civilians and authorizing continuation of medical care once their claims were submitted for review. DOD provides a number of special pays and benefits to its federal civilian personnel who deploy in support of contingency operations, which are generally different in type and in amount from those provided to deployed military personnel. Both groups receive special pays, but the types and amounts differ. In our modeled scenarios, the overall amounts of compensation, which include special pays, were higher for DOD federal civilian personnel than for military personnel. DOD federal civilian personnel also receive different types and amounts of disability benefits, depending on specific program provisions and individual circumstances. Further, survivors of deceased DOD federal civilian and military personnel generally receive comparable types of cash survivor benefits—lump sum, recurring, or both—but benefit amounts differ for the two groups. Survivors of DOD federal civilian personnel, however, almost always receive lower noncash benefits than military personnel. DOD federal civilian and military personnel are both eligible to receive special pays to compensate them for the conditions of deployment. As shown in table 5, some of the types of special pays are similar for both DOD federal civilian and military personnel, although the amounts paid to each group differ. Other special pays were unique to each group. DOD federal civilian and military personnel deployed to posts with unusually difficult or unhealthful conditions or severe physical hardships are authorized a similar type of post (hardship) differential. In addition, danger pay is granted to both groups serving at a post where civil insurrection, civil war, or war-like conditions exist. In this context, DOD federal civilian personnel who are deployed to Afghanistan and Iraq are eligible to receive post (hardship) differential and danger pay, each equivalent to 35 percent of their base salaries. In contrast, military personnel receive monthly pays of $100 for hardship duty and $225 for imminent danger. However, some special pays are unique to each group. For example, to partially reimburse those who are involuntarily separated from their dependents, military personnel are eligible to receive a family separation allowance that is not available to deployed DOD federal civilian personnel. Additionally, unlike DOD federal civilian personnel, military personnel also receive a combat zone tax exclusion while deployed to Afghanistan and Iraq that excludes certain income from federal taxes. DOD federal civilian personnel, by contrast, are eligible for a variety of premium pays, such as overtime and night differential, that are not available to military personnel. Although DOD federal civilian and military personnel generally receive various special pays to compensate them for conditions of deployment, in certain scenarios that we modeled, the overall amounts of compensation payments were higher for DOD federal civilian personnel than for military personnel, as illustrated in tables 6 and 7. In the event of sustaining an injury while deployed, DOD federal civilian and military personnel are eligible to receive two broad categories of disability benefits—disability compensation and disability retirement. However, the benefits applicable to each group vary by type and amount, depending on specific program provisions and individual circumstances. Within these broad categories, there are three main types of disability: (1) temporary disability, (2) permanent partial disability, and (3) permanent total disability. Both DOD federal civilian and military personnel who are injured in the line of duty are eligible to receive continuation of their pay during the initial period of treatment and may be eligible to receive recurring payments for lost wages. However, the payments to DOD federal civilian personnel are based on their salaries and whether the employee has any dependents, regardless of the number, which can vary significantly, whereas disability compensation payments made by the Department of Veterans Affairs (VA) to injured military personnel are based on the severity of the injury and their number of dependents. DOD federal civilian personnel are eligible to receive continuation of pay (salary) for up to 45 days, followed by a recurring payment for wage loss which is based on a percentage of salary and whether they have any dependents, up to a cap. In contrast, military personnel receive continuation of pay of their salary for generally no longer than a year, followed by a recurring VA disability compensation payment for wage loss that is based on the degree of disability and their number of dependents, and temporary DOD disability retirement for up to 5 years. Appendix II provides additional information on temporary disability compensation payments for federal civilian and military personnel. To illustrate the way in which the degree of impairment and an individual’s salary can affect temporary disability compensation, in our April 2006 review, we compared the disability benefits available to military personnel with those available to comparable civilian public safety officers at the federal, state, and local levels. We found that VA compensation payments for military personnel were based on a disability rating, regardless of salary level; in contrast, compensation payments for civilian public safety officers were based on salary level, regardless of disability level. Thus, for an individual with severe injuries and relatively low wages, VA compensation payments for military personnel were generally higher than those of the civilian public safety officers included in the reviews. However, if an individual had less severe injuries and high wages, VA compensation payments for military personnel were generally lower than those of the civilian public safety officers included in the review. When a partial disability is determined to be permanent, DOD federal civilian and military personnel can continue to receive recurring compensation payments. For DOD federal civilian personnel, these payments are provided for the remainder of life as long as the impairment persists, and can vary significantly depending upon the salary of the individual and the existence of dependents. Military personnel are also eligible to receive recurring VA disability compensation payments for the remainder of their lives, and these payments are based on the severity of the servicemember’s injury and the number of dependents. In addition, both groups are eligible to receive additional compensation payments beyond the recurring payments just discussed, based on the type of impairment. DOD federal civilians with permanent partial disabilities receive a schedule of payments based on the specific type of impairment (sometimes referred to as a schedule award). Some impairments may result in benefits for a few weeks, while others may result in benefits for several years. Similarly, military personnel receive special monthly VA compensation payments depending on the specific type and degree of impairment. Appendix II provides more detailed information on permanent partial disability compensation payments for DOD federal civilian and military personnel. Our April 2006 review compared the compensation benefits available to military personnel with those available to federal civilian public safety officers, among others, using several scenarios. Our analysis showed that when able to return to duty, military personnel often received a greater amount of compensation benefits over a lifetime than did civilians, even when the monthly benefit payment was substantially lower and receipt of benefits was delayed for several years. Permanent partial disabilities that prevent civilian and military personnel from returning to duty in their current jobs may entitle them to receive disability retirement benefits based on a percentage of salary in addition to compensation benefits; however, the eligibility criteria and benefit amounts differ. Under the Civil Service Retirement System (CSRS), DOD federal civilian personnel must be unfit for duty and have 5 years of service to qualify for disability retirement benefits. Under the Federal Employees’ Retirement System (FERS), civilian personnel must be unfit for duty and have 18 months of service. DOD federal civilian personnel must elect either compensation benefits or disability retirement. Military personnel who are unfit for duty are eligible for DOD disability retirement benefits if they have a disability rating of 30 percent or more regardless of length of service, or if they have 20 years or more of service regardless of disability rating. The amount of the DOD disability retirement payment is offset dollar for dollar, however, by the amount of the monthly VA disability compensation payment unless they have at least 20 years of service and a disability rating of 50 percent or more, or combat-related disabilities. Our April 2006 review of disability benefits showed that when military personnel and federal civilian public safety officers were unable to return to duty due to a permanent partial disability, such as a leg amputation, the combined compensation and retirement benefits provided to the military personnel over a lifetime were sometimes more, and sometimes less, than the combined benefits provided to civilian public safety officers. When an injury is severe enough to be deemed permanent and total, DOD federal civilian and military personnel may receive similar types of benefits such as disability compensation and retirement payments; however, the amounts paid to each group vary. For civilian personnel, the monthly payment amounts for total disability are generally similar to those for permanent partial disability described earlier, but unlike with permanent partial disabilities, the payments do not take into account any wage earning capacity. Both groups are eligible to receive additional compensation payments beyond the recurring payments that are similar to those for permanent partial disability. DOD federal civilians with permanent disabilities receive a schedule award based on the specific type of impairment. In addition, DOD federal civilian personnel may be eligible for an additional attendant allowance—up to $1,500 per month during 2006—if such care is needed. Military personnel receive special monthly VA compensation payments for particularly severe injuries, such as amputations, blindness, or other loss of use of organs and extremities. The payments are designed to account for attendant care or other special needs deriving from the disability. In addition to disability compensation, both DOD federal civilian and military personnel have access to disability retirement benefits for permanent total disabilities. The provisions for election and offset of disability compensation and disability retirement benefits in cases of permanent total disability are similar to provisions in cases of permanent partial disability discussed earlier. Another benefit available to DOD federal civilian and military personnel with permanent total disabilities is Social Security Disability Insurance (SSDI). SSDI benefits are available to individuals who incur a physical or mental impairment that prevents them from performing substantial gainful activity and that is expected to last at least 1 year or to result in death. The benefit is based on the employee’s earnings history and lifetime contributions to Social Security; therefore, the benefit amounts vary widely among individuals. DOD federal civilian personnel covered by FERS and military personnel pay into Social Security and thus may be eligible to receive SSDI benefits. The maximum benefit to both groups in 2006 was $2,053 per month. However, DOD federal civilian personnel must choose between either compensation payments and SSDI benefits or have their disability retirement payments reduced when receiving SSDI benefits. Survivors of deceased DOD federal civilian and military personnel generally receive similar types of cash survivor benefits—either as a lump sum, a recurring payment, or both—through comparable sources. However, the benefit amounts generally differ for each group. Survivors of DOD federal civilian and military personnel also receive noncash benefits that differ in type and amounts. As shown in table 8, survivors of deceased DOD federal civilian and military personnel both receive lump sum benefits in the form of Social Security, a death gratuity, burial expenses, and life insurance. Social Security provides $255 upon the death of a DOD federal civilian employee or military member. In addition, survivors of deceased DOD federal civilian personnel receive a death gratuity of up to $10,000, while survivors of deceased military personnel receive $100,000. The payment for funeral expenses provided to survivors of deceased DOD federal civilian personnel can be as high as $800, plus $200 for costs associated with terminating employee status, while it can be $7,700 for deceased military personnel. Life insurance is another common source of benefits for the survivors of many deceased civilian and military personnel. Survivors of deceased federal civilian personnel receive a payment equal to the civilian’s rate of basic pay, rounded to the nearest thousand, plus $2,000. Military personnel automatically are insured as part of the Servicemembers’ Group Life Insurance for up to $400,000, unless they elect less or no coverage. DOD federal civilian employees also receive a survivor benefit in their retirement plans. Survivors of deceased DOD federal civilian and military personnel are also eligible for recurring benefits, some of which are specific to each group, as shown in table 9. Survivors of both deceased DOD federal civilian and military personnel may be eligible to receive recurring Social Security payments based on the deceased individual’s earnings in a covered period. However, other types of recurring payments are specific to either civilian or military personnel. For example, survivors of DOD federal civilian personnel may receive recurring payments from a retirement plan or workers’ compensation if the death occurred while in the line of duty. Survivors of deceased military personnel also receive payments through the Survivor Benefit Plan, Dependency and Indemnity Compensation, or both. In addition to lump sum and recurring benefits, survivors of deceased DOD federal civilians and military personnel receive noncash benefits. As shown in table 10, survivors of deceased military personnel receive more noncash benefits than do those of deceased DOD federal civilian personnel, with few benefits being comparable in type. For example, eligible survivors of military personnel who die while on active duty obtain benefits such as rent-free government housing or tax- free housing allowances for up to 365 days, relocation assistance, and lifetime access to commissaries and exchanges that are not available to civilian personnel who die in the line-of-duty. However, survivors of both deceased DOD federal civilian and military personnel do continue to receive health insurance that is wholly or partially subsidized. As DOD’s federal civilian employees assume an expanding role in helping the department support its contingency operations overseas, the need for attention to the policies and benefits that affect the health and welfare of these individuals becomes increasingly significant. DOD currently has important policies in place that relate to the deployment of its federal civilians. However, it lacks an adequate oversight and quality assurance mechanism to ensure compliance and quality of service. Thus, not all of its policies—such as those that define the department’s requirements for force health protection and surveillance—are being fully implemented by the DOD components. Until DOD improves its oversight in this area, it will jeopardize its ability to be effectively informed of the extent to which its federal civilians are screened and deemed medically fit to deploy in support of contingency operations; deployed civilian personnel receive needed immunizations to counter theater disease threats; and what medical follow-up attention federal civilians require for health problems or concerns that may arise following their deployment. To strengthen DOD’s force health protection and surveillance for its federal civilian personnel who deploy in support of contingency operations, we recommend that the Secretary of Defense direct the Office of the Under Secretary of Defense for Personnel and Readiness to establish an oversight and quality assurance mechanism to ensure that all components fully comply with its requirements. In written comments on a draft of this report, DOD partially concurred with our recommendation. The department acknowledged the necessity for all deployed civilians to receive required medical assessments and immunizations, and that documentation must be available in every instance. The department outlined several steps it intends to take to determine appropriate implementation of our recommendation. Specifically, the department stated that it has written and coordinated a new DOD instruction, scheduled to become effective before the end of 2006, that establishes a comprehensive DOD force health protection quality assurance program that will apply to DOD civilian personnel accompanying deploying military forces. While DOD’s response is encouraging, we remain concerned that the department’s description of the actions it plans to take to assess the components’ compliance with its requirements lacks sufficient detail. DOD was unable to provide us with a copy of the new instruction; thus, we could not evaluate the comprehensiveness of its new force health protection quality assurance program or determine whether the program identifies specific actions the department plans to take for assessing and ensuring the full implementation of the force health protection and surveillance requirements by all DOD components. DOD also stated that proposed revisions to its directives and instructions that address the planning, preparation, and utilization of DOD civilians include, among other things, annual assessments for compliance with pre-and post-deployment medical assessment requirements. However, the department did not describe what actions, if any, it plans to take to ensure that it will be sufficiently informed of the extent to which its components are complying with existing health protection requirements for its deployed federal civilians. In the absence of more specific details on its planned actions, we continue to emphasize the department’s need for a comprehensive oversight and quality assurance mechanism without which it will not be effectively positioned to ensure compliance with its policies, or ensure the health care and protection of its deployed federal civilians as they continue to support contingency operations. In addition to its comments on our recommendation, the department took issue with some of our specific findings. DOD questioned our findings that in many cases DOD components were unable to produce documentation confirming that deployed federal civilians had received necessary pre- or post-deployment medical assessments, or immunizations. The department stated that DOD activities, particularly regarding the Army Corps of Engineers, Transatlantic Programs Center (TPC), had determined that documentation did exist for many records included in our review, thus raising reservations about our findings. In particular, the department stated that the number (and percent) of records missing two or more immunizations that we reported for TPC was inaccurate. It stated that based on TPC’s review of the specific documentation that we used to support our findings, we had actually identified 69 records (54.3 percent) as missing two or more immunizations, rather than 85 (66.9 percent) noted in our draft report. We disagree. TPC overlooked 16 records included in our review that lacked documentation of any immunizations. Moreover, as we noted in our report, to provide assurances that the results of our review of hard copy deployment records at the selected component locations were accurate, we requested that each component’s designated medical personnel reexamine those deployment records that we determined were missing required health documentation. We then adjusted our results in those instances where documentation was subsequently provided. To provide additional assurances regarding our determinations, we requested that each component’s designated medical personnel review and sign the data collection instrument that we used to collect deployment health information from each individual civilian’s deployment record attesting to our conclusions regarding the existence of health assessment or immunization documentation. DOD also stated that we inappropriately mixed discussion of Veterans Affairs and DOD benefits without distinguishing between the two. However, our report appropriately discusses two broad categories of “government-provided” benefits: (1) those provided by DOD and (2) those provided by VA. Nonetheless, to further clarify this section of our report, we added “VA” and “DOD” to our discussions of disability compensation and retirement benefits for military personnel. DOD also stated that our discussion of military disability benefits presented incorrect information in many cases, indicating that our statements that compensation payments for military personnel were based on a disability rating, regardless of salary level is only true with regard to VA disability benefits. DOD also stated that DOD disability payments do, in fact, take into account salary level, and that if a former member is entitled to both, there is an offsetting mechanism. We agree. As we state in our report, under veterans’ compensation programs, benefits typically include cash payments to replace a percentage of the individual’s loss in wages while injured and unable to work. We also state that disability retirement benefits for military personnel are based on a percent of salary in addition to compensation benefits, and that the amount of retirement payment is offset dollar for dollar by the amount of monthly compensation payment unless military personnel have at least 20 years of service and a disability rating of 50 percent or more, or have combat-related disabilities. Further, DOD submitted detailed comments related to our analysis of special pays and benefits provided to deployed DOD federal civilian and military personnel. In particular, the department stated that our selection and presentation of the associated data on the special pays and benefits provided to DOD federal civilian and military personnel could easily mislead the reader into drawing erroneous conclusions. The department also stated that our comparisons did not take into account the relative value of certain key benefits for which explicit dollar amounts cannot be measured, such as retirement systems, health care systems, and military commissary exchange privileges. To the contrary, our report did discuss this limitation, and as is the case with any modeled scenarios based on certain assumptions, some of the factors with the potential to affect the overall outcomes of our comparisons could not be included because of, as DOD pointed out, the relative value of certain key benefits for which explicit dollar amounts cannot be measured. It is partly for this reason that we acknowledged in the report that we do not take a position on the adequacy or appropriateness of the special pays and benefits provided to DOD federal civilian and military personnel. DOD also requested that we clearly acknowledge the fundamental differences between the military and civilians systems. We believe that we have done so. As we noted in our report, we did not make direct analytical comparisons between compensation and benefits offered by DOD to deployed federal civilian and military personnel because such comparisons must account for the demands of the military service, such as involuntary relocation, frequent and lengthy separations from family, and liability for combat. DOD provided other technical comments, which we have incorporated as appropriate. The department’s comments are reprinted in their entirety in appendix III. We are sending copies of this report to the Chairman and Ranking Minority Member, Senate Committee on Armed Services; the Chairman and Ranking Minority Member, House Committee on Armed Services; the Chairman and Ranking Minority Member, Subcommittee on Defense, Senate Committee on Appropriations; and the Chairman and Ranking Minority Member, Subcommittee on Defense, House Committee on Appropriations; and other interested congressional parties. We are also sending copies to the Secretary of Defense and the Under Secretary of Defense for Personnel and Readiness. We will make copies available to other interested parties upon request. Copies of this report will also be made available at no charge on GAO’s Web site at http://www.gao.gov. Should you or your staff have any questions about this report, please contact me at (202) 512-6304 or by e-mail at melvinv@gao.gov. Contact points for our Offices of Congressional Relations and Public Affairs may be found on the last page of this report. Key contributors to this report are listed in appendix IV. To assess the extent to which DOD has established force health protection and surveillance policies for DOD federal civilians who deploy outside of the United States in support of contingency operations, and how the components (military services and the Defense Contract Management Agency) have implemented those policies, we reviewed pertinent force health protection and surveillance policies and discussed these policies with the following offices or commands: U.S. Central Command; Joint Chiefs of Staff, Manpower and Personnel; Under Secretary of Defense for Personnel and Readiness (including the Assistant Secretary of Defense for Health Affairs, Deployment Health Support Directorate; Civilian Personnel Policy; and Civilian Personnel Management Services); the Surgeons General for the Army, Navy, and Air Force; and the Defense Contract Management Agency (DCMA). Our review focused on DOD federal civilians who (1) deployed to Afghanistan or Iraq for 30 continuous days or more between June 1, 2003, and September 30, 2005, and (2) returned to the United States by February 28, 2006. Because DOD had difficulty identifying the total number of federal civilians who deployed to Afghanistan or Iraq, we assessed the implementation of DOD’s deployment health requirements at eight component locations that were selected using a number of approaches. Given that DOD components have flexibility in where they conduct deployment processing, we selected locations for our review accordingly. Specifically, the Army uses a centralized approach, deploying its federal civilians at three primary locations; therefore, we selected all three locations for review. By contrast, the Navy and Air Force use a decentralized approach, deploying their federal civilians from their home stations. For these components, we selected five locations based on data that indicated that these locations had deployed the largest numbers of federal civilian personnel. DCMA was included in our review because it had deployed the largest number of federal civilian personnel compared to other defense agencies. DCMA has an informal agreement with the Army to process its federal civilians through two of the Army’s three deployment locations. Therefore, DCMA federal civilian deployment data in this report are included in the Army results to the extent that DCMA federal civilian deployments were documented at the two relevant Army locations. At all eight component locations, we reviewed either all available hard copy or electronic deployment records, or in one instance, a sample of the deployment records for deployed federal civilian personnel who met our criteria above. Table 11 shows the locations included in our review and the number of deployment records reviewed at each location. In total, we reviewed 3,431 hard copy and automated records for federal civilian personnel who deployed to Afghanistan and Iraq. Specifically, we reviewed hard copies of deployment records for 454 (out of a reported 822) federal civilian personnel at seven component locations and automated deployment records for 2,977 (out of the reported 2,977) federal civilian personnel at the other location where all deployment records were being maintained electronically. The results of deployment record reviews, however, could not be projected beyond the samples to all DOD federal civilians who had deployed during this time frame. To facilitate our review of federal civilian deployment records at the selected component locations, we developed a data collection instrument to review and collect deployment health information from each individual civilian’s deployment record. For federal civilians in our review at each location, we reviewed deployment records for documentation that the following force health protection and surveillance policy requirements were met: Pre-and post-deployment health assessments; Tuberculosis screening test (within 1 year of deployment); Human Immunodeficiency Virus (HIV) screening test; Pre-deployment immunizations: hepatitis A (first and second course); influenza (within 1 year of deployment); tetanus-diphtheria (within 10 years of deployment); typhoid; and smallpox (within 5 years of deployment) After our review of hard copy deployment records, we requested each component’s medical personnel to reexamine those hard copy deployment records that were missing required health documentation, and we adjusted our results where documentation was subsequently provided. We also requested and queried other documentation from information systems used by the components to capture deployment and related health information, making adjustments to our results where documentation was found in the systems. These data sources included the Army’s Medical Protection System (MEDPROS), the Army’s medical database (MedBase), the Air Force’s Preventive Health Assessment and Individual Medical Readiness (PIMR) system and its Comprehensive Immunization Tracking Application (CITA), DOD’s Defense Enrollment Eligibility Reporting System (DEERS), which is used by the Navy, and the Army Medical Surveillance Activity’s Defense Medical Surveillance System (DMSS). At the Army’s Fort Benning, we created a sampling frame (i.e., total population) of records for 606 federal civilian deployments between June 1, 2003, and September 30, 2005. Our study population was limited to DOD federal civilians who deployed to Afghanistan or Iraq. We then drew a stratified random sample of 288 deployment records and stratified the sample to isolate potential duplicate deployment records for the same federal civilian. We found two duplicate records and removed them from both the population and sample, as shown in table 12. We also removed another 14 deployment records from our sample because those DOD federal civilians had been deployed to locations other than Afghanistan or Iraq, and were not eligible for the duty population. In addition, we removed another 13 deployment records that were originally selected as potential replacement records; however, we found that those replacements were not needed. Ultimately, we identified 238 in-scope responses, for a weighted response rate of 87 percent. Each sampled record was subsequently weighted in the analysis to represent all DOD federal civilians deployed to Afghanistan or Iraq. The disposition of the federal civilian deployment records we reviewed at Fort Benning are summarized in the following table: Our probability sample is only one of a large number of samples that we might have drawn. Because each sample could have provided different estimates, we express our confidence in the precision of our particular sample’s results as a 95 percent confidence interval. This is the interval that would contain the actual population value for 95 percent of the Fort Benning, Ga., samples we could have drawn. All percentage estimates from our sample have margins of error (that is, widths of confidence intervals) of plus or minus 5 percentage points or less, at the 95 percent confidence level, unless otherwise noted. We took steps to assess the reliability of DOD federal civilian deployment and health data for the purposes of this review, including consideration of issues such as the completeness of the data from the respective information systems’ program managers and administrators. We also examined whether the data were subjected to quality control measures such as periodic testing of the data against deployment records to ensure the accuracy and reliability of the data. In addition, we reviewed existing documentation related to the data sources and interviewed knowledgeable agency officials about the data. We did not find these deployment and health data to be sufficiently reliable for (1) identifying the universe of DOD federal civilian deployments or (2) use as the sole source for reviewing the health and immunization information for all DOD federal civilian deployments, but we found the information systems to be sufficiently reliable when used as one of several sources in our review of deployment records. In those instances where we did not find a deployment health assessment or immunization in either the deployment records or in the electronic data systems, we concluded that the health assessment or immunization was not documented. To determine the extent to which DOD has established and the components have implemented medical treatment policies for DOD federal civilians who deployed in support of contingency operations, we examined pertinent medical treatment policies for DOD federal civilian employees who required treatment for injuries and diseases sustained while supporting contingency operations. In addition, we obtained workers’ compensation claims filed by DOD federal civilian personnel with the Department of Labor’s Office of Workers’ Compensation Programs(OWCP) showing those civilians who sustained injuries and diseases during deployment. We selected and reviewed a non-probability sample of claims to assess the components’ processes and procedures for implementing DOD’s medical treatment policies across a range of civilian casualties including injuries, physical and mental illnesses, and diseases. The scope of our review did not extend to the Department of Labor’s claims review process. To identify special pays and benefits provided to DOD federal civilians who deployed in support of contingency operations and to assess the extent that special pays and benefits differ from those provided to deployed active duty military personnel, we examined major statutory provisions for special pays, disability and death benefits for federal civilians and military personnel, including relevant chapters of Title 5 of the U.S. Code governing federal civilian personnel management; relevant chapters of Title 10 of the U.S. Code governing armed forces personnel management; Section 112 of Title 26 of the U.S. Code governing combat zone tax exemption; relevant chapters of Title 37 of the U.S. Code governing pay and allowances for the uniformed services; relevant chapters of Title 38 of the U.S. Code governing veterans’ benefits; relevant provisions of applicable public laws governing military and civilian pay and benefits; applicable directives and instructions related to active duty military and DOD federal civilian benefits and entitlements; DOD financial management regulations; Department of State regulations; and prior GAO reports. In addition, we discussed the statutes and guidance with cognizant officials of the Office of the Under Secretary of Defense for Personnel and Readiness, military services’ headquarters, and the Defense Contract Management Agency involved with the administration of active duty and federal civilian personnel entitlements. We did not perform a comprehensive review of all compensation—comprised of a myriad of pays and benefits—offered to active duty military and federal civilian personnel in general. Our analysis focused on selected elements of compensation such as special pays (e.g., hostile fire/imminent danger pay). Also, we did not make direct analytical comparisons between compensation and benefits offered by DOD to deployed federal civilian and military personnel because such comparisons must account for the demands of the military service, such as involuntary relocation, frequent and lengthy separations from family, and liability for combat. After reviewing documents and interviewing officials, we then compiled and analyzed the information on the types and amounts of special pays and benefits available to active duty military and DOD federal civilian personnel who deployed to Afghanistan or Iraq. We interviewed DOD officials to discuss the basis for any differences in compensation. In addition, to illustrate how special pays affect overall compensation provided to DOD federal civilian and military personnel, we modeled scenarios for both groups using similar circumstances, such as length of deployment, pay grades, special pays (e.g., post differential pay, danger pay, overtime pay, family separation allowance, basic allowance for housing, basic allowance for subsistence), and duty location. Through discussions with senior DOD officials, we made an assumption that deployed DOD federal civilians worked 30 hours of overtime per week. For deployed DOD federal civilians, we subtracted a contribution of $15,000 to the Thrift Savings Plan (TSP) to obtain the adjusted gross income. We assumed that DOD federal civilians, temporarily at a higher tax bracket, would take maximum advantage of the opportunity to defer taxes. We assumed that the military personnel would contribute a smaller percentage of pay, 5 percent of gross income, to TSP. We made this assumption because much of the military pay was not subject to federal taxes, which removes the incentive to contribute to TSP, and because unlike for federal workers, military TSP does not have a matching component. For military personnel, we also deducted the amount of pay not subject to taxes due to the combat zone exclusion, family separation allowance, basic allowance for subsistence, and basic allowance for housing. Using these assumptions, we generated an adjusted gross income and used that as input into a commercial tax program, Turbo Tax, to obtain federal taxes owed. We assumed that both DOD federal civilian and military personnel were married, filing jointly, with a spouse that earned no income. We assumed that the family had two children and qualified for two child tax credits, and the Earned Income Tax Credit, if at that income level. This resulted in four exemptions and a standard deduction of $10,000 in 2005. For purposes of validation, we repeated this exercise using an alternate tax program, Tax Cut, and obtained identical results. We conducted our review from March 2006 to August 2006 in accordance with generally accepted government auditing standards. Both DOD federal civilian and military personnel are eligible to receive disability benefits when they sustain a line-of-duty injury. However, these benefits vary in amount. Table 13 shows the temporary disability benefits available to eligible DOD federal civilian and military personnel. As table 13 shows, DOD federal civilians who are injured in the line of duty are eligible to receive continuation of their salary up to 45 days, followed by a recurring payment for wage loss that is based on a percentage of their salary and the existence of dependents, up to a cap. In contrast, military personnel receive continuation of their salaries for generally no longer than a year, followed by a recurring payment for wage loss, which is based on the degree of disability and their number of dependents, and temporary retirement pay based on salary for up to 5 years. When a partial disability is determined to be permanent, both DOD federal civilians and military personnel are eligible to continue receiving recurring compensation payments, but again, the amounts of these benefits vary, as shown in table 14. As table 14 shows, DOD federal civilian personnel with permanent partial disabilities receive payments based on salary and dependents while military personnel receive payments based on the severity of the injury and their number of dependents, as long as the condition persists. In addition to the contact named above, Sandra Burrell, Assistant Director; William Bates; Dr. Benjamin Bolitzer; Alissa Czyz; George Duncan; Steve Fox; Dawn Godfrey; Nancy Hess; Lynn Johnson; Barbara Joyce; Dr. Ronald La Due Lake; William Mathers; Paul Newton; Dr. Charles Perdue; Jason Porter; Julia Matta; Susan Tieh; John Townes; and Dr. Monica Wolford made key contributions to this report.', 'summary': \"As the Department of Defense (DOD) has expanded its involvement in overseas military operations, it has grown increasingly reliant on its federal civilian workforce to support contingency operations. The Senate Armed Services Committee required GAO to examine DOD's policies concerning the health care for DOD civilians who deploy in support of contingency operations in Afghanistan and Iraq. GAO analyzed over 3,400 deployment-related records for deployed federal civilians and interviewed department officials to determine the extent to which DOD has established and the military services and defense agencies (hereafter referred to as DOD components) have implemented (1) force health protection and surveillance policies and (2) medical treatment policies and procedures for its deployed federal civilians. GAO also examined the differences in special pays and benefits provided to DOD's deployed federal civilians and military personnel. DOD has established force health protection and surveillance policies to assess and reduce or prevent health risks for its deployed federal civilian personnel, but it lacks procedures to ensure implementation. Our review of over 3,400 deployment records at eight component locations found that components lacked documentation that some federal civilian personnel who deployed to Afghanistan and Iraq had received, among other things, required pre- and post-deployment health assessments and immunizations. These deficiencies were most prevalent at Air Force and Navy locations, and one Army location. As a larger issue, DOD lacked complete and centralized data to readily identify its deployed federal civilians and their movement in theater, further hindering its efforts to assess the overall effectiveness of its force health protection and surveillance capabilities. In August 2006, DOD issued a revised policy which outlined procedures that are intended to address these shortcomings. However, these procedures are not comprehensive enough to ensure that DOD will know the extent to which its components are complying with existing health protection requirements. In particular, the procedures do not establish an oversight and quality assurance mechanism for assessing the implementation of its force health protection and surveillance requirements. Until DOD establishes a mechanism to strengthen its force health protection and surveillance oversight, it will not be effectively positioned to ensure compliance with its policies, or the health care and protection of deployed federal civilians. DOD has also established medical treatment policies for its deployed federal civilians which provide those who require treatment for injuries or diseases sustained during overseas hostilities with care that is equivalent in scope to that provided to active duty military personnel under the DOD military health system. GAO reviewed a sample of seven workers' compensation claims (out of a universe of 83) filed under the Federal Employees' Compensation Act by DOD federal civilians who deployed to Iraq. GAO found in three cases where care was initiated in theater, that the affected civilians had received treatment in accordance with DOD's policies. In all seven cases, DOD federal civilians who requested care after returning to the United States had, in accordance with DOD's policies, received medical examinations and/or treatment for their deployment-related injuries or diseases through either military or civilian treatment facilities. DOD provides certain special pays and benefits to its deployed federal civilians, which generally differ in type and/or amount from those provided to deployed military personnel. For example, both civilian and military personnel are eligible to receive disability benefits for deployment-related injuries; however, the type and amount of these benefits vary, and some are unique to each group. Further, while the survivors of deceased federal civilian and military personnel generally receive similar types of cash survivor benefits, the comparative amounts of these benefits differ.\", 'input_ids': [37, 1809, 13, 8, 3, 8715, 3859, 19, 3, 390, 30, 8, 9273, 5205, 2077, 6, 84, 4206, 7, 24, 66, 2479, 13, 8, 1809, 318, 6645, 5461, 2716, 4231, 6, 3, 60, 3473, 343, 7, 6, 4453, 8392, 6, 2290, 2982, 2716, 11, 14705, 4231, 6, 11, 6054, 308, 2822, 14705, 1652, 318, 1018, 5135, 17, 15, 12, 1157, 4453, 5, 86, 1100, 203, 6, 2822, 14705, 4231, 43, 16163, 590, 28, 2716, 4231, 12, 3716, 16, 14111, 16761, 4925, 9, 1967, 6, 4468, 16, 8, 1440, 13, 31275, 18, 12636, 776, 9527, 77, 9, 6, 19789, 6, 11, 22768, 117, 16761, 17606, 6, 16, 30072, 117, 11, 21618, 16133, 6, 16, 21423, 3826, 5, 9006, 6, 437, 8, 1849, 13, 8, 3699, 1602, 30, 332, 17262, 159, 51, 6, 8, 1075, 13, 6054, 308, 22, 7, 2822, 14705, 4231, 65, 8148, 12, 560, 5178, 16, 4719, 380, 3621, 16, 14111, 3720, 7920, 14179, 11, 7457, 23, 14179, 5, 6054, 308, 11455, 7, 30, 8, 2822, 14705, 4231, 34, 17274, 7, 12, 380, 3, 9, 620, 13, 1832, 14004, 6, 379, 6123, 1232, 6, 4336, 17032, 6, 11, 10931, 1002, 6566, 11, 2453, 5, 304, 766, 24, 165, 2822, 14705, 1652, 56, 17274, 12, 4719, 10029, 11, 1912, 2404, 4719, 380, 3621, 16, 8701, 6, 6054, 308, 2127, 8, 3583, 18, 8185, 10646, 478, 16, 13200, 5, 3526, 48, 478, 6, 6054, 308, 408, 6203, 38, 105, 15, 935, 122, 4392, 18, 8185, 10646, 153, 273, 14705, 1652, 3, 2544, 4655, 33, 831, 12, 766, 8, 1269, 13, 4719, 2673, 42, 8, 5576, 13, 4719, 18, 8185, 10646, 1002, 5, 6054, 308, 54, 17274, 2822, 14705, 1652, 893, 30, 3, 9, 18545, 42, 16, 4571, 14016, 651, 1873, 12, 8582, 8, 6054, 308, 2253, 5, 6054, 308, 65, 2127, 2054, 533, 1711, 11, 12305, 3101, 3, 8287, 44, 3, 20861, 11, 3, 5503, 42, 3, 13494, 533, 5217, 21, 165, 16163, 2822, 14705, 4231, 117, 983, 6, 8, 3066, 2136, 7, 4293, 12, 766, 8, 3379, 22, 423, 4432, 13, 165, 3101, 5, 86, 15678, 6054, 308, 2822, 14705, 12001, 3187, 11, 119, 3031, 7192, 44, 2639, 3876, 3248, 6, 62, 435, 24, 175, 3379, 3, 40, 13365, 7192, 12, 504, 24, 79, 141, 1540, 3, 29529, 26, 28, 6054, 308, 22, 7, 2054, 533, 1711, 11, 12305, 1291, 1502, 21, 128, 2822, 14705, 4231, 113, 16163, 12, 13542, 11, 7457, 5, 282, 3, 9, 2186, 962, 6, 6054, 308, 22, 7, 3101, 410, 59, 1457, 8, 3, 21411, 1232, 13, 331, 30, 8, 4810, 13, 165, 16163, 14705, 7, 6, 70, 9780, 16, 8701, 6, 42, 70, 533, 2637, 6, 856, 22190, 53, 165, 2231, 12, 6570, 8, 1879, 9570, 13, 165, 2054, 533, 1711, 11, 12305, 5644, 5, 86, 1660, 3581, 6, 6054, 308, 4683, 3, 9, 15760, 1291, 41, 235, 36, 1231, 16, 1882, 3581, 61, 24, 3, 28896, 4293, 12, 1115, 165, 2136, 13, 3, 21411, 12001, 11, 533, 18, 3897, 331, 5, 611, 6, 8, 4293, 33, 59, 3452, 631, 12, 766, 24, 6054, 308, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [282, 8, 1775, 13, 13143, 41, 308, 7039, 61, 65, 8148, 165, 9683, 16, 10055, 2716, 2673, 6, 34, 65, 4503, 5684, 3, 60, 9333, 30, 165, 2822, 14705, 10312, 12, 380, 3622, 53, 4392, 2673, 5, 37, 7819, 18251, 26, 1799, 3201, 831, 350, 17249, 12, 5443, 6054, 308, 31, 7, 3101, 6238, 8, 533, 124, 21, 6054, 308, 14705, 7, 113, 17274, 16, 380, 13, 3622, 53, 4392, 2673, 16, 13542, 11, 7457, 5, 350, 17249, 3, 16466, 147, 6180, 5548, 12001, 18, 3897, 3187, 21, 16163, 2822, 14705, 7, 11, 19257, 3066, 4298, 12, 2082, 8, 5996, 12, 84, 6054, 308, 65, 2127, 11, 8, 2716, 364, 11, 4453, 4299, 41, 88, 60, 10245, 3, 4822, 12, 38, 6054, 308, 3379, 61, 43, 6960, 1]}\n"]}],"source":["def preprocess(examples):\n","    inputs = examples[\"report\"]\n","    targets = examples[\"summary\"]\n","    model_inputs = tokenizer(\n","        inputs,\n","        max_length=512,\n","        truncation=True,\n","        padding=\"max_length\"\n","    )\n","    labels = tokenizer(\n","        targets,\n","        max_length=128,\n","        truncation=True,\n","        padding=\"max_length\"\n","    )\n","    model_inputs[\"labels\"] = labels[\"input_ids\"]\n","    return model_inputs\n","\n","# Apply preprocessing to the training and evaluation datasets\n","train_tok = train_ds.map(preprocess, batched=True)\n","eval_tok  = eval_ds.map(preprocess, batched=True)\n","\n","print(\"✅ Tokenisation complete!\")\n","print(\"Sample tokenised entry:\")\n","print(train_tok[0])"]},{"cell_type":"markdown","source":["Setting up the training loop that fine-tunes the model using Hugging Face’s Seq2SeqTrainer API."],"metadata":{"id":"QjnZE-bqpQmW"},"id":"QjnZE-bqpQmW"},{"cell_type":"code","execution_count":15,"id":"815a463f-56dc-4265-8e20-07ae54b0e351","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":772},"id":"815a463f-56dc-4265-8e20-07ae54b0e351","executionInfo":{"status":"ok","timestamp":1761300633994,"user_tz":-60,"elapsed":98448,"user":{"displayName":"Durojaye Olusegun","userId":"09188621512197003284"}},"outputId":"92cf6ec6-c089-427f-c6c7-b1b198aa7cf0"},"outputs":[{"output_type":"stream","name":"stderr","text":["/tmp/ipython-input-2964402256.py:22: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Seq2SeqTrainer.__init__`. Use `processing_class` instead.\n","  trainer = Seq2SeqTrainer(\n"]},{"output_type":"stream","name":"stdout","text":["🚀 Starting fine-tuning…\n"]},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.HTML object>"],"text/html":["\n","    <div>\n","      \n","      <progress value='500' max='500' style='width:300px; height:20px; vertical-align: middle;'></progress>\n","      [500/500 01:36, Epoch 1/1]\n","    </div>\n","    <table border=\"1\" class=\"dataframe\">\n","  <thead>\n"," <tr style=\"text-align: left;\">\n","      <th>Step</th>\n","      <th>Training Loss</th>\n","    </tr>\n","  </thead>\n","  <tbody>\n","    <tr>\n","      <td>25</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>50</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>75</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>100</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>125</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>150</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>175</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>200</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>225</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>250</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>275</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>300</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>325</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>350</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>375</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>400</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>425</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>450</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>475</td>\n","      <td>0.000000</td>\n","    </tr>\n","    <tr>\n","      <td>500</td>\n","      <td>0.000000</td>\n","    </tr>\n","  </tbody>\n","</table><p>"]},"metadata":{}},{"output_type":"stream","name":"stdout","text":["✅ Training complete!\n"]}],"source":["import torch\n","from transformers import DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer\n","\n","# Prepare data collator\n","data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)\n","\n","# Define training arguments\n","args = Seq2SeqTrainingArguments(\n","    output_dir=\"outputs-lora\",\n","    per_device_train_batch_size=2,\n","    per_device_eval_batch_size=2,\n","    learning_rate=2e-4,\n","    num_train_epochs=1,\n","    save_strategy=\"epoch\",\n","    logging_steps=25,\n","    predict_with_generate=True,\n","    fp16=torch.cuda.is_available(),  # Use mixed precision if GPU supports it\n","    report_to=[],  # disables online tracking (no API needed)\n",")\n","\n","# Initialise trainer\n","trainer = Seq2SeqTrainer(\n","    model=model,\n","    args=args,\n","    train_dataset=train_tok,\n","    eval_dataset=eval_tok,\n","    tokenizer=tokenizer,\n","    data_collator=data_collator,\n",")\n","\n","print(\"🚀 Starting fine-tuning…\")\n","trainer.train()\n","print(\"✅ Training complete!\")"]},{"cell_type":"markdown","id":"cb3261ba-fd89-42f8-8cbc-b9391b859ee6","metadata":{"id":"cb3261ba-fd89-42f8-8cbc-b9391b859ee6"},"source":["Let's test the fine-tuned model to verify that it can generate meaningful summaries. It performs a full inference pass using the model and tokenizer."]},{"cell_type":"code","execution_count":16,"id":"f86f32e1-49c1-426e-b013-3156cb6d6e4f","metadata":{"jp-MarkdownHeadingCollapsed":true,"colab":{"base_uri":"https://localhost:8080/"},"id":"f86f32e1-49c1-426e-b013-3156cb6d6e4f","executionInfo":{"status":"ok","timestamp":1761300634308,"user_tz":-60,"elapsed":233,"user":{"displayName":"Durojaye Olusegun","userId":"09188621512197003284"}},"outputId":"057ca513-731d-438a-a6d3-c41225bfa966"},"outputs":[{"output_type":"stream","name":"stdout","text":["\n","🧠 Fine-tuned Model Output:\n","\n","Bob and Alice discuss the museum's history.\n"]}],"source":["test_input = \"Write a brief summary: Alice and Bob discussed weekend plans. Bob suggested hiking, but Alice preferred visiting the museum.\"\n","\n","# Tokenise and move to model device\n","inputs = tokenizer(test_input, return_tensors=\"pt\", truncation=True, padding=True).to(model.device)\n","\n","# Generate output\n","outputs = model.generate(**inputs, max_new_tokens=80)\n","\n","# Decode and display\n","print(\"\\n🧠 Fine-tuned Model Output:\\n\")\n","print(tokenizer.decode(outputs[0], skip_special_tokens=True))\n"]},{"cell_type":"markdown","id":"3ee4b6cb-1684-49ca-9cc2-74609bf610bd","metadata":{"id":"3ee4b6cb-1684-49ca-9cc2-74609bf610bd"},"source":["This allows interactively test the fine-tuned model with your own custom input."]},{"cell_type":"code","execution_count":17,"id":"3bad36a0-89b4-484d-953c-7371d83cfff6","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"3bad36a0-89b4-484d-953c-7371d83cfff6","executionInfo":{"status":"ok","timestamp":1761300740710,"user_tz":-60,"elapsed":106374,"user":{"displayName":"Durojaye Olusegun","userId":"09188621512197003284"}},"outputId":"cee233ae-58d2-42ac-90a9-3e49430bc355"},"outputs":[{"output_type":"stream","name":"stdout","text":["💬 Try your own prompt!\n","\n","Enter a text or paragraph you'd like the model to summarise: what is it doing \n","\n","🧩 Model Output:\n","\n","It is doing it doing it doing it\n"]}],"source":["print(\"💬 Try your own prompt!\")\n","\n","user_prompt = input(\"\\nEnter a text or paragraph you'd like the model to summarise: \")\n","\n","# Tokenise user prompt\n","inputs = tokenizer(user_prompt, return_tensors=\"pt\", truncation=True, padding=True).to(model.device)\n","\n","# Generate output\n","outputs = model.generate(**inputs, max_new_tokens=80)\n","\n","# Decode and print\n","print(\"\\n🧩 Model Output:\\n\")\n","print(tokenizer.decode(outputs[0], skip_special_tokens=True))\n"]},{"cell_type":"markdown","id":"a0a3c84e-2d27-46ad-9356-95e2ef9a598b","metadata":{"id":"a0a3c84e-2d27-46ad-9356-95e2ef9a598b"},"source":["In this template, you fine-tuned **Google’s FLAN-T5-Small** model using **LoRA (Low-Rank Adaptation)** with the **PEFT** library — a modern, lightweight approach to large language model adaptation.\n","\n","Running this workflow on **Saturn Cloud** makes it both **scalable and cost-effective**. Saturn Cloud’s managed infrastructure allows you to:\n","\n","* Start with a **single NVIDIA GPU** for experimentation and scale up to multi-GPU clusters for larger models.\n","* Collaborate across teams easily through shared Jupyter environments.\n","* Integrate this fine-tuning workflow into production pipelines for enterprise-ready deployment.\n","\n","By using this template, you now have a complete, ready-to-run foundation for **adapter-based fine-tuning** in Saturn Cloud — ideal for tasks like summarisation, translation, or instruction-following with minimal resource use.\n","\n","To continue exploring, check out:\n","\n","* [Saturn Cloud Documentation](https://saturncloud.io/docs/) — for advanced configuration and GPU scaling.\n","* [Saturn Cloud Templates](https://saturncloud.io/resources/templates/) — for more examples of ML, LLM, and data science workflows."]}],"metadata":{"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.13.7"},"colab":{"provenance":[],"gpuType":"T4"},"accelerator":"GPU","widgets":{"application/vnd.jupyter.widget-state+json":{"9db3a5ac0dd84249a2b236b96c58aad8":{"model_module":"@jupyter-widgets/controls","model_name":"HBoxModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HBoxModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HBoxView","box_style":"","children":["IPY_MODEL_2c821f95cbf94e6f972651544b51bacf","IPY_MODEL_69e89bf8eace41aa850498fd3fd61f99","IPY_MODEL_3aaca7366ecb47d8b4ac27b6301aa91b"],"layout":"IPY_MODEL_48ba285de8364e65a380add6e08e4d69"}},"2c821f95cbf94e6f972651544b51bacf":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_29dfb08a2a1d43b3878cb8a98b285b09","placeholder":"​","style":"IPY_MODEL_4edaefbb46844f8ba1583f63c20f9ccf","value":"Map: 100%"}},"69e89bf8eace41aa850498fd3fd61f99":{"model_module":"@jupyter-widgets/controls","model_name":"FloatProgressModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"FloatProgressModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"ProgressView","bar_style":"success","description":"","description_tooltip":null,"layout":"IPY_MODEL_168534f6a2f3457b8dfa29da5aa15d6a","max":200,"min":0,"orientation":"horizontal","style":"IPY_MODEL_3e54568d0ae94350a1a461a6b1cc3423","value":200}},"3aaca7366ecb47d8b4ac27b6301aa91b":{"model_module":"@jupyter-widgets/controls","model_name":"HTMLModel","model_module_version":"1.5.0","state":{"_dom_classes":[],"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"HTMLModel","_view_count":null,"_view_module":"@jupyter-widgets/controls","_view_module_version":"1.5.0","_view_name":"HTMLView","description":"","description_tooltip":null,"layout":"IPY_MODEL_bd8316fe2cc24289bf8d39ab6f065e43","placeholder":"​","style":"IPY_MODEL_d802453c7a484c89897a30b8ddde157b","value":" 200/200 [00:13&lt;00:00, 15.11 examples/s]"}},"48ba285de8364e65a380add6e08e4d69":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"29dfb08a2a1d43b3878cb8a98b285b09":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"4edaefbb46844f8ba1583f63c20f9ccf":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}},"168534f6a2f3457b8dfa29da5aa15d6a":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"3e54568d0ae94350a1a461a6b1cc3423":{"model_module":"@jupyter-widgets/controls","model_name":"ProgressStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"ProgressStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","bar_color":null,"description_width":""}},"bd8316fe2cc24289bf8d39ab6f065e43":{"model_module":"@jupyter-widgets/base","model_name":"LayoutModel","model_module_version":"1.2.0","state":{"_model_module":"@jupyter-widgets/base","_model_module_version":"1.2.0","_model_name":"LayoutModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"LayoutView","align_content":null,"align_items":null,"align_self":null,"border":null,"bottom":null,"display":null,"flex":null,"flex_flow":null,"grid_area":null,"grid_auto_columns":null,"grid_auto_flow":null,"grid_auto_rows":null,"grid_column":null,"grid_gap":null,"grid_row":null,"grid_template_areas":null,"grid_template_columns":null,"grid_template_rows":null,"height":null,"justify_content":null,"justify_items":null,"left":null,"margin":null,"max_height":null,"max_width":null,"min_height":null,"min_width":null,"object_fit":null,"object_position":null,"order":null,"overflow":null,"overflow_x":null,"overflow_y":null,"padding":null,"right":null,"top":null,"visibility":null,"width":null}},"d802453c7a484c89897a30b8ddde157b":{"model_module":"@jupyter-widgets/controls","model_name":"DescriptionStyleModel","model_module_version":"1.5.0","state":{"_model_module":"@jupyter-widgets/controls","_model_module_version":"1.5.0","_model_name":"DescriptionStyleModel","_view_count":null,"_view_module":"@jupyter-widgets/base","_view_module_version":"1.2.0","_view_name":"StyleView","description_width":""}}}}},"nbformat":4,"nbformat_minor":5}
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-nim-tgi/README.md b/examples/nlp_and_llms/nvidia-nim-tgi/README.md
new file mode 100644
index 00000000..1da88fb1
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-nim-tgi/README.md
@@ -0,0 +1,222 @@
+# 🚀 NIM / TGI Server — Drop-In API
+
+**Tech Stack:** NVIDIA NIM + TGI (Text Generation Inference)
+**Built for:** Saturn Cloud Custom Templates
+➡️ [https://saturncloud.io/](https://saturncloud.io/)
+
+---
+
+## 🧠 Overview
+
+This template provides a **plug-and-play inference server** that supports **two interchangeable LLM backends**:
+
+| Backend              | Description                                                  | Use Case                                                        |
+| -------------------- | ------------------------------------------------------------ | --------------------------------------------------------------- |
+| **NVIDIA NIM Cloud** | Fully hosted LLMs on NVIDIA's high-performance GPU cloud     | High-accuracy, large models (Qwen 80B, Mistral, Nemotron, etc.) |
+| **Local TGI Server** | Lightweight local model running via HuggingFace Transformers | Fast prototyping, offline usage                                 |
+
+The API exposes **the same unified interface** for both backends, so users can switch engines without changing frontend code.
+
+This is ideal for **Saturn Cloud Data Science workflows**, allowing teams to quickly integrate LLM inference inside their notebooks, pipelines, or applications.
+
+---
+
+# 📂 Project Structure
+
+```
+NIM-TGI-Server/
+│
+├── server.py               # Main FastAPI server (unified interface)
+├── backend_tgi.py          # Local TGI backend (SmolLM)
+├── backend_nim.py          # NVIDIA cloud backend
+├── cli.py                  # CLI tool (select backend from terminal)
+├── requirements.txt
+└── README.md               # (this file)
+```
+
+---
+
+# ⚙️ 1. Environment Setup
+
+## **Create and activate a virtual environment**
+
+### Linux / MacOS
+
+```bash
+python -m venv venv
+source venv/bin/activate
+```
+
+### Windows (PowerShell)
+
+```powershell
+python -m venv venv
+venv\Scripts\activate
+```
+
+---
+
+## **Install dependencies**
+
+```bash
+pip install -r requirements.txt
+```
+
+---
+
+# 🔑 2. Getting a NVIDIA NIM API Key
+
+To use the **NIM Cloud backend**, you need an **NVIDIA AI Foundation API Key**.
+
+### Steps:
+
+1. Visit:
+   👉 [https://build.nvidia.com/explore/discover](https://build.nvidia.com/explore/discover)
+2. Sign in with NVIDIA account
+3. Open your "API Keys" panel
+4. Click **Create New API Key**
+5. Copy the key
+6. **Paste it into `backend_nim.py`**, replacing:
+
+```python
+API_KEY = "nvapi-xxxxxxxxxxxxxxxxxxxx"
+```
+
+⚠️ **Note:**
+This template currently embeds the key directly for simplicity, but in production you should store it in environment variables or a secret manager.
+
+---
+
+# 🧠 3. Backend Models
+
+## **A. NVIDIA NIM Backend (Cloud)**
+
+* Model used: `qwen/qwen3-next-80b-a3b-instruct`
+* Endpoint: `https://integrate.api.nvidia.com/v1`
+* Requires API Key
+* Supports streaming + large prompts
+
+## **B. Local TGI Backend (Lightweight CPU/GPU)**
+
+* Model: `HuggingFaceTB/SmolLM-1.7B-Instruct`
+* Runs entirely inside Python (no Docker needed)
+* Great for local experimentation
+
+---
+
+# 🚀 4. Running the Server
+
+Start FastAPI server:
+
+```bash
+uvicorn server:app --reload
+```
+
+You’ll see:
+
+```
+INFO:     Uvicorn running on http://127.0.0.1:8000
+```
+
+---
+
+# 🧪 5. Testing the Server
+
+## A. Test Local TGI Model
+
+**POST /chat/local**
+
+### Curl:
+
+```bash
+curl -X POST -F "prompt=Explain machine learning" http://localhost:8000/chat/local
+```
+
+### Expected Response:
+
+```json
+{
+  "backend": "tgi-local",
+  "response": "Machine learning is..."
+}
+```
+
+---
+
+## B. Test NVIDIA NIM Model
+
+**POST /chat/nim**
+
+### Curl:
+
+```bash
+curl -X POST -F "prompt=Write a short poem" http://localhost:8000/chat/nim
+```
+
+### Streaming:
+
+```bash
+curl -N -X POST -F "prompt=Tell me a story" -F "stream=true" http://localhost:8000/chat/nim
+```
+
+---
+
+# 🖥️ 6. Command-Line Interface (CLI)
+
+The template includes a **CLI wrapper**:
+
+### Local TGI:
+
+```bash
+python cli.py --backend local "Explain photosynthesis"
+```
+
+### NVIDIA NIM:
+
+```bash
+python cli.py --backend nim "Write 5 facts about Jupiter"
+```
+
+Streaming output works automatically.
+
+---
+
+# 💡 7. Using with Saturn Cloud
+
+This template is designed as a **plug-and-play server component** inside Saturn Cloud:
+
+* Run the server inside a Jupyter workspace
+* Use the API from notebooks or external apps
+* Swap between local inference (TGI) and cloud inference (NIM)
+* Ideal for ML research, RAG systems, agent development, and batch inference jobs
+
+Saturn Cloud provides scalable Jupyter environments with GPUs:
+👉 [https://saturncloud.io/](https://saturncloud.io/)
+
+---
+
+# ✔️ 8. Summary
+
+This template provides:
+
+### **✔ A drop-in inference server**
+
+Supports both NVIDIA Cloud NIM and local TGI backends.
+
+### **✔ Ready to use in Saturn Cloud**
+
+Works inside a GPU instance or CPU instance.
+
+### **✔ Unified API**
+
+Same route structure for both engines.
+
+### **✔ Full CLI + server support**
+
+### **✔ Ideal foundation for:**
+
+* Chatbots
+* RAG pipelines
+* Model comparison apps
+* AI feature development
+* ML/DS experimentation
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-nim-tgi/backend_nim.py b/examples/nlp_and_llms/nvidia-nim-tgi/backend_nim.py
new file mode 100644
index 00000000..8638bfc1
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-nim-tgi/backend_nim.py
@@ -0,0 +1,32 @@
+from openai import OpenAI
+import os
+
+# set it up in the key in the environment first
+## sample free API key:  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
+API_KEY = "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"
+
+if not API_KEY:
+    raise ValueError("❌ NVIDIA_API_KEY is not set. Export it first!")
+
+client = OpenAI(
+    base_url="https://integrate.api.nvidia.com/v1",
+    api_key=API_KEY,
+)
+
+def nim_chat(prompt, model="qwen/qwen3-next-80b-a3b-instruct", stream=False):
+    completion = client.chat.completions.create(
+        model=model,
+        messages=[{"role": "user", "content": prompt}],
+        temperature=0.6,
+        top_p=0.7,
+        max_tokens=1024,
+        stream=stream
+    )
+
+    if stream:
+        for chunk in completion:
+            delta = chunk.choices[0].delta
+            if delta and delta.content:
+                yield delta.content
+    else:
+        return completion.choices[0].message["content"]
diff --git a/examples/nlp_and_llms/nvidia-nim-tgi/backend_tgi.py b/examples/nlp_and_llms/nvidia-nim-tgi/backend_tgi.py
new file mode 100644
index 00000000..8c8b8c0a
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-nim-tgi/backend_tgi.py
@@ -0,0 +1,24 @@
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+MODEL_ID = "HuggingFaceTB/SmolLM-1.7B-Instruct"
+
+tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_ID,
+    device_map="auto",
+    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
+)
+
+def tgi_chat(prompt, max_tokens=256, temperature=0.7):
+    formatted_prompt = f"User: {prompt}\nAssistant:"
+    inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
+
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=max_tokens,
+        temperature=temperature
+    )
+
+    text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    return text.split("Assistant:")[-1].strip()
diff --git a/examples/nlp_and_llms/nvidia-nim-tgi/cli.py b/examples/nlp_and_llms/nvidia-nim-tgi/cli.py
new file mode 100644
index 00000000..a54ab10c
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-nim-tgi/cli.py
@@ -0,0 +1,19 @@
+import argparse
+from backend_tgi import tgi_chat
+from backend_nim import nim_chat
+
+parser = argparse.ArgumentParser(description="NIM/TGI CLI")
+parser.add_argument("--backend", choices=["local", "nim"], required=True)
+parser.add_argument("prompt", type=str)
+
+args = parser.parse_args()
+
+if args.backend == "local":
+    print("\n🟢 Local TGI Response:")
+    print(tgi_chat(args.prompt))
+
+else:
+    print("\n🟢 NVIDIA NIM Response:")
+    for chunk in nim_chat(args.prompt, stream=True):
+        print(chunk, end="", flush=True)
+    print("\n")
diff --git a/examples/nlp_and_llms/nvidia-nim-tgi/requirements.txt b/examples/nlp_and_llms/nvidia-nim-tgi/requirements.txt
new file mode 100644
index 00000000..e4211a4c
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-nim-tgi/requirements.txt
@@ -0,0 +1,5 @@
+fastapi
+uvicorn
+transformers
+torch
+openai
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-nim-tgi/server.py b/examples/nlp_and_llms/nvidia-nim-tgi/server.py
new file mode 100644
index 00000000..abb6fdb5
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-nim-tgi/server.py
@@ -0,0 +1,26 @@
+from fastapi import FastAPI, Form
+from fastapi.responses import StreamingResponse, JSONResponse
+from backend_tgi import tgi_chat
+from backend_nim import nim_chat
+
+app = FastAPI(title="NIM / TGI Drop-in API Server")
+
+@app.post("/chat/local")
+def chat_local(prompt: str = Form(...)):
+    response = tgi_chat(prompt)
+    return {"backend": "tgi-local", "response": response}
+
+
+@app.post("/chat/nim")
+def chat_nim(prompt: str = Form(...), stream: bool = False):
+    if stream:
+        generator = nim_chat(prompt, stream=True)
+        return StreamingResponse(generator, media_type="text/event-stream")
+
+    response = nim_chat(prompt, stream=False)
+    return {"backend": "nvidia-nim", "response": response}
+
+
+@app.get("/")
+def root():
+    return {"message": "NIM/TGI Server Running", "endpoints": ["/chat/local", "/chat/nim"]}
diff --git a/examples/nlp_and_llms/nvidia-rag-mini/README.md b/examples/nlp_and_llms/nvidia-rag-mini/README.md
new file mode 100644
index 00000000..aa22e6d3
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-rag-mini/README.md
@@ -0,0 +1,222 @@
+# 🧠 RAG Mini Docs Q&A
+
+A lightweight **Retrieval-Augmented Generation (RAG)** system that lets you drop `.txt` files into a folder and ask natural-language questions about them.
+
+This template combines:
+
+* **SentenceTransformers** for document embeddings
+* **ChromaDB** for vector storage & retrieval
+* **🤗 Transformers (FLAN-T5)** for answer generation
+* **FastAPI** for serving an interactive Q&A API
+
+Designed for fast prototyping and educational use on **[Saturn Cloud](https://saturncloud.io/)**.
+
+---
+
+## 🚀 1. Get Started – Understand the Folder Layout
+
+Before you start coding, review the project structure below.
+Each file serves a clear role; ensure you’re working from the correct one.
+
+```
+NVIDIA_RAG-MINI/
+├─ data/                 # Folder for your .txt documents
+│   └─ saturndoc.txt     # Sample document included for testing
+├─ rag_machine.py        # Core logic: embeddings, Chroma, QA engine
+├─ rag-api.py            # REST API built with FastAPI
+└─ requirements.txt
+```
+
+👉 **Action:** Create or upload `.txt` files into the `data/` folder before running the template.
+A sample file named **`saturndoc.txt`** is already included — you can use it immediately to test model training and query responses.
+
+---
+
+## 🧩 2. Set Up the Environment
+
+To run this project, you’ll need Python ≥ 3.10.
+If you’re using **Saturn Cloud**, create a new environment and install dependencies from `requirements.txt`.
+
+### ✍️ Step-by-step
+
+```bash
+# (optional) create a fresh virtual environment
+python -m venv rag-env
+source rag-env/bin/activate   # or .\rag-env\Scripts\activate on Windows
+
+# install dependencies
+pip install -r requirements.txt
+```
+
+### 📦 requirements.txt
+
+```text
+torch>=2.2.0
+transformers>=4.44.0
+sentence-transformers>=3.0.0
+chromadb>=0.5.0
+fastapi>=0.115.0
+uvicorn[standard]>=0.30.0
+pydantic>=2.7.0
+tqdm>=4.66.0
+```
+
+👉 **Action:** Run the install command inside your active environment before executing any Python file.
+
+---
+
+## ⚙️ 3. Configure Models and Paths
+
+All configuration happens inside **`rag_machine.py`**.
+Defaults are already suitable for most cases:
+
+```python
+CHROMA_DIR   = "rag_chroma_store"                  # Persistent database for embeddings
+DATA_DIR     = Path("data")                        # Directory containing your .txt files
+EMBED_MODEL  = "sentence-transformers/all-MiniLM-L6-v2"
+LLM_MODEL    = "google/flan-t5-base"
+```
+
+👉 **Action:**
+If you want faster inference, you can change `LLM_MODEL` to `google/flan-t5-small`.
+If you have a GPU, keep `flan-t5-base` or try `flan-t5-large`.
+
+---
+
+## 💻 4. Run in CLI Mode – Test the RAG Machine
+
+Use this mode for quick experimentation.
+The script loads models, indexes your `.txt` files, and opens an interactive prompt.
+
+```bash
+python rag_machine.py
+```
+
+You’ll see output similar to:
+
+```
+🧠 Starting RAG Machine (Transformers + Chroma)...
+♻️ Reindexing documents...
+📚 Indexing 5 documents...
+✅ Indexed 5 documents successfully.
+📊 Current collection size: 5 documents
+❓ Enter your question (or 'exit'):
+```
+
+👉 **Action:**
+Type a question like
+`What is this project about?`
+and the model will respond based on your documents.
+
+> You can use the included **`saturndoc.txt`** file for your first run — it’s already in the `data/` folder and serves as a ready-made example for testing and model training.
+
+---
+
+## 🌐 5. Run as an API – Serve Questions via HTTP
+
+Now, let’s turn your RAG engine into a service.
+Start the FastAPI server with Uvicorn:
+
+```bash
+uvicorn rag-api:app --reload
+```
+
+Once running, open your browser at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
+to explore the built-in Swagger interface.
+
+### 🧭 Endpoints
+
+| Endpoint               | Method | Description                                        |
+| ---------------------- | ------ | -------------------------------------------------- |
+| `/query`               | POST   | Submit a question and get an answer                |
+| `/reload` *(optional)* | POST   | Reindex `.txt` files without restarting the server |
+
+### Example Query
+
+```bash
+curl -X POST "http://127.0.0.1:8000/query" \
+     -H "Content-Type: application/json" \
+     -d "{\"query\": \"What does the onboarding doc say?\"}"
+```
+
+Response:
+
+```json
+{
+  "result": "The onboarding doc explains the project setup and data structure."
+}
+```
+
+👉 **Action:** Use `/query` to test, and `/reload` whenever you add new `.txt` files.
+
+---
+
+## 🔍 6. How It Works (Conceptually)
+
+1. **Document Loading** – Reads all `.txt` files from `data/`.
+2. **Embedding Generation** – Converts text into dense vectors using SentenceTransformers.
+3. **Vector Storage** – Saves these embeddings persistently in **ChromaDB** (`rag_chroma_store/`).
+4. **Retrieval** – Finds the most relevant text chunks for your query.
+5. **LLM Answering** – Passes retrieved context + query into **FLAN-T5** to generate the final answer.
+
+👉 **Action:** Skim through `rag_machine.py` to see how each step is implemented—you can easily swap models or add chunking later.
+
+---
+
+## 🔁 7. Reindex vs Reuse
+
+* **`reindex=True`** → Clears and rebuilds embeddings from scratch
+* **`reindex=False`** → Loads existing persistent store (faster)
+
+```python
+index_documents(reindex=True)   # rebuild everything
+index_documents(reindex=False)  # reuse old vectors
+```
+
+👉 **Action:**
+Use reindexing only after you add or update text files in `data/`.
+The included **`saturndoc.txt`** is already indexed by default when you run the script for the first time — so you can test immediately without adding new documents.
+
+---
+
+## 🧩 8. Best Practices
+
+* Keep each text file focused on one topic for cleaner retrieval.
+* For long documents, consider manually splitting them into sections.
+* If using CPU only, choose smaller models for faster inference.
+* Delete the `rag_chroma_store/` folder to fully reset the database.
+
+---
+
+## 🛰️ 9. Deploying on Saturn Cloud
+
+You can easily host this on **Saturn Cloud**:
+
+1. Create a new Jupyter or VS Code resource.
+2. Upload this project folder.
+3. Install requirements:
+
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. Run `python rag_machine.py` to test indexing.
+5. Launch the API:
+
+   ```bash
+   uvicorn rag-api:app --host 0.0.0.0 --port 8000
+   ```
+6. Expose port **8000** in your Saturn environment to access it externally.
+
+👉 Learn more about Saturn Cloud and GPU-accelerated workflows at **[https://saturncloud.io](https://saturncloud.io)**
+
+---
+
+## 🙌 Credits
+
+Built with ❤️ using:
+
+* 🤗 **Transformers**
+* 🧠 **SentenceTransformers**
+* 💾 **ChromaDB**
+* ⚡ **FastAPI**
+* and hosted proudly on **[Saturn Cloud](https://saturncloud.io/)**
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-rag-mini/data/saturndoc.txt b/examples/nlp_and_llms/nvidia-rag-mini/data/saturndoc.txt
new file mode 100644
index 00000000..f9375715
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-rag-mini/data/saturndoc.txt
@@ -0,0 +1,5 @@
+Saturn Cloud provides a scalable cloud platform for data science and machine learning.
+It supports Jupyter environments, Dask clusters, and GPU-powered instances.
+Users can collaborate on notebooks, deploy APIs, and run scheduled jobs.
+You can also fine-tune large language models and deploy them with minimal effort.
+Saturn Cloud offers integrations with Hugging Face, AWS, and GitHub.
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-rag-mini/rag-api.py b/examples/nlp_and_llms/nvidia-rag-mini/rag-api.py
new file mode 100644
index 00000000..aa072e79
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-rag-mini/rag-api.py
@@ -0,0 +1,17 @@
+from fastapi import FastAPI
+from pydantic import BaseModel
+from rag_machine import query_docs, index_documents
+
+app = FastAPI(title="RAG Mini Docs Q&A")
+
+class QueryRequest(BaseModel):
+    query: str
+
+@app.on_event("startup")
+def startup_event():
+    index_documents(reindex=False)
+
+@app.post("/query")
+def query(req: QueryRequest):
+    answer = query_docs(req.query)
+    return {"result": answer}
diff --git a/examples/nlp_and_llms/nvidia-rag-mini/rag_machine.py b/examples/nlp_and_llms/nvidia-rag-mini/rag_machine.py
new file mode 100644
index 00000000..5721867a
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-rag-mini/rag_machine.py
@@ -0,0 +1,114 @@
+# rag_machine.py
+from pathlib import Path
+import os
+import torch
+from sentence_transformers import SentenceTransformer
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+import chromadb
+
+# --------------------------
+# 🔧 Configuration
+# --------------------------
+CHROMA_DIR = "rag_chroma_store"
+DATA_DIR = Path("data")
+EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
+LLM_MODEL = "google/flan-t5-base"
+
+os.environ["TOKENIZERS_PARALLELISM"] = "false"
+
+DATA_DIR.mkdir(exist_ok=True)
+Path(CHROMA_DIR).mkdir(exist_ok=True)
+
+# --------------------------
+# ⚙️ Initialize Components
+# --------------------------
+print("🚀 Loading models...")
+embedder = SentenceTransformer(EMBED_MODEL)
+tokenizer = AutoTokenizer.from_pretrained(LLM_MODEL)
+llm = AutoModelForSeq2SeqLM.from_pretrained(LLM_MODEL)
+
+client = chromadb.PersistentClient(path=CHROMA_DIR)
+collection = client.get_or_create_collection("rag_docs")
+
+# --------------------------
+# 📚 Document Loader
+# --------------------------
+def load_all_documents(data_dir: Path):
+    docs = []
+    for file in data_dir.glob("*.txt"):
+        with open(file, "r", encoding="utf-8") as f:
+            text = f.read().strip()
+            if text:
+                docs.append({"file": file.name, "text": text})
+                print(f"📄 Loaded: {file.name}")
+    return docs
+
+# --------------------------
+# 🔢 Index Documents
+# --------------------------
+def index_documents(reindex: bool = False):
+    """Rebuild or load existing document embeddings."""
+    if reindex:
+        print("♻️ Reindexing documents...")
+        try:
+            collection.reset()
+            print("🧹 Cleared existing collection.")
+        except AttributeError:
+            ids = collection.get()["ids"]
+            if ids:
+                collection.delete(ids=ids)
+                print("🧹 Deleted existing documents manually.")
+
+        docs = load_all_documents(DATA_DIR)
+        for i, d in enumerate(docs):
+            emb = embedder.encode(d["text"])
+            collection.add(
+                ids=[str(i)],
+                documents=[d["text"]],
+                embeddings=[emb.tolist()],
+                metadatas=[{"source": d["file"]}],
+            )
+        print("✅ Documents reindexed and stored in Chroma.")
+    else:
+        print("📦 Using existing Chroma store.")
+
+
+# --------------------------
+# 🔍 Query System
+# --------------------------
+def query_docs(question: str, top_k: int = 3):
+    """Retrieve top-k relevant docs and generate an answer."""
+    print(f"\n🔍 Question: {question}")
+
+    # Embed the query and search
+    q_emb = embedder.encode(question).tolist()
+    results = collection.query(query_embeddings=[q_emb], n_results=top_k)
+
+    if not results["documents"]:
+        return "No relevant documents found."
+
+    context = "\n".join(results["documents"][0])
+    prompt = f"Answer based on the following context:\n{context}\n\nQuestion: {question}"
+
+    inputs = tokenizer(prompt, return_tensors="pt", truncation=True)
+    outputs = llm.generate(**inputs, max_length=512)
+    answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
+
+    return answer
+
+# --------------------------
+# 🧪 CLI Test Mode
+# --------------------------
+if __name__ == "__main__":
+    print("🧠 Starting RAG Machine (Transformers + Chroma)...")
+    index_documents(reindex=True)
+
+    while True:
+        q = input("\n❓ Enter your question (or 'exit'): ").strip()
+        if q.lower() == "exit":
+            break
+        try:
+            ans = query_docs(q)
+            print(f"\n💬 {ans}\n")
+        except Exception as e:
+            print(f"⚠️ Error: {e}")
diff --git a/examples/nlp_and_llms/nvidia-rag-mini/requirements.txt b/examples/nlp_and_llms/nvidia-rag-mini/requirements.txt
new file mode 100644
index 00000000..624f58dd
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-rag-mini/requirements.txt
@@ -0,0 +1,7 @@
+torch>=2.2.0
+transformers>=4.44.0
+sentence-transformers>=3.0.0
+chromadb>=0.5.0
+fastapi>=0.115.0
+uvicorn[standard]>=0.30.0
+pydantic>=2.7.0
diff --git a/examples/nlp_and_llms/nvidia-rag-serve-api/README.md b/examples/nlp_and_llms/nvidia-rag-serve-api/README.md
new file mode 100644
index 00000000..911fabd4
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-rag-serve-api/README.md
@@ -0,0 +1,105 @@
+# 📘 Ray Serve LLM API — Qwen 1.5B (vLLM)
+
+This template shows how to deploy a **Qwen2.5-1.5B-Instruct LLM** using:
+
+* **Ray Serve**
+* **vLLM**
+* **OpenAI-compatible API format**
+
+You get a local inference server running at:
+
+```
+http://127.0.0.1:8000/v1/chat/completions
+```
+
+This template is designed for **Saturn Cloud custom templates** so users can plug-and-play LLM inference environments with GPU acceleration.
+
+🔗 **Back to Saturn Cloud → [https://saturncloud.io](https://saturncloud.io)**
+
+---
+
+## 🚀 Features
+
+* Fully OpenAI-compatible API endpoint
+* Deploys Qwen 1.5B using vLLM (fast inference)
+* Simple Ray Serve deployment
+* Example client request included
+* Clean and minimal code structure
+* Works inside Jupyter or full terminal environment
+
+---
+
+## 📦 Requirements
+
+The notebook installs everything automatically:
+
+```
+torch
+transformers
+ray[serve, llm]
+fastapi
+uvicorn
+requests
+huggingface_hub
+```
+
+GPU recommended for optimal performance.
+
+---
+
+## 📁 Project Structure
+
+```
+ray-serve-llm/
+│
+├── serve_llm.py          # Ray Serve deployment definition
+├── start_server.py       # Ray launcher (if using outside notebook)
+├── test_client.py        # Example API client test
+└── ray_serve_llm_template.ipynb   # Full Jupyter notebook template (generated)
+```
+
+---
+
+## ▶️ How It Works
+
+### 1. Write your Ray Serve deployment file
+
+Defines:
+
+* Model ID (`Qwen2.5-1.5B-Instruct`)
+* Engine config
+* Autoscaling
+* OpenAI-compatible app
+
+### 2. Start Ray and deploy the model
+
+Ray Serve loads the model via vLLM and exposes the API.
+
+### 3. Send a test request
+
+JSON API format identical to OpenAI:
+
+```python
+payload = {
+    "model": "qwen-1.5b",
+    "messages": [{"role": "user", "content": "Explain API design."}]
+}
+```
+
+### 4. Extract the assistant text
+
+```python
+res = out.json()["choices"][0]["message"]["content"]
+```
+
+---
+
+## 🏁 Conclusion
+
+This template provides a clean, reproducible Ray Serve LLM deployment that works both in Jupyter and full terminal mode.
+You can adapt it to larger models, scale it across nodes, or wrap it inside FastAPI.
+
+🔗 **Back to Saturn Cloud → [https://saturncloud.io](https://saturncloud.io)**
+
+---
+
diff --git a/examples/nlp_and_llms/nvidia-rag-serve-api/ray_serve_llm.ipynb b/examples/nlp_and_llms/nvidia-rag-serve-api/ray_serve_llm.ipynb
new file mode 100644
index 00000000..fe2de35a
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-rag-serve-api/ray_serve_llm.ipynb
@@ -0,0 +1,188 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d1a1950f",
+   "metadata": {},
+   "source": [
+    "# 🚀 Ray Serve LLM API\n",
+    "\n",
+    "This template demonstrates how to deploy **Models** using **Ray Serve + vLLM** and expose it through an **OpenAI-compatible API**.\n",
+    "\n",
+    "This a custom template on **Saturn Cloud custom templates** so users can plug-and-play LLM inference environments with GPU acceleration.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d06caaab",
+   "metadata": {},
+   "source": [
+    "## 📦 Install required libraries\n",
+    "Install all the requireed library for the template"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "78b5ec11",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install required libraries\n",
+    "!pip install torch transformers fastapi uvicorn \"ray[serve, llm]\" requests huggingface_hub\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec4c2ac1",
+   "metadata": {},
+   "source": [
+    "## 🧩 Create Ray Serve Deployment File\n",
+    "\n",
+    "his writes a file called **`serve_llm.py`** which:\n",
+    "\n",
+    "* Configures the model (Qwen2.5-1.5B-Instruct)\n",
+    "* Creates a Ray Serve LLMConfig\n",
+    "* Builds an OpenAI-compatible API using Ray's `build_openai_app`\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bc3b43ec",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%writefile serve_llm.py\n",
+    "from ray.serve.llm import LLMConfig, build_openai_app\n",
+    "\n",
+    "MODEL_ID = \"Qwen/Qwen2.5-1.5B-Instruct\"\n",
+    "MODEL_ALIAS = \"qwen-1.5b\"\n",
+    "\n",
+    "engine_kwargs = dict(\n",
+    "    tensor_parallel_size=1,\n",
+    "    max_model_len=4096,\n",
+    ")\n",
+    "\n",
+    "deployment_config = dict(\n",
+    "    autoscaling_config=dict(\n",
+    "        min_replicas=1,\n",
+    "        max_replicas=1,\n",
+    "    )\n",
+    ")\n",
+    "\n",
+    "llm_config = LLMConfig(\n",
+    "    model_loading_config=dict(\n",
+    "        model_id=MODEL_ALIAS,\n",
+    "        model_source=MODEL_ID,\n",
+    "    ),\n",
+    "    engine_kwargs=engine_kwargs,\n",
+    "    deployment_config=deployment_config,\n",
+    ")\n",
+    "\n",
+    "app = build_openai_app({\"llm_configs\": [llm_config]})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f3464f5",
+   "metadata": {},
+   "source": [
+    "## ▶️ Start Ray Serve and Deploy the Model\n",
+    "\n",
+    "This will:\n",
+    "\n",
+    "* Initialize Ray\n",
+    "* Start Ray Serve\n",
+    "* Deploy the Qwen model as an API at:\n",
+    "  **[http://127.0.0.1:8000/v1/chat/completions](http://127.0.0.1:8000/v1/chat/completions)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1e011e24",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import ray\n",
+    "from serve_llm import app\n",
+    "from ray import serve\n",
+    "\n",
+    "ray.init(ignore_reinit_error=True)\n",
+    "\n",
+    "serve.start(detached=False)\n",
+    "serve.run(app)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3700cc7d",
+   "metadata": {},
+   "source": [
+    "## 💬 Test the API\n",
+    "\n",
+    "Sends a real chat request to your Ray Serve LLM deployment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eb912c3a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "\n",
+    "payload = {\n",
+    "    \"model\": \"qwen-1.5b\",\n",
+    "    \"messages\": [{\"role\": \"user\", \"content\": \"Explain API design.\"}]\n",
+    "}\n",
+    "\n",
+    "out = requests.post(\"http://127.0.0.1:8000/v1/chat/completions\", json=payload)\n",
+    "print(out.json())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78c72539",
+   "metadata": {},
+   "source": [
+    "## ✨ Extract Only the Model \n",
+    "\n",
+    "This grabs the generated text only (no metadata)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e440e110",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "res = out.json()[\"choices\"][0][\"message\"][\"content\"]\n",
+    "print(res)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17f4ac64",
+   "metadata": {},
+   "source": [
+    "## 🏁 **Conclusion**\n",
+    "\n",
+    "You now have a fully running **Ray Serve LLM API** using Qwen2.5-1.5B-Instruct, powered by **vLLM** and exposed through an **OpenAI-compatible endpoint**.\n",
+    "This template can be extended to larger models, added to pipelines, or used inside production-grade ML workloads within Saturn Cloud.\n",
+    "\n",
+    "🔗 **Back to Saturn Cloud → [https://saturncloud.io](https://saturncloud.io)**"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/nlp_and_llms/nvidia-vector-db/.env b/examples/nlp_and_llms/nvidia-vector-db/.env
new file mode 100644
index 00000000..1622d39c
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vector-db/.env
@@ -0,0 +1,3 @@
+ZILLIZ_URI="https://in03-e969f44404493f8.serverless.aws-eu-central-1.cloud.zilliz.com"
+ZILLIZ_TOKEN="a71de8fc4a75f5cb758d0fcf2b92fb2ebc1f851d7e776247d440e887cd355d7b575649f63a514fb7b78fdeac6f3b416e2ef11150"
+PG_CONNECTION="postgresql://neondb_owner:npg_ymHkZNUVr2I7@ep-lingering-silence-ah4wmlqw-pooler.c-3.us-east-1.aws.neon.tech/neondb?sslmode=require&channel_binding=require"
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-vector-db/README.md b/examples/nlp_and_llms/nvidia-vector-db/README.md
new file mode 100644
index 00000000..0d97129f
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vector-db/README.md
@@ -0,0 +1,233 @@
+
+# 🚀 **Vector DB Menu (FAISS • Zilliz Milvus • Neon PGVector)**
+
+> A unified FastAPI search service that lets you test and compare **FAISS (local)**, **Milvus (Zilliz Cloud free tier)**, and **PostgreSQL with PGVector (Neon free tier)** using a common API.
+
+🔗 **Built for the Saturn Cloud AI Community**
+👉 [https://saturncloud.io/](https://saturncloud.io/)
+
+---
+
+## 🧠 Overview
+
+This project loads a public dataset (State of the Union speeches), embeds it with `sentence-transformers/all-MiniLM-L6-v2`, stores vectors in **three different databases**, and exposes a **FastAPI endpoint** to query them interchangeably.
+
+### ✅ What’s included:
+
+* FAISS (local in-memory vector search)
+* Milvus (via **Zilliz Cloud free tier**)
+* PostgreSQL + PGVector (via **Neon free tier**)
+* FastAPI for querying all 3 backends
+* CLI & Browser UI testing
+* Modular, deploy-ready architecture
+
+---
+
+## ⚠️ Free-Tier Credentials Notice
+
+This repo includes **working test credentials** for quick validation.
+However, because they are **free-tier**, they may:
+
+⚠️ expire at any time
+⚠️ be rate-limited
+⚠️ be deleted automatically
+
+✅ You are **strongly encouraged to create your own accounts** using the setup guide below.
+
+---
+
+---
+
+# 🛠️ **1. Project Setup**
+
+### Clone Repository
+
+```sh
+git clone https://github.com/your-repo/nvidia-vector-db.git
+cd nvidia-vector-db
+```
+
+---
+
+### Create and Activate Virtual Environment
+
+#### Windows (PowerShell)
+
+```sh
+python -m venv vectordb-env
+vectordb-env\Scripts\activate
+```
+
+#### macOS / Linux
+
+```sh
+python3 -m venv vectordb-env
+source vectordb-env/bin/activate
+```
+
+---
+
+### Install Dependencies
+
+```sh
+pip install -r requirements.txt
+```
+
+---
+
+# ☁️ **2. Create Neon (PostgreSQL + PGVector) Free Account**
+
+1. Visit: [https://neon.tech/](https://neon.tech/)
+2. Click **Sign Up** (free tier)
+3. Create a new project
+4. Go to **Dashboard → Connection Details**
+5. Copy the connection string:
+
+   ```
+   postgresql://<user>:<password>@<host>.neon.tech/<db>?sslmode=require
+   ```
+6. Edit it to SQLAlchemy format for this project:
+
+   ```
+   postgresql+psycopg2://<user>:<password>@<host>.neon.tech/<db>?sslmode=require
+   ```
+
+---
+
+# ☁️ **3. Create Zilliz Cloud (Milvus) Free Account**
+
+1. Visit: [https://cloud.zilliz.com/signup](https://cloud.zilliz.com/signup)
+2. Create account (Free tier)
+3. Create a new **Serverless cluster**
+4. Go to **API Keys**
+5. Copy:
+
+   * `Public Endpoint (URI)`
+   * `API Key (Token)`
+
+Example:
+
+```
+ZILLIZ_URI=https://in03-xxxx.serverless.aws-eu-central-1.cloud.zilliz.com
+ZILLIZ_TOKEN=xxxxxxxxxxxxxxxxxxxx
+```
+
+---
+
+# 🧩 **4. Configure Environment Variables**
+
+Create a `.env` file in the project root:
+
+```
+PG_CONNECTION=postgresql+psycopg2://your-user:your-pass@your-host.neon.tech/your-db?sslmode=require
+
+ZILLIZ_URI=https://your-zilliz-endpoint.serverless.aws-xyz.cloud.zilliz.com
+ZILLIZ_TOKEN=your-zilliz-api-key
+```
+
+> ⚠️ You may test with the current free credentials included in the code, but replace them when creating yours.
+
+---
+
+# 🧱 **5. Load Dataset & Build Vector Stores**
+
+Run:
+
+```sh
+python data_loader.py
+```
+
+Expected output:
+
+```
+✅ Split into X chunks
+🚀 Loading FAISS...
+🚀 Connecting to Zilliz Cloud...
+🚀 Connecting to PGVector (Neon)...
+✅ All vector stores ready!
+```
+
+---
+
+# 🚀 **6. Start the FastAPI Server**
+
+```sh
+uvicorn app:app --reload
+```
+
+Server should start at:
+
+```
+http://127.0.0.1:8000
+```
+
+Swagger UI (API testing interface):
+
+```
+http://127.0.0.1:8000/docs
+```
+
+---
+
+# 🧪 **7. Test the API**
+
+## ✅ Browser UI (Swagger)
+
+1. Open: `http://127.0.0.1:8000/docs`
+2. Go to **POST /search**
+3. Test queries like:
+
+```json
+{
+  "db": "faiss",
+  "query": "Who talked about peace?",
+  "k": 3
+}
+```
+
+Try other DBs:
+
+```json
+{ "db": "milvus", "query": "war economy", "k": 3 }
+{ "db": "pgvector", "query": "mars mission", "k": 3 }
+```
+
+---
+
+## ✅ CLI Testing with `curl`
+
+```sh
+curl -X POST "http://127.0.0.1:8000/search" \
+  -H "Content-Type: application/json" \
+  -d '{"db":"faiss", "query":"state of the economy", "k":2}'
+```
+
+```sh
+curl -X POST "http://127.0.0.1:8000/search" \
+  -H "Content-Type: application/json" \
+  -d '{"db":"milvus", "query":"foreign policy", "k":2}'
+```
+
+```sh
+curl -X POST "http://127.0.0.1:8000/search" \
+  -H "Content-Type: application/json" \
+  -d '{"db":"pgvector", "query":"education reform", "k":2}'
+```
+
+---
+
+# 🧬 Supported Vector Backends
+
+| Backend           | Type           | Notes                                |
+| ----------------- | -------------- | ------------------------------------ |
+| **FAISS**         | Local          | Fastest, no cloud, resets on restart |
+| **Zilliz Milvus** | Cloud          | Free tier, scalable, best for prod   |
+| **Neon PGVector** | Cloud Postgres | SQL + vectors, persistent, queryable |
+
+
+# 🌎 About Saturn Cloud
+
+If you're experimenting with **GPU workloads, LLM inference, vector search, or MLOps**, check out the best community platform for AI builders:
+
+🔗 **[https://saturncloud.io/](https://saturncloud.io/)**
+
diff --git a/examples/nlp_and_llms/nvidia-vector-db/app.py b/examples/nlp_and_llms/nvidia-vector-db/app.py
new file mode 100644
index 00000000..78b9df69
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vector-db/app.py
@@ -0,0 +1,19 @@
+from fastapi import FastAPI
+from pydantic import BaseModel
+from vectordb import search
+
+app = FastAPI(title="RAG Vector DB Compare API")
+
+class QueryRequest(BaseModel):
+    db: str  # faiss | milvus | pgvector
+    query: str
+    k: int = 3
+
+@app.post("/search")
+def search_vectors(req: QueryRequest):
+    results = search(req.db.lower(), req.query, req.k)
+    return {
+        "db": req.db,
+        "query": req.query,
+        "results": [r.page_content[:400] for r in results]
+    }
diff --git a/examples/nlp_and_llms/nvidia-vector-db/data_loader.py b/examples/nlp_and_llms/nvidia-vector-db/data_loader.py
new file mode 100644
index 00000000..f4043694
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vector-db/data_loader.py
@@ -0,0 +1,14 @@
+from datasets import load_dataset
+from langchain_core.documents import Document
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+
+
+def load_and_chunk():
+    print("📥 Loading dataset...")
+    ds = load_dataset("jsulz/state-of-the-union-addresses")
+
+    texts = [row["speech_html"] for row in ds["train"]]
+    docs = [Document(page_content=t) for t in texts]
+
+    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
+    return splitter.split_documents(docs)
diff --git a/examples/nlp_and_llms/nvidia-vector-db/embed.py b/examples/nlp_and_llms/nvidia-vector-db/embed.py
new file mode 100644
index 00000000..ef3d3f13
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vector-db/embed.py
@@ -0,0 +1,4 @@
+from langchain_huggingface import HuggingFaceEmbeddings
+
+def get_embeddings():
+    return HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
diff --git a/examples/nlp_and_llms/nvidia-vector-db/requirements.txt b/examples/nlp_and_llms/nvidia-vector-db/requirements.txt
new file mode 100644
index 00000000..75eb6e47
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vector-db/requirements.txt
@@ -0,0 +1,15 @@
+fastapi
+uvicorn
+datasets
+langchain
+langchain-core
+langchain-huggingface
+langchain-community
+langchain-postgres
+sentence-transformers
+langchain-milvus
+pymilvus
+faiss-cpu
+psycopg2-binary
+python-dotenv
+sqlalchemy
diff --git a/examples/nlp_and_llms/nvidia-vector-db/vectordb.py b/examples/nlp_and_llms/nvidia-vector-db/vectordb.py
new file mode 100644
index 00000000..47ccf520
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vector-db/vectordb.py
@@ -0,0 +1,59 @@
+import os
+from langchain_community.vectorstores import FAISS
+from langchain_postgres.vectorstores import PGVector
+from langchain_milvus import Milvus
+from pymilvus import connections
+from sqlalchemy import create_engine
+
+
+from data_loader import load_and_chunk
+from embed import get_embeddings
+
+# Load data + embeddings
+docs = load_and_chunk()
+embeddings = get_embeddings()
+
+# ---------- FAISS ----------
+print("🚀 Loading FAISS...")
+faiss_db = FAISS.from_documents(docs, embeddings)
+
+# ---------- Milvus (Zilliz) ----------
+print("🚀 Connecting to Zilliz Cloud...")
+connections.connect(
+    alias="default",
+    uri=os.getenv("ZILLIZ_URI"),
+    token=os.getenv("ZILLIZ_TOKEN")
+)
+milvus_db = Milvus.from_documents(
+    docs,
+    embeddings,
+    collection_name="state_union_collection",
+    connection_args={"alias": "default"},
+)
+
+# ---------- PGVector ----------
+print("🚀 Connecting to PGVector...")
+NEON_CONN = os.getenv("PG_CONNECTION")  # must contain full neon URL
+print("🚀 Connecting to Neon PGVector...")
+engine = create_engine(NEON_CONN)
+
+pg_db = PGVector.from_documents(
+    docs,
+    embeddings,
+    connection=engine,
+    collection_name="state_union_pg"
+)
+
+print("✅ PGVector (Neon) loaded!")
+
+print("✅ All vector DBs ready!")
+
+# Generic search method
+def search(db: str, query: str, k: int = 3):
+    if db == "faiss":
+        return faiss_db.similarity_search(query, k)
+    if db == "milvus":
+        return milvus_db.similarity_search(query, k)
+    if db == "pgvector":
+        return pg_db.similarity_search(query, k)
+    return []
diff --git a/examples/nlp_and_llms/nvidia-vllm-7b/README.md b/examples/nlp_and_llms/nvidia-vllm-7b/README.md
new file mode 100644
index 00000000..503f7a73
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vllm-7b/README.md
@@ -0,0 +1,69 @@
+# 🧠 LLM Inference with vLLM 7B
+
+**Saturn Cloud | GPU-Optimised Template**
+
+Run and serve large language models (LLMs) efficiently using **vLLM**, a high-performance inference and serving engine designed for speed and scalability.
+This Saturn Cloud template demonstrates how to deploy **7B-class models** such as *Mistral*, *Llama*, or *Gemma* for text generation and interactive inference.
+
+---
+
+## 🚀 Overview
+
+**vLLM** delivers lightning-fast text generation through techniques such as **PagedAttention**, **continuous batching**, and **quantisation**.
+On **Saturn Cloud**, this notebook enables you to:
+
+* Deploy and test 7B-class LLMs for inference and serving.
+* Scale seamlessly from a single GPU to **multi-GPU clusters**.
+* Experiment interactively or integrate models into larger data-science pipelines.
+
+> ⚙️ Fully compatible with Saturn Cloud’s managed GPU environments and ready for immediate use.
+
+---
+
+## 🧩 Features
+
+* **Pre-configured vLLM environment** for fast setup.
+* **Support for NVIDIA GPUs** (A10G, A100) and multi-GPU scaling.
+* **Quick-start workflow**: load, run, and test model prompts.
+* **Local API-style inference** via vLLM’s serving engine.
+* **Interactive prompt input** for experimentation.
+
+---
+
+## 📋 Requirements
+
+* **Saturn Cloud account** with GPU instance access.
+* Python ≥ 3.12
+* Compatible with **CUDA 12.0+** and **Transformers ≥ 4.40**
+
+All dependencies are pre-installed when running the notebook on Saturn Cloud.
+
+---
+
+## 💡 Usage
+
+1. **Open the template** in Saturn Cloud.
+2. **Select a GPU instance** (A10G or A100 recommended).
+3. **Run the notebook cells sequentially** to:
+
+   * Install dependencies
+   * Configure vLLM settings
+   * Load and test your model
+   * Input prompts interactively to generate text
+
+> For production, vLLM can also serve models as an **OpenAI-compatible API** using the `vllm serve` command.
+
+---
+
+## 🧭 Learn More
+
+* [Saturn Cloud Documentation](https://saturncloud.io/docs/?utm_source=github&utm_medium=template)
+* [Saturn Cloud Templates](https://saturncloud.io/templates/?utm_source=github&utm_medium=template)
+* [vLLM Official Docs](https://docs.vllm.ai/en/latest/?utm_source=saturn&utm_medium=template)
+
+---
+
+## 🏁 Conclusion
+
+This template provides a ready-to-run setup for **LLM inference with vLLM 7B on Saturn Cloud**, combining high performance, scalability, and ease of use.
+Adapt it for experimentation, prototyping, or production-grade LLM deployments in your Saturn Cloud workspace.
diff --git a/examples/nlp_and_llms/nvidia-vllm-7b/nvidia_vllm_7b.ipynb b/examples/nlp_and_llms/nvidia-vllm-7b/nvidia_vllm_7b.ipynb
new file mode 100644
index 00000000..9c9dac42
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vllm-7b/nvidia_vllm_7b.ipynb
@@ -0,0 +1,255 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Es_w2TvemoO3"
+      },
+      "source": [
+        "# LLM Inference vLLM 7B\n",
+        "\n",
+        "![chat Bubbles](https://cdn-icons-png.flaticon.com/512/2076/2076246.png) ![GPU Illustration](https://cdn-icons-png.flaticon.com/512/4854/4854226.png)\n",
+        "\n",
+        "**vLLM** is a high-performance inference and serving engine for large language models, optimised for speed and scalability. It delivers efficient text generation through innovations such as **PagedAttention**,** continuous batching**, and support for **quantisation**.\n",
+        "\n",
+        "This is a template demonstrates on how to run **7B-class models** (e.g. Mistral, Llama, Gemma) on Saturn Cloud.\n",
+        "\n",
+        "On [Saturn Cloud](https://saturncloud.io), you can scale from a single NVIDIA GPU to multi-GPU clusters, enabling distributed inference for larger models or higher throughput workloads — all within a managed, GPU-ready environment."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "1hhl8dEPmoO5"
+      },
+      "source": [
+        "## 1. Install dependencies\n",
+        "\n",
+        "\n",
+        "We install **vLLM** and **Transformers**. A recent NVIDIA CUDA runtime is recommended for best performance."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "xDTiLAdfmoO6"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install -q jedi\n",
+        "!pip install -q vllm transformers\n",
+        "!pip install uv\n",
+        "!uv venv vllm-env -p 3.12\n",
+        "!source vllm-env/bin/activate && uv pip install vllm\n",
+        "!source vllm-env/bin/activate && pip install ipykernel\n",
+        "!python -m ipykernel install --user --name=vllm-env --display-name \"vLLM Env\"\n",
+        "\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ehqOzc4hmoO8"
+      },
+      "source": [
+        "## 2. Environment check\n",
+        "\n",
+        "Verify the GPU is visible and print library versions. Confirm the environment is GPU-enabled."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "_A7AYnJmmoO9"
+      },
+      "outputs": [],
+      "source": [
+        "import torch, platform\n",
+        "import vllm, transformers\n",
+        "\n",
+        "cuda_ok = torch.cuda.is_available()\n",
+        "print(f\"✅ CUDA available: {cuda_ok}\")\n",
+        "if cuda_ok:\n",
+        "    print(\"🧠 GPU:\", torch.cuda.get_device_name(0))\n",
+        "print(\"🧩 torch:\", torch.__version__)\n",
+        "print(\"🧩 vllm:\", vllm.__version__)\n",
+        "print(\"🧩 transformers:\", transformers.__version__)\n",
+        "print(\"🐍 python:\", platform.python_version())"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Qpk7TkAhmoO-"
+      },
+      "source": [
+        "## 3. Select model and vLLM settings\n",
+        "\n",
+        "Choose a **7B** model from Hugging Face. The defaults below work with common, openly available options. If a model is gated, select a different one."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Vujk0jtwmoO-"
+      },
+      "outputs": [],
+      "source": [
+        "# 🔧 Model & runtime config (edit these as needed)\n",
+        "MODEL_ID = \"mistralai/Mistral-7B-Instruct-v0.2\"  # e.g., \"meta-llama/Llama-2-7b-chat-hf\", \"google/gemma-7b\"\n",
+        "DTYPE = \"auto\"                 # \"auto\", \"float16\", \"bfloat16\", \"float32\"\n",
+        "TENSOR_PARALLEL = 1            # single GPU = 1\n",
+        "GPU_MEMORY_UTIL = 0.90         # 0.6–0.95 depending on VRAM\n",
+        "MAX_MODEL_LEN = 8192           # context length (depends on model)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "gMjiJkoTmoPA"
+      },
+      "source": [
+        "## 4. Basic model inference\n",
+        "\n",
+        "Load the model with **vLLM** and generate text for one or more prompts using **SamplingParams** (temperature, top_p, max_tokens, etc.)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "D7IXT5FWmoPB"
+      },
+      "outputs": [],
+      "source": [
+        "from vllm import LLM, SamplingParams\n",
+        "\n",
+        "print(\"⏳ Loading model (this may download weights on first run)...\")\n",
+        "llm = LLM(\n",
+        "    model=MODEL_ID,\n",
+        "    dtype=DTYPE,\n",
+        "    tensor_parallel_size=TENSOR_PARALLEL,\n",
+        "    gpu_memory_utilization=GPU_MEMORY_UTIL,\n",
+        "    max_model_len=MAX_MODEL_LEN,\n",
+        ")\n",
+        "print(\"✅ Model loaded!\")\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 5. Sample prompts\n",
+        "\n",
+        "Use the customise Let's test the model using sample prompts."
+      ],
+      "metadata": {
+        "id": "yaaCIaOfDILx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Example prompts\n",
+        "prompts = [\n",
+        "    \"You are a helpful assistant. Summarise why efficient attention helps LLM inference.\",\n",
+        "    \"List three creative uses of a 7B model for education.\",\n",
+        "]\n",
+        "\n",
+        "# Sampling parameters\n",
+        "sampling = SamplingParams(\n",
+        "    temperature=0.7,\n",
+        "    top_p=0.9,\n",
+        "    max_tokens=256,\n",
+        ")\n",
+        "\n",
+        "# Generate\n",
+        "outputs = llm.generate(prompts, sampling)\n",
+        "for out in outputs:\n",
+        "    print(\"\\n---\")\n",
+        "    print(\"Prompt:\", out.prompt)\n",
+        "    print(\"Completion:\", out.outputs[0].text.strip())\n"
+      ],
+      "metadata": {
+        "id": "1s_ALheCCwfP"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## 6. User Custom Prompt Testing\n",
+        "\n",
+        "You can enter your prompt to test the model's chat capabilities here."
+      ],
+      "metadata": {
+        "id": "kaSLGm0_GL62"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Helper function for quick generation\n",
+        "def generate_text(prompt, temperature=0.7, top_p=0.9, max_tokens=256):\n",
+        "    params = SamplingParams(temperature=temperature, top_p=top_p, max_tokens=max_tokens)\n",
+        "    result = llm.generate([prompt], params)[0].outputs[0].text\n",
+        "    return result.strip()\n",
+        "\n",
+        "print(\"\\nQuick test:\")\n",
+        "new_Prompt = input(\"Enter a prompt: \")\n",
+        "print(generate_text(new_Prompt))\n",
+        "\n",
+        "\n",
+        "# print(generate_text(\"Explain what continuous batching means in vLLM.\"))"
+      ],
+      "metadata": {
+        "id": "AI9CELj5Ej5g"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "yJSF_-4FmoPD"
+      },
+      "source": [
+        "## 7. Conclusion\n",
+        "\n",
+        "You have successfully deployed and run a 7B-class Large Language Model using vLLM on Saturn Cloud. This template demonstrates how to perform high-speed inference, interact with your model via prompts, and scale seamlessly across single or multiple GPUs.\n",
+        "\n",
+        "\n",
+        "By using [Saturn Cloud’s GPU infrastructure](https://saturncloud.io/docs/user-guide/how-to/resources/), you can easily extend this workflow for larger models, API serving, or integrated data science pipelines — all within a managed, scalable environment designed for production-grade AI workloads. Visit [saturn cloud](https://saturncloud.io/) to easily deploy this model."
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.13.7",
+      "mimetype": "text/x-python",
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "pygments_lexer": "ipython3",
+      "nbconvert_exporter": "python",
+      "file_extension": ".py"
+    },
+    "colab": {
+      "provenance": [],
+      "gpuType": "A100"
+    },
+    "accelerator": "GPU"
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-vllm-tp/README.md b/examples/nlp_and_llms/nvidia-vllm-tp/README.md
new file mode 100644
index 00000000..56413701
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vllm-tp/README.md
@@ -0,0 +1,164 @@
+# **vLLM Server for Meta-Llama-3-70B-Instruct**
+
+This template provides a **production-ready deployment environment** on **Saturn Cloud** to serve the **Meta-Llama-3-70B-Instruct** model using the high-performance **vLLM** inference engine and a FastAPI web API.
+
+➡️ **Saturn Cloud lets you quickly launch multi-GPU machines to run large-scale models like Llama-3 70B. Learn more:**
+[https://saturncloud.io](https://saturncloud.io)
+
+---
+
+## **🔎 Overview**
+
+The **Meta-Llama-3-70B-Instruct** model is a powerful open-source LLM from Meta AI.
+This template demonstrates how to deploy it efficiently using:
+
+* **Model:** `meta-llama/Meta-Llama-3-70B-Instruct`
+* **Inference Engine:** vLLM
+* **API Interface:** FastAPI (OpenAI-compatible)
+* **Precision:** bfloat16
+* **Parallelism:** Tensor Parallelism across 4 GPUs
+* **Use Cases:** Chatbots, RAG systems, model serving backends, enterprise AI apps
+
+vLLM provides optimized **PagedAttention**, **continuous batching**, and multi-GPU scaling—resulting in **significantly faster inference** compared to HuggingFace Transformers.
+
+---
+
+## **💻 Requirements & Setup**
+
+Running a 70B parameter model requires substantial hardware and proper authentication.
+
+---
+
+### **1. Hardware Requirements**
+
+To run Llama-3 70B with vLLM, you need:
+
+| Component              | Minimum Requirement                                |
+| ---------------------- | -------------------------------------------------- |
+| **GPUs**               | 4× GPUs (A40 48GB, RTX 3090/4090, or 2× A100 80GB) |
+| **VRAM**               | ~140GB total (bfloat16 precision)                  |
+| **Disk Space**         | **150GB+** to store model weights                  |
+| **Tensor Parallelism** | `tensor_parallel_size = 4`                         |
+
+This template is suited for Saturn Cloud multi-GPU instances.
+
+---
+
+### **2. Hugging Face Authentication (Required)**
+
+Llama-3 models are **license-restricted** ("gated").
+You must authenticate before downloading.
+
+#### **Steps:**
+
+1. **Accept the License**
+   Visit:
+   [https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)
+
+2. **Log in via CLI**
+
+   ```bash
+   hf auth login
+   ```
+
+   Paste your HF access token.
+
+3. **Token in Script**
+   A placeholder token is included for testing, but **you must use your own token** in production.
+
+
+---
+
+### **3. Environment Setup**
+
+#### Create & activate your Python environment:
+
+```bash
+python3 -m venv env
+source env/bin/activate
+```
+
+#### Install dependencies:
+
+```bash
+pip install -r requirements.txt
+```
+
+All required libraries (vLLM, FastAPI, Uvicorn, HF Hub support) are included inside `requirements.txt`.
+
+---
+
+## **🚀 Running the Model**
+
+### **Step 1 — Start the API Server**
+
+Launch the vLLM FastAPI server:
+
+```bash
+python start_server.py
+```
+
+You will see logs for:
+
+* GPU detection
+* Model download progress
+* Tensor parallel initialization
+* Engine warm-up
+
+When ready:
+
+```
+INFO:     Uvicorn running on http://0.0.0.0:8000
+```
+
+Your vLLM server is now live and accepting OpenAI-style requests.
+
+---
+
+### **Step 2 — Test Using the Client Script**
+
+In a separate terminal window:
+
+```bash
+source env/bin/activate
+python test_client.py
+```
+
+You will receive a JSON response similar to:
+
+```json
+{
+  "choices": [
+    {
+      "message": {
+        "content": "Tensor parallelism is a technique that..."
+      }
+    }
+  ]
+}
+```
+
+This confirms the vLLM server is functioning correctly.
+
+---
+
+## **📌 Notes for Saturn Cloud Users**
+
+This template is ideal for running on **Saturn Cloud GPU clusters**, which provide:
+
+* Multi-GPU instances compatible with vLLM
+* Prebuilt CUDA, NCCL, Python environments
+* Fast storage needed for models of this size
+* Ability to schedule long-running inference servers
+
+➡️ Learn more or launch GPU resources: [https://saturncloud.io](https://saturncloud.io)
+
+---
+
+## **🏁 Conclusion**
+
+This template demonstrates how to deploy **Meta-Llama-3-70B-Instruct** efficiently using the **vLLM inference engine** with **tensor parallelism** across multiple GPUs.
+It provides a fast and scalable foundation for real-world applications such as chat systems, RAG pipelines, or large-scale AI services.
+
+By combining vLLM’s optimizations with infrastructure from **Saturn Cloud**, you get a robust, production-grade environment for serving massive open-source LLMs.
+
diff --git a/examples/nlp_and_llms/nvidia-vllm-tp/requirements.txt b/examples/nlp_and_llms/nvidia-vllm-tp/requirements.txt
new file mode 100644
index 00000000..20093290
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vllm-tp/requirements.txt
@@ -0,0 +1,7 @@
+vllm
+torch
+huggingface_hub
+fastapi
+uvicorn
+requests
+hf_transfer
\ No newline at end of file
diff --git a/examples/nlp_and_llms/nvidia-vllm-tp/start_server.py b/examples/nlp_and_llms/nvidia-vllm-tp/start_server.py
new file mode 100644
index 00000000..739fd5bb
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vllm-tp/start_server.py
@@ -0,0 +1,69 @@
+from vllm import LLM, SamplingParams
+from fastapi import FastAPI
+from pydantic import BaseModel
+import uvicorn
+import os
+
+# -----------------------------
+# ⚙️ Model Setup
+# -----------------------------
+# Use the Llama 3 model ID
+MODEL_ID = "meta-llama/Meta-Llama-3-70B-Instruct"
+
+# ---- Tensor Parallelism ----
+TENSOR_PARALLEL = 4
+
+# -----------------------------
+# 🚀 Initialize vLLM
+# -----------------------------
+print(f"🔄 Loading model {MODEL_ID} using vLLM tensor parallelism...")
+llm = LLM(
+    model=MODEL_ID,
+    tensor_parallel_size=TENSOR_PARALLEL,
+    gpu_memory_utilization=0.95, # High utilization as recommended
+    dtype="bfloat16",             # Use bfloat16 for Ampere GPUs (A40/3090/etc)
+    enforce_eager=True,           # Fixes AsyncEngineDead issues
+    max_model_len=8128,           # Matches the context length in the guide
+)
+
+sampling = SamplingParams(
+    temperature=0.7,
+    top_p=0.9,
+    max_tokens=512
+)
+
+# -----------------------------
+# 🌐 FastAPI (OpenAI-style API)
+# -----------------------------
+app = FastAPI(title="vLLM Tensor Parallel Server")
+
+
+class ChatRequest(BaseModel):
+    model: str
+    messages: list
+
+
+
+@app.post("/v1/chat/completions")
+async def chat(req: ChatRequest):
+    user_msg = req.messages[-1]["content"]
+
+    outputs = llm.generate([user_msg], sampling)
+    # Access the first element of the list before accessing attributes
+    text = outputs[0].outputs[0].text 
+
+    return {
+        "id": "tensorpar-chat",
+        "object": "chat.completion",
+        "model": req.model,
+        "choices": [
+            {"index": 0, "message": {"role": "assistant", "content": text}}
+        ]
+    }
+
+
+# -----------------------------
+# ▶️ Run the Server
+# -----------------------------
+if __name__ == "__main__":
+    uvicorn.run(app, host="0.0.0.0", port=8000)
diff --git a/examples/nlp_and_llms/nvidia-vllm-tp/test_client.py b/examples/nlp_and_llms/nvidia-vllm-tp/test_client.py
new file mode 100644
index 00000000..832890d2
--- /dev/null
+++ b/examples/nlp_and_llms/nvidia-vllm-tp/test_client.py
@@ -0,0 +1,24 @@
+import requests
+import json # Import json module
+
+payload = {
+    
+    "model": "meta-llama/Meta-Llama-3-70B-Instruct", 
+    "messages": [
+        {"role": "user", "content": "Explain tensor parallelism simply."}
+    ]
+}
+
+# Use the correct internal address
+url = "http://127.0.0.1:8000/v1/chat/completions" 
+
+res = requests.post(url, json=payload)
+
+# Check if the request was successful before parsing JSON
+if res.status_code == 200:
+    data = res.json()
+    print("RAW:", json.dumps(data, indent=2))
+    print("\nASSISTANT:", data["choices"][0]["message"]["content"])
+else:
+    print(f"Request failed with status code {res.status_code}")
+    print("Response text:", res.text)