diff --git a/examples/machine-learning_ClassicAI/cpu-hpo-optuna/README.md b/examples/machine-learning_ClassicAI/cpu-hpo-optuna/README.md new file mode 100644 index 00000000..390cdbc1 --- /dev/null +++ b/examples/machine-learning_ClassicAI/cpu-hpo-optuna/README.md @@ -0,0 +1,140 @@ +# Hyperparameter Tuning & Serving (Optuna + Ray Tune) + +This template implements an end-to-end **Auto-ML workflow** on a **Python Server**. It automates the lifecycle of a machine learning model, combining the intelligent search of Optuna with the scalable execution of Ray Tune. + +**Infrastructure:** [Saturn Cloud](https://saturncloud.io/) +**Resource:** Python Server +**Hardware:** CPU +**Tech Stack:** Optuna, Ray Tune, FastAPI, Scikit-Learn + +--- + +## 📖 Overview + +Standard hyperparameter tuning scripts often stop at printing the best parameters. This template goes further by **operationalizing** the result. It solves two key problems: + +1. **Efficient Search:** It uses **Optuna's** Tree-Parzen Estimator (TPE) algorithm to intelligently select hyperparameters, wrapped in **Ray Tune** to run multiple trials in parallel. +2. **Instant Deployment:** The workflow automatically retrains the model with the best parameters, saves the artifact, and serves it via a production-ready **FastAPI** server. + +--- + +## 🚀 Quick Start + +### 1. Environment Setup +Run the setup script to create a virtual environment and install all dependencies (Ray, Optuna, FastAPI, Uvicorn, Scikit-Learn). +```bash +# 1. Make executable +chmod +x setup.sh + +# 2. Run setup +bash setup.sh + +``` + +### 2. Run the Tuning Job (Batch) + +Execute the tuning script. This will: + +* Launch 20 concurrent trials using Ray Tune. +* Identify the best configuration (e.g., `n_estimators`, `max_depth`). +* **Retrain** the model on the full dataset using those winning parameters. +* **Save** the final model to `best_model.pkl`. + +```bash +# Activate environment +source venv/bin/activate + +# Start tuning +python tune_hpo.py + +``` + +### 3. Start the API Server + +Once `tune_hpo.py` finishes and generates `best_model.pkl`, start the server to accept real-time requests. + +```bash +python app.py + +``` + +--- + +## 🧠 Architecture: "Tune & Serve" + +The workflow consists of two distinct stages designed to bridge the gap between experimentation and production. + +### Stage 1: Optimization (`tune_hpo.py`) + +* **Search Algorithm:** We use `OptunaSearch`, which leverages Bayesian optimization to learn from previous trials and find optimal parameters faster. +* **Execution Engine:** Ray Tune manages the resources. It uses a `ConcurrencyLimiter` to run 4 trials simultaneously on the CPU, significantly reducing total wait time. + +### Stage 2: Inference (`app.py`) + +* **Loader:** On startup, the API loads the optimized `best_model.pkl` artifact. +* **Endpoint:** Exposes a `/predict` route that accepts Iris flower features and returns the classified species (Setosa, Versicolor, or Virginica). + +--- + +## 🧪 Testing + +You can test the API using the built-in Swagger UI or via the terminal. + +### Method 1: Web Interface + +Visit `http://localhost:8000/docs`. Click **POST /predict** -> **Try it out**. + +**Test Case A: Setosa (Small Petals)** +Paste this JSON: + +```json +{ + "sepal_length": 5.1, "sepal_width": 3.5, + "petal_length": 1.4, "petal_width": 0.2 +} + +``` + +*Expected Result:* `{"class_name": "setosa"}` + +**Test Case B: Virginica (Large Petals)** +Paste this JSON: + +```json +{ + "sepal_length": 6.5, "sepal_width": 3.0, + "petal_length": 5.2, "petal_width": 2.0 +} + +``` + +*Expected Result:* `{"class_name": "virginica"}` + +### Method 2: Terminal (CURL) + +Run this command in a new terminal window to test the Virginica prediction: + +```bash +curl -X 'POST' \ + 'http://localhost:8000/predict' \ + -H 'Content-Type: application/json' \ + -d '{ + "sepal_length": 6.5, + "sepal_width": 3.0, + "petal_length": 5.2, + "petal_width": 2.0 +}' + +``` + +--- + +## 🏁 Conclusion + +This template demonstrates a "Best of Both Worlds" approach: using Optuna for search intelligence and Ray Tune for scaling. By automating the retraining and serving steps, you create a pipeline where model improvements can be deployed rapidly. + +To scale the tuning phase—running hundreds of parallel trials across a distributed cluster of machines—consider deploying this workflow on [Saturn Cloud](https://saturncloud.io/). + +``` + +``` \ No newline at end of file diff --git a/examples/machine-learning_ClassicAI/cpu-hpo-optuna/app.py b/examples/machine-learning_ClassicAI/cpu-hpo-optuna/app.py new file mode 100644 index 00000000..86a8d2dc --- /dev/null +++ b/examples/machine-learning_ClassicAI/cpu-hpo-optuna/app.py @@ -0,0 +1,55 @@ +from fastapi import FastAPI +from pydantic import BaseModel +import joblib +import numpy as np +import os + +app = FastAPI(title="Auto-Tuned Iris API") + +# Define Input Schema +class IrisData(BaseModel): + sepal_length: float + sepal_width: float + petal_length: float + petal_width: float + +# Global Model Variable +model = None +MODEL_PATH = "best_model.pkl" + +@app.on_event("startup") +def load_model(): + global model + if os.path.exists(MODEL_PATH): + model = joblib.load(MODEL_PATH) + print(f"✅ Loaded optimized model: {MODEL_PATH}") + else: + print(f"⚠️ Error: {MODEL_PATH} not found. Run 'python tune_hpo.py' first.") + +@app.post("/predict") +def predict(data: IrisData): + if not model: + return {"error": "Model not loaded"} + + # Prepare features + features = np.array([[ + data.sepal_length, + data.sepal_width, + data.petal_length, + data.petal_width + ]]) + + # Predict + prediction = int(model.predict(features)[0]) + + # Map to Class Name + classes = {0: "setosa", 1: "versicolor", 2: "virginica"} + + return { + "class_id": prediction, + "class_name": classes.get(prediction, "unknown") + } + +if __name__ == "__main__": + import uvicorn + uvicorn.run(app, host="0.0.0.0", port=8000) \ No newline at end of file diff --git a/examples/machine-learning_ClassicAI/cpu-hpo-optuna/setup.sh b/examples/machine-learning_ClassicAI/cpu-hpo-optuna/setup.sh new file mode 100755 index 00000000..d5ccbad7 --- /dev/null +++ b/examples/machine-learning_ClassicAI/cpu-hpo-optuna/setup.sh @@ -0,0 +1,35 @@ +#!/bin/bash +set -e + +GREEN='\033[0;32m' +BLUE='\033[0;34m' +NC='\033[0m' + +echo -e "${GREEN}🚀 Starting Auto-ML Setup...${NC}" + +# 1. Robust Python Detection +if command -v python3 &> /dev/null; then + PY_CMD="python3" +elif command -v python &> /dev/null; then + PY_CMD="python" +else + echo "❌ Error: Could not find 'python3' or 'python' in your PATH." + exit 1 +fi + +# 2. Create Virtual Environment +echo -e "${BLUE}📦 Creating Virtual Environment 'venv'...${NC}" +$PY_CMD -m venv venv + +# 3. Install Dependencies +echo -e "${BLUE}⬇️ Installing libraries...${NC}" +. venv/bin/activate +pip install --upgrade pip +# Core stack: Ray Tune (HPO), Optuna (Search), FastAPI (Serving), Scikit-Learn (Model) +pip install "ray[tune]" "optuna>=3.0.0" scikit-learn pandas numpy fastapi uvicorn joblib + +echo -e "${GREEN}✅ Environment Ready!${NC}" +echo "-------------------------------------------------------" +echo "1. Tune & Save Model: python tune_hpo.py" +echo "2. Serve Model: python app.py" +echo "-------------------------------------------------------" \ No newline at end of file diff --git a/examples/machine-learning_ClassicAI/cpu-hpo-optuna/tune_hpo.py b/examples/machine-learning_ClassicAI/cpu-hpo-optuna/tune_hpo.py new file mode 100644 index 00000000..8284d833 --- /dev/null +++ b/examples/machine-learning_ClassicAI/cpu-hpo-optuna/tune_hpo.py @@ -0,0 +1,86 @@ +import time +import joblib +import ray +from ray import tune +from ray.tune.search import ConcurrencyLimiter +from ray.tune.search.optuna import OptunaSearch +from sklearn.datasets import load_iris +from sklearn.ensemble import RandomForestClassifier +from sklearn.model_selection import cross_val_score + +# 1. Define Objective (The "Black Box" function) +def objective(config): + data = load_iris() + X, y = data.data, data.target + + # Initialize model with current trial's hyperparameters + clf = RandomForestClassifier( + n_estimators=int(config["n_estimators"]), + max_depth=int(config["max_depth"]), + min_samples_split=float(config["min_samples_split"]), + random_state=42 + ) + + # Evaluate performance using Cross-Validation + scores = cross_val_score(clf, X, y, cv=3) + accuracy = scores.mean() + + # Report metric to Ray Tune + tune.report({"accuracy": accuracy}) + +def run_hpo(): + print("🧠 Initializing Ray...") + ray.init(configure_logging=False) + + # 2. Define Search Space + search_space = { + "n_estimators": tune.randint(10, 200), + "max_depth": tune.randint(2, 20), + "min_samples_split": tune.uniform(0.1, 1.0) + } + + # 3. Setup Optuna Search Algorithm + algo = OptunaSearch() + algo = ConcurrencyLimiter(algo, max_concurrent=4) + + print("🚀 Starting Tuning Job...") + tuner = tune.Tuner( + objective, + tune_config=tune.TuneConfig( + metric="accuracy", + mode="max", + search_alg=algo, + num_samples=20, + ), + param_space=search_space, + ) + + results = tuner.fit() + + # 4. Process Best Result + best_result = results.get_best_result("accuracy", "max") + best_config = best_result.config + + print("\n" + "="*50) + print(f"🏆 Best Accuracy: {best_result.metrics['accuracy']:.4f}") + print(f"🔧 Best Config: {best_config}") + print("="*50) + + # 5. Retrain & Save Best Model + print("💾 Retraining final model with best parameters...") + data = load_iris() + X, y = data.data, data.target + + final_model = RandomForestClassifier( + n_estimators=int(best_config["n_estimators"]), + max_depth=int(best_config["max_depth"]), + min_samples_split=float(best_config["min_samples_split"]), + random_state=42 + ) + final_model.fit(X, y) + + joblib.dump(final_model, "best_model.pkl") + print("✅ Model saved to 'best_model.pkl'") + +if __name__ == "__main__": + run_hpo() \ No newline at end of file diff --git a/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/README.md b/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/README.md new file mode 100644 index 00000000..e652acbc --- /dev/null +++ b/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/README.md @@ -0,0 +1,129 @@ +# scikit-learn GBM (Python Server) + +
+ +
+ +This template implements a production-ready **Baseline Model Comparison** workflow using **scikit-learn** on a **Python Server**. It demonstrates two core server capabilities: +1. **Batch Processing**: A script to train a model and generate static reports (headless plotting). +2. **API Service**: A FastAPI server to serve real-time predictions from the trained model. + +**Infrastructure:** [Saturn Cloud](https://saturncloud.io/) +**Resource:** Python Server +**Hardware:** CPU +**Tech Stack:** scikit-learn, FastAPI, Pandas, Seaborn + +--- + +## 🚀 Quick Start + +### 1. Environment Setup +Run the setup script to automatically create the Python virtual environment (`venv`) and install all required dependencies (including FastAPI and scikit-learn). + +```bash +# 1. Make the script executable (if needed) +chmod +x setup.sh + +# 2. Run the setup +bash setup.sh + +``` + +### 2. Train the Model (Batch Job) + +Run the baseline script. This performs the following actions: + +* Loads the Iris dataset. +* Trains multiple models (Dummy, SVM, Logistic Regression, Decision Tree). +* **Saves the Model:** Exports the trained Logistic Regression model to `iris_model.pkl`. +* **Saves the Report:** Generates a performance plot (`baseline_comparison.png`) without requiring a display monitor. + +```bash +# Activate the environment +source venv/bin/activate + +# Run the training pipeline +python baseline.py + +``` + +### 3. Start the API Server + +Once the model is trained and saved, start the **FastAPI** server to begin accepting prediction requests. + +```bash +# Start the server (runs on port 8000) +python app.py + +``` + +--- + +## 🧠 Project Architecture + +### Files Included + +* **`setup.sh`**: Robust setup script that handles virtual environment creation and dependency installation. +* **`baseline.py`**: The "Batch" workload. It compares model performance against a baseline and saves artifacts (model + plots) to disk. +* **`app.py`**: The "Service" workload. A FastAPI application that loads `iris_model.pkl` and serves an HTTP endpoint for predictions. + +### Model Details + +* **Baseline Strategy**: Uses a "Dummy Classifier" to establish minimum acceptable accuracy. +* **Production Model**: A **Logistic Regression** classifier is selected for deployment due to its efficiency and interpretability. + +--- + +## 🧪 Testing & Validation + +You can interact with this template in two ways: + +### A. Check Batch Results + +After running `baseline.py`, verify that the artifacts were created: + +```bash +# Check for the plot and the saved model +ls -lh baseline_comparison.png iris_model.pkl + +``` + +### B. Test the API (Web Interface) + +While `python app.py` is running, open your browser to the auto-generated documentation: + +* **URL:** `http://localhost:8000/docs` +* **Action:** Click **POST /predict** -> **Try it out** -> **Execute**. + +### C. Test the API (Terminal) + +You can also send a raw HTTP request from a separate terminal window: + +```bash +curl -X 'POST' \ + 'http://localhost:8000/predict' \ + -H 'Content-Type: application/json' \ + -d '{ + "sepal_length": 5.1, + "sepal_width": 3.5, + "petal_length": 1.4, + "petal_width": 0.2 +}' + +``` + +**Expected Output:** + +```json +{"class_id":0,"class_name":"setosa"} + +``` + +--- + +## 🏁 Conclusion + +This template provides a robust foundation for deploying machine learning models on CPU-based Python Servers. By separating the **training pipeline** (`baseline.py`) from the **inference service** (`app.py`), it adheres to MLOps best practices, ensuring that model artifacts are versioned and reproducible. + +For scaling this workflow to larger datasets or deploying it to a managed cluster, consider moving this pipeline to [Saturn Cloud](https://saturncloud.io/). Use this structure as a starting point to deploy more complex models, such as Random Forests or Gradient Boosting Machines, while maintaining a clean and scalable deployment architecture. + diff --git a/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/app.py b/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/app.py new file mode 100644 index 00000000..318d2210 --- /dev/null +++ b/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/app.py @@ -0,0 +1,58 @@ +from fastapi import FastAPI +from pydantic import BaseModel +import joblib +import numpy as np +import os + +# 1. Initialize API +app = FastAPI(title="Iris Baseline API") + +# 2. Define Input Schema (Sepal/Petal dimensions) +class IrisData(BaseModel): + sepal_length: float + sepal_width: float + petal_length: float + petal_width: float + +# 3. Load Model Global Variable +model = None +MODEL_PATH = "iris_model.pkl" + +@app.on_event("startup") +def load_model(): + global model + if os.path.exists(MODEL_PATH): + model = joblib.load(MODEL_PATH) + print(f"✅ Loaded model from {MODEL_PATH}") + else: + print(f"⚠️ Error: {MODEL_PATH} not found. Did you run baseline_demo.py?") + +# 4. Prediction Endpoint +@app.post("/predict") +def predict(data: IrisData): + if not model: + return {"error": "Model not trained yet."} + + # Convert input JSON to model-ready array + features = np.array([[ + data.sepal_length, + data.sepal_width, + data.petal_length, + data.petal_width + ]]) + + # Predict Class (0, 1, or 2) + prediction = int(model.predict(features)[0]) + + # Map to String Name + classes = {0: "setosa", 1: "versicolor", 2: "virginica"} + return { + "class_id": prediction, + "class_name": classes.get(prediction, "unknown") + } + +# 5. Run Server (If executed directly) +if __name__ == "__main__": + import uvicorn + # Host 0.0.0.0 is crucial for cloud servers to be accessible + uvicorn.run(app, host="0.0.0.0", port=8000) \ No newline at end of file diff --git a/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/asset/baseline_comparison.png b/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/asset/baseline_comparison.png new file mode 100644 index 00000000..a33139ce Binary files /dev/null and b/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/asset/baseline_comparison.png differ diff --git a/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/baseline.py b/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/baseline.py new file mode 100644 index 00000000..ce9a1a7e --- /dev/null +++ b/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/baseline.py @@ -0,0 +1,87 @@ +import numpy as np +import pandas as pd +import matplotlib +import joblib # <--- NEW: For saving the model +import os + +# --------------------------------------------------------- +# 🔧 Headless Server Config +matplotlib.use('Agg') +# --------------------------------------------------------- + +import matplotlib.pyplot as plt +import seaborn as sns +from sklearn.datasets import load_iris +from sklearn.model_selection import train_test_split +from sklearn.dummy import DummyClassifier +from sklearn.svm import SVC +from sklearn.linear_model import LogisticRegression +from sklearn.tree import DecisionTreeClassifier + +# Set style for plots +sns.set(style="whitegrid") + +def run_baseline_comparison(): + print("🔄 Loading Iris Dataset...") + data = load_iris() + X = data.data + y = data.target + + # Split Data + X_train, X_test, y_train, y_test = train_test_split( + X, y, test_size=0.3, random_state=42, stratify=y + ) + + results = [] + + print("\n--- 1. Baseline Models (Dummy Classifiers) ---") + strategies = ['stratified', 'most_frequent', 'prior', 'uniform'] + for strategy in strategies: + dummy = DummyClassifier(strategy=strategy, random_state=42) + dummy.fit(X_train, y_train) + score = dummy.score(X_test, y_test) + results.append({'Model': f"Dummy ({strategy})", 'Accuracy': score, 'Type': 'Baseline'}) + print(f" {strategy}: {score:.4f}") + + print("\n--- 2. Real Machine Learning Models ---") + + # SVM + svm = SVC(gamma='scale', random_state=42) + svm.fit(X_train, y_train) + res_svm = svm.score(X_test, y_test) + results.append({'Model': 'SVM', 'Accuracy': res_svm, 'Type': 'Real Model'}) + print(f" SVM: {res_svm:.4f}") + + # Logistic Regression (We will save this one) + log_reg = LogisticRegression(solver='lbfgs', max_iter=1000, random_state=42) + log_reg.fit(X_train, y_train) + res_log = log_reg.score(X_test, y_test) + results.append({'Model': 'Logistic Regression', 'Accuracy': res_log, 'Type': 'Real Model'}) + print(f" Logistic Regression: {res_log:.4f}") + + # Decision Tree + dt = DecisionTreeClassifier(random_state=42) + dt.fit(X_train, y_train) + res_dt = dt.score(X_test, y_test) + results.append({'Model': 'Decision Tree', 'Accuracy': res_dt, 'Type': 'Real Model'}) + print(f" Decision Tree: {res_dt:.4f}") + + # --- 3. Save Artifacts --- + print("\n💾 Saving Artifacts...") + + # Save the Plot + df_results = pd.DataFrame(results) + plt.figure(figsize=(10, 6)) + sns.barplot(x="Accuracy", y="Model", hue="Type", data=df_results, palette="viridis") + plt.title("Baseline vs. Real Model Performance") + plt.axvline(x=0.33, color='r', linestyle='--', label="Random Guess") + plt.tight_layout() + plt.savefig("baseline_comparison.png") + print(" ✅ Plot saved to 'baseline_comparison.png'") + + # Save the Model (For the API) + joblib.dump(log_reg, "iris_model.pkl") + print(" ✅ Model saved to 'iris_model.pkl'") + +if __name__ == "__main__": + run_baseline_comparison() \ No newline at end of file diff --git a/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/setup.sh b/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/setup.sh new file mode 100755 index 00000000..4ef0b1cb --- /dev/null +++ b/examples/machine-learning_ClassicAI/cpu-sklearn-gbm_/setup.sh @@ -0,0 +1,42 @@ +#!/bin/bash +set -e + +# Define colors +GREEN='\033[0;32m' +BLUE='\033[0;34m' +RED='\033[0;31m' +NC='\033[0m' # No Color + +echo -e "${GREEN}🚀 Starting Python Server Setup...${NC}" + +# 1. robust Python Detection +# We check for 'python3' first, then 'python' +if command -v python3 &> /dev/null; then + PY_CMD="python3" +elif command -v python &> /dev/null; then + PY_CMD="python" +else + echo -e "${RED}❌ Error: Could not find 'python3' or 'python' in your PATH.${NC}" + echo "Please install Python 3 before continuing." + exit 1 +fi + +echo -e "✅ Found Python executable: ${BLUE}$PY_CMD${NC}" + +# 2. Create Virtual Environment +echo -e "${BLUE}📦 Creating Virtual Environment 'venv'...${NC}" +$PY_CMD -m venv venv + +# 3. Install Dependencies +echo -e "${BLUE}⬇️ Installing libraries...${NC}" +source venv/bin/activate +pip install --upgrade pip +# Installing all required libraries for the Demo + API +pip install scikit-learn pandas numpy matplotlib seaborn fastapi uvicorn joblib + +echo -e "${GREEN}✅ Environment Ready!${NC}" +echo "-------------------------------------------------------" +echo "To run the full pipeline:" +echo "1. Train & Save Model: python baseline.py" +echo "2. Start API Server: python app.py" +echo "-------------------------------------------------------" \ No newline at end of file diff --git a/examples/machine-learning_ClassicAI/cpu-xgb-api/README.md b/examples/machine-learning_ClassicAI/cpu-xgb-api/README.md new file mode 100644 index 00000000..6866992a --- /dev/null +++ b/examples/machine-learning_ClassicAI/cpu-xgb-api/README.md @@ -0,0 +1,112 @@ +# XGBoost Serving API + +This template implements a **maintenance-free** Model Serving workflow for a **Regression** problem. It uses the **California Housing dataset** (~20,000 samples) to train an XGBoost model that predicts house prices, deployed via a schema-agnostic FastAPI service. + +**Infrastructure:** [Saturn Cloud](https://saturncloud.io/) +**Resource:** Jupyter Notebook +**Hardware:** CPU +**Tech Stack:** XGBoost (Regression), FastAPI, Pandas, Scikit-Learn + +--- + +## 📖 Overview + +In traditional model serving, changing the model's features (e.g., adding "zip_code" or removing "age") often requires rewriting the API code. This template demonstrates a **"Model-First"** architecture where the API is generic and adapts to the model artifact automatically. + +This is deployed as a **Jupyter Notebook** resource on [Saturn Cloud](https://saturncloud.io/), allowing you to develop, train, and serve from a single environment. + +--- + +## 🚀 Quick Start + +### 1. Workflow + +1. Open **`xgboost_serving.ipynb`** in the Jupyter interface. +2. **Run All Cells**: +* **Install:** Installs dependencies (`xgboost`, `fastapi`, `uvicorn`) directly in the kernel. +* **Train:** Trains an `XGBRegressor` on the California Housing dataset (20,640 samples). +* **Generate:** Writes the `app.py` server file to disk. + + + +### 2. Launch the Server + +The notebook generates the API code for you. To run it, open a **Terminal** in Jupyter (File -> New -> Terminal) and execute: + +```bash +uvicorn app:app --host 0.0.0.0 --port 8000 + +``` +We can run it from the next code cell in the jupyter notebook. +--- + +## 🧠 Architecture: Schema-Agnostic Design + +This template uses a **"Model-First"** approach where the API code is decoupled from the specific features of the model. This allows the API to serve the regression model dynamically. + +* **Inputs:** A generic list of numerical values (representing the 8 housing features like `MedInc`, `HouseAge`, etc.). +* **Outputs:** A continuous float value (Estimated House Value). +* **Maintenance:** To update the model features (e.g., adding "Zip Code" or removing "Rooms"), you simply retrain and replace `model.json`. The Python API code remains untouched. + +### Dataset Details + +* **Source:** California Housing Dataset (1990 Census). +* **Target:** Median House Value in units of **$100,000**. +* **Features:** 8 numerical features including Median Income, House Age, Average Rooms, Latitude, and Longitude. + +--- + +## 🧪 Testing + +The API using the built-in Swagger UI or via the terminal. + +### Method 1: Web Interface + +Visit `http://localhost:8000/docs`. Click **POST /predict** -> **Try it out** and paste the JSON below. + +```json +{ + "features": [ + 8.32, + 41.0, + 6.98, + 1.02, + 322.0, + 2.55, + 37.88, + -122.23 + ] +} + +``` + +### Method 2: Terminal (CURL) + +Run this command in a separate terminal window to send a sample request: + +```bash +# Features: [MedInc, HouseAge, AveRooms, AveBedrms, Population, AveOccup, Lat, Long] +curl -X POST "http://localhost:8000/predict" \ + -H "Content-Type: application/json" \ + -d '{"features": [8.32, 41.0, 6.98, 1.02, 322.0, 2.55, 37.88, -122.23]}' + +``` + +**Expected Output:** + +```json +{"estimated_value": 4.526} + +``` + +*Interpretation: The model predicts a median house value of roughly **$452,600**.* + +--- + +Note: you can use whatever values for the parameters and get predicted results + +## 🏁 Conclusion + +For scaling this workflow—such as deploying this API to a Kubernetes cluster or scheduling the training job—consider moving this pipeline to [Saturn Cloud](https://saturncloud.io/). + +``` \ No newline at end of file diff --git a/examples/machine-learning_ClassicAI/cpu-xgb-api/xgboost_serving.ipynb b/examples/machine-learning_ClassicAI/cpu-xgb-api/xgboost_serving.ipynb new file mode 100644 index 00000000..2cd74509 --- /dev/null +++ b/examples/machine-learning_ClassicAI/cpu-xgb-api/xgboost_serving.ipynb @@ -0,0 +1,181 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 🚀 XGBoost Serving API (California Housing)\n", + "\n", + "This notebook demonstrates a **Regression** workflow using a larger dataset.\n", + "\n", + "1. **Dataset**: California Housing (20,640 samples, 8 features).\n", + "2. **Model**: XGBoost Regressor (predicting continuous house prices).\n", + "3. **Serve**: A schema-agnostic FastAPI service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# 1. Install Dependencies\n", + "%pip install xgboost fastapi uvicorn scikit-learn pandas numpy\n", + "\n", + "print(\"✅ Dependencies installed. Restart kernel if needed.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import xgboost as xgb\n", + "import pandas as pd\n", + "import numpy as np\n", + "from sklearn.datasets import fetch_california_housing\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.metrics import mean_squared_error\n", + "\n", + "print(\"✅ Libraries imported.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Train on California Housing Data\n", + "We use `fetch_california_housing`. This dataset is much larger than Iris, so the model learns real patterns rather than memorizing data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# 1. Load Data (20,640 samples)\n", + "housing = fetch_california_housing()\n", + "X = pd.DataFrame(housing.data, columns=housing.feature_names)\n", + "y = housing.target\n", + "\n", + "# 2. Split\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", + "\n", + "# 3. Train (Regression)\n", + "# We use XGBRegressor instead of Classifier\n", + "model = xgb.XGBRegressor(objective='reg:squarederror')\n", + "model.fit(X_train, y_train)\n", + "\n", + "# 4. Evaluate (RMSE)\n", + "preds = model.predict(X_test)\n", + "rmse = np.sqrt(mean_squared_error(y_test, preds))\n", + "print(f\"🎯 Model RMSE: {rmse:.4f} (Lower is better)\")\n", + "print(f\"📊 Typical Price Range: 0.15 to 5.0 (Units of $100k)\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model.save_model(\"model.json\")\n", + "print(\"💾 Model saved to 'model.json'\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. The Maintenance-Free API\n", + "We use `%%writefile` to create `app.py`. \n", + "\n", + "Notice how we didn't have to manually define `MedInc`, `HouseAge`, etc. in the API. \n", + "The API simply accepts a list of floats `[feature_1, feature_2, ...]`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile app.py\n", + "import xgboost as xgb\n", + "import numpy as np\n", + "import os\n", + "from fastapi import FastAPI\n", + "from pydantic import BaseModel\n", + "from typing import List\n", + "\n", + "app = FastAPI(title=\"Housing Price API\")\n", + "model = xgb.XGBRegressor()\n", + "MODEL_FILE = \"model.json\"\n", + "\n", + "# Generic Input Schema\n", + "class Payload(BaseModel):\n", + " features: List[float]\n", + "\n", + "@app.on_event(\"startup\")\n", + "def load_model():\n", + " if os.path.exists(MODEL_FILE):\n", + " model.load_model(MODEL_FILE)\n", + " print(f\"✅ Loaded {MODEL_FILE}\")\n", + " else:\n", + " print(\"⚠️ Model file missing.\")\n", + "\n", + "@app.post(\"/predict\")\n", + "def predict(payload: Payload):\n", + " # Convert generic list -> numpy array\n", + " vector = np.array([payload.features])\n", + " \n", + " # Predict (Returns a float for regression)\n", + " prediction = float(model.predict(vector)[0])\n", + " return {\"estimated_value\": prediction}\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Launch Server\n", + "To start the server, open a **Terminal** in Jupyter and run:\n", + "```bash\n", + "uvicorn app:app --host 0.0.0.0 --port 8000\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!uvicorn app:app --host 0.0.0.0 --port 8000" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "cpu-plotly-env", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/examples/machine-learning_ClassicAI/mlflow-server/README.md b/examples/machine-learning_ClassicAI/mlflow-server/README.md new file mode 100644 index 00000000..66dadb90 --- /dev/null +++ b/examples/machine-learning_ClassicAI/mlflow-server/README.md @@ -0,0 +1,13 @@ +# MLflow Tracking Server Template + +This template demonstrates how to use MLflow to track, log, and manage machine learning experiments in a single notebook. It trains a Random Forest model on the Diabetes dataset, logs parameters, metrics, and artifacts, and enables viewing and reloading runs locally or through a remote MLflow tracking server. + +You can deploy MLflow-tracked models via platforms like **Saturn Cloud**, refer to Saturn’s documentation for deployment guidance. + +--- + +## References + +* [MLflow Documentation](https://mlflow.org/docs/latest/index.html) +* [Saturn Cloud Docs](https://saturncloud.io/docs/) +* [Scikit-learn RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) \ No newline at end of file diff --git a/examples/machine-learning_ClassicAI/mlflow-server/mlflow-tracking.ipynb b/examples/machine-learning_ClassicAI/mlflow-server/mlflow-tracking.ipynb new file mode 100644 index 00000000..8497df8f --- /dev/null +++ b/examples/machine-learning_ClassicAI/mlflow-server/mlflow-tracking.ipynb @@ -0,0 +1,237 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "JwE9FH9oFOMb" + }, + "source": [ + "# MLflow Tracking Server\n", + "\n", + "**MLflow** is an open-source platform that simplifies the tracking, comparison, and deployment of machine learning experiments.\n", + "\n", + "In this sample example template, you’ll use MLflow to **track training runs**, **log parameters and metrics**, and **store models** for future reuse — all within a single notebook.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oV2djjnvFOMj" + }, + "source": [ + "Install **MLflow**, **Gradio**, and supporting libraries including **scikit‑learn**, **matplotlib**, and **pandas**.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b5yEqAXTFOMk" + }, + "outputs": [], + "source": [ + "!pip install -q mlflow scikit-learn matplotlib pandas gradio\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IppJqmrgFOMm" + }, + "source": [ + "Import MLflow, perform a quick GPU check with PyTorch, and load helper libraries used throughout.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "F31VfPfuFOMn" + }, + "outputs": [], + "source": [ + "import mlflow, os, torch, pandas as pd, matplotlib.pyplot as plt, gradio as gr\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.datasets import load_diabetes\n", + "from sklearn.ensemble import RandomForestRegressor\n", + "\n", + "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n", + "print(f'✅ Using device: {device}')\n", + "if device == 'cpu':\n", + " print('⚠️ Running on CPU — switch to GPU for faster performance if available.')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fyS8Q66nFOMp" + }, + "source": [ + "By default, MLflow saves runs to the local **`mlruns/`** directory. You can switch to a **remote tracking server** later by setting a different tracking URI.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Le6Ec_F_FOMq" + }, + "outputs": [], + "source": [ + "mlflow.set_tracking_uri('file:///content/mlruns')\n", + "mlflow.set_experiment('mlflow_tracking_demo')\n", + "print('🎯 Tracking URI:', mlflow.get_tracking_uri())\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eTSpUqXLFOMs" + }, + "source": [ + "It fetches experiment metadata, parameters, and metrics from your local `mlruns/` directory (or a remote server if configured).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tNSyaQmHFOMu" + }, + "outputs": [], + "source": [ + "from mlflow.tracking import MlflowClient\n", + "\n", + "def show_mlflow_runs_table(experiment_name=\"mlflow_tracking_demo\"):\n", + " \"\"\"Display all MLflow runs (similar to MLflow UI Table).\"\"\"\n", + " client = MlflowClient()\n", + " experiment = client.get_experiment_by_name(experiment_name)\n", + "\n", + " if not experiment:\n", + " return pd.DataFrame({\"Info\": [\"No experiment found. Run a training cell first.\"]})\n", + " runs = client.search_runs([experiment.experiment_id])\n", + " if not runs:\n", + " return pd.DataFrame({\"Info\": [\"No runs logged yet.\"]})\n", + "\n", + " rows = []\n", + " for r in runs:\n", + " row = {\n", + " \"Run ID\": r.info.run_id,\n", + " \"Status\": r.info.status,\n", + " \"Start Time\": pd.to_datetime(r.info.start_time, unit=\"ms\"),\n", + " \"End Time\": pd.to_datetime(r.info.end_time, unit=\"ms\"),\n", + " \"Duration (s)\": round((r.info.end_time - r.info.start_time) / 1000, 2)\n", + " if r.info.end_time else None,\n", + " }\n", + " row.update(r.data.params)\n", + " row.update(r.data.metrics)\n", + " rows.append(row)\n", + "\n", + " df = pd.DataFrame(rows)\n", + " main_cols = [\"Run ID\", \"Status\", \"Start Time\", \"End Time\", \"Duration (s)\"]\n", + " other_cols = [c for c in df.columns if c not in main_cols]\n", + " df = df[main_cols + other_cols]\n", + " print(f\"✅ Showing {len(df)} runs from experiment '{experiment_name}'\")\n", + " return df\n", + "\n", + "runs_df = show_mlflow_runs_table()\n", + "display(runs_df)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oMeP7H8qFOMw" + }, + "source": [ + "Let's train a small **Random Forest** on the Diabetes dataset and log parameters, metrics, and the model artefact to MLflow.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tqqgobq6FOMy" + }, + "outputs": [], + "source": [ + "from mlflow.models.signature import infer_signature\n", + "\n", + "with mlflow.start_run() as run:\n", + " db = load_diabetes()\n", + " X_train, X_test, y_train, y_test = train_test_split(db.data, db.target, test_size=0.2, random_state=42)\n", + "\n", + " model = RandomForestRegressor(n_estimators=100, max_depth=6, random_state=42)\n", + " model.fit(X_train, y_train)\n", + " preds = model.predict(X_test)\n", + " signature = infer_signature(X_test, preds)\n", + "\n", + " mlflow.log_params({'n_estimators': 100, 'max_depth': 6})\n", + " mlflow.log_metric('mean_prediction', float(preds.mean()))\n", + " mlflow.sklearn.log_model(model, 'model', signature=signature)\n", + "\n", + " print(f'Run ID: {run.info.run_id}')\n", + " print('✅ Training and logging complete!')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KwHrJpK9FOMz" + }, + "source": [ + "Use the run ID to load the stored model for inference.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "iB6HPf3kFOM0" + }, + "outputs": [], + "source": [ + "run_id = run.info.run_id\n", + "loaded_model = mlflow.sklearn.load_model(f'runs:/{run_id}/model')\n", + "print('✅ Model loaded successfully!')\n", + "print('Sample predictions:', loaded_model.predict(X_test[:5]))\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7_ULM2cqFOM0" + }, + "source": [ + "So, you've configured MLflow tracking locally (can be configure for MLflow remote server too), logged parameters, metrics, and model artifacts.\n", + "\n", + "Additionally, you can reload a trained model from specific run using the `run Id`. Guide on deployment on saturn can be found in the [saturn documentation](https://saturncloud.io/docs)." + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}