Skip to content

GPU-accelerated Stable Diffusion image generation with Java, LangChain4j, SD4J and ONNX Runtime on Azure Container Apps

Notifications You must be signed in to change notification settings

bbenz/gpuonazure

Repository files navigation

GPU-Accelerated Image Generation with LangChain4j

Java Spring Boot LangChain4j ONNX Runtime CUDA

A production-ready demonstration of GPU-accelerated AI inference using LangChain4j with ONNX Runtime and CUDA, deployed to Azure Container Apps with GPU support.

🚀 Features

  • GPU-Accelerated Inference: CUDA 12.2 with ONNX Runtime for high-performance AI
  • Stable Diffusion Image Generation: Powered by Oracle's SD4J (Stable Diffusion for Java) 🎉
    • Complete CLIP tokenizer, U-Net, VAE decoder, and scheduler implementation
    • Multiple scheduler algorithms (LMS, Euler Ancestral)
    • Optional NSFW safety checker
    • Generate high-quality cartoon-style images from text prompts
  • Text Embeddings: Semantic similarity with All-MiniLM-L6-v2 model via LangChain4j
  • Modern Stack: Java 21 virtual threads, Spring Boot 3.2.5, LangChain4j 0.34.0, SD4J
  • Cloud-Ready: Containerized with Docker, deployable to Azure Container Apps
  • Interactive UI: Web-based interface with real-time metrics
  • Production-Grade: Health checks, graceful shutdown, monitoring endpoints

🎨 Example Output

Image Generator Example

📖 Complete Setup Guide

👉 See SETUP.md for detailed step-by-step installation instructions, including:

  • Prerequisites and build tools installation
  • Model downloads and directory structure
  • ONNX Runtime Extensions build process
  • Local development and Azure deployment
  • Comprehensive troubleshooting guide

📋 Table of Contents

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                         Web UI                              │
│              (HTML + Tailwind CSS + JavaScript)             │
└──────────────────────┬──────────────────────────────────────┘
                       │ REST API
┌──────────────────────▼──────────────────────────────────────┐
│                   Spring Boot Controller                    │
│                   (ImageController.java)                    │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│                LangChain4j GPU Service                      │
│              (LangChain4jGpuService.java)                   │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│                   ONNX Runtime (GPU)                        │
│           • Stable Diffusion v1.5 (Image Gen)               │
│           • All-MiniLM-L6-v2 (Embeddings)                   │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│                    NVIDIA CUDA 12.2                         │
│                  (GPU Acceleration Layer)                   │
└─────────────────────────────────────────────────────────────┘

🔧 Prerequisites

Local Development

  • Java 21 LTS (with --enable-preview flag)
  • Maven 3.9+
  • NVIDIA GPU with CUDA 12.2+ support
  • CUDA Toolkit 12.2
  • Docker (optional, for containerization)
  • Git

Azure Deployment

  • Azure CLI (Install)
  • Azure Subscription with GPU quota enabled
  • Docker for building images

🚀 Quick Start

📘 For detailed setup instructions, see SETUP.md

1. Clone Repository

git clone https://github.com/your-org/gpuonazure.git
cd gpuonazure

2. Install Build Tools

# Linux
sudo apt-get update && sudo apt-get install -y cmake build-essential

# macOS
brew install cmake

3. Download Models

./download-missing-models.sh

Downloads Stable Diffusion v1.5 models (~5.2 GB):

  • Text Encoder, U-Net, VAE Decoder

4. Build ONNX Runtime Extensions

./download-ortextensions.sh

Builds libortextensions.so (~3 MB, required for CLIP tokenizer).

5. Run Application

mvn spring-boot:run

6. Access Web UI

Open browser: http://localhost:8080

🎨 Generate your first image!

🔨 Local Development

📘 For troubleshooting and advanced configuration, see SETUP.md

Running with Maven

mvn spring-boot:run

Running with Docker

# Build image
docker build -t gpu-langchain4j-demo:latest .

# Run with GPU (if CUDA available)
docker run --gpus all -p 8080:8080 \
  -v $(pwd)/models:/app/models \
  gpu-langchain4j-demo:latest

Configuration

Edit src/main/resources/application.yml:

gpu:
  langchain4j:
    gpu:
      enabled: true  # Set to false for CPU-only mode
      device-id: 0
    model:
      dir: ./models

Health Check

curl http://localhost:8080/actuator/health

Expected Response:

{
  "status": "UP",
  "gpuAvailable": false,
  "modelsLoaded": true,
  "stableDiffusion": "Ready (SD4J)"
}

☁️ Azure Deployment

🎯 Quick Deploy (Automated)

Deploy to Azure Container Apps with GPU in one command:

./deploy-azure-aca.sh

This script will:

  • ✅ Create Azure Container Registry
  • ✅ Build and push Docker image (~5GB)
  • ✅ Create Container Apps Environment with GPU
  • ✅ Deploy application
  • ✅ Output application URL

Total time: 15-20 minutes

⚙️ Configuration

Customize deployment with environment variables:

export RESOURCE_GROUP="gpu-demo-rg"
export LOCATION="eastus"
export ACR_NAME="gpudemoregistry"
export GPU_PROFILE="NC8as_T4_v3"  # or NC24ads_A100_v4

./deploy-azure-aca.sh

GPU Options:

  • NC8as_T4_v3 - NVIDIA T4 (16GB), $0.526/hour
  • NC24ads_A100_v4 - NVIDIA A100 (80GB), $3.672/hour

📖 Complete Guide

For detailed instructions, see:

🧪 Test Deployment

# Get application URL
APP_URL=$(az containerapp show --name gpu-langchain4j-demo --resource-group gpu-demo-rg --query properties.configuration.ingress.fqdn -o tsv)

# Health check
curl https://$APP_URL/actuator/health

# Generate test image
curl -X POST https://$APP_URL/api/langchain4j/image \
  -H "Content-Type: application/json" \
  -d '{"prompt": "sunset over mountains", "style": "CLASSIC"}' \
  --output test-azure.png

🗑️ Cleanup

az group delete --name gpu-demo-rg --yes

📚 API Documentation

Base URL

http://localhost:8080/api/langchain4j

Endpoints

1. Generate Image

POST /image

Generate a cartoon-style image from text prompt using Stable Diffusion.

Request:

{
  "prompt": "A friendly robot helping with Azure deployment",
  "style": "HAPPY"
}

Response: image/png (binary)

cURL Example:

curl -X POST http://localhost:8080/api/langchain4j/image \
  -H "Content-Type: application/json" \
  -d '{"prompt":"A friendly robot at a computer","style":"CLASSIC"}' \
  --output generated-image.png

2. Compare Embeddings

POST /embeddings

Compare semantic similarity between two text snippets.

Request:

{
  "text1": "Azure Container Apps",
  "text2": "Cloud container platform"
}

Response:

{
  "text1": "Azure Container Apps",
  "text2": "Cloud container platform",
  "similarity": 0.8532,
  "processingTimeMs": 45
}

cURL Example:

curl -X POST http://localhost:8080/api/langchain4j/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "text1": "GPU acceleration",
    "text2": "CUDA processing"
  }'

3. Health Check

GET /health

Get system health and GPU status.

Response:

{
  "status": "UP",
  "gpuAvailable": true,
  "modelsLoaded": true,
  "timestamp": "2025-09-29T12:00:00Z"
}

4. List Models

GET /models

Get information about loaded models.

Response:

{
  "models": {
    "stableDiffusion": {
      "name": "Stable Diffusion v1.5",
      "path": "/app/models/stable-diffusion/model.onnx",
      "available": true,
      "sizeBytes": 3442332160
    },
    "allMiniLmL6V2": {
      "name": "All-MiniLM-L6-v2",
      "path": "/app/models/all-MiniLM-L6-v2/model.onnx",
      "available": true,
      "sizeBytes": 90000000
    }
  },
  "gpuAvailable": true,
  "modelsLoaded": true
}

⚙️ Configuration

Environment Variables

Variable Default Description
GPU_LANGCHAIN4J_GPU_ENABLED true Enable GPU acceleration
GPU_LANGCHAIN4J_GPU_DEVICE_ID 0 CUDA device ID
GPU_LANGCHAIN4J_MODEL_DIR ./models Model directory path
JAVA_OPTS -Xmx8g -XX:+UseZGC JVM options
SPRING_PROFILES_ACTIVE default Spring profile

GPU Configuration

gpu:
  langchain4j:
    gpu:
      enabled: true
      device-id: 0
      inter-op-threads: 4
      intra-op-threads: 8

Model Configuration

gpu:
  langchain4j:
    model:
      dir: /app/models
      download:
        enabled: true
        azure-storage-account: mystorageaccount
        azure-storage-container: onnx-models

🐛 Troubleshooting

📘 For comprehensive troubleshooting, see SETUP.md

Common Issues

1. libortextensions.so not found

Error: Failed to load library ./libortextensions.so

Solution:

# Build the library (takes ~10 minutes)
./download-ortextensions.sh

2. Models Not Found

Error: Stable Diffusion model not found

Solution:

# Download models and organize structure
./download-missing-models.sh

# Verify structure
ls -R models/stable-diffusion/

3. Out of Memory

Error: OutOfMemoryError during generation

Solution:

# Increase JVM heap (requires 6-8GB for CPU mode)
export JAVA_OPTS="-Xmx8g -XX:+UseZGC"
mvn spring-boot:run

4. Slow Image Generation

CPU Mode Expected Times:

  • First generation: ~60 seconds (model loading)
  • Subsequent: ~30-45 seconds per image

Optimization:

# Reduce inference steps for faster generation
gpu:
  langchain4j:
    inference-steps: 20  # Default 40

Logs

# Application logs
mvn spring-boot:run --debug

# Azure Container Apps logs
az containerapp logs show --name gpu-langchain4j-app \
  --resource-group gpu-langchain4j-rg --follow

📊 Performance

Benchmarks (T4 GPU)

Operation Average Time Throughput
Image Generation (512x512) 2.3s ~0.43 img/s
Text Embedding (single) 15ms ~66 req/s
Text Embedding (batch of 10) 45ms ~222 req/s

Benchmarks (A100 GPU)

Operation Average Time Throughput
Image Generation (512x512) 0.8s ~1.25 img/s
Text Embedding (single) 8ms ~125 req/s
Text Embedding (batch of 10) 25ms ~400 req/s

🔒 Security

  • HTTPS: Enabled by default in Azure Container Apps
  • CORS: Configured for web UI access
  • Secrets: Use Azure Key Vault for sensitive configuration
  • Authentication: Add Azure AD authentication for production

📝 License

This project is licensed under the MIT License - see LICENSE file.

🤝 Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing)
  5. Open a Pull Request

📧 Support

🙏 Acknowledgments

🗺️ Roadmap

  • Add more ONNX models (BERT, GPT, etc.)
  • Implement model quantization for faster inference
  • Add batch processing endpoints
  • Support for multi-GPU deployment
  • Implement caching layer with Redis
  • Add Prometheus metrics export
  • Create Helm chart for Kubernetes deployment

Made with ❤️ using Java 21, Spring Boot, LangChain4j, and Azure

About

GPU-accelerated Stable Diffusion image generation with Java, LangChain4j, SD4J and ONNX Runtime on Azure Container Apps

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published