Skip to content

mlsquare/picotron-wgpu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

PicoTron WGPU

A minimalistic 4D-parallelism distributed training framework for LLaMA-like models, implemented using WGPU (WebGPU) for cross-platform GPU acceleration.

๐Ÿš€ Features

  • Cross-platform GPU acceleration using WGPU/WebGPU
  • 4D Parallelism support: Data, Tensor, Pipeline, and Context parallelism
  • WGSL shaders compiled to SPIR-V for maximum compatibility
  • Complete training pipeline with gradient computation and optimization
  • Text generation with character-level tokenization
  • Educational focus with comprehensive examples and documentation

๐Ÿ—๏ธ Architecture

Core Components

  • Model: PicoTron model with embedding, attention, and output layers
  • Training: Complete training pipeline with cross-entropy loss and SGD optimizer
  • GPU Operations: WGPU-based matrix multiplication, attention, and layer normalization
  • Parallelism: Framework for 4D parallelism (Data, Tensor, Pipeline, Context)
  • Tokenizer: Simple character-level tokenizer for text processing

GPU Backends

  • Metal (macOS/iOS)
  • Vulkan (Linux/Windows/Android)
  • DirectX 12 (Windows)
  • OpenGL (fallback)

๐Ÿ“ฆ Installation

Prerequisites

  • Rust 1.70+
  • WGPU-compatible GPU drivers
  • For macOS: Metal support
  • For Linux: Vulkan drivers
  • For Windows: DirectX 12 or Vulkan drivers

Build

git clone https://github.com/mlsquare/picotron-wgpu.git
cd picotron-wgpu
cargo build --release

๐ŸŽฏ Quick Start

Basic Example

cargo run --example basic_example

Inference Example

cargo run --example inference_example

Training Example

cargo run --example training_example

๐Ÿ“š Examples

1. Basic Initialization

use picotron_wgpu::*;

#[tokio::main]
async fn main() -> Result<()> {
    // Initialize WGPU device
    let device = PicoTronWgpuDevice::new().await?;
    
    // Create model configuration
    let config = PicoTronConfig::default();
    
    // Create model
    let model = PicoTronModel::new(config.model).await?;
    
    Ok(())
}

2. Text Generation

use picotron_wgpu::*;

#[tokio::main]
async fn main() -> Result<()> {
    let config = PicoTronConfig::default();
    let tokenizer = SimpleTokenizer::new(config.model.vocab_size);
    let model = PicoTronModel::new(config.model).await?;
    
    // Generate text
    let prompt = "hello";
    let input_tokens = tokenizer.encode(prompt);
    let generated_tokens = model.generate(&input_tokens, 20)?;
    let generated_text = tokenizer.decode(&generated_tokens);
    
    println!("Generated: {}", generated_text);
    Ok(())
}

3. Training

use picotron_wgpu::*;

#[tokio::main]
async fn main() -> Result<()> {
    let config = PicoTronConfig::default();
    let mut model = PicoTronModel::new(config.model).await?;
    
    // Training loop
    for epoch in 0..10 {
        let loss = model.compute_loss_and_gradients(&input_ids, &target_ids)?;
        model.update_parameters(0.01);
        println!("Epoch {}: Loss = {:.4}", epoch, loss);
    }
    
    Ok(())
}

๐Ÿ”ง Configuration

Model Configuration

let mut config = PicoTronConfig::default();
config.model.vocab_size = 100;
config.model.hidden_size = 128;
config.model.num_attention_heads = 4;
config.model.num_hidden_layers = 2;
config.model.max_position_embeddings = 50;

Training Parameters

  • Learning Rate: 0.01 (default)
  • Batch Size: Configurable
  • Epochs: 10 (example)
  • Optimizer: Simple SGD

๐ŸŽฎ GPU Operations

Matrix Multiplication

let result = gpu_ops.matmul(&buffer_a, &buffer_b, &buffer_c, m, n, k).await?;

Attention

let result = gpu_ops.attention(&query, &key, &value, &output, 
                              batch_size, num_heads, seq_len, head_dim).await?;

Layer Normalization

let result = gpu_ops.layer_norm(&input, &output, &gamma, &beta,
                                batch_size, seq_len, hidden_size, eps).await?;

๐Ÿงช Testing

# Run all tests
cargo test

# Run specific test
cargo test test_name

# Run with logging
RUST_LOG=debug cargo test

๐Ÿ“Š Performance

Benchmarks

  • Apple M3: ~100ms for 1000 tokens generation
  • Training: ~50ms per epoch on small corpus
  • Memory: ~64MB for small model (100 vocab, 128 hidden)

Optimization Tips

  1. Use release builds: cargo build --release
  2. Enable GPU-specific optimizations
  3. Batch operations when possible
  4. Use appropriate workgroup sizes

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Hugging Face for the original PicoTron concept
  • WGPU team for the excellent cross-platform GPU abstraction
  • Rust community for the amazing ecosystem

๐Ÿ“ž Support


PicoTron WGPU - Cross-platform GPU-accelerated neural network training in Rust ๐Ÿฆ€โšก

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published