Skip to content

andreyYaky/mingpt

Repository files navigation

mingpt

GPTs trained with shakespeare dataset. Includes: small 10.8M GPT mimicking Andrej Karpathy's video lecture, Universal Transformer with Adaptive Computation Time

Implemented Techniques:

  • Multihead Attention
  • KV Caching
  • SwiGLU (Swish Gated Linear Unit) FeedForward
  • RoPE: Rotary Positional Embeddings

Libraries

  • einops

About

GPTs trained with shakespeare dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages