Skip to content
Change the repository type filter

All

    Repositories list

    • LightLLM

      Public
      LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
      Python
      2913.8k8034Updated Jan 6, 2026Jan 6, 2026
    • LightX2V

      Public
      Light Image Video Generation Inference Framework
      Python
      1251.7k1020Updated Jan 6, 2026Jan 6, 2026
    • LightTTS

      Public
      LightTTS is a lightweight TTS inference framework optimized for CosyVoice2 and CosyVoice3, enabling fast and scalable speech synthesis in Python and supports stream and bistream modes.
      Python
      31910Updated Jan 5, 2026Jan 5, 2026
    • mtc-incremental-bpe

      Public
      Incremental BPE tokenization for all prefixes
      Rust
      0000Updated Jan 5, 2026Jan 5, 2026
    • Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
      Python
      421.1k280Updated Jan 1, 2026Jan 1, 2026
    • LightMem

      Public
      Python
      0300Updated Dec 30, 2025Dec 30, 2025
    • modeltc.github.io

      Public
      HTML
      0000Updated Dec 29, 2025Dec 29, 2025
    • SageAttention

      Public
      Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
      Cuda
      304200Updated Dec 18, 2025Dec 18, 2025
    • Python bindings for general-sam and some utilities
      Python
      0500Updated Dec 15, 2025Dec 15, 2025
    • mtc-token-healing

      Public
      Token healing implementation in Rust
      Rust
      0400Updated Dec 15, 2025Dec 15, 2025
    • verl

      Public
      verl: Volcano Engine Reinforcement Learning for LLMs
      Python
      3k100Updated Dec 15, 2025Dec 15, 2025
    • ComfyUI custom node for lightx2v
      Python
      77240Updated Dec 11, 2025Dec 11, 2025
    • slime

      Public
      slime is an LLM post-training framework for RL Scaling.
      Python
      395000Updated Dec 8, 2025Dec 8, 2025
    • A general suffix automaton implementation in Rust with Python bindings
      Rust
      0800Updated Dec 1, 2025Dec 1, 2025
    • SCSS
      1101Updated Nov 26, 2025Nov 26, 2025
    • LightKernel

      Public
      HTML
      0300Updated Nov 26, 2025Nov 26, 2025
    • Greedily tokenize strings with the longest tokens iteratively.
      Python
      0003Updated Nov 24, 2025Nov 24, 2025
    • [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
      Python
      66652400Updated Nov 19, 2025Nov 19, 2025
    • 0000Updated Nov 12, 2025Nov 12, 2025
    • Wan2.2-Lightning: Speed up wan2.2 model with distillation
      Python
      1.6k253210Updated Nov 7, 2025Nov 7, 2025
    • Python
      16000Updated Nov 6, 2025Nov 6, 2025
    • [ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
      Cuda
      304000Updated Nov 6, 2025Nov 6, 2025
    • FlashVSR

      Public
      Towards Real-Time Diffusion-Based Streaming Video Super-Resolution — An efficient one-step diffusion framework for streaming VSR with locality-constrained sparse attention and a tiny conditional decoder.
      Python
      99100Updated Nov 5, 2025Nov 5, 2025
    • Python
      840130Updated Nov 3, 2025Nov 3, 2025
    • HBP

      Public
      [NeurIPS 2025] This is the official PyTorch implementation of "Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM".
      Python
      0400Updated Sep 30, 2025Sep 30, 2025
    • TFMQ-DM

      Public
      [CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
      Jupyter Notebook
      510800Updated Sep 29, 2025Sep 29, 2025
    • fa3

      Public
      Python
      1000Updated Aug 7, 2025Aug 7, 2025
    • Dockerfile
      2000Updated Jul 24, 2025Jul 24, 2025
    • HarmoniCa

      Public
      [ICML 2025] This is the official PyTorch implementation of "🎵 HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration".
      Python
      14400Updated Jul 10, 2025Jul 10, 2025
    • OmniBal

      Public
      [ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance".
      Python
      32630Updated Jun 16, 2025Jun 16, 2025