Skip to content
Change the repository type filter

All

    Repositories list

    • disco

      Public
      DISCO is a code-free and installation-free browser platform that allows any non-technical user to collaboratively train machine learning models without sharing any private data.
      TypeScript
      301765810Updated Jan 3, 2026Jan 3, 2026
    • Benchmarking Optimizers for LLM Pretraining
      Python
      24700Updated Dec 30, 2025Dec 30, 2025
    • Python
      162612Updated Dec 17, 2025Dec 17, 2025
    • ML_course

      Public
      EPFL Machine Learning Course, Fall 2025
      Jupyter Notebook
      1k2k41Updated Dec 15, 2025Dec 15, 2025
    • Official implementation of "Gradient-Normalized Smoothness for Optimization with Approximate Hessians"
      Jupyter Notebook
      0000Updated Nov 9, 2025Nov 9, 2025
    • nanoGPT-like codebase for LLM training
      Python
      3611333Updated Nov 7, 2025Nov 7, 2025
    • CoMiGS

      Public
      Python
      0100Updated Sep 24, 2025Sep 24, 2025
    • TiMoE

      Public
      A time aware language modeling framework
      Python
      0100Updated Aug 31, 2025Aug 31, 2025
    • EPFL Course - Optimization for Machine Learning - CS-439
      Jupyter Notebook
      3361.4k51Updated Jul 8, 2025Jul 8, 2025
    • Code for the paper "Enhancing Multilingual LLM Pretraining with Model-Based Data Selection"
      Python
      2300Updated May 16, 2025May 16, 2025
    • Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
      Python
      88610Updated Oct 30, 2024Oct 30, 2024
    • powersgd

      Public
      Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727
      Python
      3214911Updated Oct 29, 2024Oct 29, 2024
    • CoBo

      Public
      Python
      0000Updated Oct 22, 2024Oct 22, 2024
    • Exploration on-device self-supervised collaborative fine-tuning of large language models with limited local data availability, using Low-Rank Adaptation (LoRA). We introduce three distinct trust-weighted gradient aggregation schemes: weight similarity-based, prediction similarity-based and validation performance-based.
      Python
      0600Updated Sep 2, 2024Sep 2, 2024
    • SGD with compressed gradients and error-feedback: https://arxiv.org/abs/1901.09847
      Jupyter Notebook
      103222Updated Jul 25, 2024Jul 25, 2024
    • REQ

      Public
      Python
      01800Updated Jun 10, 2024Jun 10, 2024
    • CoTFormer

      Public
      Python
      0600Updated May 22, 2024May 22, 2024
    • Python
      0000Updated May 22, 2024May 22, 2024
    • Python
      16000Updated Apr 18, 2024Apr 18, 2024
    • Python
      108201Updated Apr 16, 2024Apr 16, 2024
    • DoGE

      Public
      Codebase for ICML submission "DOGE: Domain Reweighting with Generalization Estimation"
      5000Updated Feb 4, 2024Feb 4, 2024
    • Landmark Attention: Random-Access Infinite Context Length for Transformers
      Python
      3642780Updated Dec 20, 2023Dec 20, 2023
    • pam

      Public
      Python
      41600Updated Dec 9, 2023Dec 9, 2023
    • Python
      0400Updated Aug 18, 2023Aug 18, 2023
    • optML-pku

      Public
      summer school materials
      64600Updated Aug 4, 2023Aug 4, 2023
    • Code for Multi-Head Attention: Collaborate Instead of Concatenate
      Python
      2215361Updated Jun 12, 2023Jun 12, 2023
    • Jupyter Notebook
      615020Updated Jun 2, 2023Jun 2, 2023
    • difficulty-guided text summarization
      Python
      6501Updated May 22, 2023May 22, 2023
    • relaysgd

      Public
      Code for the paper “RelaySum for Decentralized Deep Learning on Heterogeneous Data”
      Jupyter Notebook
      31000Updated Apr 21, 2023Apr 21, 2023
    • Tools for experimentation and using run:ai. The aim is for these to be small self-contained utilities that are used by multiple people.
      Python
      0010Updated Mar 16, 2023Mar 16, 2023