Skip to content
Change the repository type filter

All

    Repositories list

    • introspective-interp

      Public
      Repository for "Training Language Models To Explain Their Own Computations"
      Python
      31511Updated Dec 22, 2025Dec 22, 2025
    • docent

      Public
      Python
      77700Updated Dec 18, 2025Dec 18, 2025
    • .github

      Public
      11800Updated Dec 12, 2025Dec 12, 2025
    • observatory

      Public
      A toolkit for describing model features and intervening on those features to steer behavior.
      Python
      2022543Updated Dec 12, 2025Dec 12, 2025
    • τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
      Python
      136000Updated Sep 16, 2025Sep 16, 2025
    • Python
      42100Updated Sep 3, 2025Sep 3, 2025
    • Inspect: A framework for large language model evaluations
      Python
      365600Updated Mar 25, 2025Mar 25, 2025
    • inspect_evals

      Public archive
      Collection of evals for Inspect AI
      Python
      220100Updated Jan 29, 2025Jan 29, 2025