Skeleton Codebase for RL Experiments (Hydra • Submitit • MariaDB)

A lightweight, extensible template for running large-scale reinforcement learning (RL) experiments with structured configuration, sweep execution (local or Slurm), and automatic logging to MySQL/MariaDB. Notebooks are provided for result retrieval and visualization.

Supported Algorithms

Algorithms implemented under agents/:

Greedy Actor–Critic (greedy_ac) — Neumann et al., 2018
MPO — Maximum a Posteriori Policy Optimization (mpo) — Abdolmaleki et al., 2018
PPO — Proximal Policy Optimization (ppo) — Schulman et al., 2017 — see main_ppo.py for an entry point
SAC — Soft Actor–Critic (sac) — Haarnoja et al., 2018
TD3 — Twin Delayed DDPG (td3) — Fujimoto et al., 2018
Tsallis AWAC (tsallis_awac) — Zhu et al., 2024

Policy Parameterizations

Policy heads/modules under models/policy_parameterizations/:

beta — Beta distributions (bounded policy)
gaussian — Clipped Gaussian policies
q_gaussian — q-Gaussian policies
softmax — Categorical (discrete) policies
squashed_gaussian — Tanh-squashed Gaussian
student — Student‑t policies with learnable dof (heavy‑tailed)

Installation

Requirements

Python 3.10 or newer. Some features rely on Python ≥3.10.
A running MySQL/MariaDB server you can connect to (local or remote).

pip install -r requirements.txt

Configuration

Database credentials

Add your DB credentials to configs/db/credentials.yaml.

You can log results to a local database, to a Google Cloud VM, or to a Compute Canada (CC) database server by following these instructions.

Tip: Create a credentials-local.yaml and add it to .gitignore so you don’t commit secrets.

Database prefix (Compute Canada)

Set db_prefix to your username in configs/config.yaml. This is required on CC, where database names must begin with your CC username.

Database schema

The schema used for logging is defined in configs/schema/default-schema.yaml.

Not using the database

If you do not want to log to a database server, you can remove the ExperimentManager and Metric from main.py and use your own logger. This is the only place where the database is referenced.

Running Experiments

Quick test (single run)

Run a single experiment by omitting the sweep flag:

python main.py -cn config_joblib_local run=0

Local sweep (joblib)

Run a sweep of experiments as defined in configs/config_joblib_local.yaml. The args.run argument is the run ID for the first experiment; subsequent experiments are assigned args.run + sweep_id in the database.

python main.py -m -cn config_joblib_local run=0

If you’re running on a machine without Slurm, use the joblib configuration files.

Slurm sweep (submitit)

Use the submitit config for your cluster (e.g., config_submitit_cedar). Example sweep:

python main.py -m -cn config_submitit_cedar run=0

To schedule a single run on Slurm, remove the -m flag.

Run IDs and database primary key

The run field defined in the config files is the primary key in the database. Within a single database, this key must be unique or the experiment will not start.

You can ensure uniqueness by either:

Passing a unique run value for each experiment, or
Using a separate database for each sweep.

Results & Analysis

All run statistics are logged to your configured database. See the example notebooks under notebooks/ for how to retrieve and visualize results.

Repository Structure

agents/                             # Agent implementations and interfaces
configs/                            # Hydra configs (env, db, sweeps, etc.)
environments/                       # Environment helper
experiment/                         # Experiment manager and logger
models/                             # Policy / value networks
models/policy_parameterizations/    # Policy parameterization modules
notebooks/                          # Example analysis notebooks
utils/                              # Common utilities
main.py                             # Entry point for sweeps and single runs
main_ppo.py                         # Entry point for PPO
save_policy_evolution.py            # Entry point for saving policy snapshots

Adding Environments

If your environment is a Gymnasium environment, you only need to add a config file under configs/env/. The environment will be initialized via the factory in environments/factory.py. Note that this codebase is tested only on gymnasium classical control. It may or may not work with other environments with multi-dimensional action space (though the changes necessary wouldn't be big).

Troubleshooting

If your code is not running or logging properly, the chances are that it is due to a problem with the database. Many of these errors may be silent. Try one of the following:

DB connection errors → Double‑check configs/db/credentials.yaml (or your local override), network access/ports, and the db_prefix on CC.
Duplicate run IDs → Use a fresh run or a separate database per sweep.
Schema errors → Ensure that whatever you are logging is defined properly in the schema file. The datatypes and the order of entries matter.

Contributing

Contributions are welcome!

Bugs & features: Please open an issue to report bugs or request features. For larger changes, start a discussion first to align on scope and design.
Pull requests: Keep PRs focused and well-described. Where applicable, update docs/examples and include a minimal reproduction if you change experiment logic.

Citation

This codebase was used for online RL experiments in our ICLR 2025 paper q-Exponential Family For Policy Optimization. If you found our work helpful in your research and need a reference, then you may use the following bibtex:

@inproceedings{ICLR2025_6507b115,
 author = {Zhu, Lingwei and Shah, Haseeb and Wang, Han and Nagai, Yukie and White, Martha},
 booktitle = {International Conference on Representation Learning},
 editor = {Y. Yue and A. Garg and N. Peng and F. Sha and R. Yu},
 pages = {40717--40744},
 title = {q-exponential family for policy optimization},
 url = {https://proceedings.iclr.cc/paper_files/paper/2025/file/6507b115562bb0a305f1958ccc87355a-Paper-Conference.pdf},
 volume = {2025},
 year = {2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Skeleton Codebase for RL Experiments (Hydra • Submitit • MariaDB)

Supported Algorithms

Policy Parameterizations

Installation

Requirements

Configuration

Database credentials

Database prefix (Compute Canada)

Database schema

Not using the database

Running Experiments

Quick test (single run)

Local sweep (joblib)

Slurm sweep (submitit)

Run IDs and database primary key

Results & Analysis

Repository Structure

Adding Environments

Troubleshooting

Contributing

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
agents		agents
configs		configs
environments		environments
experiment		experiment
models		models
notebooks		notebooks
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
main_ppo.py		main_ppo.py
requirements.txt		requirements.txt
save_policy_evolution.py		save_policy_evolution.py

License

haseebs/librl

Folders and files

Latest commit

History

Repository files navigation

Skeleton Codebase for RL Experiments (Hydra • Submitit • MariaDB)

Supported Algorithms

Policy Parameterizations

Installation

Requirements

Configuration

Database credentials

Database prefix (Compute Canada)

Database schema

Not using the database

Running Experiments

Quick test (single run)

Local sweep (joblib)

Slurm sweep (submitit)

Run IDs and database primary key

Results & Analysis

Repository Structure

Adding Environments

Troubleshooting

Contributing

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages