Stable Error-seeking Exploration

This repository contains code to reproduce all results of the paper

Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization

Installation

Clone the repository and install all requirements from the requirements.txt. We recommend using a virtual environment. The following shows an example installation:

git clone https://github.com/Sebastian-Griesbach/SEE.git
cd SEE
python -m venv env
source env/bin/activate
pip install -r requirements.txt

This was tested with Python version 3.12.3.

Usage

Specific results can be repeated by executing run_experiment.py.

python see/run_experiment.py

Or directly in your IDE.

To recreate any of the experiments in the paper, open run_experiment.py, scroll down to the bottom and modify:

    if __name__ == "__main__":
        setup_and_run_experiment(
            environment_name="Pendulum-v1",
            reward_setting="adverse",
            method="SAC+SEE",
        )

You can choose between:

environment_name:
- "Pendulum-v1"
- "LocalOptimumCar-v0"
- "HalfCheetah-v5"
- "Ant-v5"
- "Hopper-v5"
- "Swimmer-v5"
- "FetchPickAndPlace-v4"
- "LargePointMaze-v3"
reward_setting:
- "dense"
- "sparse"
- "adverse"
method:
- "SAC"
- "TD3"
- "SAC+SEE"
- "TD3+SEE"
- "SAC+SEE w/o maximum update"
- "SAC+SEE w/o conditioning"
- "SAC+SEE w/o mixing"
- "TD3+SEE w/o maximum update"
- "TD3+SEE w/o conditioning"
- "TD3+SEE w/o mixing"

All other settings will be automatically set to match the settings of the experiments in the paper. The code is optimized for readability and is fully documented with docstrings.

wandb is used for logging which requires an account to use. If you want to change that, there are three occurrences of wandb calls in run_experiment.py which you can replace with an alternative. It is not used outside of this file.

Most environments are included in the environments folder. But the Gymnasium-Robotics environments are based on a fork of that repository, which alters the reward functions of the relevant environments.

SAC+SEE and TD3+SEE are implemented using the Athlete API. For the baseline SAC and TD3 the Athlete implementations are used directly.

Cite us

@article{griesbach2025learning,
    title={Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization},
    author={Griesbach, Sebastian and D'Eramo, Carlo},
    journal={Reinforcement Learning Journal},
    volume={6},
    pages={1140--1157},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
see		see
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stable Error-seeking Exploration

Installation

Usage

Cite us

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Sebastian-Griesbach/SEE

Folders and files

Latest commit

History

Repository files navigation

Stable Error-seeking Exploration

Installation

Usage

Cite us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages