Learnable Reward Function

The goal of this project is to explore an approach to learn a smooth reward function out of a sparse reward function in the context of Reinforcement Learning. This learnt reward function should make learning converge faster.

Main goals

Learn a smooth reward function given a sparse reward function.
Use a learnt smooth reward function to train an agent to meet the goal of the sparse reward function.
Compare the learnt smooth reward function against the sparse reward function (and ideally show that the learnt reward function makes agent learning converge faster).

Status

Done

A reasonable reward function is learnt for the CartPole environment. The reward values assigned to states can be observed on the heatmap plot.

Issues

The learnt reward function doesn't seem to let a policy learn sufficiently well.
Policy learning using stage-based difficulty adjustment doesn't necessarily adjust difficulty according to agent's abilities. Sometimes it increases difficulty too fast and the agent is unable to obtain enough possitive reward. This usually leads the agent not to improve in the right direction, potentially getting stuck in a bad solution.

Other observations

For some reason, the learnt reward function only produces negative values. This could be a cause of slow learning.

Next steps

Prioritary

Implementation: split components for better readibility and more flexibility to updates. The following components should be sequential: regular environment, transformation to sparse reward environment, learnt reward network and weighted average, agent.
Replace stage-based difficulty adjustment with performance-based difficulty adjustment. Test performance.
Replace stochastic gradient descent with mini-batch gradient descent. Test performance.

Optional

During reward function training, we could try interleaving between training the agent and the reward function. When one has its weights fixed, the other has them updated. This may increase stability while learning the reward function.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
agents		agents
difficulty_regulators		difficulty_regulators
plot		plot
reward_transforms		reward_transforms
.gitignore		.gitignore
README.md		README.md
env.py		env.py
parser.py		parser.py
test_training.py		test_training.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learnable Reward Function

Main goals

Status

Done

Issues

Other observations

Next steps

Prioritary

Optional

About

Uh oh!

Releases

Packages

Languages

csalcedo001/learnable_reward_function

Folders and files

Latest commit

History

Repository files navigation

Learnable Reward Function

Main goals

Status

Done

Issues

Other observations

Next steps

Prioritary

Optional

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages