diff --git a/tutorials/tutorial10_rllib_ec2.ipynb b/tutorials/tutorial10_rllib_ec2.ipynb new file mode 100644 index 00000000..fdad3420 --- /dev/null +++ b/tutorials/tutorial10_rllib_ec2.ipynb @@ -0,0 +1,151 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 10: Running RLlib experiments on EC2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This tutorial walks through how to run RLlib experiments on an AWS EC2 instance. This assumes that the machine you are using has already been configured for AWS (i.e. `~/.aws/credentials` is properly set up). For more detailed documentation, please view: https://github.com/ray-project/ray/blob/master/doc/source/autoscaling.rst" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Getting Dependencies \n", + "\n", + "\n", + "* First, make sure your version of ray is tracking http://github.com/eugenevinitsky/ray. To do this, go to your ray directory and run `git remove -v` and confirm that the branch you are trackng matches \"eugenevinitsky/ray\"\n", + "\n", + "\n", + "* Install the `rayutils` package from https://github.com/richardliaw/rayutils: \n", + "\n", + "`pip install -e git+https://github.com/richardliaw/rayutils.git#egg=rayutils`\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Modify Configuration\n", + "\n", + "This section explains the components of `/learning-traffic/scripts/ray_autoscale.yaml` you'll want to customize. These descriptions are also listed in the `ray_autoscale.yaml`. We'll go over some of the variables you should change, as well as those that might come in handy for you:\n", + "\n", + "* Modify `cluster_name`: A unique identifier for the head node and workers of this cluster. If you want to set up multiple clusters, `cluster_name` must be changed each time the script is run.\n", + "\n", + " \n", + "* Modify `file_mounts`: _change me!_ You'll want to change these file mounts. \n", + " * \"tmp/path\" indicates the path to the version of Flow you intend to use. This is specified in the format`#\"/tmp/path\": \"/.git/refs/heads/\"`\n", + " * \"tmp/ray_autoscaler_key\" is the path to the ray autoscaler key. For most, this will be found in ~/.ssh" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup Clusters\n", + "\n", + "* To create the cluster, run: `ray create_or_update ray_autoscale.yaml -y`\n", + "\n", + "* To set up ray in the cluster, run: `ray2 setup ray_autoscale.yaml`\n", + "\n", + "After step 5 is complete, you can login to the cluster via: `$(ray2 login_cmd ray_autoscale.yaml)`. Note that you can run commands from outside the cluster via: `ray2 submit ray_autoscale.yaml [--background] [--shutdown] test.py`, where test.py is an example script." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Run Experiments\n", + "\n", + "The cluster is all set up and you are ready to run an experiment! From `/learning-traffic/scripts`, run: \n", + "\n", + "`./run_rllib.sh -f /Users/kathyjang/research/rllab-multiagent/learning-traffic/examples/rllib/figure_eight.py -s`\n", + "\n", + "--- \n", + "### Results and Caveats\n", + "The experiment is now being run! Results are by default logged in ~/ray_results\n", + "\n", + "The `run_rllib.sh` script can be run with a few different flags: \n", + "* -f is required, indicates the script to be ran on the cluster\n", + "* -s instructs the cluster to shutdown after the script is done running\n", + "* -b runs the script in the background (recommended for long experiments)\n", + "* -n TODO: This is listed as an option in `run_rllib.sh` but there's no if clause supporting it, nor do I see an analog in https://github.com/richardliaw/rayutils/blob/master/rayutils/rayutils.py Whoever wrote this can you chime in? \n", + "\n", + "For background users: RLlib uses `screen`, a Linux utility for managing processes in order to run scripts in the background. This means your experiment is running in a \"screen\" separate to the main screen you can interface with. If you ran an experiment with the -b flag, here's how to check up on the progress of your experiment. Login to the cluster and enter `screen -r` in order to reattach the other screen. Once reattached you should immediately be able to see the stdout string of your running experiment. To detach from this screen, hit `Ctrl-d` to signal for commands to be sent to screen rather than than the shell, then hit `a`. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Close Clusters\n", + "\n", + "If you didn't run `./run_rllib.sh` with the -s option, then you will need to shutdown the cluster manually. To do this, log on to the cluster and run: \n", + "\n", + "`ray2 shutdown`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Troubleshooting\n", + "\n", + "\n", + "- NOTE: If pyarrow is an issue or Ray is being an issue, this is what I did. basically you have to completely get rid of ray and reinstall it again\n", + " - source activate [your_env]\n", + " - `rm -rf ray`\n", + " - repeat the following until `which ray` returns blank:\n", + " - `which ray`\n", + " - `rm [the output of which ray]`\n", + " - this gets rid of the binary installed. Idk if this is necessary but I did it. after this step it’s as if ray never existed on your system\n", + " - Go to directory you want to install ray and run: `git clone https://github.com/eugenevinitsky/ray.git`\n", + " - `cd ray/python`\n", + " - `python setup.py develop`\n", + " \n", + " \n", + " \n", + "* pip install for rayutils doesn't work\n", + " * this is being run in \"editable\" mode\n", + " * NOTE: Richard Liaw's Git README suggests that you run the following command: `pip install git+https://github.com/richardliaw/rayutils.git`. I suspect there's something wrong with the repo structure, because rayutils is nowhere to be found after pip returns a successful installation. My edit does the pip installation in \"edit\" mode, and the #rayutils at the end of the command denotes the name of the package\n", + " TODO: Could someone confirm that ^ doesn't work?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}