This repository contains the infrastructure and bootstrap code for the vLLM continuous integration pipeline using Buildkite.
Current CI Infrastructure Setup:
- AWS Buildkite Elastic CI Stack: Infrastructure code in
terraform/aws - TPU v5/v6e Nodes on GCP: Infrastructure code in
terraform/gcp_old - GKE Cluster on GCP (currently not in use): Infrastructure code in
terraform/gcp
Buildkite bootstrap scripts & pipeline template files are located in the buildkite/ directory.
vLLM leverages Buildkite for CI workflow. Whenever a commit is pushed to the vLLM GitHub repository, a Buildkite webhook triggers an event that initiates a new build in the Buildkite pipeline with relevant details like Github branch and commit.
Build Process Overview:
-
Bootstrap Step:
- Executed via
buildkite/bootstrap.sh. - Utilizes a CI Jinja2 template (
buildkite/test-template-ci.j2) along with the list of tests from vLLM to render a Buildkite YAML configuration that defines all build/test steps and their configurations. - Uploads the rendered YAML to Buildkite to initiate the build.
- Note: We are transitioning to a custom Buildkite pipeline generator to replace the Jinja2 template rendering soon.
- Executed via
-
Job Queueing and Execution:
- Each Buildkite step is associated with an agent queue.
- After uploaded, steps are pushed into the queue, waiting to be picked up by a Buildkite agent.
We use the Buildkite Elastic CI Stack to set up our autoscaling Buildkite agent cluster on AWS.
Components of the stack for each Agent Queue:
-
AWS CloudFormation Stack:
- Contains an EC2 Auto Scaling Group and an AWS Lambda function.
-
EC2 Auto Scaling Group:
- Automatically scales number of EC2 instances based on the workload from the Buildkite queue.
- Each EC2 instance comes with a Buildkite agent that executes jobs.
-
AWS Lambda Function:
- Periodically polls Buildkite to assess capacity needs for the queue and adjusts the size of the Auto Scaling Group accordingly.
- Create a feature branch on this repo, say named
my-feature-branch. If you can't create a feature branch, ping @khluu to add you into the repo. - Once the branch is created, you can start making changes and commit to the branch.
- After the changes are pushed to the branch, wait a few minutes, then create a new build on Buildkite with this environment variable
VLLM_CI_BRANCH=my-feature-branchto test your changes against vLLM codebase.
Please note that when creating a new build on Buildkite:
- Please do it on your own feature branch/fork branch on vLLM, preferrably a branch that is up to date with
main. - If it's a fork branch,
HEADcannot be used as commit when creating a build. You have to put in the hash of the latest commit on your branch. Also, format the branch name to include your fork prefix as<fork/username>:<branch name on fork>.
The machines communicate with Buildkite server via an agent being installed on the machine. There are multiple ways agents can be installed, depending on how the machines are set up:
- Buildkite Elastic CI stack if you want the compute to be autoscaling EC2 instances.
- Buildkite K8s agent stack if you want machines to be managed/orchestrated in a Kubernetes cluster.
- Buildkite agent if you already have existing standalone machines.
For all of these approaches, you would need the following info to set up (please contact @khluu on #sig-ci channel - vllm-dev.slack.com to get them):
- Buildkite agent token
- Buildkite queue name
- (optional) Buildkite cluster UUID
If you go with option 1 or 2, these info would need to be provided when you setup the stack. For option 3, it doesn't require it when installing agent. After installation, you would need to manually:
- Add these info in the agent config (usually located in
/etc/buildkite-agent/buildkite-agent.cfg) - Restart the agent (usually
systemctl stop buildkite-agentthensystemctl start buildkite-agentworks)