Skip to content

[SPEC] Automated testnet deployment #664

@tbraun96

Description

@tbraun96

Overview

As we know, pipelines passing is never enough to determine if we're actually stable enough for a merge into master. Even though it's a good indicator that, in well-behaved environments, the code will run for at least a limited number of sessions, we must use the live testnet to prove that we can successfully run hundreds to thousands of sessions without fail. If we wish for increasing long-term stability, the practice of using the testnet before merging code into master should be codified into our development process to ensure we don't undo our hard work.

Currently, we have no enforced policies in Github Actions that require us to pass a testnet before merging into master. Deploying to testnets is manual, whereby @1xstj deploys the latest commit from the PR branch onto the testnet using SSH to execute a remote terminal.

Sometimes, we do not actually need a testnet, and other times, we do need a testnet. Because of this indeterminacy, we can use environments and auto-defined deployment targets to adjust the deployment target. One target will be the testnet deployment, and the other deployment target will be the null deployment. For the testnet deployment, the job passes when e.g. 500 sessions passes. We can use websockets or polling to determine when to terminate. For the null deployment, the job passes instantly. So long as one of these passes (as well as the normal PR checks), we can then confidently merge into master assured of the stability of the PR.

Task List

  • Create a Github-actions only SSH credentials for AWS testnet access, then store those credentials inside github secrets
  • Define criteria for when testnet deployment is required (e.g., based on files-changed, hash of cargo.toml changes, etc)
  • Create the testnet deployment environment with accompanying logic for connecting to SSH then executing the deployment commands. Require that this environment only begins with manual approval.
  • Create the github actions file that detects which environment is required then either deploys to the testnet or instantly finishes (i.e., the null deployment). We can call this the testnet-deployer

The ideal workflow:

Case A: User makes a PR that does not affect core logic

The testnet-deployer runs and detects that this PR does not need a testnet and instantly returns true

Case B: User makes a PR that affects core logic

The testnet-deployer runs and detects that this PR needs a testnet and submits a testnet deployment request. A manual approval must then be given in the GitHub interface. Next, the testnet-deployer continues by executing the relevant SSH commands, thus starting a testnet. Then, the testnet-deployer uses websockets (or polls) until it either notices stalling or the 500 session target is reached. Finally, the testnet-deployer either returns success or falure depending on the previous result.

Further discussion

We may not want auto-detection when choosing a deployment target. If we wish for manual selection of the deployment target, the testnet-deployer can instead send two simultaneous requests to both the testnet deployment target and the null deployment target and wait for approval. If we decide we need a testnet, we deny the null deployment and accept the testnet, and if we decide not to need a testnet, we deny the testnet request and accept the null deployment. The testnet-deployer, in either case, will succeed once either request returns with a success. The accepted testnet deployment does not succeed until the 500 sessions are reached, whereas the null deployment succeeds immediately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    optimization ⚙️Tasks that are refactor, optimize, or are considered chores.p3 🔵Issues should be resolved eventuallytask ✔️

    Type

    No type

    Projects

    Status

    Not Started 🕧

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions