-
Notifications
You must be signed in to change notification settings - Fork 9
Add design document for the operator architecture #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
alicefr
wants to merge
1
commit into
trusted-execution-clusters:main
Choose a base branch
from
alicefr:design-architecture
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+118
−0
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| # Trusted Execution Cluster Operator Architecture | ||
|
|
||
| ## Overview | ||
|
|
||
| This document describes the architecture of the Trusted Execution Cluster Operator, a Kubernetes operator written in Rust that manages the lifecycle of confidential computing nodes in a cluster. The operator coordinates machine registration, attestation key management, secret provisioning, and Trustee integration to enable trusted execution environments with TPM-based attestation. | ||
|
|
||
| The operator follows the Kubernetes operator pattern and is built using the [kube-rs](https://github.com/kube-rs/kube) library for Kubernetes API interactions. | ||
|
|
||
| ## Key Components | ||
|
|
||
| The operator consists of several interconnected components: | ||
|
|
||
| 1. **Registration Server**: HTTP service that handles initial machine registration and Ignition configuration delivery. Deployed in the *registration server*. | ||
| 2. **AttestationKey Controller**: Manages attestation key registration and approval. Deployed in the *register-ak* pod. | ||
| 3. **Trustee Integration**: Updates Trustee deployment with secrets and attestation keys for node verification. Part of the *operator* pod. | ||
| 4. **Machine Controller**: Reconciles Machine custom resources representing individual nodes. Part of the *operator* pod | ||
| 5. **Secret Management**: Generates and manages LUKS. encryption keys and attestation key secrets. Part of the *operator* pod. | ||
| 6. **Attestation Server and KBS**: [Trustee](https://github.com/confidential-containers/trustee) deployment handle the attestation request, the reference values and secrets. | ||
| 7. **Reference Values calculation**: calculate the reference values provided by the approved images. | ||
|
|
||
| ## Architecture Components | ||
|
|
||
| ### Registration Server Deployment | ||
|
|
||
| The operator deploys a registration server in the cluster that serves as the entrypoint for new machines joining the trusted cluster. | ||
|
|
||
|
|
||
| #### Registration Server Responsibilities | ||
|
|
||
| 1. **Ignition Configuration Delivery**: Serves Ignition snippets containing LUKS/Clevis pin configuration when machines perform merge requests during first boot | ||
| 2. **UUID Generation**: Generates unique identifiers for each registering machine | ||
| 3. **Machine Object Creation**: Creates Machine custom resources in Kubernetes for each registered node | ||
|
|
||
| The registration flow is initiated when a node boots with an initial Ignition configuration containing a merge directive pointing to the registration service endpoint (e.g., `http://register-server:8000/ignitioni-clevis-pin-trustee`). | ||
|
|
||
| A machine can be also manually registered by the admin, but then the Ignition produced by the operator, needs to be configured manually in the upcoming machine. | ||
|
|
||
| ### Machine Custom Resource | ||
|
|
||
| The Machine CRD represents an individual node/machine in the cluster and serves as the central coordination point for the node's lifecycle. | ||
|
|
||
| #### Machine Spec | ||
|
|
||
| The Machine object includes: | ||
|
|
||
| - **ID**: Unique identifier (UUID) for the machine | ||
|
|
||
| The machine is unique and an internal representation of an existing node for the operator. | ||
|
|
||
| #### Machine Lifecycle | ||
|
|
||
| 1. **Creation**: Created by the registration server when a node performs its first Ignition merge request | ||
| 2. **Reconciliation**: The operator watches for new Machine objects and triggers provisioning workflows | ||
| 3. **Association**: Linked with AttestationKey objects for TOFU (Trust On First Use) validation | ||
|
|
||
| ### Secret Generation and Trustee Update Flow | ||
|
|
||
| When the operator detects a new Machine object, it automatically provisions secrets and updates the Trustee deployment to enable attestation-based secret retrieval. | ||
|
|
||
| #### Secret Provisioning Process | ||
|
|
||
| - Creates Kubernetes owner reference linking the secret to the Machine object | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could mention the endpoint here too like you do for AK reg further below. I think it makes it clearer what happens technically. |
||
| - Ensures proper garbage collection when Machine is deleted | ||
| - Generates a LUKS encryption key for the node | ||
| - Creates a Kubernetes Secret containing the LUKS key | ||
| - Secret is namespaced and linked to the specific Machine object | ||
| - Updates the Trustee deployment to include the new secret | ||
| - Adds the secret as a volume mount in the Trustee pod spec | ||
| - **Triggers Trustee pod restart** to reload the updated secret and volume configuration | ||
| - When a machine is deleted, the secret is removed from the Trustee deployement which triggers a further pod restart. | ||
|
|
||
|  | ||
|
|
||
| ### Attestation Key Registration | ||
|
|
||
| The operator implements a comprehensive attestation key (AK) registration system based on TOFU model. For detailed information about the attestation key provisioning process, see the [Attestation Key Provisioning Design Document](attestation_key_provisioning.md). | ||
|
|
||
| #### Registration Flow Integration | ||
|
|
||
| The AK registration is coordinated with Machine registration: | ||
|
|
||
| **During First Boot (handled by Ignition)** | ||
| 1. Ignition checks if `/var/tpm/ak.pub` exists | ||
| 2. If not present, generates a new AK in the TPM | ||
| 3. Contacts the operator's AK registration endpoint (e.g., `https://attestation-key-register:8001/register-ak`) | ||
| 4. Submits the AK public key in PEM format along with platform information | ||
|
|
||
| **Operator Processing** | ||
| 1. **AK Registration Service**: Receives and stores the AK public key | ||
| 2. **Machine Matching**: Associates the AK with the corresponding Machine object based on registration correlation. If no Machine machine exist, the AK isn't approved | ||
| 3. **AK Approval**: When a matching Machine exists, the AK is automatically approved | ||
| 4. **Secret Creation**: Creates a Kubernetes Secret containing the AK public key | ||
| 5. **Trustee Integration**: Updates Trustee deployment with the registered AK via `trustee::update_attestation_keys()` | ||
|
|
||
| **Trustee Pod Restart** | ||
| - The Trustee deployment is updated with the new AK secret | ||
| - Triggers a pod restart to load the new attestation key | ||
| - After restart, Trustee can verify attestation reports signed by the registered AK | ||
| - At machine deletion, the AK is also garbage collected and removed from the Trustee deployment. | ||
|
|
||
|  | ||
|
|
||
| ### Calculation of the reference values | ||
|
|
||
| The calculation of the reference values is covered in details in [reference-values](docs/design/reference-values.md). | ||
|
|
||
| The result of the operation is stored in the configmaps *image-pcrs*, and then mounted in the Trustee deployment as reference value repository. This configmap is unique and multiple approved images and reference values are included here. | ||
|
|
||
|
|
||
| ### Trustee deployment | ||
|
|
||
| The Trustee deployment is the core of the attestation and secret release. The deployment of Trustee is a single pod, it uses the builtin [attestation server configuration](https://github.com/confidential-containers/trustee/blob/main/kbs/docs/config.md#attestation-configuration) option. | ||
|
|
||
| Configured by the operator, it use the *confimaps*: | ||
| * attestation-policy, for controlling which values need to be evaluated for the attestation. | ||
| * trustee-datam which contains the reference values, the resource policy and the kbs configuration. | ||
|
|
||
| The secrets for the LUKS key and the trusted Attestation Key are added when the machine or the AK are registered. This causes the restart of the trustee pod since the deployment is patched with the new set of secrets. | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.