-
Notifications
You must be signed in to change notification settings - Fork 44
feat: add GCP Infrastructure Manager Terraform modules [9.2] #3881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 9.2
Are you sure you want to change the base?
Conversation
### Summary of your changes Replaces deprecated GCP Deployment Manager with modern Infrastructure Manager (Terraform) for deploying Elastic Agent CSPM integration. Provides identical resources with improved tooling and user experience. #### New Directory: deploy/infrastructure-manager/gcp-elastic-agent/ Files Added: main.tf - Main infrastructure configuration (compute instance, network, service account, IAM bindings) variables.tf - Input variable definitions outputs.tf - Deployment outputs service_account.tf - Standalone service account deployment for agentless mode terraform.tfvars.example - Example configuration for main deployment service_account.tfvars.example - Example configuration for SA-only deployment README.md - Comprehensive deployment guide #### Resources Created Identical to Deployment Manager implementation: Compute instance (Ubuntu, n2-standard-4, 32GB disk) with Elastic Agent pre-installed Service account with roles/cloudasset.viewer and roles/browser VPC network with auto-created subnets IAM bindings (project or organization scope) Optional SSH firewall rule #### Compatibility The new deployment script `infrastructure-manager/deploy.sh` is compatible with kibana deployment command of the form: ```bash gcloud config set project elastic-security-test && \ FLEET_URL=https://a6f784d2fb4d48bea7724fbe41ef17d3.fleet.us-central1.gcp.qa.elastic.cloud:443 \ ENROLLMENT_TOKEN=<REDUCTED> \ STACK_VERSION=9.2.3 \ ./deploy.sh ``` ### Related Issues - Resolves: elastic#3132 (cherry picked from commit fdf76cc)
### Summary of your changes Adds a new method for deploying GCP service account credentials using GCP Infrastructure Manager (Terraform-based) as an alternative to the existing Deployment Manager approach in deploy/deployment-manager/. The key improvement is that service account keys are now stored securely in Secret Manager rather than being exposed in deployment outputs. The script creates a service account with cloudasset.viewer and browser roles, stores the JSON key in Secret Manager, and retrieves it locally to KEY_FILE.json for use in the Elastic Agent GCP integration. Supports both project-level and organization-level deployments via the ORG_ID environment variable. ### Screenshot/Data <!-- If this PR adds a new feature, please add an example screenshot or data (findings json for example). --> ### Related Issues <!-- - Related: https://github.com/elastic/security-team/issues/ - Fixes: https://github.com/elastic/security-team/issues/ --> ### Checklist - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added the necessary README/documentation (if appropriate) #### Introducing a new rule? - [ ] Generate rule metadata using [this script](https://github.com/elastic/cloudbeat/tree/main/security-policies/dev#generate-rules-metadata) - [ ] Add relevant unit tests - [ ] Generate relevant rule templates using [this script](https://github.com/elastic/cloudbeat/tree/main/security-policies/dev#generate-rule-templates), and open a PR in [elastic/packages/cloud_security_posture](https://github.com/elastic/integrations/tree/main/packages/cloud_security_posture) (cherry picked from commit e20e115)
…y script (elastic#3865) Refactors the error handling in the GCP Elastic Agent Infrastructure Manager deployment script to use a more idiomatic shell pattern. The change replaces the explicit exit code capture and conditional check with a direct if-not pattern for the gcloud command, making the script more readable and following shell scripting best practices. (cherry picked from commit 59c61ae)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds GCP Infrastructure Manager Terraform modules for deploying Elastic Agent on Google Cloud Platform. It provides two deployment approaches: a VM-based deployment with integrated service account authentication (gcp-elastic-agent), and a standalone service account credential generator (gcp-credentials-json). The implementation includes comprehensive automation scripts, modular Terraform configurations, and startup validation capabilities.
Changes:
- Added complete Terraform module for VM-based Elastic Agent deployment with service account, compute instance, and startup validation submodules
- Implemented credentials-json module for generating service account keys stored in Secret Manager
- Included automated deployment scripts with prerequisite setup, error handling, and user guidance
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| deploy/infrastructure-manager/gcp-elastic-agent/main.tf | Root module orchestrating service account, compute instance, and startup validation |
| deploy/infrastructure-manager/gcp-elastic-agent/variables.tf | Input variable definitions including fleet URL, tokens, and configuration options |
| deploy/infrastructure-manager/gcp-elastic-agent/outputs.tf | Module outputs exposing instance details and validation status |
| deploy/infrastructure-manager/gcp-elastic-agent/deploy.sh | Deployment automation script with GCP Infrastructure Manager integration |
| deploy/infrastructure-manager/gcp-elastic-agent/deploy_service_account.sh | Wrapper script delegating to credentials-json deployment |
| deploy/infrastructure-manager/gcp-elastic-agent/setup.sh | Prerequisites setup enabling APIs and configuring service accounts |
| deploy/infrastructure-manager/gcp-elastic-agent/README.md | Comprehensive documentation for deployment options and troubleshooting |
| deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/main.tf | Compute instance with VPC network and Elastic Agent installation startup script |
| deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/variables.tf | Compute instance module inputs for configuration |
| deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/outputs.tf | Instance metadata outputs |
| deploy/infrastructure-manager/gcp-elastic-agent/modules/service_account/main.tf | Service account creation with project/org-level IAM bindings |
| deploy/infrastructure-manager/gcp-elastic-agent/modules/service_account/variables.tf | Service account module configuration inputs |
| deploy/infrastructure-manager/gcp-elastic-agent/modules/service_account/outputs.tf | Service account email output |
| deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/main.tf | Local-exec provisioner validating startup completion via guest attributes |
| deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/variables.tf | Validation configuration inputs |
| deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/outputs.tf | Validation status output |
| deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/validate_startup.sh | Bash script polling guest attributes to verify installation success |
| deploy/infrastructure-manager/gcp-credentials-json/main.tf | Service account key generation with Secret Manager storage |
| deploy/infrastructure-manager/gcp-credentials-json/variables.tf | Credentials module input variables with scope validation |
| deploy/infrastructure-manager/gcp-credentials-json/outputs.tf | Service account and secret outputs |
| deploy/infrastructure-manager/gcp-credentials-json/deploy.sh | Deployment script retrieving and saving credentials locally |
| deploy/infrastructure-manager/gcp-credentials-json/setup.sh | Prerequisites configuration for credentials deployment |
| deploy/infrastructure-manager/gcp-credentials-json/README.md | Documentation for credentials-based authentication setup |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| INSTANCE_NAME="$3" | ||
| TIMEOUT="$4" | ||
|
|
||
| MAX_ATTEMPTS=$((TIMEOUT / 10)) |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation script computes MAX_ATTEMPTS by dividing TIMEOUT by 10, but there's an edge case where if TIMEOUT is less than 10 seconds, MAX_ATTEMPTS would be 0, causing the while loop to never execute. Consider adding a minimum value check or ensuring at least 1 attempt is made.
| MAX_ATTEMPTS=$((TIMEOUT / 10)) | |
| MAX_ATTEMPTS=$((TIMEOUT / 10)) | |
| if [ "$MAX_ATTEMPTS" -lt 1 ]; then | |
| MAX_ATTEMPTS=1 | |
| fi |
| response=$(curl -s -H "Authorization: Bearer $TOKEN" \ | ||
| "https://compute.googleapis.com/compute/v1/projects/${PROJECT_ID}/zones/${ZONE}/instances/${INSTANCE_NAME}/getGuestAttributes?queryPath=elastic-agent/$key" \ | ||
| 2>/dev/null || echo '{}') | ||
| echo "$response" | sed -n 's/.*"value":[[:space:]]*"\([^"]*\)".*/\1/p' | head -1 |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sed command to extract the JSON value from the API response is fragile and could fail if the JSON format changes or if there are multiple 'value' fields in the response. Consider using a proper JSON parser like 'jq' which is commonly available in Cloud Shell and would make the parsing more reliable and maintainable.
| echo "$response" | sed -n 's/.*"value":[[:space:]]*"\([^"]*\)".*/\1/p' | head -1 | |
| if command -v jq >/dev/null 2>&1; then | |
| echo "$response" | jq -r '.queryValue.value // empty' | |
| else | |
| echo "$response" | sed -n 's/.*"value":[[:space:]]*"\([^"]*\)".*/\1/p' | head -1 | |
| fi |
| ARTIFACT_URL="${var.elastic_artifact_server}/$ElasticAgentArtifact.tar.gz" | ||
| log "Downloading Elastic Agent from $ARTIFACT_URL" | ||
| if ! curl -f -L -O --connect-timeout 30 --max-time 300 "$ARTIFACT_URL"; then |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The download command uses --connect-timeout 30 and --max-time 300, but if the download is slow (e.g., on a poor network connection), the max-time could be exceeded even though the download is progressing. This could cause false failures. Consider either increasing the timeout for larger artifacts or implementing a more sophisticated progress-based timeout that allows for slow but steady downloads.
| if ! curl -f -L -O --connect-timeout 30 --max-time 300 "$ARTIFACT_URL"; then | |
| # Use a connection timeout but no overall max-time to avoid failing on slow but progressing downloads | |
| if ! curl -f -L -O --connect-timeout 30 "$ARTIFACT_URL"; then |
| # Store the service account key in Secret Manager | ||
| resource "google_secret_manager_secret_version" "sa_key" { | ||
| secret = google_secret_manager_secret.sa_key.id | ||
| secret_data = google_service_account_key.elastic_agent_key.private_key | ||
| } |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script creates a service account key and stores it in Secret Manager, but there's no validation that the key was successfully created before attempting to store it. If the key creation fails silently, the Secret Manager resource will be created with invalid data. Consider adding explicit validation or error handling between key creation and storage steps.
| INPUT_VALUES="${INPUT_VALUES},fleet_url=${FLEET_URL}" | ||
| INPUT_VALUES="${INPUT_VALUES},enrollment_token=${ENROLLMENT_TOKEN}" | ||
| INPUT_VALUES="${INPUT_VALUES},elastic_agent_version=${STACK_VERSION}" |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script depends on environment variables like FLEET_URL, ENROLLMENT_TOKEN, and STACK_VERSION being set (lines 40-42), but there's no validation to ensure these required variables are non-empty before attempting deployment. If any of these are unset or empty, the deployment will fail with unclear error messages from gcloud. Consider adding validation checks at the beginning of the script to provide clear error messages for missing required variables.
| # Token from environment variable (sensitive) | ||
| TOKEN="$GCP_ACCESS_TOKEN" | ||
|
|
||
| # Arguments passed from Terraform | ||
| PROJECT_ID="$1" |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ACCESS_TOKEN is passed as an environment variable but used directly in the curl command with 'set +x' disabled. However, the token could still appear in error messages or in the process table. Consider using a more secure method such as reading from a file descriptor or using a credentials helper. Additionally, the comment says "(sensitive)" but the token will be visible in the terraform state and could be logged if the script fails.
| local response | ||
| response=$(curl -s -H "Authorization: Bearer $TOKEN" \ | ||
| "https://compute.googleapis.com/compute/v1/projects/${PROJECT_ID}/zones/${ZONE}/instances/${INSTANCE_NAME}/getGuestAttributes?queryPath=elastic-agent/$key" \ | ||
| 2>/dev/null || echo '{}') | ||
| echo "$response" | sed -n 's/.*"value":[[:space:]]*"\([^"]*\)".*/\1/p' | head -1 |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling doesn't distinguish between different failure modes when calling 'curl'. A network error, authentication failure, or API error would all result in the same '{}' response. Consider checking the HTTP status code or curl exit code to provide more specific error messages and potentially retry on transient failures.
| local response | |
| response=$(curl -s -H "Authorization: Bearer $TOKEN" \ | |
| "https://compute.googleapis.com/compute/v1/projects/${PROJECT_ID}/zones/${ZONE}/instances/${INSTANCE_NAME}/getGuestAttributes?queryPath=elastic-agent/$key" \ | |
| 2>/dev/null || echo '{}') | |
| echo "$response" | sed -n 's/.*"value":[[:space:]]*"\([^"]*\)".*/\1/p' | head -1 | |
| local response http_code body | |
| # Call the GCE guest attributes API and capture both body and HTTP status code. | |
| # Using -w '%{http_code}' appends the status code to the end of the output. | |
| if ! response=$(curl -s -H "Authorization: Bearer $TOKEN" \ | |
| -w '%{http_code}' \ | |
| "https://compute.googleapis.com/compute/v1/projects/${PROJECT_ID}/zones/${ZONE}/instances/${INSTANCE_NAME}/getGuestAttributes?queryPath=elastic-agent/$key" \ | |
| 2>/dev/null); then | |
| echo "Warning: Failed to call GCE guest attributes API for key '$key' (curl error)." >&2 | |
| # Return an empty string so callers treat this as an unknown value. | |
| echo "" | |
| return | |
| fi | |
| # Separate HTTP status code from the response body. | |
| http_code="${response: -3}" | |
| body="${response::-3}" | |
| case "$http_code" in | |
| 200) | |
| # Successful response; extract the guest attribute value. | |
| echo "$body" | sed -n 's/.*"value":[[:space:]]*"\([^"]*\)".*/\1/p' | head -1 | |
| ;; | |
| 401|403) | |
| echo "Warning: Authentication/authorization failure when fetching guest attribute '$key' (HTTP $http_code)." >&2 | |
| echo "" | |
| ;; | |
| 5??) | |
| echo "Warning: Server error from GCE guest attributes API when fetching '$key' (HTTP $http_code)." >&2 | |
| echo "" | |
| ;; | |
| *) | |
| echo "Warning: Unexpected HTTP status $http_code from GCE guest attributes API when fetching '$key'." >&2 | |
| echo "" | |
| ;; | |
| esac |
| # Remove trailing slash if present | ||
| ELASTIC_ARTIFACT_SERVER="${ELASTIC_ARTIFACT_SERVER%/}" |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ELASTIC_ARTIFACT_SERVER is stripped of trailing slashes if present, but if the user provides a URL with multiple trailing slashes (e.g., "https://example.com///"), only one will be removed. Consider using a more robust approach like parameter expansion pattern matching (${ELASTIC_ARTIFACT_SERVER%%/}) to remove all trailing slashes, or using bash string manipulation to handle this edge case properly.
| # Remove trailing slash if present | |
| ELASTIC_ARTIFACT_SERVER="${ELASTIC_ARTIFACT_SERVER%/}" | |
| # Remove all trailing slashes if present | |
| ELASTIC_ARTIFACT_SERVER="${ELASTIC_ARTIFACT_SERVER%%/}" |
| cd "$ElasticAgentArtifact" | ||
| # Install Elastic Agent | ||
| log "Installing Elastic Agent with command: ${local.install_command}" |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The startup script enables shell xtrace with set -x and then executes the Elastic Agent installation command including the sensitive --enrollment-token=${var.enrollment_token} argument. With xtrace enabled, the full command line (including the enrollment token) will be written to the instance’s console output and system logs, which are collected by GCE/Cloud Logging and accessible to anyone with log viewing permissions, effectively leaking the enrollment token. Remove or disable set -x around commands that include var.enrollment_token, or otherwise ensure that the enrollment token is never printed to stdout/stderr or system logs during startup.
| log "Installing Elastic Agent with command: ${local.install_command}" | |
| log "Installing Elastic Agent with command: ${local.install_command}" | |
| # Disable xtrace to avoid leaking sensitive enrollment token in logs | |
| set +x |
| # Download Elastic Agent | ||
| ElasticAgentArtifact=elastic-agent-${var.elastic_agent_version}-linux-x86_64 | ||
| ARTIFACT_URL="${var.elastic_artifact_server}/$ElasticAgentArtifact.tar.gz" | ||
| log "Downloading Elastic Agent from $ARTIFACT_URL" | ||
| if ! curl -f -L -O --connect-timeout 30 --max-time 300 "$ARTIFACT_URL"; then | ||
| report_failure "Failed to download Elastic Agent from $ARTIFACT_URL" | ||
| fi | ||
| log "Download successful" | ||
| # Verify download | ||
| if [ ! -f "$ElasticAgentArtifact.tar.gz" ]; then | ||
| report_failure "Downloaded file not found: $ElasticAgentArtifact.tar.gz" | ||
| fi | ||
| # Extract archive | ||
| log "Extracting $ElasticAgentArtifact.tar.gz" | ||
| if ! tar xzvf "$ElasticAgentArtifact.tar.gz"; then | ||
| report_failure "Failed to extract $ElasticAgentArtifact.tar.gz" | ||
| fi | ||
| # Verify extraction | ||
| if [ ! -d "$ElasticAgentArtifact" ]; then | ||
| report_failure "Extracted directory not found: $ElasticAgentArtifact" | ||
| fi | ||
| cd "$ElasticAgentArtifact" | ||
| # Install Elastic Agent | ||
| log "Installing Elastic Agent with command: ${local.install_command}" | ||
| if ! ${local.install_command} --url=${var.fleet_url} --enrollment-token=${var.enrollment_token}; then |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The startup script downloads and executes the Elastic Agent binary from var.elastic_artifact_server using curl and tar without any integrity verification (no checksum or signature check) before running ./elastic-agent install with root privileges. If the artifact server, DNS, or network path is compromised—or if elastic_artifact_server is pointed at a malicious host—an attacker can supply a tampered agent archive and gain code execution on the VM via the startup script. Add a cryptographic integrity check (e.g., vendor-published checksum or signature verification) for the downloaded tarball before extracting and executing it, and consider restricting elastic_artifact_server to trusted HTTPS endpoints only.
Adds GCP Infrastructure Manager Terraform modules for deploying Elastic Agent on GCP, providing two deployment options: elastic-agent with a service account for VM-based deployments, and credentials-json for credential-based authentication. The implementation includes modular Terraform configurations with compute instance, service account, and startup validation components, along with comprehensive deploy scripts that handle environment setup, terraform initialization, and deployment automation with proper error handling.