feat: add GCP Infrastructure Manager Terraform modules [8.19] #3882

amirbenun · 2026-01-22T08:03:45Z

Adds GCP Infrastructure Manager Terraform modules for deploying Elastic Agent on GCP, providing two deployment options: elastic-agent with a service account for VM-based deployments, and credentials-json for credential-based authentication. The implementation includes modular Terraform configurations with compute instance, service account, and startup validation components, along with comprehensive deploy scripts that handle environment setup, terraform initialization, and deployment automation with proper error handling.

### Summary of your changes Replaces deprecated GCP Deployment Manager with modern Infrastructure Manager (Terraform) for deploying Elastic Agent CSPM integration. Provides identical resources with improved tooling and user experience. #### New Directory: deploy/infrastructure-manager/gcp-elastic-agent/ Files Added: main.tf - Main infrastructure configuration (compute instance, network, service account, IAM bindings) variables.tf - Input variable definitions outputs.tf - Deployment outputs service_account.tf - Standalone service account deployment for agentless mode terraform.tfvars.example - Example configuration for main deployment service_account.tfvars.example - Example configuration for SA-only deployment README.md - Comprehensive deployment guide #### Resources Created Identical to Deployment Manager implementation: Compute instance (Ubuntu, n2-standard-4, 32GB disk) with Elastic Agent pre-installed Service account with roles/cloudasset.viewer and roles/browser VPC network with auto-created subnets IAM bindings (project or organization scope) Optional SSH firewall rule #### Compatibility The new deployment script `infrastructure-manager/deploy.sh` is compatible with kibana deployment command of the form: ```bash gcloud config set project elastic-security-test && \ FLEET_URL=https://a6f784d2fb4d48bea7724fbe41ef17d3.fleet.us-central1.gcp.qa.elastic.cloud:443 \ ENROLLMENT_TOKEN=<REDUCTED> \ STACK_VERSION=9.2.3 \ ./deploy.sh ``` ### Related Issues - Resolves: elastic#3132 (cherry picked from commit fdf76cc)

### Summary of your changes Adds a new method for deploying GCP service account credentials using GCP Infrastructure Manager (Terraform-based) as an alternative to the existing Deployment Manager approach in deploy/deployment-manager/. The key improvement is that service account keys are now stored securely in Secret Manager rather than being exposed in deployment outputs. The script creates a service account with cloudasset.viewer and browser roles, stores the JSON key in Secret Manager, and retrieves it locally to KEY_FILE.json for use in the Elastic Agent GCP integration. Supports both project-level and organization-level deployments via the ORG_ID environment variable. ### Screenshot/Data  ### Related Issues  ### Checklist - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added the necessary README/documentation (if appropriate) #### Introducing a new rule? - [ ] Generate rule metadata using [this script](https://github.com/elastic/cloudbeat/tree/main/security-policies/dev#generate-rules-metadata) - [ ] Add relevant unit tests - [ ] Generate relevant rule templates using [this script](https://github.com/elastic/cloudbeat/tree/main/security-policies/dev#generate-rule-templates), and open a PR in [elastic/packages/cloud_security_posture](https://github.com/elastic/integrations/tree/main/packages/cloud_security_posture) (cherry picked from commit e20e115)

…y script (elastic#3865) Refactors the error handling in the GCP Elastic Agent Infrastructure Manager deployment script to use a more idiomatic shell pattern. The change replaces the explicit exit code capture and conditional check with a direct if-not pattern for the gcloud command, making the script more readable and following shell scripting best practices. (cherry picked from commit 59c61ae)

Copilot

Pull request overview

This PR adds GCP Infrastructure Manager Terraform modules for deploying Elastic Agent on GCP with two deployment options: gcp-elastic-agent for VM-based deployments with service account authentication, and gcp-credentials-json for credential-based authentication. The implementation provides comprehensive automation for setting up Elastic Agent in GCP environments with proper IAM permissions and validation mechanisms.

Changes:

Added modular Terraform configurations for deploying Elastic Agent on GCP compute instances with automated installation and validation
Implemented credentials-based deployment option that creates service accounts with JSON keys stored in Secret Manager
Included comprehensive deployment scripts with prerequisite setup, error handling, and detailed troubleshooting guidance

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
deploy/infrastructure-manager/gcp-elastic-agent/main.tf	Root Terraform configuration orchestrating service account, compute instance, and startup validation modules
deploy/infrastructure-manager/gcp-elastic-agent/variables.tf	Variable definitions for deployment configuration including project, fleet settings, and validation options
deploy/infrastructure-manager/gcp-elastic-agent/outputs.tf	Output definitions exposing instance details and validation status
deploy/infrastructure-manager/gcp-elastic-agent/setup.sh	Prerequisites setup script for enabling APIs and configuring service account permissions
deploy/infrastructure-manager/gcp-elastic-agent/deploy.sh	Main deployment script handling Infrastructure Manager deployment with comprehensive error handling
deploy/infrastructure-manager/gcp-elastic-agent/deploy_service_account.sh	Convenience wrapper redirecting to credentials-json deployment
deploy/infrastructure-manager/gcp-elastic-agent/README.md	Comprehensive documentation for VM-based deployment with troubleshooting guides
deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/main.tf	Compute instance module with VPC network, startup script, and Elastic Agent installation logic
deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/variables.tf	Variable definitions for compute instance configuration
deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/outputs.tf	Compute instance outputs including instance details and network information
deploy/infrastructure-manager/gcp-elastic-agent/modules/service_account/main.tf	Service account creation and IAM role bindings for project or organization scope
deploy/infrastructure-manager/gcp-elastic-agent/modules/service_account/variables.tf	Service account module variable definitions
deploy/infrastructure-manager/gcp-elastic-agent/modules/service_account/outputs.tf	Service account email output
deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/main.tf	Startup validation module using terraform_data and local-exec provisioner
deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/variables.tf	Validation module variable definitions
deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/outputs.tf	Validation status output
deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/validate_startup.sh	Bash script polling guest attributes to validate Elastic Agent installation
deploy/infrastructure-manager/gcp-credentials-json/main.tf	Terraform configuration for creating service account with JSON key in Secret Manager
deploy/infrastructure-manager/gcp-credentials-json/variables.tf	Variable definitions with scope validation for credentials deployment
deploy/infrastructure-manager/gcp-credentials-json/outputs.tf	Outputs for service account email and secret name
deploy/infrastructure-manager/gcp-credentials-json/setup.sh	Prerequisites setup for credentials-based deployment
deploy/infrastructure-manager/gcp-credentials-json/deploy.sh	Deployment script that retrieves and saves service account key locally
deploy/infrastructure-manager/gcp-credentials-json/README.md	Documentation for credential-based deployment approach

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-22T08:12:10Z

deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/main.tf

+  # Determine install command based on version
+  install_command = startswith(var.elastic_agent_version, "9.") ? "sudo ./elastic-agent install --non-interactive --install-servers" : "sudo ./elastic-agent install --non-interactive"


The condition startswith(var.elastic_agent_version, "9.") checks for version 9.x to determine which install command to use. However, this approach is fragile - if version 10.x or higher is released with different requirements, this will incorrectly use the version 9 command. Consider using a more robust version comparison or documenting the assumption that versions 9+ use this command format.

Suggested change

# Determine install command based on version

install_command = startswith(var.elastic_agent_version, "9.") ? "sudo ./elastic-agent install --non-interactive --install-servers" : "sudo ./elastic-agent install --non-interactive"

# Extract major version number from elastic_agent_version (e.g., "9.1.0" -> 9)

major_version = tonumber(regex("^[0-9]+", var.elastic_agent_version))

# Determine install command based on major version.

# Assumption: Elastic Agent versions 9 and above use the "--install-servers" format.

install_command = major_version >= 9 ? "sudo ./elastic-agent install --non-interactive --install-servers" : "sudo ./elastic-agent install --non-interactive"

Copilot · 2026-01-22T08:12:10Z

deploy/infrastructure-manager/gcp-elastic-agent/README.md

+| `scope` | No | `projects` | `projects` or `organizations` |
+| `parent_id` | Yes | - | Project ID or Organization ID |
+| `startup_validation_enabled` | No | `true` | Enable validation of startup script completion |
+| `startup_timeout_seconds` | No | `600` | Maximum time to wait for startup (seconds) |


The default value for startup_timeout_seconds is documented as 600 seconds (10 minutes) in the README, but in variables.tf line 60, the actual default is 300 seconds (5 minutes). This inconsistency could confuse users about the actual timeout behavior.

Suggested change

| `startup_timeout_seconds` | No | `600` | Maximum time to wait for startup (seconds) |

| `startup_timeout_seconds` | No | `300` | Maximum time to wait for startup (seconds) |

Copilot · 2026-01-22T08:12:10Z

deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/outputs.tf

@@ -0,0 +1,4 @@
+output "validated" {
+  description = "Whether validation was performed and succeeded"


The validation logic in the output is incorrect. It returns var.enabled which indicates whether validation was enabled, not whether it actually succeeded. If validation is enabled but fails, this will still return true, which is misleading. The output should either track the actual validation result or be renamed to clarify it only indicates if validation was attempted.

Suggested change

description = "Whether validation was performed and succeeded"

description = "Whether validation was enabled (validation was attempted)"

Copilot · 2026-01-22T08:12:11Z

deploy/infrastructure-manager/gcp-elastic-agent/setup.sh

+for role in "${REQUIRED_ROLES[@]}"; do
+    gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
+        --member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
+        --role="${role}" --condition=None --quiet >/dev/null


The error handling redirects output to /dev/null which suppresses potentially useful error messages. If a role binding fails, users won't see the specific error from gcloud. Consider removing the redirect or using 2>&1 to capture errors in a log file for troubleshooting.

Suggested change

--role="${role}" --condition=None --quiet >/dev/null

--role="${role}" --condition=None --quiet

Copilot · 2026-01-22T08:12:11Z

deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/validate_startup.sh

@@ -0,0 +1,61 @@
+#!/bin/bash
+set -e


The validation script uses set +x which disables command echoing, but this may have been intended to be set -x (enable echoing) for debugging purposes. Given that line 6 references a sensitive token from environment, set +x is correct for security. However, consider adding a comment explaining why set +x is used to prevent accidental changes.

Suggested change

set -e

set -e

# Disable xtrace to avoid leaking sensitive tokens/headers in logs.

Copilot · 2026-01-22T08:12:13Z

deploy/infrastructure-manager/gcp-elastic-agent/modules/startup_validation/main.tf

+  count = var.enabled ? 1 : 0
+
+  provisioner "local-exec" {
+    command = "bash ${path.module}/validate_startup.sh '${var.project_id}' '${var.zone}' '${var.instance_name}' '${var.timeout}'"


The command interpolation syntax uses double quotes around the entire command with single quotes inside for the Terraform variables. This could cause issues with proper escaping. Consider using proper shell quoting or heredoc syntax for better security and maintainability when passing variables that might contain special characters.

Suggested change

command = "bash ${path.module}/validate_startup.sh '${var.project_id}' '${var.zone}' '${var.instance_name}' '${var.timeout}'"

command = <<-EOT

bash "${path.module}/validate_startup.sh" \

"${var.project_id}" \

"${var.zone}" \

"${var.instance_name}" \

"${var.timeout}"

EOT

Copilot · 2026-01-22T08:12:13Z

deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/outputs.tf

+}
+
+output "instance_self_link" {
+  description = "Self-link to the compute instance (check startup status in Cloud Console or Cloud Logging)"


The output description mentions checking startup status in Cloud Console or Cloud Logging, but the startup validation feature uses guest attributes, not just Cloud Logging. This description could be more accurate by mentioning guest attributes as the primary validation mechanism.

Suggested change

description = "Self-link to the compute instance (check startup status in Cloud Console or Cloud Logging)"

description = "Self-link to the compute instance (primarily used to check startup validation via guest attributes; you can also review startup status in Cloud Logging or Cloud Console)"

Copilot · 2026-01-22T08:12:14Z

deploy/infrastructure-manager/gcp-credentials-json/deploy.sh

+if ! gcloud secrets versions access latest --secret="${SECRET_NAME}" --project="${PROJECT_ID}" | base64 -d >"${KEY_FILE}"; then
+    echo -e "${RED}Error: Failed to retrieve key from Secret Manager.${RESET}"
+    exit 1
+fi
+


The script retrieves and decodes the service account key using base64 -d, but there's no validation that the decoded content is valid JSON before writing to the file. If the secret contains malformed data, this could result in an invalid KEY_FILE.json. Consider adding JSON validation after decoding to ensure the key is valid.

Suggested change

if ! gcloud secrets versions access latest --secret="${SECRET_NAME}" --project="${PROJECT_ID}" | base64 -d >"${KEY_FILE}"; then

echo -e "${RED}Error: Failed to retrieve key from Secret Manager.${RESET}"

exit 1

fi

TMP_KEY_FILE="$(mktemp)"

# Decode the secret into a temporary file first

if ! gcloud secrets versions access latest --secret="${SECRET_NAME}" --project="${PROJECT_ID}" | base64 -d >"${TMP_KEY_FILE}"; then

echo -e "${RED}Error: Failed to retrieve key from Secret Manager.${RESET}"

rm -f "${TMP_KEY_FILE}"

exit 1

fi

# Validate that the decoded content is valid JSON

if ! python3 -m json.tool "${TMP_KEY_FILE}" >/dev/null 2>&1; then

echo -e "${RED}Error: Retrieved key is not valid JSON. Please verify the secret contents.${RESET}"

rm -f "${TMP_KEY_FILE}"

exit 1

fi

# Move validated JSON to the final key file

mv "${TMP_KEY_FILE}" "${KEY_FILE}"

Copilot · 2026-01-22T08:12:14Z

deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/main.tf

+    enable-guest-attributes = "TRUE"
+    startup-script          = <<-EOT
+    #!/bin/bash
+    set -x  # Enable debug output


Using set -x in the instance startup script causes the shell to echo every command and its arguments, including the elastic-agent installation command that embeds the sensitive enrollment_token. Anyone with access to the instance serial console output or logs can recover this token and enroll arbitrary agents or impersonate this deployment. Remove shell tracing around commands that use secrets (or disable set -x before invoking the install command) so the enrollment token is never written to logs.

Suggested change

set -x # Enable debug output

# Debug tracing (set -x) intentionally disabled to avoid leaking sensitive data in logs

Copilot · 2026-01-22T08:12:14Z

deploy/infrastructure-manager/gcp-elastic-agent/modules/compute_instance/main.tf

+    ElasticAgentArtifact=elastic-agent-${var.elastic_agent_version}-linux-x86_64
+    ARTIFACT_URL="${var.elastic_artifact_server}/$ElasticAgentArtifact.tar.gz"
+
+    log "Downloading Elastic Agent from $ARTIFACT_URL"
+    if ! curl -f -L -O --connect-timeout 30 --max-time 300 "$ARTIFACT_URL"; then
+      report_failure "Failed to download Elastic Agent from $ARTIFACT_URL"
+    fi
+    log "Download successful"
+
+    # Verify download
+    if [ ! -f "$ElasticAgentArtifact.tar.gz" ]; then
+      report_failure "Downloaded file not found: $ElasticAgentArtifact.tar.gz"
+    fi
+
+    # Extract archive
+    log "Extracting $ElasticAgentArtifact.tar.gz"
+    if ! tar xzvf "$ElasticAgentArtifact.tar.gz"; then
+      report_failure "Failed to extract $ElasticAgentArtifact.tar.gz"
+    fi
+
+    # Verify extraction
+    if [ ! -d "$ElasticAgentArtifact" ]; then
+      report_failure "Extracted directory not found: $ElasticAgentArtifact"
+    fi
+
+    cd "$ElasticAgentArtifact"
+
+    # Install Elastic Agent
+    log "Installing Elastic Agent with command: ${local.install_command}"
+    if ! ${local.install_command} --url=${var.fleet_url} --enrollment-token=${var.enrollment_token}; then
+      report_failure "Elastic Agent installation command failed"
+    fi


The startup script downloads the elastic-agent tarball from var.elastic_artifact_server using curl and then extracts and executes it as root without any checksum or signature verification. If the artifact host, DNS, or a custom elastic_artifact_server value is compromised or misconfigured, an attacker could serve a malicious binary that will be trusted and run with full privileges on the VM. Add integrity verification for the downloaded artifact (for example, validating against a pinned checksum or vendor-provided signature) before extraction and installation.

amirbenun added 3 commits January 22, 2026 10:02

Copilot AI review requested due to automatic review settings January 22, 2026 08:03

amirbenun requested a review from a team as a code owner January 22, 2026 08:03

amirbenun changed the title ~~feat: add GCP Infrastructure Manager Terraform modules~~ feat: add GCP Infrastructure Manager Terraform modules [8.19] Jan 22, 2026

Copilot started reviewing on behalf of amirbenun January 22, 2026 08:04 View session

mergify bot assigned amirbenun Jan 22, 2026

amirbenun linked an issue Jan 22, 2026 that may be closed by this pull request

Replace GCP deployment-manager scripts with infrastructure-manager #3132

Open

2 tasks

amirbenun mentioned this pull request Jan 22, 2026

Replace GCP deployment-manager scripts with infrastructure-manager #3132

Open

2 tasks

Copilot AI reviewed Jan 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add GCP Infrastructure Manager Terraform modules [8.19] #3882

feat: add GCP Infrastructure Manager Terraform modules [8.19] #3882

amirbenun commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# Determine install command based on version
		install_command = startswith(var.elastic_agent_version, "9.") ? "sudo ./elastic-agent install --non-interactive --install-servers" : "sudo ./elastic-agent install --non-interactive"

-  # Determine install command based on version
-  install_command = startswith(var.elastic_agent_version, "9.") ? "sudo ./elastic-agent install --non-interactive --install-servers" : "sudo ./elastic-agent install --non-interactive"
+  # Extract major version number from elastic_agent_version (e.g., "9.1.0" -> 9)
+  major_version = tonumber(regex("^[0-9]+", var.elastic_agent_version))
+  # Determine install command based on major version.
+  # Assumption: Elastic Agent versions 9 and above use the "--install-servers" format.
+  install_command = major_version >= 9 ? "sudo ./elastic-agent install --non-interactive --install-servers" : "sudo ./elastic-agent install --non-interactive"

	\| `startup_timeout_seconds` \| No \| `600` \| Maximum time to wait for startup (seconds) \|
	\| `startup_timeout_seconds` \| No \| `300` \| Maximum time to wait for startup (seconds) \|

		@@ -0,0 +1,4 @@
		output "validated" {
		description = "Whether validation was performed and succeeded"

	description = "Whether validation was performed and succeeded"
	description = "Whether validation was enabled (validation was attempted)"

	--role="${role}" --condition=None --quiet >/dev/null
	--role="${role}" --condition=None --quiet

	set -e
	set -e
	# Disable xtrace to avoid leaking sensitive tokens/headers in logs.

-    command = "bash ${path.module}/validate_startup.sh '${var.project_id}' '${var.zone}' '${var.instance_name}' '${var.timeout}'"
+    command = <<-EOT
+      bash "${path.module}/validate_startup.sh" \
+        "${var.project_id}" \
+        "${var.zone}" \
+        "${var.instance_name}" \
+        "${var.timeout}"
+    EOT

	description = "Self-link to the compute instance (check startup status in Cloud Console or Cloud Logging)"
	description = "Self-link to the compute instance (primarily used to check startup validation via guest attributes; you can also review startup status in Cloud Logging or Cloud Console)"

-if ! gcloud secrets versions access latest --secret="${SECRET_NAME}" --project="${PROJECT_ID}" | base64 -d >"${KEY_FILE}"; then
-    echo -e "${RED}Error: Failed to retrieve key from Secret Manager.${RESET}"
-    exit 1
-fi
+TMP_KEY_FILE="$(mktemp)"
+# Decode the secret into a temporary file first
+if ! gcloud secrets versions access latest --secret="${SECRET_NAME}" --project="${PROJECT_ID}" | base64 -d >"${TMP_KEY_FILE}"; then
+    echo -e "${RED}Error: Failed to retrieve key from Secret Manager.${RESET}"
+    rm -f "${TMP_KEY_FILE}"
+    exit 1
+fi
+# Validate that the decoded content is valid JSON
+if ! python3 -m json.tool "${TMP_KEY_FILE}" >/dev/null 2>&1; then
+    echo -e "${RED}Error: Retrieved key is not valid JSON. Please verify the secret contents.${RESET}"
+    rm -f "${TMP_KEY_FILE}"
+    exit 1
+fi
+# Move validated JSON to the final key file
+mv "${TMP_KEY_FILE}" "${KEY_FILE}"

	set -x # Enable debug output
	# Debug tracing (set -x) intentionally disabled to avoid leaking sensitive data in logs

feat: add GCP Infrastructure Manager Terraform modules [8.19] #3882

Are you sure you want to change the base?

feat: add GCP Infrastructure Manager Terraform modules [8.19] #3882

Conversation

amirbenun commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant