Skip to content

Conversation

@robindiddams
Copy link
Member

@robindiddams robindiddams commented Jan 16, 2026

Summary

Migrates the Gravity client from the legacy provision-based authentication flow to a new session-based authentication using self-signed ECDSA client certificates.

Changes

Authentication Flow

  • Before: Client called Provision() with JWT token to get server-issued cert/key/CA, then connected with those credentials + Bearer token metadata
  • After: Client generates a self-signed certificate from a pre-shared ECDSA private key (with CN=instanceID, O=orgID) and authenticates via mTLS. No Bearer token metadata needed for Gravity streams.

Config Changes (GravityConfig)

  • Removed: Cert, Key, AuthToken
  • Added: ECDSAPrivateKey, OrgID, Region, CloudProvider
  • Kept: CACert (for server certificate verification), InstanceID

Protocol Changes

  • Replaced GravityTunnel/GravityControl services with GravitySessionService
  • Replaced ControlMessage with SessionMessage
  • Replaced ConnectRequest/ConnectResponse with SessionHello/SessionHelloResponse

New Session Response Fields

The server now returns additional machine configuration via SessionHelloResponse:

  • machine_id - Server-assigned machine identifier
  • machine_ipv6 - IPv6 address for this machine
  • machine_subnet - IPv6 subnet for containers
  • machine_token - JWT for authenticating with catalyst/API

These are stored on the client and MachineToken/MachineID are passed to provider.Configuration.

Provider Configuration

Added to provider.Configuration:

  • MachineToken string - JWT for catalyst authentication
  • MachineID string - Server-assigned machine ID

Breaking Changes

  • GravityConfig no longer accepts Cert/Key/AuthToken
  • Must provide ECDSAPrivateKey, OrgID, InstanceID
  • Callers need to update to the new config structure

Testing

All existing tests pass. The self-signed certificate is generated at startup with 24-hour validity.

Summary by CodeRabbit

Release Notes

  • Refactor
    • Migrated communication protocol to a session-centric model with improved authentication and lifecycle management
    • Updated TLS certificate generation to use self-signed, instance-bound certificates
    • Restructured configuration schema with new fields for machine identity, regional information, and cloud provider metadata

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 16, 2026

📝 Walkthrough

Walkthrough

The gRPC client architecture is refactored from a tunnel-centric model with separate GravityControl and GravityTunnel services to a unified session-centric model using GravitySessionService. Authentication shifts from connect/init to session-based Hello exchange, introducing ECDSA-based self-signed certificates and instance/region metadata.

Changes

Cohort / File(s) Summary
Proto Definition and Generation
Makefile, gravity/proto/gravity_session.proto
New gravity_session.proto file introduces GravitySessionService with bidirectional streaming RPCs (EstablishSession, StreamSessionPackets) and metadata endpoints. Defines session lifecycle messages (SessionHello, SessionHelloResponse, SessionCloseRequest) and unified SessionMessage wrapper. Makefile updated to include gravity_session.proto in protoc generation inputs.
Session-Based Client Refactoring
gravity/grpc_client.go
Major refactoring replacing GravityTunnel/GravityControl clients with GravitySessionService clients. Authentication flow transitions from ConnectMessage/ControlMessage to SessionHello/SessionMessage. Added fields: ecdsaPrivateKey, instanceID, region, cloudProvider. TLS setup changed to self-signed certificate generation with instance-bound configuration. Stream handling and message processing updated throughout (229 lines added, 210 removed).
Test Updates
gravity/grpc_interface_test.go
Three test methods updated: TestUnprovisionMethodWithoutStreams, TestPauseMethodWithoutStreams, TestResumeMethodWithoutStreams. Control stream type changed from pb.GravityTunnel_EstablishTunnelClient to pb.GravitySessionService_EstablishSessionClient.
Configuration Schema Updates
gravity/types.go, gravity/provider/provider.go
GravityConfig: Removed Cert, Key, IP6Address, AuthToken fields; added ECDSAPrivateKey *ecdsa.PrivateKey, Region, CloudProvider. Configuration: Added MachineToken, MachineID, MachineCertBundle, IP6Address fields for session-based authentication and machine identity.
🚥 Pre-merge checks | ✅ 1
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

tlsConfig.RootCAs = g.tlsConfig.RootCAs.Clone()
} else {
// No CA cert provided - skip server verification (dev mode)
tlsConfig.InsecureSkipVerify = true

Check failure

Code scanning / CodeQL

Disabled TLS certificate check High

InsecureSkipVerify should not be used in production code.

Copilot Autofix

AI 1 day ago

In general, the fix is to never disable server certificate verification in production code. Instead of setting InsecureSkipVerify = true when no RootCAs are configured, the client should either (a) load an appropriate system or custom CA pool and verify the server certificate, or (b) treat absence of CA configuration as a hard error and refuse to connect.

The best fix here, without changing existing functionality too much or adding project‑specific assumptions, is:

  • Remove the assignment tlsConfig.InsecureSkipVerify = true.
  • When g.tlsConfig.RootCAs is nil, attempt to load the host’s system certificate pool via x509.SystemCertPool() and use that as tlsConfig.RootCAs.
  • If loading the system pool fails or returns nil, log an error and return a failure instead of silently disabling verification.

This keeps the intent: “if no CA cert provided, still try to make TLS work,” but in a secure way by relying on trusted CAs instead of skipping verification. It also ensures that if no CAs are available at all, the function fails rather than creating an insecure connection. All needed imports (crypto/x509) are already present at the top of gravity/grpc_client.go, so no new dependencies are required. The only code changes are within the shown TLS configuration block around lines 363–368.

Suggested changeset 1
gravity/grpc_client.go

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/gravity/grpc_client.go b/gravity/grpc_client.go
--- a/gravity/grpc_client.go
+++ b/gravity/grpc_client.go
@@ -363,8 +363,13 @@
 	if g.tlsConfig.RootCAs != nil {
 		tlsConfig.RootCAs = g.tlsConfig.RootCAs.Clone()
 	} else {
-		// No CA cert provided - skip server verification (dev mode)
-		tlsConfig.InsecureSkipVerify = true
+		// No explicit CA cert provided - fall back to system certificate pool
+		systemPool, err := x509.SystemCertPool()
+		if err != nil || systemPool == nil {
+			g.logger.Error("failed to load system certificate pool: %v", err)
+			return fmt.Errorf("no CA certificates configured and unable to load system pool: %w", err)
+		}
+		tlsConfig.RootCAs = systemPool
 	}
 	creds := credentials.NewTLS(tlsConfig)
 	g.logger.Debug("TLS credentials created successfully")
EOF
@@ -363,8 +363,13 @@
if g.tlsConfig.RootCAs != nil {
tlsConfig.RootCAs = g.tlsConfig.RootCAs.Clone()
} else {
// No CA cert provided - skip server verification (dev mode)
tlsConfig.InsecureSkipVerify = true
// No explicit CA cert provided - fall back to system certificate pool
systemPool, err := x509.SystemCertPool()
if err != nil || systemPool == nil {
g.logger.Error("failed to load system certificate pool: %v", err)
return fmt.Errorf("no CA certificates configured and unable to load system pool: %w", err)
}
tlsConfig.RootCAs = systemPool
}
creds := credentials.NewTLS(tlsConfig)
g.logger.Debug("TLS credentials created successfully")
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@gravity/grpc_client.go`:
- Around line 226-236: The New function validates ECDSAPrivateKey and InstanceID
but never checks or uses OrgID, and createSelfSignedTLSConfig currently only
sets CommonName; update New (accepting GravityConfig) to validate that OrgID is
non-empty and return an error if missing, and modify createSelfSignedTLSConfig
to include OrgID as the certificate's Organization (O) field when building the
self‑signed cert (so mTLS peer verification that expects O=OrgID will succeed);
ensure the call site passes config.OrgID into createSelfSignedTLSConfig
alongside InstanceID, ECDSAPrivateKey, and CACert.
- Around line 362-368: The code assumes g.tlsConfig.RootCAs being nil when no CA
is provided, but a non-nil empty CertPool is created earlier so the dev-mode
branch never runs; update the logic where tlsConfig.RootCAs is set (the block
that currently does if g.tlsConfig.RootCAs != nil { tlsConfig.RootCAs =
g.tlsConfig.RootCAs.Clone() } else { tlsConfig.InsecureSkipVerify = true }) to
detect an empty pool instead of only nil (e.g., check
len(g.tlsConfig.RootCAs.Subjects()) == 0 or equivalent) and if empty set
tlsConfig.InsecureSkipVerify = true, otherwise clone the pool; apply the same
change to the other identical spot that sets RootCAs (the second occurrence
referenced in the review).
- Around line 760-807: In handleSessionHelloResponse(), the server-assigned IPv6
is stored in g.machineIPv6 but g.ip6Address remains unset so GetIPv6Address()
returns stale data; under the same mutex-protected section where g.machineIPv6
is assigned (before g.mu.Unlock()), also set g.ip6Address = response.MachineIpv6
so the public getter (GetIPv6Address) returns the server-assigned IPv6.
🧹 Nitpick comments (2)
gravity/provider/provider.go (1)

52-59: Treat machine auth fields as sensitive to avoid accidental logging.
Consider explicitly marking MachineToken and MachineCertBundle as sensitive to discourage logging or exposing them in error messages.

🔒 Optional doc tweak
-	// MachineToken is the JWT token for machine authentication with catalyst
+	// MachineToken is the JWT token for machine authentication with catalyst (sensitive; do not log)
 	MachineToken string
-	// MachineCertBundle is the server-assigned machine certificate bundle (cert, ca, key)
+	// MachineCertBundle is the server-assigned machine certificate bundle (cert, ca, key) (sensitive; do not log)
 	MachineCertBundle string
gravity/types.go (1)

33-37: Consider documenting/validating required config fields.
If ECDSAPrivateKey, InstanceID, Region, or CloudProvider are required, add a validation step in the client constructor (or mark these fields as required) to fail fast on nil/empty values.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 93e2cef and 8e65b9d.

⛔ Files ignored due to path filters (2)
  • gravity/proto/gravity_session.pb.go is excluded by !**/*.pb.go
  • gravity/proto/gravity_session_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (6)
  • Makefile
  • gravity/grpc_client.go
  • gravity/grpc_interface_test.go
  • gravity/proto/gravity_session.proto
  • gravity/provider/provider.go
  • gravity/types.go
🧰 Additional context used
🧬 Code graph analysis (3)
gravity/grpc_interface_test.go (1)
gravity/proto/gravity_session_grpc.pb.go (1)
  • GravitySessionService_EstablishSessionClient (71-71)
gravity/types.go (2)
network/ipv6.go (1)
  • Region (41-41)
network/dns.go (1)
  • CloudProvider (17-17)
gravity/grpc_client.go (2)
gravity/proto/gravity_session_grpc.pb.go (3)
  • GravitySessionServiceClient (38-50)
  • GravitySessionService_StreamSessionPacketsClient (84-84)
  • GravitySessionService_EstablishSessionClient (71-71)
gravity/provider/provider.go (2)
  • Configuration (23-60)
  • Server (11-20)
🪛 Buf (1.63.0)
gravity/proto/gravity_session.proto

5-5: import "gravity.proto": file does not exist

(COMPILE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: Analyze (go)
🔇 Additional comments (9)
Makefile (1)

55-55: Protoc generation includes the new session proto — looks good.

gravity/grpc_interface_test.go (3)

39-39: LGTM — test now targets the session-based control stream type.


61-61: LGTM — consistent with the new session client interface.


83-83: LGTM — aligns with GravitySessionService stream usage.

gravity/grpc_client.go (5)

581-637: SessionHello wiring looks solid.

Line 581-637 cleanly packages instance/provider/region plus existing deployments into the session hello message; this aligns well with the new session handshake.


721-752: Session message dispatch looks consistent.

Line 721-752 cleanly routes the new session message variants (hello, routing, pause/resume, close). No issues spotted.


1534-1552: Async session message sending w/ circuit breaker is clean.

Line 1534-1552 preserves the retry + circuit breaker guardrails with the new session messages. Looks good.


2342-2374: Metadata RPCs correctly moved to session client.

Line 2342-2374 updates metadata RPCs to use the session client; the migration looks consistent.


387-388: No action needed. The module's go.mod declares Go 1.25.5, which is well above the 1.22 minimum required for range over integers. The code is fully compatible.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 226 to 236
// New creates a new gRPC-based Gravity server client
func New(config GravityConfig) (*GravityClient, error) {
tlsConfig, err := createTLSConfig(config.Cert, config.Key, config.CACert)
if config.ECDSAPrivateKey == nil {
return nil, fmt.Errorf("ECDSAPrivateKey is required")
}
if config.InstanceID == "" {
return nil, fmt.Errorf("InstanceID is required")
}

tlsConfig, err := createSelfSignedTLSConfig(config.ECDSAPrivateKey, config.InstanceID, config.CACert)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Include OrgID in the self‑signed cert (and validate config).

Line 226-236 validates ECDSAPrivateKey/InstanceID, but OrgID is never validated or used; Line 2142-2149 sets only CommonName. If the server expects O=OrgID for mTLS verification (per the new auth flow), handshake will fail.

✅ Suggested fix
- tlsConfig, err := createSelfSignedTLSConfig(config.ECDSAPrivateKey, config.InstanceID, config.CACert)
+ if config.OrgID == "" {
+     return nil, fmt.Errorf("OrgID is required")
+ }
+ tlsConfig, err := createSelfSignedTLSConfig(config.ECDSAPrivateKey, config.InstanceID, config.OrgID, config.CACert)
-func createSelfSignedTLSConfig(privateKey *ecdsa.PrivateKey, instanceID, caCertPEM string) (*tls.Config, error) {
+func createSelfSignedTLSConfig(privateKey *ecdsa.PrivateKey, instanceID, orgID, caCertPEM string) (*tls.Config, error) {
     template := &x509.Certificate{
         SerialNumber: big.NewInt(time.Now().UnixNano()),
         Subject: pkix.Name{
             CommonName: instanceID,
+            Organization: []string{orgID},
         },
         NotBefore:             time.Now(),

Also applies to: 2142-2149

🤖 Prompt for AI Agents
In `@gravity/grpc_client.go` around lines 226 - 236, The New function validates
ECDSAPrivateKey and InstanceID but never checks or uses OrgID, and
createSelfSignedTLSConfig currently only sets CommonName; update New (accepting
GravityConfig) to validate that OrgID is non-empty and return an error if
missing, and modify createSelfSignedTLSConfig to include OrgID as the
certificate's Organization (O) field when building the self‑signed cert (so mTLS
peer verification that expects O=OrgID will succeed); ensure the call site
passes config.OrgID into createSelfSignedTLSConfig alongside InstanceID,
ECDSAPrivateKey, and CACert.

Comment on lines +362 to 368
// Only set RootCAs if we have them (for server cert verification)
if g.tlsConfig.RootCAs != nil {
tlsConfig.RootCAs = g.tlsConfig.RootCAs.Clone()
} else {
// No CA cert provided - skip server verification (dev mode)
tlsConfig.InsecureSkipVerify = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

CACert omitted → empty RootCAs breaks TLS / dev‑mode branch never runs.

Line 362-368 assumes RootCAs == nil when no CA is provided, but Line 2172-2177 always creates a non‑nil pool. With empty CACert, verification uses an empty pool and fails, while the dev-mode InsecureSkipVerify path is skipped.

✅ Suggested fix
- caCertPool := x509.NewCertPool()
- if caCertPEM != "" {
-     if !caCertPool.AppendCertsFromPEM([]byte(caCertPEM)) {
-         return nil, fmt.Errorf("failed to parse CA certificate")
-     }
- }
+ var caCertPool *x509.CertPool
+ if caCertPEM != "" {
+     caCertPool = x509.NewCertPool()
+     if !caCertPool.AppendCertsFromPEM([]byte(caCertPEM)) {
+         return nil, fmt.Errorf("failed to parse CA certificate")
+     }
+ }
- if g.tlsConfig.RootCAs != nil {
+ if g.tlsConfig.RootCAs != nil && len(g.tlsConfig.RootCAs.Subjects()) > 0 {
     tlsConfig.RootCAs = g.tlsConfig.RootCAs.Clone()
 } else {
     tlsConfig.InsecureSkipVerify = true
 }

Also applies to: 2172-2182

🤖 Prompt for AI Agents
In `@gravity/grpc_client.go` around lines 362 - 368, The code assumes
g.tlsConfig.RootCAs being nil when no CA is provided, but a non-nil empty
CertPool is created earlier so the dev-mode branch never runs; update the logic
where tlsConfig.RootCAs is set (the block that currently does if
g.tlsConfig.RootCAs != nil { tlsConfig.RootCAs = g.tlsConfig.RootCAs.Clone() }
else { tlsConfig.InsecureSkipVerify = true }) to detect an empty pool instead of
only nil (e.g., check len(g.tlsConfig.RootCAs.Subjects()) == 0 or equivalent)
and if empty set tlsConfig.InsecureSkipVerify = true, otherwise clone the pool;
apply the same change to the other identical spot that sets RootCAs (the second
occurrence referenced in the review).

Comment on lines +760 to +807
// Store session response fields
g.machineID = response.MachineId
g.machineIPv6 = response.MachineIpv6
g.machineSubnet = response.MachineSubnet
g.authorizationToken = response.MachineToken
g.connectionIDs = append(g.connectionIDs, response.MachineId)

g.logger.Debug("setting response fields...")
g.otlpURL = response.OtlpUrl
g.otlpToken = response.OtlpKey
g.apiURL = response.ApiUrl
g.hostEnvironment = response.Environment
g.hostMapping = response.HostMapping

// Store SSH public key if provided
if len(response.SshPublicKey) > 0 {
g.logger.Info("received SSH public key from Gravity (%d bytes)", len(response.SshPublicKey))
}

g.logger.Debug("storing host mappings...")
// Store host mappings directly (already protobuf)
g.hostMapping = response.HostMapping
g.logger.Debug("about to release mutex...")
g.mu.Unlock()
g.logger.Debug("mutex released")

// IMPORTANT: Send connection ID to channel FIRST before any blocking operations
g.logger.Debug("about to send connection ID %s to channel...", connectionID)
// Send machine ID to channel to unblock tunnel stream establishment
select {
case g.connectionIDChan <- connectionID:
g.logger.Debug("connection ID sent to channel successfully")
case g.connectionIDChan <- response.MachineId:
g.logger.Debug("machine ID sent to channel successfully")
default:
g.logger.Error("connection ID channel full - this should not happen!")
return // Early return if channel send fails
g.logger.Error("machine ID channel full - this should not happen!")
return
}

g.logger.Debug("configuring provider with gRPC server...")
// Configure provider with the gRPC server as provider.Server interface
// Configure provider with session info
if err := g.provider.Configure(provider.Configuration{
Server: g, // Critical: pass the gRPC server as provider.Server
Context: g.context,
Logger: g.logger,
APIURL: g.apiURL,
SSHPublicKey: response.SshPublicKey,
TelemetryURL: g.otlpURL,
TelemetryAPIKey: response.OtlpKey,
GravityURL: g.url,
AgentuityCACert: g.caCert,
HostMapping: g.hostMapping,
Environment: g.hostEnvironment,
SubnetRoutes: response.SubnetRoutes,
Hostname: response.Hostname,
OrgID: response.OrgId,
Server: g,
Context: g.context,
Logger: g.logger,
APIURL: g.apiURL,
SSHPublicKey: response.SshPublicKey,
TelemetryURL: g.otlpURL,
TelemetryAPIKey: response.OtlpKey,
GravityURL: g.url,
AgentuityCACert: g.caCert,
HostMapping: g.hostMapping,
Environment: g.hostEnvironment,
SubnetRoutes: response.SubnetRoutes,
Hostname: response.Hostname,
OrgID: response.OrgId,
MachineToken: response.MachineToken,
MachineID: response.MachineId,
IP6Address: response.MachineIpv6,
MachineCertBundle: response.MachineCertBundle,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the specific lines mentioned in the review
head -n 810 gravity/grpc_client.go | tail -n 60

Repository: agentuity/go-common

Length of output: 2079


🏁 Script executed:

# Find the GetIPv6Address() method implementation
rg "func.*GetIPv6Address" gravity/grpc_client.go -A 5

Repository: agentuity/go-common

Length of output: 245


🏁 Script executed:

# Search for all references to machineIPv6 and ip6Address in the file
rg "machineIPv6|ip6Address" gravity/grpc_client.go -n

Repository: agentuity/go-common

Length of output: 319


🏁 Script executed:

# Find where ip6Address is initially set
rg "ip6Address\s*=" gravity/grpc_client.go -B 2 -A 2

Repository: agentuity/go-common

Length of output: 45


🏁 Script executed:

# Find all callers of GetIPv6Address()
rg "GetIPv6Address\(\)" --type go

Repository: agentuity/go-common

Length of output: 135


🏁 Script executed:

# Check NewGravityClient or constructor to see initial setup
rg "func New|func.*Client.*\{" gravity/grpc_client.go -A 15 | head -80

Repository: agentuity/go-common

Length of output: 3282


🏁 Script executed:

# Check if machineIPv6 is used anywhere else in the codebase
rg "machineIPv6" --type go

Repository: agentuity/go-common

Length of output: 168


🏁 Script executed:

# Look for where ip6Address might be initialized or set in New() function
rg "ip6Address" gravity/grpc_client.go -B 3 -A 3

Repository: agentuity/go-common

Length of output: 842


🏁 Script executed:

# Check the full New() function to see if ip6Address is set
sed -n '1,150p' gravity/grpc_client.go | tail -100

Repository: agentuity/go-common

Length of output: 2954


🏁 Script executed:

# Verify the struct definition to understand both fields
rg "type GravityClient struct" gravity/grpc_client.go -A 50 | head -70

Repository: agentuity/go-common

Length of output: 1439


Update ip6Address with server-assigned IPv6 in session response handler.

In handleSessionHelloResponse(), g.machineIPv6 is set from the session response, but GetIPv6Address() returns g.ip6Address (initialized to empty string and never updated). This means external callers receive stale data instead of the server-assigned IPv6 address. Update g.ip6Address alongside g.machineIPv6 to ensure the public getter returns the correct value.

Fix
	g.machineIPv6 = response.MachineIpv6
+	g.ip6Address = response.MachineIpv6
🤖 Prompt for AI Agents
In `@gravity/grpc_client.go` around lines 760 - 807, In
handleSessionHelloResponse(), the server-assigned IPv6 is stored in
g.machineIPv6 but g.ip6Address remains unset so GetIPv6Address() returns stale
data; under the same mutex-protected section where g.machineIPv6 is assigned
(before g.mu.Unlock()), also set g.ip6Address = response.MachineIpv6 so the
public getter (GetIPv6Address) returns the server-assigned IPv6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants