Skip to content

Conversation

@xueli181114
Copy link
Contributor

@xueli181114 xueli181114 commented Jan 8, 2026

In this PR I added

  • Put error to logger fields
  • Put stack_trace to logger fields
  • Put resource_type, resource_name, resource_result, observed generation to logger fields
  • Bumped broker lib and updated the code

Summary by CodeRabbit

  • New Features

    • End-to-end distributed tracing via OpenTelemetry for improved cross-service trace correlation.
  • Improvements

    • Richer contextual logging (resource and trace fields), more consistent error context during startup/shutdown, and reduced noisy debug output.
    • Selective stack-trace capture for unexpected errors to improve diagnostics without excess noise.
    • Improved trace propagation on outgoing requests and event handling.
  • Chores

    • Dependency updates for OpenTelemetry and broker.
  • Tests

    • Expanded tests for trace extraction, error-field behavior, and stack-trace capture.

✏️ Tip: You can customize this high-level summary in your review settings.

@openshift-ci openshift-ci bot requested review from crizzo71 and mbrudnoy January 8, 2026 12:07
@coderabbitai
Copy link

coderabbitai bot commented Jan 8, 2026

Walkthrough

Adds OpenTelemetry tracing and propagation across the adapter startup and runtime (tracer init/shutdown, trace extraction from CloudEvents, spans around Execute and outbound HTTP requests), extends logger context with OTEL trace ids and Kubernetes resource fields, replaces direct logger error methods with context-attached WithErrorField and selective stack-trace capture, updates many executor and client call sites to propagate trace/logger/context, changes broker.NewSubscriber to accept a logger, and updates go.mod to add OTEL modules and bump hyperfleet-broker.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Adapter
    participant OTEL
    participant HyperfleetClient
    participant Backend

    Client->>Adapter: Send CloudEvent
    Adapter->>OTEL: ExtractTraceContextFromCloudEvent(ctx)
    Adapter->>OTEL: Start execution span
    Adapter->>HyperfleetClient: doRequest(ctx with propagated trace headers)
    HyperfleetClient->>OTEL: Start HTTP child span
    HyperfleetClient->>Backend: HTTP request (with traceparent)
    Backend-->>HyperfleetClient: Response
    HyperfleetClient->>OTEL: End HTTP span
    Adapter->>OTEL: End execution span
    Adapter-->>Client: Acknowledge
Loading
sequenceDiagram
    participant Main
    participant ConfigLoader
    participant OTEL
    participant BrokerLib
    participant Subscriber
    participant Executor

    Main->>ConfigLoader: Load config
    Main->>OTEL: GetTraceSampleRatio()
    Main->>OTEL: InitTracer(service, version, ratio)
    Main->>OTEL: Defer tracer shutdown (timeout)
    Main->>BrokerLib: NewSubscriber(logger, subscriptionID)
    BrokerLib-->>Subscriber: return subscriber
    Main->>Executor: Start processing (logger + ctx)
    Executor->>OTEL: Create execution spans per event
    Main->>OTEL: Shutdown tracer on exit
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • crizzo71
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.32% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the PR: adding improved structured logging fields (error, stack_trace, resource metadata, trace context) for better filtering and correlation in the adapter logger.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @cmd/adapter/main.go:
- Line 293: The call to broker.NewSubscriber is using the wrong signature:
remove the logger argument and call NewSubscriber with only the subscriptionID;
update the invocation of broker.NewSubscriber(log, subscriptionID) to
broker.NewSubscriber(subscriptionID) so it matches the
NewSubscriber(subscriptionID string, configMap ...map[string]string) signature
and eliminate the unused log parameter from that call.

In @go.mod:
- Around line 10-17: Update the OpenTelemetry module versions in go.mod by
changing go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk, and
go.opentelemetry.io/otel/trace from v1.36.0 to v1.38.0; after editing these
three module lines, run the module update (e.g., go get
go.opentelemetry.io/otel@v1.38.0) and then tidy the module (go mod tidy) to
ensure dependency graph and go.sum are updated.
🧹 Nitpick comments (5)
internal/hyperfleet_api/client.go (1)

322-327: Minor inconsistency: debug log uses original URL while span uses resolved URL.

Line 327 logs req.URL (the original, potentially relative URL), while the span at line 279 records resolvedURL (the absolute URL). For consistency in debugging and trace correlation, consider using resolvedURL in the debug log as well.

♻️ Suggested fix
-	c.log.Debugf(ctx, "HyperFleet API request: %s %s", req.Method, req.URL)
+	c.log.Debugf(ctx, "HyperFleet API request: %s %s", req.Method, resolvedURL)
internal/executor/resource_executor.go (1)

132-136: Redundant computation of manifestGen.

manifestGen is already computed at line 109 within the if existingResource != nil block. However, since that declaration is block-scoped, it's not accessible here. Consider moving the generation annotation retrieval to a single location before the conditional block to avoid the redundant call.

♻️ Suggested refactor
 	// Extract resource info
 	gvk := manifest.GroupVersionKind()
 	result.Kind = gvk.Kind
 	result.Namespace = manifest.GetNamespace()
 	result.ResourceName = manifest.GetName()
+	manifestGen := k8s_client.GetGenerationAnnotation(manifest)

 	// Add resource context fields for logging
 	ctx = logger.WithResourceType(ctx, result.Kind)
 	ctx = logger.WithResourceName(ctx, result.ResourceName)
+	if manifestGen > 0 {
+		ctx = logger.WithObservedGeneration(ctx, manifestGen)
+	}

 	re.log.Debugf(ctx, "Resource[%s] manifest built: namespace=%s", resource.Name, manifest.GetNamespace())

 	// Step 2: Check for existing resource using discovery
 	...

 	// Step 3: Determine and perform the appropriate operation
 	if existingResource != nil {
 		// Check if generation annotations match - skip update if unchanged
 		existingGen := k8s_client.GetGenerationAnnotation(existingResource)
-		manifestGen := k8s_client.GetGenerationAnnotation(manifest)

 		if existingGen == manifestGen {
 		...
-
-	// Add observed_generation to context if available
-	manifestGen := k8s_client.GetGenerationAnnotation(manifest)
-	if manifestGen > 0 {
-		ctx = logger.WithObservedGeneration(ctx, manifestGen)
-	}
internal/executor/executor.go (1)

88-94: Consider setting resource_type consistently.

When OwnedReference is nil (e.g., for Cluster events), WithResourceType is not called, so logs will lack the resource_type field. For consistency, consider always setting the resource type:

 	if eventData.OwnedReference != nil {
 		ctx = logger.WithResourceType(ctx, eventData.Kind)
 		ctx = logger.WithDynamicResourceID(ctx, eventData.Kind, eventData.ID)
 		ctx = logger.WithDynamicResourceID(ctx, eventData.OwnedReference.Kind, eventData.OwnedReference.ID)
 	} else {
+		ctx = logger.WithResourceType(ctx, eventData.Kind)
 		ctx = logger.WithDynamicResourceID(ctx, eventData.Kind, eventData.ID)
 	}
cmd/adapter/main.go (1)

207-211: Consider logging tracer shutdown errors.

The shutdown error is currently discarded. While tracer shutdown failures are typically non-critical, logging them could help diagnose issues in production.

♻️ Suggested improvement
 	defer func() {
 		shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 5*time.Second)
 		defer shutdownCancel()
-		_ = tp.Shutdown(shutdownCtx)
+		if err := tp.Shutdown(shutdownCtx); err != nil {
+			log.Warnf(context.Background(), "Failed to shutdown tracer provider: %v", err)
+		}
 	}()
pkg/logger/context.go (1)

177-233: LGTM, but consider using these helpers in executor.go.

These helper functions provide clean abstractions for extracting OTel trace context. However, in internal/executor/executor.go (lines 63-69), the trace ID and span ID are extracted manually instead of using these helpers.

Consider refactoring executor.go to use WithOTelTraceContext:

-	// Extract trace_id and span_id from OTel span and add to logger context
-	spanCtx := trace.SpanContextFromContext(ctx)
-	if spanCtx.HasTraceID() {
-		ctx = logger.WithTraceID(ctx, spanCtx.TraceID().String())
-	}
-	if spanCtx.HasSpanID() {
-		ctx = logger.WithSpanID(ctx, spanCtx.SpanID().String())
-	}
+	// Extract trace_id and span_id from OTel span and add to logger context
+	ctx = logger.WithOTelTraceContext(ctx)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a5be140 and 78faffc.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (13)
  • cmd/adapter/main.go
  • go.mod
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
  • internal/executor/executor.go
  • internal/executor/post_action_executor.go
  • internal/executor/precondition_executor.go
  • internal/executor/resource_executor.go
  • internal/executor/utils.go
  • internal/hyperfleet_api/client.go
  • pkg/logger/context.go
  • pkg/logger/logger.go
  • pkg/logger/logger_test.go
💤 Files with no reviewable changes (2)
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.
📚 Learning: 2025-12-11T02:06:33.975Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 17
File: internal/config_loader/validator_schema.go:126-133
Timestamp: 2025-12-11T02:06:33.975Z
Learning: In the hyperfleet-adapter repository (Go), environment variable validation for required parameters (those with `env.*` source, `Required=true`, and no default) occurs at config load time in `internal/config_loader/validator_schema.go`. This is intentional design - environment variables must be set before deployment, and the validation ensures fail-fast behavior with clear error messages.

Applied to files:

  • internal/hyperfleet_api/client.go
🧬 Code graph analysis (8)
internal/hyperfleet_api/client.go (1)
pkg/logger/context.go (1)
  • WithOTelTraceContext (216-233)
internal/executor/precondition_executor.go (1)
pkg/logger/context.go (3)
  • WithStackTraceField (144-149)
  • WithErrorField (135-140)
  • CaptureStackTrace (156-175)
pkg/logger/logger_test.go (1)
pkg/logger/context.go (1)
  • WithErrorField (135-140)
internal/executor/executor.go (2)
internal/config_loader/types.go (2)
  • AdapterConfig (59-64)
  • Metadata (67-71)
pkg/logger/context.go (7)
  • WithTraceID (84-86)
  • WithSpanID (89-91)
  • WithStackTraceField (144-149)
  • WithErrorField (135-140)
  • CaptureStackTrace (156-175)
  • WithResourceType (104-106)
  • WithDynamicResourceID (78-81)
cmd/adapter/main.go (1)
pkg/logger/context.go (3)
  • WithStackTraceField (144-149)
  • WithErrorField (135-140)
  • CaptureStackTrace (156-175)
internal/executor/utils.go (1)
pkg/logger/context.go (3)
  • WithStackTraceField (144-149)
  • WithErrorField (135-140)
  • CaptureStackTrace (156-175)
internal/executor/resource_executor.go (3)
pkg/logger/context.go (7)
  • WithResourceType (104-106)
  • WithResourceName (109-111)
  • WithObservedGeneration (124-126)
  • WithResourceResult (114-116)
  • WithStackTraceField (144-149)
  • WithErrorField (135-140)
  • CaptureStackTrace (156-175)
internal/k8s_client/client.go (1)
  • GetGenerationAnnotation (381-398)
internal/executor/types.go (2)
  • NewExecutorError (375-382)
  • PhaseResources (25-25)
internal/executor/post_action_executor.go (1)
pkg/logger/context.go (3)
  • WithStackTraceField (144-149)
  • WithErrorField (135-140)
  • CaptureStackTrace (156-175)
🔇 Additional comments (21)
internal/executor/utils.go (2)

47-48: Clean adoption of the new context-based error logging pattern.

The error context enrichment with stack trace capture correctly skips 1 frame to point at the caller site.


141-148: Good approach to unify error context for both error and non-success HTTP scenarios.

The logic correctly distinguishes between an actual error and a non-success HTTP status, creating appropriate error context for both cases before logging.

pkg/logger/logger_test.go (1)

270-280: Test correctly updated to reflect the new context-based error logging API.

The test validates that WithErrorField enriches the context, which is then propagated through the logger chain. The approach aligns with the broader refactor from method-based to context-based error handling.

internal/hyperfleet_api/client.go (1)

271-284: OpenTelemetry span integration is well-structured.

The implementation correctly creates a child span for each HTTP request, sets appropriate attributes, and propagates trace context to the logger. Good use of defer span.End() to ensure spans are always closed.

pkg/logger/logger.go (1)

147-152: Good refactor to use typed keys for base logger fields.

Using string(ComponentKey), string(VersionKey), and string(HostnameKey) instead of string literals ensures consistency with the context key definitions and reduces the risk of typos or drift between logger fields and context fields.

internal/executor/precondition_executor.go (2)

41-42: Consistent adoption of context-enriched error logging.

The error and stack trace are now captured in structured log fields, improving log searchability and correlation while keeping the log message concise.


188-188: Appropriate log level adjustment for individual condition results.

Demoting individual condition match/no-match details to debug level reduces noise while the meaningful NOT_MET summary at line 52 remains at info level. Good balance between observability and verbosity.

internal/executor/post_action_executor.go (2)

40-41: Consistent error logging pattern for payload build failures.

The enriched context captures error and stack trace in structured fields while the ExecutionError struct continues to hold the detailed error information for programmatic access.


61-62: Consistent error logging for post-action failures.

The pattern matches the rest of the codebase, with error details in structured fields and the ExecutionError maintaining full context for error handling.

internal/executor/resource_executor.go (3)

76-80: LGTM!

Resource context fields are correctly added using the new logger helpers. The context enrichment with ResourceType and ResourceName early in the flow ensures all subsequent logs in this method include these fields.


163-172: LGTM!

Good use of structured logging context. The error path correctly captures stack trace and error details, while the success path cleanly logs the operation result. The skip=1 parameter correctly skips only the immediate caller frame.


337-380: LGTM!

The logging changes appropriately reduce verbosity by moving routine recreate/delete operations to Debug level while maintaining Warn/Error for exceptional conditions. The error logging correctly enriches context with stack traces.

internal/executor/executor.go (3)

56-69: LGTM!

OpenTelemetry integration is correctly implemented:

  • Span is properly started with component name and deferred end
  • Trace ID and Span ID are extracted from the span context and added to the logger context for correlation
  • The conditional checks (HasTraceID, HasSpanID) correctly handle cases where IDs might not be present

196-203: LGTM, but note on error ordering.

The combined error aggregation is useful for final status logging. Note that map iteration order in Go is non-deterministic, so the order of phase errors in the combined message may vary between runs. This is acceptable since individual errors are already logged with stack traces at their respective failure points.


233-239: LGTM!

Event metadata logging is appropriately streamlined. The event ID is already in the logging context (set at line 230), so removing it from the message avoids duplication while maintaining structured field availability.

cmd/adapter/main.go (2)

137-162: LGTM!

The initTracer function correctly initializes OpenTelemetry with:

  • Proper resource attributes (service name/version)
  • AlwaysSample() sampler for full trace coverage
  • W3C TraceContext propagation for distributed tracing
  • Clean return of the TracerProvider for lifecycle management

183-184: LGTM!

The error logging pattern is consistently applied throughout the file, enriching all error logs with structured error fields and stack traces for better debugging and correlation.

pkg/logger/context.go (4)

5-9: LGTM!

New imports are correctly added for the new functionality: fmt and runtime for stack trace capture, and go.opentelemetry.io/otel/trace for OpenTelemetry integration.


133-149: LGTM!

Both functions correctly implement nil/empty guards to avoid polluting logs with empty fields. The pattern of returning the unchanged context when there's nothing to add is consistent and efficient.


151-175: LGTM!

The stack trace capture implementation is correct:

  • maxFrames=32 provides sufficient depth for debugging
  • The skip+2 calculation correctly accounts for runtime.Callers and CaptureStackTrace itself
  • The frame format (file:line function) is clear and parseable
  • Returning nil for empty captures integrates cleanly with WithStackTraceField

108-116: LGTM!

The new functions follow the established pattern and provide clear semantic meaning. The result values ("SUCCESS"/"FAILED") are used consistently in the calling code.

@coderabbitai
Copy link

coderabbitai bot commented Jan 8, 2026

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #27

coderabbitai bot added a commit that referenced this pull request Jan 8, 2026
Docstrings generation was requested by @xueli181114.

* #26 (comment)

The following files were modified:

* `cmd/adapter/main.go`
* `internal/executor/utils.go`
* `pkg/logger/context.go`
* `pkg/logger/logger.go`
@xueli181114 xueli181114 changed the title HYPERFLEET-463 - Improved structured logging fields for enhanced ering and correlation of adapter logger HYPERFLEET-463 - Improved structured logging fields for enhanced filtering and correlation of adapter logger Jan 8, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @internal/hyperfleet_api/client.go:
- Around line 277-280: The span.SetAttributes call is using deprecated HTTP
attribute keys; update the attributes passed to span.SetAttributes to use
OpenTelemetry's current semantic names by replacing "http.method" with
"http.request.method" and "http.url" with "url.full", keeping the same values
(e.g., req.Method and resolvedURL) so the call becomes span.SetAttributes(...)
with attribute.String("http.request.method", req.Method) and
attribute.String("url.full", resolvedURL).
🧹 Nitpick comments (5)
internal/executor/utils.go (1)

140-149: Consider applying consistent error logging to PUT and PATCH failures.

The enriched error logging with stack trace is applied only to POST failures. PUT (lines 150-158) and PATCH (lines 159-167) operations also have body rendering but lack similar failure logging for debugging.

If the intent is to help debug body-related failures, consider applying the same pattern to PUT and PATCH, or document why POST is special.

pkg/logger/context.go (1)

189-233: Minor duplication between WithOTelTraceID and WithOTelTraceContext.

Both functions extract and validate the span context, then conditionally add trace_id. Consider extracting the common logic to reduce duplication.

♻️ Optional refactor to reduce duplication
+// addTraceIDIfValid adds trace_id to context if the span context has a valid trace ID
+func addTraceIDIfValid(ctx context.Context, spanCtx trace.SpanContext) context.Context {
+	if spanCtx.HasTraceID() {
+		ctx = WithLogField(ctx, string(TraceIDKey), spanCtx.TraceID().String())
+	}
+	return ctx
+}
+
 func WithOTelTraceID(ctx context.Context) context.Context {
 	spanCtx := trace.SpanContextFromContext(ctx)
 	if !spanCtx.IsValid() {
 		return ctx
 	}
-
-	// Add trace_id if valid
-	if spanCtx.HasTraceID() {
-		ctx = WithLogField(ctx, string(TraceIDKey), spanCtx.TraceID().String())
-	}
-
-	return ctx
+	return addTraceIDIfValid(ctx, spanCtx)
 }

 func WithOTelTraceContext(ctx context.Context) context.Context {
 	spanCtx := trace.SpanContextFromContext(ctx)
 	if !spanCtx.IsValid() {
 		return ctx
 	}
-
-	// Add trace_id if valid
-	if spanCtx.HasTraceID() {
-		ctx = WithLogField(ctx, string(TraceIDKey), spanCtx.TraceID().String())
-	}
-
+	ctx = addTraceIDIfValid(ctx, spanCtx)
 	// Add span_id if valid
 	if spanCtx.HasSpanID() {
 		ctx = WithLogField(ctx, string(SpanIDKey), spanCtx.SpanID().String())
 	}
-
 	return ctx
 }
pkg/logger/logger.go (1)

298-313: Consider consolidating with CaptureStackTrace in context.go.

This GetStackTrace function has a different output format (func() file:line) compared to CaptureStackTrace in context.go (file:line func). Having two stack trace capture functions with different formats could lead to inconsistent log output.

Consider deprecating one in favor of the other, or documenting the intended use case for each.

internal/executor/resource_executor.go (1)

132-136: Minor duplication: manifestGen is extracted twice.

k8s_client.GetGenerationAnnotation(manifest) is called here on line 133, but was already called on line 109. Consider reusing the value from line 109.

♻️ Reuse manifestGen from earlier extraction
 		existingGen := k8s_client.GetGenerationAnnotation(existingResource)
 		manifestGen := k8s_client.GetGenerationAnnotation(manifest)
+		// manifestGen is reused below for observed_generation logging

 		if existingGen == manifestGen {
 			// Generations match - no action needed
@@ -129,9 +130,8 @@
 	}

 	// Add observed_generation to context if available
-	manifestGen := k8s_client.GetGenerationAnnotation(manifest)
 	if manifestGen > 0 {
 		ctx = logger.WithObservedGeneration(ctx, manifestGen)
 	}

Note: This requires moving manifestGen declaration outside the if existingResource != nil block or accessing it differently.

cmd/adapter/main.go (1)

183-184: Consider extracting error logging with stack traces into a helper function.

The pattern of creating an error context with stack trace and logging is repeated ~13 times throughout the function:

errCtx := logger.WithStackTraceField(logger.WithErrorField(ctx, err), logger.CaptureStackTrace(1))
log.Errorf(errCtx, "message")

To reduce duplication and improve maintainability, consider adding a helper method to the logger package:

func (l *Logger) ErrorWithStackTrace(ctx context.Context, err error, msg string) {
    errCtx := WithStackTraceField(WithErrorField(ctx, err), CaptureStackTrace(2))
    l.Errorf(errCtx, msg)
}

This would simplify all error logging sites to:

log.ErrorWithStackTrace(ctx, err, "Failed to load adapter configuration")

Also applies to: 203-204, 217-218, 227-228, 241-242, 272-273, 281-282, 295-296, 305-306, 317-318, 336-337, 355-356, 362-363

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78faffc and 9239e08.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (13)
  • cmd/adapter/main.go
  • go.mod
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
  • internal/executor/executor.go
  • internal/executor/post_action_executor.go
  • internal/executor/precondition_executor.go
  • internal/executor/resource_executor.go
  • internal/executor/utils.go
  • internal/hyperfleet_api/client.go
  • pkg/logger/context.go
  • pkg/logger/logger.go
  • pkg/logger/logger_test.go
💤 Files with no reviewable changes (2)
  • internal/config_loader/validator_test.go
  • internal/config_loader/loader_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • go.mod
  • internal/executor/executor.go
  • internal/executor/post_action_executor.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.
📚 Learning: 2025-12-11T02:06:33.975Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 17
File: internal/config_loader/validator_schema.go:126-133
Timestamp: 2025-12-11T02:06:33.975Z
Learning: In the hyperfleet-adapter repository (Go), environment variable validation for required parameters (those with `env.*` source, `Required=true`, and no default) occurs at config load time in `internal/config_loader/validator_schema.go`. This is intentional design - environment variables must be set before deployment, and the validation ensures fail-fast behavior with clear error messages.

Applied to files:

  • internal/hyperfleet_api/client.go
📚 Learning: 2025-12-04T14:06:51.656Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.

Applied to files:

  • cmd/adapter/main.go
🧬 Code graph analysis (6)
internal/hyperfleet_api/client.go (2)
pkg/logger/context.go (1)
  • WithOTelTraceContext (216-233)
internal/config_loader/types.go (1)
  • Header (172-175)
pkg/logger/logger_test.go (1)
pkg/logger/context.go (1)
  • WithErrorField (135-140)
internal/executor/resource_executor.go (3)
pkg/logger/context.go (7)
  • WithResourceType (104-106)
  • WithResourceName (109-111)
  • WithObservedGeneration (124-126)
  • WithResourceResult (114-116)
  • WithStackTraceField (144-149)
  • WithErrorField (135-140)
  • CaptureStackTrace (156-175)
internal/k8s_client/client.go (1)
  • GetGenerationAnnotation (381-398)
internal/executor/types.go (2)
  • NewExecutorError (375-382)
  • PhaseResources (25-25)
cmd/adapter/main.go (2)
pkg/logger/context.go (3)
  • WithStackTraceField (144-149)
  • WithErrorField (135-140)
  • CaptureStackTrace (156-175)
internal/config_loader/types.go (1)
  • Metadata (67-71)
pkg/logger/logger.go (1)
pkg/logger/context.go (3)
  • ComponentKey (17-17)
  • VersionKey (18-18)
  • HostnameKey (19-19)
internal/executor/utils.go (1)
pkg/logger/context.go (3)
  • WithStackTraceField (144-149)
  • WithErrorField (135-140)
  • CaptureStackTrace (156-175)
🔇 Additional comments (14)
internal/executor/utils.go (1)

47-48: LGTM! Context-based error logging is correctly implemented.

The error context is properly enriched with both the error field and stack trace before logging. The CaptureStackTrace(1) correctly skips one frame to capture the relevant call site.

pkg/logger/context.go (2)

133-149: LGTM! Well-designed error and stack trace context helpers.

The nil/empty guards ensure safe usage without requiring callers to check. Storing err.Error() as a string (rather than the error interface) is appropriate for structured logging serialization.


156-175: LGTM! Stack trace capture is correctly implemented.

The skip+2 offset correctly accounts for runtime.Callers and CaptureStackTrace itself. The frame format includes file, line, and function name which provides good debugging context.

internal/hyperfleet_api/client.go (2)

271-284: LGTM! OpenTelemetry tracing correctly integrated.

The span is properly created with method and URL attributes, and defer span.End() ensures the span is closed even on early returns. The logger context is enriched with trace information for log correlation.


322-324: LGTM! Trace context propagation is correctly implemented.

Injecting the trace context via propagation.HeaderCarrier ensures the traceparent header is added to outgoing requests, enabling distributed tracing across services.

pkg/logger/logger_test.go (1)

270-280: LGTM! Test correctly updated for context-based error handling.

The test demonstrates the new pattern where errors are attached to context via WithErrorField rather than the removed Logger.WithError method. The chaining with log.With("extra", "info") verifies that logger chaining still works alongside context-based error fields.

pkg/logger/logger.go (1)

147-152: LGTM! Base logger fields use consistent typed keys.

Using string(ComponentKey), string(VersionKey), and string(HostnameKey) ensures the base logger fields align with the context key constants defined in context.go, improving consistency across the logging system.

internal/executor/precondition_executor.go (2)

41-42: LGTM! Error context enrichment correctly applied.

The error is wrapped with both error field and stack trace before logging, providing rich context for debugging precondition failures.


188-189: LGTM! Reduced verbosity for NOT_MET conditions is appropriate.

Changing from Infof with full details to Debugf with a concise message reduces log noise. The detailed condition information is still available in the result object and logged at Info level on line 52.

internal/executor/resource_executor.go (4)

76-80: LGTM! Resource context enrichment is well-placed.

Adding resource_type and resource_name to the context after manifest building ensures all subsequent logs for this resource include consistent identification fields.


163-172: LGTM! Error and success paths use consistent result tagging.

The error path correctly enriches context with resource_result=FAILED, error field, and stack trace. The success path uses resource_result=SUCCESS. This enables filtering logs by operation outcome.


337-350: LGTM! Recreate logging appropriately reduced to Debug level.

Moving detailed recreate operation logs (delete, wait, create) from Info to Debug level reduces noise while preserving the information for debugging when needed.


375-377: LGTM! Error context correctly enriched in deletion wait loop.

When an unexpected error occurs during deletion status check, the context is enriched with error and stack trace before logging, maintaining consistency with the PR's error handling pattern.

cmd/adapter/main.go (1)

293-293: Verify broker.NewSubscriber signature in hyperfleet-broker v1.0.1.

The code calls broker.NewSubscriber(log, subscriptionID) with the logger as the first parameter. Confirm this matches the actual v1.0.1 API by checking the broker library's source code or API documentation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @cmd/adapter/tracing.go:
- Around line 56-94: The semconv package used by initTracer is outdated for the
SDK version; update the semconv import to v1.31.0 or later (compatible with
OpenTelemetry SDK v1.36.0) so symbols like semconv.ServiceName and
semconv.ServiceVersion resolve correctly; modify the module dependency (go.mod)
to require the newer semconv version and run go mod tidy, then rebuild to ensure
no missing constants or API mismatches in the initTracer function and related
uses.
🧹 Nitpick comments (3)
pkg/logger/context.go (1)

189-233: Consider extracting shared logic between WithOTelTraceID and WithOTelTraceContext.

Both functions share the same pattern for checking span validity and adding trace_id. The duplication is minor but could be consolidated.

♻️ Optional refactor to reduce duplication
 func WithOTelTraceID(ctx context.Context) context.Context {
 	spanCtx := trace.SpanContextFromContext(ctx)
 	if !spanCtx.IsValid() {
 		return ctx
 	}
-
-	// Add trace_id if valid
-	if spanCtx.HasTraceID() {
-		ctx = WithLogField(ctx, string(TraceIDKey), spanCtx.TraceID().String())
-	}
-
-	return ctx
+	return addTraceID(ctx, spanCtx)
 }

 func WithOTelTraceContext(ctx context.Context) context.Context {
 	spanCtx := trace.SpanContextFromContext(ctx)
 	if !spanCtx.IsValid() {
 		return ctx
 	}
-
-	// Add trace_id if valid
-	if spanCtx.HasTraceID() {
-		ctx = WithLogField(ctx, string(TraceIDKey), spanCtx.TraceID().String())
-	}
-
-	// Add span_id if valid
-	if spanCtx.HasSpanID() {
-		ctx = WithLogField(ctx, string(SpanIDKey), spanCtx.SpanID().String())
-	}
-
-	return ctx
+	ctx = addTraceID(ctx, spanCtx)
+	if spanCtx.HasSpanID() {
+		ctx = WithLogField(ctx, string(SpanIDKey), spanCtx.SpanID().String())
+	}
+	return ctx
+}
+
+func addTraceID(ctx context.Context, spanCtx trace.SpanContext) context.Context {
+	if spanCtx.HasTraceID() {
+		return WithLogField(ctx, string(TraceIDKey), spanCtx.TraceID().String())
+	}
+	return ctx
 }
cmd/adapter/main.go (1)

286-298: Stack trace captured at subscription error handler, not at error origin.

The stack trace captured here reflects the error handler goroutine's call stack, not the original error's origin in the broker. This is a limitation—the stack trace shows where the error was logged, not where it occurred.

Consider documenting this behavior or investigating if the broker library can provide richer error context with its own stack traces.

internal/executor/resource_executor.go (1)

132-140: Consider adding observed_generation earlier in the flow.

The manifestGen is already computed at line 109 for comparison. Moving the context enrichment earlier would ensure the generation appears in all logs, including the operation decision log at line 139.

♻️ Optional: Move observed_generation context enrichment earlier
 	// Step 3: Determine and perform the appropriate operation
+	// Add observed_generation to context if available (for all logs in this resource)
+	manifestGen := k8s_client.GetGenerationAnnotation(manifest)
+	if manifestGen > 0 {
+		ctx = logger.WithObservedGeneration(ctx, manifestGen)
+	}
+
 	if existingResource != nil {
 		// Check if generation annotations match - skip update if unchanged
 		existingGen := k8s_client.GetGenerationAnnotation(existingResource)
-		manifestGen := k8s_client.GetGenerationAnnotation(manifest)
 		...
 	}
-
-	// Add observed_generation to context if available
-	manifestGen := k8s_client.GetGenerationAnnotation(manifest)
-	if manifestGen > 0 {
-		ctx = logger.WithObservedGeneration(ctx, manifestGen)
-	}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9239e08 and 78f885a.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (14)
  • cmd/adapter/main.go
  • cmd/adapter/tracing.go
  • go.mod
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
  • internal/executor/executor.go
  • internal/executor/post_action_executor.go
  • internal/executor/precondition_executor.go
  • internal/executor/resource_executor.go
  • internal/executor/utils.go
  • internal/hyperfleet_api/client.go
  • pkg/logger/context.go
  • pkg/logger/logger.go
  • pkg/logger/logger_test.go
💤 Files with no reviewable changes (2)
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • internal/executor/precondition_executor.go
  • pkg/logger/logger_test.go
  • go.mod
  • internal/executor/utils.go
  • internal/executor/post_action_executor.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.
📚 Learning: 2025-12-04T14:06:51.656Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.

Applied to files:

  • cmd/adapter/main.go
📚 Learning: 2025-12-11T02:06:33.975Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 17
File: internal/config_loader/validator_schema.go:126-133
Timestamp: 2025-12-11T02:06:33.975Z
Learning: In the hyperfleet-adapter repository (Go), environment variable validation for required parameters (those with `env.*` source, `Required=true`, and no default) occurs at config load time in `internal/config_loader/validator_schema.go`. This is intentional design - environment variables must be set before deployment, and the validation ensures fail-fast behavior with clear error messages.

Applied to files:

  • internal/hyperfleet_api/client.go
🧬 Code graph analysis (5)
cmd/adapter/main.go (2)
pkg/logger/context.go (3)
  • WithStackTraceField (144-149)
  • WithErrorField (135-140)
  • CaptureStackTrace (156-175)
internal/config_loader/types.go (1)
  • Metadata (67-71)
internal/executor/executor.go (3)
pkg/logger/context.go (6)
  • WithStackTraceField (144-149)
  • WithErrorField (135-140)
  • CaptureStackTrace (156-175)
  • WithResourceType (104-106)
  • WithDynamicResourceID (78-81)
  • WithOTelTraceContext (216-233)
internal/executor/types.go (1)
  • Executor (69-75)
internal/config_loader/types.go (2)
  • AdapterConfig (59-64)
  • Metadata (67-71)
cmd/adapter/tracing.go (1)
pkg/logger/logger.go (1)
  • Logger (14-40)
internal/executor/resource_executor.go (3)
pkg/logger/context.go (7)
  • WithResourceType (104-106)
  • WithResourceName (109-111)
  • WithObservedGeneration (124-126)
  • WithResourceResult (114-116)
  • WithStackTraceField (144-149)
  • WithErrorField (135-140)
  • CaptureStackTrace (156-175)
internal/k8s_client/client.go (1)
  • GetGenerationAnnotation (381-398)
internal/executor/types.go (2)
  • NewExecutorError (375-382)
  • PhaseResources (25-25)
internal/hyperfleet_api/client.go (1)
pkg/logger/context.go (1)
  • WithOTelTraceContext (216-233)
🔇 Additional comments (20)
cmd/adapter/tracing.go (2)

17-25: LGTM - Well-documented tracing constants.

The constants are clearly defined with appropriate defaults. The 10% default sampling ratio is reasonable for moderate volume systems.


27-54: LGTM - Robust environment variable parsing with safe fallbacks.

The function properly handles missing values, parse errors, and out-of-range values with appropriate warning logs. The logging includes the effective value being used, which aids debugging.

pkg/logger/context.go (3)

15-43: LGTM - Well-organized context keys with clear categorization.

The keys are logically grouped (required fields, error fields, correlation fields, resource fields, adapter-specific fields) and use typed contextKey to prevent collisions.


133-149: LGTM - Nil-safe error and stack trace context setters.

Both functions correctly handle nil/empty input by returning the context unchanged, preventing unnecessary context allocations.


156-175: LGTM - Stack trace capture implementation.

The skip parameter handling is correct (skip+2 to account for runtime.Callers and CaptureStackTrace itself). The 32-frame limit is reasonable for most debugging scenarios.

internal/hyperfleet_api/client.go (2)

270-285: LGTM - Clean OpenTelemetry span creation for HTTP requests.

The span is properly created with meaningful name, attributes are set correctly using semantic conventions, and defer span.End() ensures the span is always closed. The logger context is correctly enriched with trace information.


322-324: Verify trace context injection uses the correct context.

The trace context is injected using reqCtx (the timeout-wrapped context), which is correct as it contains the span created earlier. This ensures the traceparent header reflects the current span.

pkg/logger/logger.go (1)

146-151: LGTM - Consistent use of typed context keys.

Using string(ComponentKey), string(VersionKey), and string(HostnameKey) ensures consistency with the keys defined in context.go. This prevents typos and makes refactoring safer.

internal/executor/executor.go (5)

56-71: LGTM - Proper tracing initialization and error handling.

The span is correctly started and deferred. Parse errors appropriately capture stack traces and error context before logging.


73-83: LGTM - Dynamic resource ID logging based on ownership.

The logic correctly distinguishes between owned resources (logs both owner and resource IDs) and standalone resources (logs only resource ID). Using WithDynamicResourceID allows flexible field naming based on resource type.


185-193: Non-deterministic error ordering in combined error message.

Map iteration order in Go is not guaranteed, so the combined error message may have different ordering across runs. This is acceptable for logging purposes but worth noting.


210-225: LGTM - Clean helper for traced execution setup.

The helper properly encapsulates span creation and trace context enrichment. Using AdapterConfig.Metadata.Name as the tracer name provides good service identification.


239-245: LGTM - Simplified event logging.

Removing event ID from the log message is appropriate since it's already in the context via WithEventID. This reduces log message verbosity while preserving correlation capability.

cmd/adapter/main.go (3)

168-182: LGTM - Proper OpenTelemetry lifecycle management.

The tracer is initialized after the logger is recreated with the adapter name, ensuring consistent service identification. The deferred shutdown with a 5-second timeout is reasonable for graceful termination.


240-255: LGTM - Enhanced error handling for missing environment variables.

Creating explicit error objects and enriching context with stack traces provides better debugging capability compared to simple string formatting.


264-268: All broker.NewSubscriber calls in this codebase are consistent.

Only one usage of broker.NewSubscriber exists in the adapter (cmd/adapter/main.go:264) and it correctly uses the signature accepting (log, subscriptionID). No inconsistent or outdated calls were found in the codebase.

internal/executor/resource_executor.go (4)

76-80: LGTM - Resource context enrichment for logging.

Adding resource type and name to the context early ensures all subsequent logs in this execution path include these fields for filtering and correlation.


153-172: LGTM - Clear success/failure logging with resource result context.

Using WithResourceResult("SUCCESS") and WithResourceResult("FAILED") provides structured filtering capability. The error path correctly includes stack trace and error field.


337-350: LGTM - Appropriate log level reduction for recreation flow.

Changing these logs from Info to Debug level reduces noise in production while preserving the information for debugging when needed.


361-382: LGTM - Robust deletion wait with proper error handling.

The polling loop correctly handles context cancellation, successful deletion (NotFound), and unexpected errors. Stack traces are captured for unexpected errors to aid debugging.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
internal/executor/executor.go (1)

185-192: Non-deterministic error message ordering.

The iteration over result.Errors (a map) does not guarantee consistent ordering. While this doesn't affect functionality, it may cause log message variations between runs.

♻️ Optional: Sort phases for consistent error messages
 	} else {
 		// Combine all errors into a single error for logging
 		var errMsgs []string
+		// Sort phases for consistent ordering
+		phases := make([]ExecutionPhase, 0, len(result.Errors))
+		for phase := range result.Errors {
+			phases = append(phases, phase)
+		}
+		sort.Slice(phases, func(i, j int) bool {
+			return string(phases[i]) < string(phases[j])
+		})
+		for _, phase := range phases {
+			errMsgs = append(errMsgs, fmt.Sprintf("%s: %v", phase, result.Errors[phase]))
+		}
-		for phase, err := range result.Errors {
-			errMsgs = append(errMsgs, fmt.Sprintf("%s: %v", phase, err))
-		}
cmd/adapter/tracing.go (1)

56-99: Consider adding a comment explaining the intentional lack of trace exporter.

The codebase contains no OTEL exporter configurations, confirming the TracerProvider is intentionally used only for trace ID generation for log correlation and request propagation. While the function's docstring explains the purpose well, consider adding an explicit comment in the code (e.g., in initTracer) stating that no exporter is configured intentionally, to prevent future confusion about whether this is an oversight or by design.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78f885a and 3897794.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (14)
  • cmd/adapter/main.go
  • cmd/adapter/tracing.go
  • go.mod
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
  • internal/executor/executor.go
  • internal/executor/post_action_executor.go
  • internal/executor/precondition_executor.go
  • internal/executor/resource_executor.go
  • internal/executor/utils.go
  • internal/hyperfleet_api/client.go
  • pkg/logger/context.go
  • pkg/logger/logger.go
  • pkg/logger/logger_test.go
💤 Files with no reviewable changes (2)
  • internal/config_loader/validator_test.go
  • internal/config_loader/loader_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/executor/precondition_executor.go
  • internal/executor/utils.go
  • pkg/logger/logger_test.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.
📚 Learning: 2025-12-04T14:06:51.656Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.

Applied to files:

  • cmd/adapter/main.go
  • go.mod
📚 Learning: 2025-12-11T02:06:33.975Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 17
File: internal/config_loader/validator_schema.go:126-133
Timestamp: 2025-12-11T02:06:33.975Z
Learning: In the hyperfleet-adapter repository (Go), environment variable validation for required parameters (those with `env.*` source, `Required=true`, and no default) occurs at config load time in `internal/config_loader/validator_schema.go`. This is intentional design - environment variables must be set before deployment, and the validation ensures fail-fast behavior with clear error messages.

Applied to files:

  • internal/hyperfleet_api/client.go
🧬 Code graph analysis (5)
internal/executor/post_action_executor.go (1)
pkg/logger/context.go (1)
  • WithErrorField (135-141)
cmd/adapter/main.go (2)
pkg/logger/context.go (1)
  • WithErrorField (135-141)
internal/config_loader/types.go (1)
  • Metadata (67-71)
internal/hyperfleet_api/client.go (2)
pkg/logger/context.go (1)
  • WithOTelTraceContext (191-208)
internal/config_loader/types.go (1)
  • Header (172-175)
internal/executor/executor.go (3)
pkg/logger/context.go (4)
  • WithErrorField (135-141)
  • WithResourceType (104-106)
  • WithDynamicResourceID (78-81)
  • WithOTelTraceContext (191-208)
internal/executor/types.go (1)
  • Executor (69-75)
internal/config_loader/types.go (2)
  • AdapterConfig (59-64)
  • Metadata (67-71)
internal/executor/resource_executor.go (2)
pkg/logger/context.go (5)
  • WithResourceType (104-106)
  • WithResourceName (109-111)
  • WithObservedGeneration (124-126)
  • WithResourceResult (114-116)
  • WithErrorField (135-141)
internal/k8s_client/client.go (1)
  • GetGenerationAnnotation (381-398)
🔇 Additional comments (28)
internal/hyperfleet_api/client.go (2)

271-284: LGTM - OpenTelemetry tracing integration for HTTP requests.

The implementation correctly:

  • Creates a child span for each HTTP request with meaningful attributes
  • Uses the resolved URL for both tracing and the actual request
  • Enriches the logger context with trace IDs for log correlation

322-324: Correct trace context propagation.

Using reqCtx (the timeout-scoped context) for trace injection ensures the propagated span context matches the actual request lifecycle.

internal/executor/post_action_executor.go (2)

40-41: Good adoption of context-enriched error logging.

Using logger.WithErrorField(ctx, err) attaches both the error message and stack trace to the logging context, aligning with the structured logging improvements tracked in HYPERFLEET-304. Based on learnings, this refactor addresses how errors are handled and logged.


61-62: Consistent error context pattern for post-action failures.

The error context enrichment is correctly applied before logging, ensuring the error details and stack trace are captured in structured log fields rather than embedded in the message string.

pkg/logger/logger.go (1)

146-151: Good use of typed context keys for consistency.

Using string(ComponentKey), string(VersionKey), and string(HostnameKey) instead of literal strings ensures consistency with the key definitions in pkg/logger/context.go and prevents typos.

internal/executor/executor.go (4)

56-58: Good OpenTelemetry tracing integration.

Starting the traced execution at the beginning of Execute and deferring span.End() ensures the entire execution is traced. The helper method cleanly separates tracing setup concerns.


75-82: Dynamic resource ID logging for owned resources.

The logging context correctly differentiates between owned resources (adding both owner and resource IDs) and standalone resources. This enables better log filtering and correlation for hierarchical resources like NodePools.


210-225: Well-documented tracing helper.

The startTracedExecution method has clear documentation explaining its purpose and the trace context propagation to both logs and outgoing HTTP requests. Using the adapter's metadata name as the tracer name provides good observability context.


239-245: Simplified event logging - event ID preserved in context.

Removing the event ID from log messages is acceptable since logger.WithEventID(ctx, evt.ID()) at line 236 already adds it to the logging context. All subsequent logs will include the event ID as a structured field.

cmd/adapter/main.go (4)

264-268: Broker subscriber now receives logger for consistent logging.

Passing the logger to broker.NewSubscriber enables the broker library to use the same structured logging configuration as the adapter, ensuring consistent log format and context propagation.


285-293: Helpful comment about error context limitations.

The comment correctly notes that stack traces captured here reflect the error monitoring goroutine's location, not the error's origin in the broker library. This sets appropriate expectations for debugging.


336-338: Timeout error handling is correct.

Creating a dedicated error for the timeout case and logging it with context ensures the timeout scenario is properly captured in structured logs.


168-182: Good OpenTelemetry lifecycle management.

The tracer is correctly:

  1. Initialized after config is loaded (to use adapter name)
  2. Initialized before API client creation (so HTTP requests are traced)
  3. Shutdown with a dedicated timeout context on exit
internal/executor/resource_executor.go (5)

76-80: Good context enrichment for resource logging.

Adding resource_type and resource_name to the logging context early ensures all subsequent logs for this resource execution include these fields, enabling effective log filtering and correlation.


106-112: Observed generation added to context for log correlation.

Extracting the manifest generation once and adding it to the logging context ensures consistency and enables filtering logs by generation. The conditional check (if manifestGen > 0) correctly avoids adding meaningless zero values.


164-173: Consistent resource result context for success and failure.

Using WithResourceResult(ctx, "FAILED") and WithResourceResult(ctx, "SUCCESS") enables structured filtering of resource operation outcomes. The error is correctly attached via WithErrorField for failure cases.


338-351: Appropriate log level reduction for recreation operations.

Demoting recreation step logs from Infof to Debugf reduces log noise during normal operations while preserving the information for debugging purposes.


376-381: Error context correctly applied in deletion wait loop.

Using WithErrorField for unexpected errors during deletion status checks ensures the error details are captured in structured logs while maintaining the polling loop's functionality.

go.mod (2)

15-17: OpenTelemetry SDK version mismatch is not a concern. OpenTelemetry Go follows semantic versioning with forward compatibility guarantees within the 1.x series, so SDK v1.36.0 is compatible with otel v1.38.0. This configuration is expected and poses no runtime compatibility issues.


10-10: Broker library updated to v1.0.1.

The broker library bump correctly implements the new NewSubscriber signature that requires a logger parameter, as confirmed by the usage in cmd/adapter/main.go line 264.

cmd/adapter/tracing.go (2)

17-25: LGTM! Well-defined tracing constants.

The default 10% sampling ratio is a reasonable choice for balancing observability with overhead in production environments.


27-54: LGTM! Robust environment variable parsing.

The function handles all edge cases appropriately:

  • Missing value
  • Invalid format
  • Out-of-range values

The logging provides clear operational feedback.

pkg/logger/context.go (6)

5-9: LGTM! Appropriate imports for new functionality.

The added imports support stack trace capture (runtime, fmt) and OpenTelemetry integration (otel/trace).


15-43: LGTM! Well-organized context key constants.

The constants are logically grouped with clear comments, follow consistent naming conventions, and use the custom contextKey type to prevent collisions.


108-116: LGTM! Consistent resource context functions.

Both functions follow the established pattern for context setters and integrate well with the dynamic log fields mechanism.


133-141: LGTM! Proper error and stack trace capture.

The function correctly:

  • Handles nil errors with early return
  • Uses skip=1 to capture the caller's stack (not WithErrorField's own frame)
  • Chains error message and stack trace into context

152-176: LGTM! Correct stack trace implementation.

The function properly:

  • Uses runtime.Callers with correct skip offset (+2 for runtime.Callers and CaptureStackTrace itself)
  • Formats stack frames in a standard, readable format
  • Handles edge cases (no frames captured)

The skip parameter is well-documented and allows callers to control the starting point of the trace.


178-208: LGTM! Proper OpenTelemetry context extraction.

The function correctly:

  • Checks for valid span context before extraction
  • Individually validates trace ID and span ID availability
  • Provides clear documentation with usage examples

This enables proper distributed tracing correlation in logs.

Copy link
Contributor

@ciaranRoche ciaranRoche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed it, but are we extracting trace context from CloudEvents anywhere 🤔

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/executor/utils.go (1)

125-150: ExecuteAPICall logs POST non-success as error but returns nil — callers must validate response separately.

The POST handling logs non-2xx responses with log.Error() (line 148), but apiClient.Post() doesn't return an error for HTTP error statuses — only for network failures. This means the function returns (resp, url, nil) for failed API calls, requiring callers to always invoke ValidateAPIResponse() to catch non-success statuses.

Currently, both callers (PreconditionExecutor and PostActionExecutor) do validate the response immediately after the call. However, this implicit contract is fragile: future callers might omit the validation step and treat a 4xx/5xx response as success.

Consider having ExecuteAPICall itself return an error when !resp.IsSuccess() (similar to how it already wraps errors at lines 180–204), eliminating the need for a separate validation step. This makes the contract explicit and prevents misuse.

Also avoid logging response bodies in this function; let callers decide what to log based on their context.

internal/executor/executor.go (1)

73-83: resource_type is missing for non-owned events (breaks the stated logging objective).
Right now resource_type is only set when OwnedReference != nil. If downstream filtering relies on resource_type, events without an owned reference won’t have it. Also, your comment implies a lower-cased value (e.g., nodepool), but you’re passing eventData.Kind directly.

Proposed fix (set resource_type consistently + normalize casing)
 	if eventData.OwnedReference != nil {
-		ctx = logger.WithResourceType(ctx, eventData.Kind)
+		ctx = logger.WithResourceType(ctx, strings.ToLower(eventData.Kind))
 		ctx = logger.WithDynamicResourceID(ctx, eventData.Kind, eventData.ID)
 		ctx = logger.WithDynamicResourceID(ctx, eventData.OwnedReference.Kind, eventData.OwnedReference.ID)
 	} else {
+		ctx = logger.WithResourceType(ctx, strings.ToLower(eventData.Kind))
 		ctx = logger.WithDynamicResourceID(ctx, eventData.Kind, eventData.ID)
 	}
🤖 Fix all issues with AI agents
In @cmd/adapter/main.go:
- Around line 266-271: The call to broker.NewSubscriber uses the wrong argument
order: NewSubscriber expects subscriptionID first per v1.0.1; update the call
site where NewSubscriber is invoked (currently passing log and subscriptionID)
to pass subscriptionID as the first/only required parameter (i.e., call
NewSubscriber(subscriptionID)), remove the stray log argument, and keep existing
error handling around the returned subscriber and err unchanged.

In @pkg/logger/with_error_field_test.go:
- Around line 335-343: The test skip_parameter_works is flaky because it assumes
len(stack1) < len(stack0) but CaptureStackTrace has a maxFrames cap; instead
update the assertion to accept either a shorter stack1 or that stack1 equals
stack0 with the first frame removed: call CaptureStackTrace(0) and
CaptureStackTrace(1) and assert (1) len(stack1) < len(stack0) OR (2) when
lengths are equal, stack1 equals stack0[1:] (with proper bounds checks) so the
test reliably verifies the skip behavior; locate the test named
skip_parameter_works and the calls to CaptureStackTrace to apply this change.
🧹 Nitpick comments (14)
cmd/adapter/main.go (1)

168-184: Make tracer shutdown warning consistent with WithErrorField.

Right now the shutdown error is only in the formatted message (%v), so it won’t land in the structured error field.

Proposed diff
 	defer func() {
 		shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 5*time.Second)
 		defer shutdownCancel()
 		if err := tp.Shutdown(shutdownCtx); err != nil {
-			log.Warnf(ctx, "Failed to shutdown TracerProvider: %v", err)
+			errCtx := logger.WithErrorField(ctx, err)
+			log.Warn(errCtx, "Failed to shutdown TracerProvider")
 		}
 	}()
pkg/logger/stack_trace.go (1)

18-99: Stack-trace blocklist is sensible, but double-check the “API server error” decision.

You currently skip stack traces for HyperFleet API server errors (IsServerError()), which are often the most actionable to have stacks for. If this is intentional to avoid storms, all good—just worth confirming.

internal/executor/resource_executor.go (1)

365-379: Add structured error field on deletion wait cancellation for consistency.

Proposed diff
 		case <-ctx.Done():
-			re.log.Warnf(ctx, "Context cancelled/timed out while waiting for deletion")
+			errCtx := logger.WithErrorField(ctx, ctx.Err())
+			re.log.Warn(errCtx, "Context cancelled/timed out while waiting for deletion")
 			return fmt.Errorf("context cancelled while waiting for resource deletion: %w", ctx.Err())
internal/executor/executor.go (7)

8-17: OTel imports/usage look consistent, but validate tracer naming strategy (high-cardinality risk).
Using otel.Tracer(componentName) where componentName comes from config may create many distinct tracer names and make telemetry harder to reason about; consider a stable instrumentation name (and put component/config name as span attributes instead).


114-125: Log the wrapped phase error (precondErr) to keep the error field consistent with result.Errors.
You build precondErr := fmt.Errorf("precondition evaluation failed: ...") but log precondOutcome.Error. Consider logging precondErr so the structured error matches what you store/return.


144-151: Same consistency issue: prefer logging the wrapped resErr over the raw err.
This makes the structured error field self-describing without needing surrounding message text.


169-175: Same consistency issue: prefer logging the wrapped postErr over the raw err.


185-193: Final combined error is nondeterministic (map iteration order) and may interact poorly with stack-trace capture heuristics.
Because result.Errors is a map, the joined string order will vary between runs, which hurts grep/diffing and can complicate alert de-duplication. Consider sorting phases before joining. Also consider whether combinedErr should be treated as “unexpected” by shouldCaptureStackTrace (it’s a synthetic string error).

Proposed fix (stable ordering)
+		// Combine all errors into a single error for logging (stable order)
 		var errMsgs []string
-		for phase, err := range result.Errors {
-			errMsgs = append(errMsgs, fmt.Sprintf("%s: %v", phase, err))
-		}
+		phases := make([]ExecutionPhase, 0, len(result.Errors))
+		for phase := range result.Errors {
+			phases = append(phases, phase)
+		}
+		sort.Slice(phases, func(i, j int) bool { return string(phases[i]) < string(phases[j]) })
+		for _, phase := range phases {
+			errMsgs = append(errMsgs, fmt.Sprintf("%s: %v", phase, result.Errors[phase]))
+		}
 		combinedErr := fmt.Errorf("execution failed: %s", strings.Join(errMsgs, "; "))
 		errCtx := logger.WithErrorField(ctx, combinedErr)
 		e.log.Errorf(errCtx, "Event execution finished: event_execution_status=failed")

(Requires adding import "sort".)


210-225: startTracedExecution(): good placement for trace/log correlation; consider span attributes instead of dynamic tracer names.
WithOTelTraceContext() right after Start() is correct. The main suggestion: use a stable tracer name and attach componentName (and maybe event kind/id) as attributes on the span.


233-247: CreateHandler(): message format change is fine since event_id remains in structured fields via ctx.
One minor gotcha: evt.Time() can be zero; if that’s possible in your event sources, consider a fallback or omit when zero to avoid 0001-... spam.

pkg/logger/context.go (4)

13-46: Key taxonomy is clear; confirm downstream log formatters/exporters preserve these field names unchanged.
Especially error and stack_trace (some pipelines remap/strip), and the dynamic *_id fields introduced via WithDynamicResourceID.


106-114: WithResourceType comment implies normalized values, but the helper doesn’t enforce it.
Either normalize inside WithResourceType (e.g., strings.ToLower(resourceType)) or ensure all call sites pass the canonical value; otherwise filtering becomes inconsistent.


116-134: K8s setters are straightforward; consider constraining k8s_result values to avoid free-form strings.
Optional: define constants (e.g., K8sResultSuccess, K8sResultFailed) to keep dashboards/alerts consistent.


151-168: WithErrorField(): good nil-guard + conditional stack traces; verify shouldCaptureStackTrace coverage for wrapped/joined errors.
This is a key performance/signal lever under high event load—ensure expected operational failures don’t suddenly start emitting stack traces because they’re wrapped differently upstream. Based on learnings, this aligns with the ongoing logger refactor work tracked in HYPERFLEET-304.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3897794 and d704281.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (16)
  • cmd/adapter/main.go
  • cmd/adapter/tracing.go
  • go.mod
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
  • internal/executor/executor.go
  • internal/executor/post_action_executor.go
  • internal/executor/precondition_executor.go
  • internal/executor/resource_executor.go
  • internal/executor/utils.go
  • internal/hyperfleet_api/client.go
  • pkg/logger/context.go
  • pkg/logger/logger.go
  • pkg/logger/logger_test.go
  • pkg/logger/stack_trace.go
  • pkg/logger/with_error_field_test.go
💤 Files with no reviewable changes (2)
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • internal/executor/precondition_executor.go
  • pkg/logger/logger_test.go
  • internal/executor/post_action_executor.go
  • cmd/adapter/tracing.go
  • pkg/logger/logger.go
  • internal/hyperfleet_api/client.go
  • go.mod
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.
📚 Learning: 2025-12-04T14:06:51.656Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.

Applied to files:

  • cmd/adapter/main.go
🧬 Code graph analysis (4)
internal/executor/executor.go (1)
pkg/logger/context.go (4)
  • WithErrorField (156-168)
  • WithResourceType (107-109)
  • WithDynamicResourceID (81-84)
  • WithOTelTraceContext (191-208)
pkg/logger/with_error_field_test.go (5)
pkg/logger/context.go (2)
  • WithErrorField (156-168)
  • GetLogFields (215-228)
internal/config_loader/types.go (1)
  • Resource (229-234)
pkg/errors/k8s_error.go (3)
  • NewK8sResourceKeyNotFoundError (74-81)
  • NewK8sInvalidPathError (96-102)
  • NewK8sResourceDataError (128-136)
pkg/errors/api_error.go (1)
  • NewAPIError (123-134)
pkg/logger/stack_trace.go (1)
  • CaptureStackTrace (115-134)
internal/executor/utils.go (1)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
pkg/logger/context.go (1)
pkg/logger/stack_trace.go (1)
  • CaptureStackTrace (115-134)
🔇 Additional comments (9)
cmd/adapter/main.go (1)

146-154: Consistent error-field logging via WithErrorField looks solid.

Nice cleanup for structured filtering/correlation (error message in fields rather than embedded in log strings).

Also applies to: 186-193, 195-203, 205-217, 241-257, 259-271, 275-281, 308-316, 329-341

internal/executor/utils.go (1)

44-50: Template render failure logging is improved (structured error field).

pkg/logger/stack_trace.go (1)

110-134: CaptureStackTrace implementation looks correct and straightforward.

internal/executor/resource_executor.go (2)

70-114: K8s log-context enrichment (kind/name/namespace/observed_generation) is a good improvement.

Also applies to: 140-143


156-175: Failure/success result contexts (k8s_result + error field) are consistent and useful for filtering.

internal/executor/executor.go (2)

55-59: Span lifecycle handling is correct (ctx enrichment + deferred End).
ctx, span := ...; defer span.End() is the right shape; the only follow-up is the tracer naming/attribute suggestion mentioned elsewhere.


61-70: Good: error is attached as a structured field (and wrapped error keeps context).
Logging via WithErrorField(ctx, parseErr) preserves the higher-level failure message in the error field.

pkg/logger/context.go (2)

3-8: Imports are coherent; OTEL dependency usage is minimal and isolated.
No issues with the added strings and otel/trace usage.


170-208: WithOTelTraceContext(): correct extraction pattern; safe no-op when span context is invalid.
One check: ensure you call this after span creation in each relevant flow (you did in Executor), and confirm your logger implementation always includes GetLogFields(ctx) so these IDs actually hit the output.

@xueli181114
Copy link
Contributor Author

trace context from CloudEvents anywhere

Right, that part is missing.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
internal/hyperfleet_api/client.go (1)

328-328: Consider using resolvedURL for consistency.

The log message at line 328 uses req.URL (the original URL) rather than resolvedURL (the resolved URL with base URL prepended). For consistency with the span attributes and actual HTTP request, consider logging the resolved URL.

📝 Suggested change for consistency
-	c.log.Debugf(ctx, "HyperFleet API request: %s %s", req.Method, req.URL)
+	c.log.Debugf(ctx, "HyperFleet API request: %s %s", req.Method, resolvedURL)
internal/executor/executor.go (1)

186-194: Consider deterministic error ordering.

The map iteration order for result.Errors is not guaranteed in Go, so the combined error message may have different phase orderings across runs. For production debugging, this is typically fine, but if consistent log output is desired for comparison, consider sorting by phase.

♻️ Optional: deterministic error ordering
+	import "sort"
...
 	// Combine all errors into a single error for logging
 	var errMsgs []string
+	var phases []ExecutionPhase
 	for phase, err := range result.Errors {
+		phases = append(phases, phase)
+	}
+	sort.Slice(phases, func(i, j int) bool { return phases[i] < phases[j] })
+	for _, phase := range phases {
+		err := result.Errors[phase]
 		errMsgs = append(errMsgs, fmt.Sprintf("%s: %v", phase, err))
 	}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d704281 and 3a47a63.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (17)
  • cmd/adapter/main.go
  • go.mod
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
  • internal/executor/executor.go
  • internal/executor/post_action_executor.go
  • internal/executor/precondition_executor.go
  • internal/executor/resource_executor.go
  • internal/executor/utils.go
  • internal/hyperfleet_api/client.go
  • pkg/logger/context.go
  • pkg/logger/logger.go
  • pkg/logger/logger_test.go
  • pkg/logger/stack_trace.go
  • pkg/logger/with_error_field_test.go
  • pkg/otel/propagation.go
  • pkg/otel/tracer.go
💤 Files with no reviewable changes (2)
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/otel/tracer.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • go.mod
  • pkg/logger/with_error_field_test.go
  • internal/executor/precondition_executor.go
  • internal/executor/utils.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.
📚 Learning: 2025-12-11T02:06:33.975Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 17
File: internal/config_loader/validator_schema.go:126-133
Timestamp: 2025-12-11T02:06:33.975Z
Learning: In the hyperfleet-adapter repository (Go), environment variable validation for required parameters (those with `env.*` source, `Required=true`, and no default) occurs at config load time in `internal/config_loader/validator_schema.go`. This is intentional design - environment variables must be set before deployment, and the validation ensures fail-fast behavior with clear error messages.

Applied to files:

  • internal/hyperfleet_api/client.go
📚 Learning: 2025-12-04T14:06:51.656Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.

Applied to files:

  • cmd/adapter/main.go
🧬 Code graph analysis (9)
internal/hyperfleet_api/client.go (1)
pkg/logger/context.go (1)
  • WithOTelTraceContext (191-208)
cmd/adapter/main.go (2)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
pkg/otel/tracer.go (2)
  • GetTraceSampleRatio (35-55)
  • InitTracer (66-100)
pkg/logger/logger.go (1)
pkg/logger/context.go (3)
  • ComponentKey (15-15)
  • VersionKey (16-16)
  • HostnameKey (17-17)
internal/executor/post_action_executor.go (1)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
pkg/logger/logger_test.go (1)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
internal/executor/executor.go (4)
pkg/logger/context.go (4)
  • WithErrorField (156-168)
  • WithResourceType (107-109)
  • WithDynamicResourceID (81-84)
  • WithOTelTraceContext (191-208)
internal/executor/types.go (1)
  • Executor (69-75)
internal/config_loader/types.go (2)
  • AdapterConfig (59-64)
  • Metadata (67-71)
pkg/otel/propagation.go (1)
  • ExtractTraceContextFromCloudEvent (30-57)
pkg/logger/stack_trace.go (4)
pkg/errors/network_error.go (1)
  • IsNetworkError (24-75)
pkg/errors/api_error.go (1)
  • IsAPIError (144-150)
pkg/errors/k8s_error.go (3)
  • K8sResourceKeyNotFoundError (61-66)
  • K8sInvalidPathError (84-88)
  • K8sResourceDataError (105-112)
pkg/logger/context.go (2)
  • WithLogField (57-64)
  • StackTraceKey (21-21)
pkg/logger/context.go (1)
pkg/logger/stack_trace.go (1)
  • CaptureStackTrace (115-134)
internal/executor/resource_executor.go (3)
pkg/logger/context.go (6)
  • WithK8sKind (117-119)
  • WithK8sName (122-124)
  • WithK8sNamespace (127-129)
  • WithObservedGeneration (142-144)
  • WithK8sResult (132-134)
  • WithErrorField (156-168)
internal/k8s_client/client.go (1)
  • GetGenerationAnnotation (381-398)
internal/executor/types.go (2)
  • NewExecutorError (375-382)
  • PhaseResources (25-25)
🔇 Additional comments (23)
pkg/otel/propagation.go (1)

30-57: LGTM! Clean trace context extraction implementation.

The function correctly extracts W3C trace context from CloudEvent extensions with appropriate nil checks and early returns. The implementation aligns with the OpenTelemetry integration introduced in this PR and addresses the trace context propagation gap noted in the PR comments.

cmd/adapter/main.go (4)

169-185: LGTM! Proper OpenTelemetry initialization with graceful shutdown.

The tracer is initialized at the right point in the startup sequence (after config load, before client creation) and includes a proper shutdown handler with a reasonable 5-second timeout.


152-154: LGTM! Consistent error context enrichment throughout.

The WithErrorField pattern is applied consistently across all error handling paths, providing structured error logging with automatic stack trace capture for internal errors. This aligns well with the PR's objective to enhance structured logging fields.

Also applies to: 191-193, 201-203, 215-217, 245-248, 254-257, 269-271, 279-281, 295-296, 314-315, 333-334, 339-341


288-292: LGTM! Excellent documentation of error context limitations.

The comment clearly explains that stack traces captured in this error handler reflect the handler's location, not the error's origin in the broker library. This transparency is valuable for future debugging and sets appropriate expectations.


267-271: Broker library signature change is correctly implemented.

The broker.NewSubscriber call at line 267 correctly includes the logger parameter as the first argument, compatible with broker v1.0.1. Only one usage exists in the codebase and it is consistent.

pkg/logger/logger_test.go (1)

270-280: LGTM! Test updated to use context-based error handling.

The test has been correctly updated to use WithErrorField(ctx, err) instead of the removed WithError method, maintaining the same test coverage while aligning with the new error handling pattern.

internal/executor/post_action_executor.go (1)

40-41: LGTM! Consistent error context enrichment in executor.

The WithErrorField pattern is properly applied to both payload build failures and post-action execution failures, maintaining the existing error propagation logic while enhancing the logging context.

Also applies to: 61-62

internal/hyperfleet_api/client.go (1)

271-285: LGTM! Well-structured OpenTelemetry instrumentation for HTTP requests.

The implementation correctly:

  • Creates a child span with the same trace_id and a new span_id
  • Uses method-only span naming to avoid high cardinality
  • Includes URL as a span attribute for debugging
  • Enriches the logger context with OTEL trace information
  • Injects trace context into outgoing HTTP headers via W3C Trace Context format

The span is properly closed with defer and the trace context will propagate to downstream services.

Also applies to: 323-325

pkg/logger/logger.go (1)

146-151: LGTM!

Good refactor to use the defined context key constants instead of string literals. This ensures consistency between base logger fields and context-based fields throughout the codebase.

pkg/logger/stack_trace.go (3)

18-49: Well-designed extensible skip list.

The blocklist approach using a slice of checker functions is clean and extensible. The coverage of context errors, network errors, API errors, and K8s errors correctly identifies high-frequency operational errors that don't warrant stack traces.


51-65: Verify intent: ServerError included in skip list.

apiErr.IsServerError() is included, which means 5xx server errors won't get stack traces. This is likely intentional since server errors are typically transient upstream issues rather than local bugs. Just confirming this aligns with the team's debugging expectations.


110-134: LGTM!

The stack trace capture implementation is correct. The skip calculation (skip+2) properly accounts for runtime.Callers and CaptureStackTrace itself. The frame limit of 32 is reasonable, and the output format is clear for debugging.

internal/executor/resource_executor.go (4)

76-81: LGTM!

Good practice to enrich the logging context with K8s resource metadata (kind, name, namespace) early in the execution flow. This ensures all subsequent logs in executeResource include resource identity for easier correlation.


107-114: LGTM!

Clean refactor to extract manifest generation once and add it to the logging context early. The conditional manifestGen > 0 correctly handles the case where the annotation is missing or invalid.


155-175: LGTM!

The error handling pattern is well-designed: scoped contexts (errCtx, successCtx) are created for specific log statements without polluting the main execution context. This correctly uses WithErrorField and WithK8sResult to add structured fields for log filtering and correlation.


363-385: LGTM!

Appropriate log levels: Debug for expected polling states, Warn for context cancellation, and Error with structured error field for unexpected issues. The logging changes appropriately reduce verbosity while maintaining observability.

internal/executor/executor.go (3)

211-226: LGTM!

Well-documented method that correctly initializes the OTel span and enriches the logging context with trace information. Using e.config.AdapterConfig.Metadata.Name as the tracer component name provides good trace granularity per adapter.


65-66: Consistent error handling across phases.

The uniform pattern of errCtx := logger.WithErrorField(ctx, err) across all execution phases (ParamExtraction, Preconditions, Resources, PostActions) ensures consistent structured error logging with conditional stack traces.

Also applies to: 121-122, 150-151, 174-175


239-251: LGTM!

Good integration of W3C trace context extraction from CloudEvent extensions. The ExtractTraceContextFromCloudEvent call enables distributed tracing correlation when upstream services include traceparent/tracestate in CloudEvents. The event ID remains in context via WithEventID, so removing it from the log message reduces redundancy.

pkg/logger/context.go (4)

13-46: LGTM!

Well-organized context key constants grouped by purpose (required fields, error fields, correlation, resource, K8s, adapter-specific, dynamic). The naming convention is consistent and the comments clearly document each group's purpose.


116-134: LGTM!

Clean, consistent K8s field setters following the established pattern. These enable the resource metadata logging seen in resource_executor.go.


151-168: LGTM!

Well-documented WithErrorField implementation with proper nil check and conditional stack trace capture. The CaptureStackTrace(1) correctly skips WithErrorField itself to start the trace from the caller.


170-208: LGTM!

Excellent documentation explaining context immutability and parent/child span behavior. The implementation correctly validates the span context before extracting trace/span IDs, and the example in the doc comments clearly illustrates the expected log output format.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @cmd/adapter/main.go:
- Around line 267-272: The call to broker.NewSubscriber is using the wrong
signature: replace the current call that passes the logger with a call that
passes only the subscriptionID (i.e., call broker.NewSubscriber(subscriptionID))
and remove any logger argument; keep the existing error handling (err, errCtx,
logger.WithErrorField, log.Errorf and the returned fmt.Errorf) intact so that
upon failure you still annotate and log the error. If the subscriber needs
logging, consult broker docs and wire it via the broker config or a separate
setter rather than passing log into NewSubscriber.
🧹 Nitpick comments (4)
internal/hyperfleet_api/client.go (1)

274-282: Use a constant for the tracer name to improve maintainability.

The attribute names http.request.method and url.full correctly follow OpenTelemetry HTTP semantic conventions. However, the tracer name "http-client" is hardcoded at line 277. Consider defining a package-level constant for consistency and maintainability, as other instrumentation in the codebase (e.g., executor.go) uses dynamic tracer names.

📝 Suggested refactor: Use constant for tracer name
+// Tracer name for HTTP client instrumentation
+const tracerName = "http-client"
+
 // doRequest performs a single HTTP request without retry logic
 func (c *httpClient) doRequest(ctx context.Context, req *Request) (*Response, error) {
 	// Resolve URL (prepend base URL if relative)
 	resolvedURL := c.resolveURL(req.URL)
 
 	// Create child span for this HTTP request (new span_id, same trace_id as parent event)
 	// Use method-only span name to avoid high cardinality; URL is in attributes
 	spanName := fmt.Sprintf("HTTP %s", req.Method)
-	ctx, span := otel.Tracer("http-client").Start(ctx, spanName)
+	ctx, span := otel.Tracer(tracerName).Start(ctx, spanName)
 	span.SetAttributes(
 		attribute.String("http.request.method", req.Method),
 		attribute.String("url.full", resolvedURL),
 	)
 	defer span.End()
pkg/logger/with_error_field_test.go (1)

335-361: Consider simplifying the skip parameter test logic.

The test handles edge cases where maxFrames cap is reached, but the logic is complex. The comment explains the reasoning well, though a simpler assertion might be easier to maintain.

internal/executor/executor.go (1)

186-194: Map iteration order causes non-deterministic error message ordering.

Iterating over result.Errors map produces an unpredictable order of error messages in the combined string. This doesn't affect functionality but may complicate log analysis and testing.

♻️ Suggested fix for deterministic error ordering
+	import "sort"
+
 	// Combine all errors into a single error for logging
 	var errMsgs []string
+	var phases []ExecutionPhase
+	for phase := range result.Errors {
+		phases = append(phases, phase)
+	}
+	sort.Slice(phases, func(i, j int) bool {
+		return string(phases[i]) < string(phases[j])
+	})
-	for phase, err := range result.Errors {
+	for _, phase := range phases {
+		err := result.Errors[phase]
 		errMsgs = append(errMsgs, fmt.Sprintf("%s: %v", phase, err))
 	}
cmd/adapter/main.go (1)

339-342: Timeout error is constructed inline - consider using a constant or variable for the timeout value.

The timeout of 30 seconds is hardcoded in both the context creation (line 321) and the error message (line 339). Consider extracting this to a constant for consistency.

♻️ Suggested improvement
+const subscriberCloseTimeout = 30 * time.Second
+
 // Close subscriber gracefully with timeout
 log.Info(ctx, "Closing broker subscriber...")
-shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
+shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), subscriberCloseTimeout)
 defer shutdownCancel()

 // ... later in the select ...

 case <-shutdownCtx.Done():
-	err := fmt.Errorf("subscriber close timed out after 30 seconds")
+	err := fmt.Errorf("subscriber close timed out after %v", subscriberCloseTimeout)
 	errCtx := logger.WithErrorField(ctx, err)
 	log.Error(errCtx, "Subscriber close timed out")
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a47a63 and 0772444.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (17)
  • cmd/adapter/main.go
  • go.mod
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
  • internal/executor/executor.go
  • internal/executor/post_action_executor.go
  • internal/executor/precondition_executor.go
  • internal/executor/resource_executor.go
  • internal/executor/utils.go
  • internal/hyperfleet_api/client.go
  • pkg/logger/context.go
  • pkg/logger/logger.go
  • pkg/logger/logger_test.go
  • pkg/logger/stack_trace.go
  • pkg/logger/with_error_field_test.go
  • pkg/otel/propagation.go
  • pkg/otel/tracer.go
💤 Files with no reviewable changes (2)
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/otel/propagation.go
  • internal/executor/post_action_executor.go
  • internal/executor/precondition_executor.go
  • internal/executor/resource_executor.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.
📚 Learning: 2025-12-11T02:06:33.975Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 17
File: internal/config_loader/validator_schema.go:126-133
Timestamp: 2025-12-11T02:06:33.975Z
Learning: In the hyperfleet-adapter repository (Go), environment variable validation for required parameters (those with `env.*` source, `Required=true`, and no default) occurs at config load time in `internal/config_loader/validator_schema.go`. This is intentional design - environment variables must be set before deployment, and the validation ensures fail-fast behavior with clear error messages.

Applied to files:

  • internal/hyperfleet_api/client.go
📚 Learning: 2025-12-04T14:06:51.656Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.

Applied to files:

  • go.mod
  • cmd/adapter/main.go
🧬 Code graph analysis (9)
internal/executor/utils.go (1)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
pkg/otel/tracer.go (1)
pkg/logger/logger.go (1)
  • Logger (14-40)
internal/hyperfleet_api/client.go (2)
pkg/logger/context.go (1)
  • WithOTelTraceContext (191-208)
internal/config_loader/types.go (1)
  • Header (172-175)
pkg/logger/logger_test.go (1)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
pkg/logger/stack_trace.go (4)
pkg/errors/network_error.go (1)
  • IsNetworkError (24-75)
pkg/errors/api_error.go (1)
  • IsAPIError (144-150)
pkg/errors/k8s_error.go (3)
  • K8sResourceKeyNotFoundError (61-66)
  • K8sInvalidPathError (84-88)
  • K8sResourceDataError (105-112)
pkg/logger/context.go (2)
  • WithLogField (57-64)
  • StackTraceKey (21-21)
pkg/logger/with_error_field_test.go (4)
pkg/logger/context.go (2)
  • WithErrorField (156-168)
  • GetLogFields (215-228)
pkg/errors/k8s_error.go (3)
  • NewK8sResourceKeyNotFoundError (74-81)
  • NewK8sInvalidPathError (96-102)
  • NewK8sResourceDataError (128-136)
pkg/errors/api_error.go (1)
  • NewAPIError (123-134)
pkg/logger/stack_trace.go (1)
  • CaptureStackTrace (115-134)
internal/executor/executor.go (4)
pkg/logger/context.go (4)
  • WithErrorField (156-168)
  • WithResourceType (107-109)
  • WithDynamicResourceID (81-84)
  • WithOTelTraceContext (191-208)
internal/executor/types.go (1)
  • Executor (69-75)
internal/config_loader/types.go (2)
  • AdapterConfig (59-64)
  • Metadata (67-71)
pkg/otel/propagation.go (1)
  • ExtractTraceContextFromCloudEvent (30-57)
pkg/logger/logger.go (1)
pkg/logger/context.go (3)
  • ComponentKey (15-15)
  • VersionKey (16-16)
  • HostnameKey (17-17)
pkg/logger/context.go (1)
pkg/logger/stack_trace.go (1)
  • CaptureStackTrace (115-134)
🔇 Additional comments (31)
pkg/logger/context.go (5)

14-21: LGTM!

The error field constants are well-documented and align with the logging specification mentioned in the PR objectives.


28-37: LGTM!

The Kubernetes resource field constants follow consistent naming conventions and support the enhanced structured logging requirements in the PR objectives.


116-134: LGTM!

The Kubernetes context setter functions are well-implemented and follow the established pattern for adding structured fields to the logging context.


170-208: Excellent OpenTelemetry integration with comprehensive documentation!

The implementation correctly extracts trace and span IDs from the OpenTelemetry context with proper validation. The detailed docstring with examples effectively explains context immutability and the behavior across parent/child spans.


151-168: No issues found.

The implementation correctly handles nil errors and captures stack traces selectively. Both helper functions (shouldCaptureStackTrace and withStackTraceField) are properly defined in pkg/logger/stack_trace.go.

internal/executor/utils.go (2)

47-48: LGTM!

The refactored error handling properly uses WithErrorField to attach error details to the context, consistent with the new structured logging pattern introduced in this PR.


140-149: LGTM!

The error handling correctly distinguishes between network failures and HTTP errors, attaching comprehensive error details to the logging context. This aligns with the structured logging enhancements in the PR.

pkg/logger/logger_test.go (1)

270-280: LGTM!

The test correctly demonstrates the new API pattern where errors are attached to the context via WithErrorField(ctx, err) rather than the logger instance. The renamed subtest clearly reflects this change.

internal/hyperfleet_api/client.go (3)

271-272: LGTM!

Moving URL resolution before span creation allows the resolved URL to be captured in trace attributes, which is more useful for debugging than the template URL.


285-285: LGTM!

The logger context is correctly updated with the new span's trace context, ensuring that all logs generated during the request execution are properly correlated with the HTTP span.


323-325: Trace propagation correctly implemented for both HTTP requests and CloudEvents.

The code at lines 323-325 properly injects OpenTelemetry trace context into outgoing HTTP headers using the standard W3C Trace Context format. Additionally, trace context extraction from incoming CloudEvents is already implemented via ExtractTraceContextFromCloudEvent() in pkg/otel/propagation.go and is actively used in the executor to maintain trace continuity across event-driven systems.

Likely an incorrect or invalid review comment.

go.mod (1)

15-17: No action required — OpenTelemetry version configuration is safe.

The version difference between go.opentelemetry.io/otel/sdk (v1.36.0) and go.opentelemetry.io/otel / go.opentelemetry.io/otel/trace (v1.38.0) does not present a compatibility issue. OpenTelemetry follows semantic versioning where minor version releases (1.36 → 1.38) are backwards-compatible. The SDK at v1.36.0 is fully compatible with v1.38.0 of the core library and trace components.

pkg/logger/logger.go (1)

146-151: LGTM! Consistent key usage across the logger package.

Using string(ComponentKey), string(VersionKey), and string(HostnameKey) ensures the base logger fields align with the context keys defined in context.go, preventing key mismatches and improving maintainability.

pkg/logger/with_error_field_test.go (4)

20-80: Well-structured tests for WithErrorField core behavior.

Good coverage of nil handling, error message capture, stack trace behavior for unexpected errors, and preservation of existing context fields.


86-185: Thorough coverage of error classification for stack trace decisions.

The tests properly verify that context errors (Canceled, DeadlineExceeded, EOF) and K8s API errors (NotFound, Conflict, Forbidden, etc.) are correctly identified as expected errors that should skip stack trace capture.


187-276: Good coverage of custom error types and API errors.

Tests correctly verify that K8sResourceKeyNotFoundError, K8sInvalidPathError, K8sResourceDataError, and various HTTP status code API errors are treated as expected operational errors.


385-472: Solid integration tests verifying end-to-end behavior.

The integration tests effectively verify the complete flow from WithErrorField through stack trace capture decisions and field population in the context.

pkg/logger/stack_trace.go (4)

18-49: Well-designed extensible blocklist pattern.

The skipStackTraceCheckers slice allows easy addition of new error types without modifying the core logic. Good use of function predicates and clear categorization with comments.


51-65: Comprehensive API error classification.

The isExpectedAPIError function covers the common HTTP error scenarios (4xx and 5xx). Note that IsServerError() includes 5xx errors, which are expected operational errors in distributed systems where remote services may be temporarily unavailable.


67-79: Correct use of errors.As for custom error types.

The function properly uses errors.As to check for wrapped errors, ensuring the check works even when errors are wrapped with additional context.


110-133: Stack trace capture implementation looks correct.

The skip+2 offset correctly accounts for runtime.Callers and CaptureStackTrace itself. The maxFrames=32 limit prevents excessive memory allocation while capturing sufficient context for debugging.

pkg/otel/tracer.go (2)

28-55: Robust environment variable parsing with good defaults and validation.

The function handles missing values, parse errors, and out-of-range values gracefully with informative log messages. The default of 10% is reasonable for production systems.


66-100: No changes needed. The TracerProvider is intentionally configured without a SpanExporter, which is clearly documented in the function's docstring and inline comments. The design is explicitly for generating trace_id/span_id for log correlation and HTTP request propagation via W3C Trace Context headers, not for span export to a backend. The intent and rationale are already well-documented in the code.

internal/executor/executor.go (5)

57-59: Good integration of OpenTelemetry tracing into execution flow.

The span is properly created at the start and deferred for cleanup. Using startTracedExecution encapsulates the tracing setup and log context enrichment cleanly.


64-72: Consistent use of WithErrorField for error logging.

The error context pattern is applied correctly, separating the error field enrichment from the log message itself.


78-84: Clear dynamic resource ID logging for event correlation.

The logic correctly handles both owned resources (e.g., NodePool with Cluster owner) and standalone resources, providing appropriate log field enrichment for each case.


211-226: Well-documented startTracedExecution helper.

The method documentation clearly explains the purpose of the span creation and log context enrichment. Using the adapter config name as the tracer name allows identifying traces per adapter instance.


239-251: CloudEvent trace context extraction enables distributed tracing.

The integration with ExtractTraceContextFromCloudEvent allows trace context from upstream services (e.g., Sentinel) to be propagated. Per the PR comments, trace context propagation from CloudEvents is noted as incomplete - this extraction is a good start.

cmd/adapter/main.go (3)

169-185: Proper OpenTelemetry lifecycle management.

The tracer is initialized after config loading (to use adapter name) and shutdown is deferred with a 5-second timeout. The warning log for shutdown failure is appropriate as it shouldn't block application exit.


152-154: Consistent error context pattern applied throughout startup.

Using WithErrorField before logging errors provides structured error fields and conditional stack traces, improving log searchability and debugging.


288-305: Good documentation of error context limitation in the monitoring goroutine.

The comment at lines 288-292 correctly acknowledges that stack traces captured here reflect the goroutine's location, not the error's origin in the broker library. This is a pragmatic acknowledgment of the limitation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @cmd/adapter/main.go:
- Line 274: The call to broker.NewSubscriber is using the wrong argument order;
change the invocation that currently passes log first (subscriber, err :=
broker.NewSubscriber(log, subscriptionID)) to pass subscriptionID first, e.g.,
call broker.NewSubscriber(subscriptionID) or
broker.NewSubscriber(subscriptionID, configMap) as appropriate; update the
variable subscriber and error handling remain the same and remove the misplaced
log argument so it matches the NewSubscriber(subscriptionID string, configMap
...map[string]string) signature.
🧹 Nitpick comments (1)
internal/executor/resource_executor.go (1)

165-174: Consider defining constants for K8s result values.

The strings "SUCCESS" and "FAILED" are used here and potentially elsewhere. Defining constants (e.g., K8sResultSuccess, K8sResultFailed) would improve consistency and prevent typos.

♻️ Suggested constants (optional)

Add to pkg/logger/context.go or a constants file:

const (
    K8sResultSuccess = "SUCCESS"
    K8sResultFailed  = "FAILED"
)

Then use:

-errCtx := logger.WithK8sResult(ctx, "FAILED")
+errCtx := logger.WithK8sResult(ctx, logger.K8sResultFailed)
...
-successCtx := logger.WithK8sResult(ctx, "SUCCESS")
+successCtx := logger.WithK8sResult(ctx, logger.K8sResultSuccess)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0772444 and c90912e.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (18)
  • cmd/adapter/main.go
  • go.mod
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
  • internal/executor/executor.go
  • internal/executor/post_action_executor.go
  • internal/executor/precondition_executor.go
  • internal/executor/resource_executor.go
  • internal/executor/utils.go
  • internal/hyperfleet_api/client.go
  • pkg/logger/context.go
  • pkg/logger/logger.go
  • pkg/logger/logger_test.go
  • pkg/logger/stack_trace.go
  • pkg/logger/with_error_field_test.go
  • pkg/otel/propagation.go
  • pkg/otel/propagation_test.go
  • pkg/otel/tracer.go
💤 Files with no reviewable changes (2)
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/otel/tracer.go
  • pkg/logger/logger_test.go
  • internal/executor/post_action_executor.go
  • internal/hyperfleet_api/client.go
  • internal/executor/precondition_executor.go
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.
📚 Learning: 2025-12-04T14:06:51.656Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.

Applied to files:

  • pkg/logger/with_error_field_test.go
  • go.mod
  • cmd/adapter/main.go
🧬 Code graph analysis (8)
pkg/logger/stack_trace.go (4)
pkg/errors/network_error.go (1)
  • IsNetworkError (24-75)
pkg/errors/api_error.go (1)
  • IsAPIError (144-150)
pkg/errors/k8s_error.go (3)
  • K8sResourceKeyNotFoundError (61-66)
  • K8sInvalidPathError (84-88)
  • K8sResourceDataError (105-112)
pkg/logger/context.go (2)
  • WithLogField (57-64)
  • StackTraceKey (21-21)
pkg/otel/propagation_test.go (1)
pkg/otel/propagation.go (1)
  • ExtractTraceContextFromCloudEvent (30-57)
internal/executor/executor.go (4)
pkg/logger/context.go (4)
  • WithErrorField (156-168)
  • WithResourceType (107-109)
  • WithDynamicResourceID (81-84)
  • WithOTelTraceContext (191-208)
internal/executor/types.go (1)
  • Executor (69-75)
internal/config_loader/types.go (2)
  • AdapterConfig (59-64)
  • Metadata (67-71)
pkg/otel/propagation.go (1)
  • ExtractTraceContextFromCloudEvent (30-57)
internal/executor/resource_executor.go (3)
pkg/logger/context.go (6)
  • WithK8sKind (117-119)
  • WithK8sName (122-124)
  • WithK8sNamespace (127-129)
  • WithObservedGeneration (142-144)
  • WithK8sResult (132-134)
  • WithErrorField (156-168)
internal/k8s_client/client.go (1)
  • GetGenerationAnnotation (381-398)
internal/executor/types.go (2)
  • NewExecutorError (375-382)
  • PhaseResources (25-25)
internal/executor/utils.go (1)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
pkg/logger/logger.go (1)
pkg/logger/context.go (3)
  • ComponentKey (15-15)
  • VersionKey (16-16)
  • HostnameKey (17-17)
pkg/logger/context.go (1)
pkg/logger/stack_trace.go (1)
  • CaptureStackTrace (115-134)
cmd/adapter/main.go (1)
pkg/otel/tracer.go (2)
  • GetTraceSampleRatio (35-55)
  • InitTracer (66-100)
🔇 Additional comments (31)
internal/executor/utils.go (1)

47-48: LGTM! Clean structured error logging.

The change to use WithErrorField properly separates the error details into structured log fields, making filtering and correlation easier.

pkg/otel/propagation.go (1)

30-57: LGTM! Well-implemented trace context extraction.

The implementation correctly follows the W3C Trace Context specification with proper nil checks and graceful handling of missing extensions. The documentation with example CloudEvent format is helpful.

pkg/otel/propagation_test.go (1)

18-241: LGTM! Comprehensive test coverage.

The test suite thoroughly covers all edge cases for trace context extraction:

  • Nil/missing/empty inputs return unchanged context
  • Valid traceparent extracts trace ID, span ID, and sampling flag
  • Tracestate preservation when present
  • Graceful handling of invalid formats
  • Type safety checks for non-string values
  • Unsampled trace context handling

The table-driven test for invalid formats is particularly well-structured.

pkg/logger/stack_trace.go (5)

21-49: LGTM! Smart performance optimization.

The skip list comprehensively covers expected operational errors (context cancellation, network issues, HTTP status codes, K8s API errors). This prevents unnecessary stack trace capture during high-frequency error scenarios like timeouts or rate limiting, which is crucial for system performance.


52-65: LGTM! Comprehensive API error classification.

The function correctly identifies expected API errors that shouldn't trigger stack traces (4xx client errors, 5xx server errors, rate limiting, timeouts). This aligns well with the operational error classification strategy.


68-79: LGTM! Proper error type checking.

The function correctly uses errors.As to check for expected Kubernetes resource data errors. This is the idiomatic way to handle wrapped errors in Go.


85-99: LGTM! Clear decision logic.

The function provides a clean decision point for stack trace capture. By iterating through all checkers and defaulting to true for unmatched errors, it ensures that unexpected errors (which likely indicate bugs) get the detailed stack trace they need for investigation.


115-134: LGTM! Correct stack trace capture implementation.

The function correctly implements stack trace capture:

  • The skip+2 offset properly accounts for runtime.Callers and CaptureStackTrace itself
  • The maxFrames constant prevents unbounded memory growth
  • The format "file:line function" provides clear, actionable debugging information
  • Proper use of runtime.CallersFrames for iteration
go.mod (2)

10-10: Broker library version v1.0.1 and API changes are correctly implemented.

The broker.NewSubscriber call in cmd/adapter/main.go:274 has been properly updated to include the required logger parameter as the first argument. All call sites have been migrated successfully.


15-17: OpenTelemetry v1.38.0 versions are confirmed safe and legitimate.

All three OpenTelemetry packages at v1.38.0 exist on the Go module proxy (released 2025-08-29). The release notes show standard maintenance updates and new features with no known security vulnerabilities. Note that v1.39.0 is available as the latest patch; consider upgrading if you need recent fixes to the Prometheus exporter or cardinality limit features mentioned in v1.38.0's changelog.

pkg/logger/logger.go (1)

146-151: LGTM! Consistent use of context key constants.

Using string(ComponentKey), string(VersionKey), and string(HostnameKey) ensures consistency between the base logger fields and the context-based log fields defined in context.go. This centralizes key definitions and reduces the risk of typos.

internal/executor/resource_executor.go (4)

76-81: LGTM! Kubernetes context enrichment for improved log correlation.

Adding WithK8sKind, WithK8sName, and WithK8sNamespace to the context early ensures all subsequent logs include consistent resource identification fields, enabling effective filtering and correlation in log aggregation systems.


107-113: LGTM! Efficient generation extraction and conditional context enrichment.

Extracting manifestGen once for both comparison and logging avoids redundant calls. Conditionally adding observed_generation only when > 0 ensures the log field is meaningful.


339-351: LGTM! Appropriate log level reduction for recreation operations.

Changing recreation logs to Debugf reduces verbosity at Info level while keeping the details available for debugging when needed.


366-382: LGTM! Well-structured error handling in deletion wait loop.

The log levels are appropriate:

  • Warnf for context cancellation (expected but noteworthy)
  • Debugf for successful deletion confirmation
  • Errorf with WithErrorField for unexpected errors

The contextual error logging pattern is consistent with the PR's objective.

pkg/logger/with_error_field_test.go (4)

1-80: LGTM! Comprehensive unit tests for WithErrorField.

The tests cover essential scenarios: nil error handling, error message preservation, stack trace capture for unexpected errors, and preservation of existing context fields. Good foundation for the new error handling approach.


335-361: LGTM! Skip parameter test handles edge cases well.

The test correctly accounts for the maxFrames cap edge case where skip=1 might not reduce frame count but shifts the stack. The comparison logic validates that frames are properly shifted.


86-309: LGTM! Thorough coverage of stack trace capture decisions.

The table-driven tests comprehensively cover:

  • Context errors (context.Canceled, context.DeadlineExceeded)
  • K8s API errors (NotFound, Conflict, Forbidden, etc.)
  • K8s resource data errors (custom application errors)
  • API errors (various HTTP status codes)
  • Unexpected errors that should trigger stack trace capture

This ensures the selective stack trace mechanism works correctly for operational efficiency.


385-472: LGTM! Integration tests validate end-to-end behavior.

The integration tests verify the complete flow of WithErrorField including stack trace presence/absence based on error type. This catches any regressions in the integration between shouldCaptureStackTrace and WithErrorField.

internal/executor/executor.go (4)

57-59: LGTM! OpenTelemetry span lifecycle management.

Starting the span at the beginning of Execute and deferring span.End() ensures proper tracing coverage for the entire execution. The helper method startTracedExecution cleanly encapsulates the setup.


211-226: LGTM! Well-documented tracing helper.

The startTracedExecution method cleanly encapsulates:

  1. Span creation with component-specific tracer name
  2. Log context enrichment with trace/span IDs

The documentation clearly explains the method's purpose and the automatic propagation behavior.


239-251: Trace context extraction enables distributed tracing.

The ExtractTraceContextFromCloudEvent call before Execute ensures that if upstream services include traceparent/tracestate in CloudEvent extensions, the trace will be properly continued. However, per the PR comments, this may not be complete if the upstream (Sentinel) is not yet propagating trace context.

The PR comments indicate trace context propagation from CloudEvents is "missing" (not yet implemented in upstream). Verify with the team whether:

  1. The upstream (Sentinel) needs to be updated to include traceparent in CloudEvent extensions
  2. If this extraction logic is sufficient once Sentinel is updated

186-194: LGTM! Consolidated error logging for failed executions.

Combining all phase errors into a single log entry provides a complete picture of failures. Note that Go map iteration order is non-deterministic, so the order of phases in the error message may vary between runs, but this is acceptable for logging purposes.

cmd/adapter/main.go (4)

45-49: LGTM! OpenTelemetry shutdown timeout constant.

5 seconds is a reasonable timeout for graceful TracerProvider shutdown, allowing pending spans to be exported while not blocking shutdown indefinitely.


175-192: LGTM! OpenTelemetry initialization with graceful shutdown.

The initialization is correctly placed after configuration load (to get adapter name) and before component creation (so traces are available). The deferred shutdown with timeout ensures clean exit.


295-312: LGTM! Well-documented error context limitations.

The comment at lines 295-299 correctly notes that stack traces captured here reflect the goroutine's call stack, not the error's origin in the broker library. This is a known limitation and documenting it helps future maintainers understand the tracing behavior.


158-159: Consistent error context pattern across startup paths.

All error handling paths now use WithErrorField to attach structured error information to the log context. This enables consistent error field formatting and optional stack trace capture based on error type.

pkg/logger/context.go (4)

13-46: LGTM! Well-organized context key constants.

The constants are logically grouped:

  • Required fields (component, version, hostname)
  • Error fields (error, stack_trace)
  • Correlation fields (trace_id, span_id, event_id)
  • Resource fields (cluster_id, resource_type, resource_id)
  • K8s manifest fields (k8s_kind, k8s_name, k8s_namespace, k8s_result)
  • Adapter-specific fields

This organization improves readability and maintainability.


116-134: LGTM! Kubernetes context helper functions.

The WithK8sKind, WithK8sName, WithK8sNamespace, and WithK8sResult helpers provide a clean API for enriching log context with Kubernetes resource information, consistent with the existing pattern.


151-168: LGTM! Well-documented WithErrorField with selective stack trace capture.

The implementation correctly:

  1. Returns context unchanged for nil errors
  2. Adds error message to context
  3. Captures stack traces only for unexpected/internal errors (via shouldCaptureStackTrace)

The documentation clearly explains the performance rationale for selective stack trace capture.


170-208: Excellent documentation for WithOTelTraceContext.

The documentation thoroughly explains:

  1. The purpose (log correlation with distributed traces)
  2. Immutability semantics (safe to call multiple times)
  3. Example flow showing parent/child span behavior
  4. Expected log output format

This level of documentation helps developers understand the tracing integration correctly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @go.mod:
- Line 10: The project currently pins OpenTelemetry packages at v1.38.0; update
the otel-related module lines in go.mod for go.opentelemetry.io/otel,
go.opentelemetry.io/otel/sdk, go.opentelemetry.io/otel/trace, and
go.opentelemetry.io/otel/metric to v1.39.0, then run `go get`/`go mod tidy` to
resolve deps; also ensure the project is compiled with a Go toolchain >=1.24 and
audit any Prometheus exporter configuration (invalid label/value handling) to
accommodate the Prometheus behavior change in v1.39.0.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c90912e and d522234.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (18)
  • cmd/adapter/main.go
  • go.mod
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
  • internal/executor/executor.go
  • internal/executor/post_action_executor.go
  • internal/executor/precondition_executor.go
  • internal/executor/resource_executor.go
  • internal/executor/utils.go
  • internal/hyperfleet_api/client.go
  • pkg/logger/context.go
  • pkg/logger/logger.go
  • pkg/logger/logger_test.go
  • pkg/logger/stack_trace.go
  • pkg/logger/with_error_field_test.go
  • pkg/otel/propagation.go
  • pkg/otel/propagation_test.go
  • pkg/otel/tracer.go
💤 Files with no reviewable changes (2)
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/otel/tracer.go
  • internal/executor/precondition_executor.go
  • pkg/logger/logger.go
  • pkg/logger/with_error_field_test.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.
📚 Learning: 2025-12-11T02:06:33.975Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 17
File: internal/config_loader/validator_schema.go:126-133
Timestamp: 2025-12-11T02:06:33.975Z
Learning: In the hyperfleet-adapter repository (Go), environment variable validation for required parameters (those with `env.*` source, `Required=true`, and no default) occurs at config load time in `internal/config_loader/validator_schema.go`. This is intentional design - environment variables must be set before deployment, and the validation ensures fail-fast behavior with clear error messages.

Applied to files:

  • internal/hyperfleet_api/client.go
📚 Learning: 2025-12-04T14:06:51.656Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.

Applied to files:

  • cmd/adapter/main.go
  • go.mod
🧬 Code graph analysis (9)
pkg/logger/stack_trace.go (4)
pkg/errors/network_error.go (1)
  • IsNetworkError (24-75)
pkg/errors/api_error.go (1)
  • IsAPIError (144-150)
pkg/errors/k8s_error.go (3)
  • K8sResourceKeyNotFoundError (61-66)
  • K8sInvalidPathError (84-88)
  • K8sResourceDataError (105-112)
pkg/logger/context.go (2)
  • WithLogField (57-64)
  • StackTraceKey (21-21)
internal/hyperfleet_api/client.go (2)
pkg/logger/context.go (1)
  • WithOTelTraceContext (191-208)
internal/config_loader/types.go (1)
  • Header (172-175)
cmd/adapter/main.go (2)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
pkg/otel/tracer.go (2)
  • GetTraceSampleRatio (35-55)
  • InitTracer (66-100)
internal/executor/resource_executor.go (3)
pkg/logger/context.go (6)
  • WithK8sKind (117-119)
  • WithK8sName (122-124)
  • WithK8sNamespace (127-129)
  • WithObservedGeneration (142-144)
  • WithK8sResult (132-134)
  • WithErrorField (156-168)
internal/k8s_client/client.go (1)
  • GetGenerationAnnotation (381-398)
internal/executor/types.go (2)
  • NewExecutorError (375-382)
  • PhaseResources (25-25)
internal/executor/executor.go (4)
pkg/logger/context.go (4)
  • WithErrorField (156-168)
  • WithResourceType (107-109)
  • WithDynamicResourceID (81-84)
  • WithOTelTraceContext (191-208)
internal/executor/types.go (1)
  • Executor (69-75)
internal/config_loader/types.go (2)
  • AdapterConfig (59-64)
  • Metadata (67-71)
pkg/otel/propagation.go (1)
  • ExtractTraceContextFromCloudEvent (30-57)
pkg/otel/propagation_test.go (2)
pkg/otel/propagation.go (1)
  • ExtractTraceContextFromCloudEvent (30-57)
pkg/errors/error.go (1)
  • New (107-123)
internal/executor/utils.go (1)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
pkg/logger/context.go (1)
pkg/logger/stack_trace.go (1)
  • CaptureStackTrace (115-134)
internal/executor/post_action_executor.go (1)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
🔇 Additional comments (33)
pkg/otel/propagation_test.go (2)

13-16: LGTM: Test setup properly initializes global propagator.

The init() function correctly sets up the global propagator for tests, ensuring consistent behavior across all test cases.


18-241: LGTM: Comprehensive test coverage.

The test suite thoroughly covers all scenarios including edge cases:

  • Nil/missing/empty traceparent handling
  • Valid trace context extraction with correct trace ID, span ID, and sampling flags
  • Tracestate preservation
  • Invalid format handling (malformed, wrong version, invalid IDs)
  • Type safety (non-string traceparent)
  • Unsampled traces

This provides excellent confidence in the implementation's correctness and robustness.

internal/executor/utils.go (2)

47-48: LGTM: Consistent migration to structured error logging.

The change from inline error logging to WithErrorField improves structured logging by moving error details into log fields. This enables better filtering and correlation while maintaining all error information.


141-148: LGTM: Proper error handling for both network and HTTP failures.

The logic correctly distinguishes between network errors (err != nil) and HTTP-level failures (non-success status code), creating an appropriate error context for structured logging. The error details are preserved in the log fields via WithErrorField.

pkg/logger/logger_test.go (1)

270-280: LGTM: Test updated to match new error field API.

The test correctly validates that WithErrorField integrates properly with logger chaining, aligning with the migration from inline error logging to context-based error fields.

pkg/otel/propagation.go (1)

30-57: Implementation is correct; global propagator is properly initialized.

The trace context extraction logic correctly follows the W3C Trace Context specification. The nil checks and early returns are appropriate, and the use of otel.GetTextMapPropagator() is the standard approach.

The global OTEL propagator is initialized in pkg/otel/tracer.go within the InitTracer() function (line 97: otel.SetTextMapPropagator(propagation.TraceContext{})), which is called at application startup in cmd/adapter/main.go (line 179). No action needed.

internal/executor/post_action_executor.go (2)

40-41: LGTM! Error context enrichment is correctly applied.

The error context wrapping with WithErrorField follows the established pattern and enhances structured logging for build payload failures.


61-62: LGTM! Error context enrichment is correctly applied.

The error context wrapping with WithErrorField follows the established pattern and enhances structured logging for post-action failures.

internal/hyperfleet_api/client.go (2)

271-285: LGTM! OpenTelemetry tracing properly integrated.

The implementation correctly:

  • Resolves the URL once and reuses it for both tracing and the HTTP request
  • Creates a span with a method-only name to avoid high cardinality issues
  • Uses semantic conventions for HTTP attributes (http.request.method, url.full)
  • Updates logger context with the new span_id via WithOTelTraceContext
  • Manages span lifecycle with defer span.End()

323-325: LGTM! Trace context propagation correctly implemented.

The W3C Trace Context propagation via the traceparent header enables distributed tracing across service boundaries. This allows downstream services to correlate requests within the same trace.

pkg/logger/stack_trace.go (2)

21-49: LGTM! Comprehensive error categorization for selective stack trace capture.

The skip list appropriately excludes high-frequency operational errors (context cancellation, network errors, expected API errors) to avoid performance overhead during error storms while capturing traces for unexpected errors.


115-134: LGTM! Stack trace capture implementation is sound.

The implementation:

  • Uses a reasonable frame limit (32) to prevent excessive memory usage
  • Correctly skips internal frames (+2 for runtime.Callers and CaptureStackTrace itself)
  • Returns nil for empty stacks (graceful handling)
  • Formats frames consistently as "file:line function"
internal/executor/executor.go (6)

57-59: LGTM! OpenTelemetry span lifecycle properly managed.

The span creation at the start of execution with deferred cleanup ensures proper trace propagation throughout the execution phases.


65-66: LGTM! Consistent error context enrichment across execution phases.

The error wrapping pattern with WithErrorField is consistently applied across all execution phases (parameter extraction, preconditions, resources, post actions), enabling structured error logging with selective stack trace capture.

Also applies to: 121-122, 150-151, 174-175


76-84: LGTM! Dynamic resource ID logging enhances traceability.

The conditional logic correctly handles both owned resources (e.g., NodePool owned by Cluster) and standalone resources, setting appropriate context fields for filtering and correlation.


186-193: LGTM! Combined error logging improves failure diagnostics.

Aggregating all phase errors into a single error message provides better context for debugging multi-phase failures.


211-226: LGTM! Trace context setup is well-structured.

The startTracedExecution method properly creates an OTEL span using the adapter component name and enriches the logger context with trace IDs for correlation.


239-242: Inconsistency: CloudEvent trace context extraction is implemented but PR comments suggest it's missing.

The code clearly implements trace context extraction from CloudEvents at lines 239-242 using ExtractTraceContextFromCloudEvent. However, the PR comments summary states: "One comment thread notes that trace context propagation from CloudEvents is missing... confirmed that this part is not yet implemented: 'Right, that part is missing.'"

This implementation extracts traceparent and tracestate from CloudEvent extensions and propagates the trace context, which appears to address the missing functionality mentioned in the comments.

Could you clarify whether:

  1. The comment thread refers to a different aspect of trace context propagation?
  2. The implementation was added after those comments were made?
  3. There's additional trace context work still pending beyond what's implemented here?
cmd/adapter/main.go (5)

45-49: LGTM! Reasonable timeout for graceful shutdown.

The 5-second timeout for TracerProvider shutdown provides sufficient time for pending spans to be exported while preventing indefinite hanging during shutdown.


175-192: LGTM! OpenTelemetry initialization follows best practices.

The implementation correctly:

  • Retrieves configurable sample ratio from environment
  • Initializes the tracer with service name and version
  • Defers shutdown with timeout to prevent resource leaks
  • Logs initialization failures with error context

158-159: LGTM! Comprehensive error context enrichment throughout startup and shutdown paths.

The consistent application of WithErrorField across all error scenarios (config loading, client creation, environment validation, subscription errors, shutdown errors) enables structured logging with selective stack trace capture.

Also applies to: 198-199, 208-209, 222-223, 252-255, 261-264, 276-277, 286-287, 302-303, 321-322, 340-341, 346-348


274-274: Note: broker.NewSubscriber API change.

The signature of broker.NewSubscriber has been updated to accept a logger as the first parameter. This is a breaking change to the broker library's public API and affects all consumers of this library.

Ensure that:

  1. This API change is documented in the broker library's changelog
  2. Other consumers of the broker library are updated accordingly
  3. The version bump of the broker library follows semantic versioning (major version bump for breaking changes)

295-299: Good documentation of stack trace limitation.

The comment correctly explains that stack traces captured in this error handler goroutine reflect the handler's call stack, not the error's origin in the broker library. This is important context for developers debugging subscription errors.

internal/executor/resource_executor.go (6)

76-80: LGTM! Early K8s context enrichment.

The K8s resource context fields (kind, name, namespace) are added early, ensuring all subsequent logs in this execution path include these fields for better filtering and correlation. The comment helpfully distinguishes these from event-level resource_type/resource_id fields.


107-113: LGTM! Single-pass generation extraction with early context enrichment.

Extracting the manifest generation once and adding it to the context early ensures consistent logging of observed_generation across all downstream operations without redundant extraction calls.


141-142: LGTM! Operation decision logging.

The log message clearly indicates the operation decision with operation and reason. K8s identity fields (kind, name, namespace) are now provided via context rather than duplicated in the message, following the structured logging pattern.


165-174: LGTM! Context-enriched error and success logging.

The error path correctly uses WithK8sResult("FAILED") and WithErrorField to capture error details and selective stack traces, while the success path uses WithK8sResult("SUCCESS"). Both maintain clear log messages with operation details, and the context chaining preserves immutability.


339-351: LGTM! Appropriate log levels for recreation steps.

The recreation flow steps (delete, wait, create) are logged at Debug level, which is appropriate for operational details that aren't user-critical. The higher-level operation decision and final result remain at Info level for visibility.


366-382: LGTM! Balanced logging in deletion wait loop.

The deletion wait logic appropriately uses:

  • Warn for context cancellation (unexpected condition)
  • Debug for deletion confirmation and polling progress (reduces noise)
  • Error with WithErrorField for unexpected errors (captures context)

This provides good troubleshooting visibility without excessive log volume.

pkg/logger/context.go (4)

116-134: LGTM! Consistent K8s context helpers.

The WithK8s* helper functions provide a clean API for enriching context with Kubernetes resource metadata. The consistent implementation pattern and clear documentation make these easy to use correctly across the codebase.


151-168: Verify selective stack trace capture logic in helper functions.

The WithErrorField implementation correctly handles nil errors and uses conditional stack trace capture. To ensure this approach avoids performance overhead under high load, manually verify:

  1. That shouldCaptureStackTrace() properly distinguishes between expected operational errors (network issues, auth failures, not found) and unexpected/internal errors.
  2. That withStackTraceField() and CaptureStackTrace(1) correctly capture and associate stack traces without introducing overhead.

170-208: This concern is incorrect. CloudEvents trace context propagation is already implemented in the codebase. The ExtractTraceContextFromCloudEvent function in pkg/otel/propagation.go extracts W3C Trace Context (traceparent and tracestate) from CloudEvent extensions, and is actively used in internal/executor/executor.go to enable distributed tracing across the request flow.

Likely an incorrect or invalid review comment.


7-46: No dependency verification needed—OpenTelemetry is already a project dependency.

The go.opentelemetry.io/otel/trace package is not a new dependency; it's already present in go.mod at v1.38.0 and is widely used throughout the codebase (in pkg/otel/, internal/executor/, and internal/hyperfleet_api/). The import in pkg/logger/context.go is consistent with the project's existing OTEL integration, and the trace package is properly used (via trace.SpanContextFromContext). Go 1.25.0 is compatible with OTEL v1.38.0.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @cmd/adapter/main.go:
- Around line 185-192: The deferred tracer shutdown logs use the outer ctx which
may be cancelled; change the error logging to use shutdownCtx instead. In the
defer that creates shutdownCtx/shutdownCancel and calls
tp.Shutdown(shutdownCtx), pass shutdownCtx into logger.WithErrorField and into
log.Warnf (replace usages of ctx with shutdownCtx) so the error context for
tp.Shutdown comes from the local shutdownCtx; keep the rest of the logic
(shutdown timeout and cancel) unchanged and reference tp.Shutdown, shutdownCtx,
shutdownCancel, logger.WithErrorField, and log.Warnf when updating the code.

In @internal/executor/executor.go:
- Around line 186-194: The combined error message built from the map
result.Errors is non-deterministic because Go map iteration order is random; fix
by collecting the map keys (phase) into a slice, sort them (add "sort" to
imports), then iterate the sorted keys to build errMsgs and combinedErr so the
log produced by logger.WithErrorField(ctx, combinedErr) and e.log.Errorf(...) is
deterministic and stable across runs.
🧹 Nitpick comments (3)
pkg/otel/propagation_test.go (1)

13-16: Global propagator setup in init() may affect test isolation.

Setting the global propagator in init() affects all tests in the package. This is acceptable for this test file, but be aware that if other test files in the otel package rely on a different propagator configuration, they could be affected.

Consider using t.Cleanup for better isolation
func TestExtractTraceContextFromCloudEvent(t *testing.T) {
	// Save and restore the original propagator
	originalPropagator := otel.GetTextMapPropagator()
	otel.SetTextMapPropagator(propagation.TraceContext{})
	t.Cleanup(func() {
		otel.SetTextMapPropagator(originalPropagator)
	})
	// ... rest of tests
}
internal/executor/executor.go (2)

211-226: Consider setting span status on execution errors.

The span is created and ended, but span status isn't set on errors. OTEL best practice is to call span.SetStatus(codes.Error, msg) when the operation fails, enabling better observability in tracing backends.

Suggested enhancement

Return the span from startTracedExecution as you do, but consider adding span status updates in the Execute method before returning failed results:

import "go.opentelemetry.io/otel/codes"

// In Execute, before returning a failed result:
if result.Status == StatusFailed {
    span.SetStatus(codes.Error, "execution failed")
}

248-248: Execution result intentionally discarded.

The result is discarded because error handling is performed internally within Execute. Since CreateHandler always returns nil (ACK strategy per comment on lines 232-233), this is consistent with the design. However, consider whether metrics or span attributes should capture the execution status for observability.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d522234 and 53e7e06.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (18)
  • cmd/adapter/main.go
  • go.mod
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
  • internal/executor/executor.go
  • internal/executor/post_action_executor.go
  • internal/executor/precondition_executor.go
  • internal/executor/resource_executor.go
  • internal/executor/utils.go
  • internal/hyperfleet_api/client.go
  • pkg/logger/context.go
  • pkg/logger/logger.go
  • pkg/logger/logger_test.go
  • pkg/logger/stack_trace.go
  • pkg/logger/with_error_field_test.go
  • pkg/otel/propagation.go
  • pkg/otel/propagation_test.go
  • pkg/otel/tracer.go
💤 Files with no reviewable changes (2)
  • internal/config_loader/validator_test.go
  • internal/config_loader/loader_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/hyperfleet_api/client.go
  • internal/executor/resource_executor.go
  • go.mod
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.
📚 Learning: 2025-12-04T14:06:51.656Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.

Applied to files:

  • cmd/adapter/main.go
  • internal/executor/utils.go
  • pkg/logger/with_error_field_test.go
🧬 Code graph analysis (9)
internal/executor/precondition_executor.go (3)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
internal/criteria/types.go (1)
  • Operator (10-10)
internal/config_loader/constants.go (1)
  • FieldValue (72-72)
pkg/logger/stack_trace.go (4)
pkg/errors/network_error.go (1)
  • IsNetworkError (24-75)
pkg/errors/api_error.go (1)
  • IsAPIError (144-150)
pkg/errors/k8s_error.go (3)
  • K8sResourceKeyNotFoundError (61-66)
  • K8sInvalidPathError (84-88)
  • K8sResourceDataError (105-112)
pkg/logger/context.go (2)
  • WithLogField (57-64)
  • StackTraceKey (21-21)
pkg/otel/propagation_test.go (1)
pkg/otel/propagation.go (1)
  • ExtractTraceContextFromCloudEvent (30-57)
pkg/logger/logger_test.go (1)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
pkg/otel/tracer.go (1)
pkg/logger/logger.go (1)
  • Logger (14-40)
pkg/logger/logger.go (1)
pkg/logger/context.go (3)
  • ComponentKey (15-15)
  • VersionKey (16-16)
  • HostnameKey (17-17)
internal/executor/post_action_executor.go (1)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
internal/executor/utils.go (1)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
pkg/logger/with_error_field_test.go (4)
pkg/logger/context.go (4)
  • WithErrorField (156-168)
  • GetLogFields (215-228)
  • WithEventID (97-99)
  • WithClusterID (102-104)
pkg/errors/k8s_error.go (3)
  • NewK8sResourceKeyNotFoundError (74-81)
  • NewK8sInvalidPathError (96-102)
  • NewK8sResourceDataError (128-136)
pkg/errors/api_error.go (1)
  • NewAPIError (123-134)
pkg/logger/stack_trace.go (1)
  • CaptureStackTrace (115-134)
🔇 Additional comments (34)
internal/executor/utils.go (2)

47-48: LGTM!

Correct use of WithErrorField to embed the error in the context for structured logging, keeping the log message generic.


141-148: Approve with note on method coverage.

The error handling logic correctly determines the appropriate error to log based on whether the original err is present or if the response indicates a non-success status. However, note that PUT, PATCH, and DELETE methods (lines 150-169) don't have similar failure logging. If this asymmetry is intentional (e.g., POST is more critical for debugging), consider adding a comment to clarify; otherwise, you may want to apply consistent logging for all methods that send a body.

internal/executor/precondition_executor.go (2)

41-42: LGTM!

Correct application of WithErrorField pattern for structured error logging. The error details are now embedded in the context rather than interpolated into the log message.


188-188: Good reduction of log noise.

Demoting non-matched condition logs from Infof to Debugf appropriately reduces log verbosity at info level while maintaining traceability at debug level.

internal/executor/post_action_executor.go (2)

40-41: LGTM!

Consistent use of WithErrorField for structured error logging when payload building fails.


61-62: LGTM!

Consistent use of WithErrorField for post-action failure logging, maintaining action name identification in the message while embedding error details in context.

pkg/logger/logger_test.go (1)

270-280: LGTM!

Test correctly updated to exercise the WithErrorField context-based error embedding pattern. The test validates that chaining with .With() continues to work after embedding an error in the context via WithErrorField.

pkg/otel/propagation.go (1)

30-57: Clean implementation of W3C trace context extraction.

The function correctly:

  • Handles nil event and nil extensions safely
  • Requires traceparent (per W3C spec) before attempting extraction
  • Treats tracestate as optional
  • Uses the global OpenTelemetry propagator for standard-compliant extraction

The function is properly integrated into the event handler at internal/executor/executor.go:242, where it's called early in the handler flow (after logging context setup but before event execution) to inject the distributed trace context. This addresses the PR comment about missing trace context propagation from CloudEvents, and comprehensive test coverage validates the implementation.

pkg/logger/logger.go (1)

146-151: LGTM - Consistent use of context keys for base logger fields.

Using the shared constants (ComponentKey, VersionKey, HostnameKey) from context.go ensures field name consistency across the logging system.

pkg/otel/propagation_test.go (1)

18-241: Comprehensive test coverage for trace context extraction.

The test suite covers all critical scenarios including nil events, missing extensions, invalid formats, and proper extraction of trace context fields. The table-driven approach for invalid traceparent formats (lines 143-176) is particularly well-structured.

pkg/logger/stack_trace.go (3)

21-49: Well-designed extensible skip list for stack trace capture.

The blocklist approach using a slice of checker functions is clean and maintainable. Adding new error types to skip is straightforward.


51-65: LGTM - Comprehensive API error classification.

Including IsServerError() in the skip list aligns with the PR objective to reduce stack trace noise for expected operational errors. Server errors from external APIs are expected in distributed systems.


115-134: LGTM - Clean stack trace capture implementation.

The skip+2 offset correctly accounts for runtime.Callers and CaptureStackTrace itself, ensuring the caller's frame is the first captured. The 32-frame limit is reasonable for debugging purposes.

pkg/otel/tracer.go (2)

28-55: LGTM - Robust environment variable parsing with proper validation and logging.

The function correctly handles all edge cases: empty value, parse errors, and out-of-range values (< 0 or > 1). Logging at appropriate levels (Info for defaults, Warn for invalid values) provides good operational visibility.


66-100: Verify: No trace exporter configured - traces will only generate IDs, not be exported.

The TracerProvider is created without any exporter (e.g., OTLP, Jaeger, stdout). This means trace IDs and span IDs will be generated for log correlation and propagation, but traces won't be sent to any backend for visualization.

If this is intentional (traces used only for correlation IDs and not for distributed tracing visualization), consider adding a comment explaining this design decision. Otherwise, an exporter would need to be added:

Example: Adding an OTLP exporter (if needed)
import (
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
)

// In InitTracer, before creating the TracerProvider:
exporter, err := otlptracegrpc.New(ctx)
if err != nil {
	return nil, fmt.Errorf("failed to create trace exporter: %w", err)
}

tp := sdktrace.NewTracerProvider(
	sdktrace.WithResource(res),
	sdktrace.WithSampler(sampler),
	sdktrace.WithBatcher(exporter), // Add exporter
)
pkg/logger/with_error_field_test.go (4)

20-80: LGTM - Thorough unit tests for WithErrorField.

The tests cover essential scenarios: nil error handling, error message propagation, stack trace capture for unexpected errors, and preservation of existing context fields. Good coverage of the core functionality.


86-309: Comprehensive test coverage for stack trace capture decisions.

The table-driven tests cover all error categories defined in skipStackTraceCheckers: context errors, K8s API errors, K8s resource data errors, HyperFleet API errors, and unexpected errors. Notably, wrapped errors are also tested (line 104), validating that errors.Is works correctly through the error chain.


335-361: Complex skip parameter test handles edge cases correctly.

The test logic accounts for the maxFrames cap scenario where skip=1 might not reduce frame count due to the limit. The verification that stack1[i] == stack0[i+1] confirms the skip behavior is working correctly even at the boundary.


385-472: Well-designed integration tests validating end-to-end behavior.

These tests verify the complete flow from WithErrorField through shouldCaptureStackTrace to the final context fields. Testing representative errors from each category (unexpected, K8s, context, API) ensures the integration works correctly.

internal/executor/executor.go (4)

57-59: LGTM on span lifecycle management.

The span is correctly started and deferred for cleanup. The trace context is properly propagated to the logger context via startTracedExecution.


65-66: Consistent error context handling.

Good use of WithErrorField to attach error details to the context before logging, aligning with the structured logging improvements.


78-84: Resource context logging is well-structured.

The dynamic resource ID logging correctly handles both owned resources (e.g., NodePool with cluster owner) and standalone resources (e.g., Cluster). The comments clearly explain the behavior.


239-251: Trace context extraction from CloudEvents enables distributed tracing.

This addresses the PR objective for trace context propagation. The ExtractTraceContextFromCloudEvent call correctly extracts W3C trace context from CloudEvent extensions before starting local spans.

cmd/adapter/main.go (5)

45-49: Well-defined timeout constant for tracer shutdown.

The 5-second timeout is reasonable for flushing pending spans to the collector.


175-183: OTel initialization with sample ratio configuration.

Good integration of environment-based trace sampling configuration. The error handling follows the established pattern with WithErrorField.


295-299: Helpful documentation about error context limitations.

The comment clearly explains that stack traces captured here reflect the goroutine's location rather than the broker library's error origin. This sets appropriate expectations for debugging.


250-265: Consistent error handling for missing environment variables.

Both required environment variables (BROKER_SUBSCRIPTION_ID and BROKER_TOPIC) are validated with proper error context before proceeding.


274-274: Broker API signature change requires logger parameter.

This aligns with the broker library version v1.0.1 specified in go.mod. The logger is now propagated to the subscriber for consistent logging.

pkg/logger/context.go (6)

14-21: Well-organized context key constants.

The grouping of keys by purpose (required fields, error fields, correlation, resource, K8s manifest, adapter-specific) improves readability and maintainability.

Also applies to: 33-37


116-134: K8s resource context setters follow established patterns.

The new WithK8sKind, WithK8sName, WithK8sNamespace, and WithK8sResult functions consistently use WithLogField like other setters in this file.


170-208: Excellent documentation for WithOTelTraceContext.

The detailed comments explaining context immutability, the example flow, and the expected log output are valuable for maintainers. The implementation correctly:

  • Validates span context before extracting
  • Checks HasTraceID() and HasSpanID() separately
  • Uses standard OTEL APIs

214-227: Defensive copy in GetLogFields prevents mutation.

Returning a copy of the LogFields map (lines 221-224) prevents callers from accidentally mutating the context's internal state. This is good defensive programming.


151-168: WithErrorField implementation with selective stack trace capture.

Good design choices:

  • Nil-safe with early return
  • Stores error message as string (not the error object) for serialization safety
  • Conditional stack trace capture reduces overhead for expected operational errors

The helper functions are properly implemented:

  • shouldCaptureStackTrace maintains a comprehensive blocklist of expected operational errors (context cancellation, network errors, K8s API errors, etc.) and returns true only for unexpected errors
  • CaptureStackTrace correctly captures the call stack with proper skip frame handling and formats frames clearly
  • The skip=1 parameter passed from WithErrorField correctly accounts for the additional call level

163-165: Stack trace offset is correct.

With CaptureStackTrace(1), the implementation adds 2 to the skip parameter: runtime.Callers(skip+2, pcs). This means:

  • Frame 0: runtime.Callers
  • Frame 1: CaptureStackTrace
  • Frame 2: WithErrorField
  • Frame 3+: caller of WithErrorField

The offset of 1 correctly skips WithErrorField and produces a stack trace starting from the caller of WithErrorField, as intended.

@xueli181114
Copy link
Contributor Author

/retest

…ering and correlation of adapter logger

In this PR I added
- Put error to logger fields
- Put stack_trace to logger fields
- Put resource_type, resource_name, resource_result, observed generation to logger fields
- Bumped broker lib and updated the code
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @pkg/otel/tracer.go:
- Around line 1-16: The semconv import in package otel currently uses
go.opentelemetry.io/otel/semconv/v1.37.0 which conflicts with the SDK v1.38.0;
either update the import to go.opentelemetry.io/otel/semconv/v1.38.0 (change the
import alias semconv to the v1.38.0 path and run go mod tidy) or explicitly add
semconv v1.37.0 to go.mod so the versions match; update the import line in
pkg/otel/tracer.go (semconv) and run dependency resolution to eliminate the
schema URL conflict.
🧹 Nitpick comments (1)
internal/executor/executor.go (1)

186-194: Non-deterministic error message ordering due to map iteration.

Go's map iteration order is randomized, so the combined error message will have phases in different orders across runs. This could make log comparison and debugging slightly harder.

Consider sorting the phase keys before building the error message:

♻️ Suggested fix for deterministic ordering
+import "sort"
+
 // Combine all errors into a single error for logging
-var errMsgs []string
-for phase, err := range result.Errors {
-	errMsgs = append(errMsgs, fmt.Sprintf("%s: %v", phase, err))
-}
+phases := make([]ExecutionPhase, 0, len(result.Errors))
+for phase := range result.Errors {
+	phases = append(phases, phase)
+}
+sort.Slice(phases, func(i, j int) bool { return phases[i] < phases[j] })
+
+var errMsgs []string
+for _, phase := range phases {
+	errMsgs = append(errMsgs, fmt.Sprintf("%s: %v", phase, result.Errors[phase]))
+}
 combinedErr := fmt.Errorf("execution failed: %s", strings.Join(errMsgs, "; "))
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53e7e06 and 72c33c1.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (18)
  • cmd/adapter/main.go
  • go.mod
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
  • internal/executor/executor.go
  • internal/executor/post_action_executor.go
  • internal/executor/precondition_executor.go
  • internal/executor/resource_executor.go
  • internal/executor/utils.go
  • internal/hyperfleet_api/client.go
  • pkg/logger/context.go
  • pkg/logger/logger.go
  • pkg/logger/logger_test.go
  • pkg/logger/stack_trace.go
  • pkg/logger/with_error_field_test.go
  • pkg/otel/propagation.go
  • pkg/otel/propagation_test.go
  • pkg/otel/tracer.go
💤 Files with no reviewable changes (2)
  • internal/config_loader/loader_test.go
  • internal/config_loader/validator_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • pkg/logger/with_error_field_test.go
  • internal/executor/precondition_executor.go
  • internal/executor/post_action_executor.go
  • pkg/otel/propagation.go
  • internal/executor/utils.go
  • go.mod
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.
📚 Learning: 2025-12-04T14:06:51.656Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 13
File: internal/executor/post_action_executor.go:198-205
Timestamp: 2025-12-04T14:06:51.656Z
Learning: Logger refactor is tracked in HYPERFLEET-304 for the hyperfleet-adapter repository, which will address how CEL evaluation failures and similar errors are handled and logged.

Applied to files:

  • internal/executor/executor.go
  • cmd/adapter/main.go
📚 Learning: 2025-12-11T02:06:33.975Z
Learnt from: xueli181114
Repo: openshift-hyperfleet/hyperfleet-adapter PR: 17
File: internal/config_loader/validator_schema.go:126-133
Timestamp: 2025-12-11T02:06:33.975Z
Learning: In the hyperfleet-adapter repository (Go), environment variable validation for required parameters (those with `env.*` source, `Required=true`, and no default) occurs at config load time in `internal/config_loader/validator_schema.go`. This is intentional design - environment variables must be set before deployment, and the validation ensures fail-fast behavior with clear error messages.

Applied to files:

  • internal/hyperfleet_api/client.go
🧬 Code graph analysis (9)
pkg/otel/tracer.go (1)
pkg/logger/logger.go (1)
  • Logger (14-40)
internal/executor/executor.go (4)
pkg/logger/context.go (4)
  • WithErrorField (156-168)
  • WithResourceType (107-109)
  • WithDynamicResourceID (81-84)
  • WithOTelTraceContext (191-208)
internal/executor/types.go (1)
  • Executor (69-75)
internal/config_loader/types.go (2)
  • AdapterConfig (59-64)
  • Metadata (67-71)
pkg/otel/propagation.go (1)
  • ExtractTraceContextFromCloudEvent (30-57)
pkg/logger/logger_test.go (1)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
pkg/logger/stack_trace.go (4)
pkg/errors/network_error.go (1)
  • IsNetworkError (24-75)
pkg/errors/api_error.go (1)
  • IsAPIError (144-150)
pkg/errors/k8s_error.go (3)
  • K8sResourceKeyNotFoundError (61-66)
  • K8sInvalidPathError (84-88)
  • K8sResourceDataError (105-112)
pkg/logger/context.go (2)
  • WithLogField (57-64)
  • StackTraceKey (21-21)
cmd/adapter/main.go (5)
pkg/logger/context.go (1)
  • WithErrorField (156-168)
pkg/otel/tracer.go (2)
  • GetTraceSampleRatio (35-55)
  • InitTracer (66-100)
internal/config_loader/types.go (1)
  • Metadata (67-71)
internal/hyperfleet_api/client.go (1)
  • WithTimeout (70-74)
pkg/errors/error.go (1)
  • Errors (77-95)
pkg/logger/logger.go (1)
pkg/logger/context.go (3)
  • ComponentKey (15-15)
  • VersionKey (16-16)
  • HostnameKey (17-17)
pkg/otel/propagation_test.go (2)
pkg/otel/propagation.go (1)
  • ExtractTraceContextFromCloudEvent (30-57)
pkg/errors/error.go (1)
  • New (107-123)
internal/executor/resource_executor.go (3)
pkg/logger/context.go (6)
  • WithK8sKind (117-119)
  • WithK8sName (122-124)
  • WithK8sNamespace (127-129)
  • WithObservedGeneration (142-144)
  • WithK8sResult (132-134)
  • WithErrorField (156-168)
internal/k8s_client/client.go (1)
  • GetGenerationAnnotation (381-398)
internal/executor/types.go (2)
  • NewExecutorError (375-382)
  • PhaseResources (25-25)
pkg/logger/context.go (1)
pkg/logger/stack_trace.go (1)
  • CaptureStackTrace (115-134)
🔇 Additional comments (32)
pkg/logger/context.go (2)

151-168: LGTM! Well-designed error field handling with selective stack trace capture.

The implementation correctly:

  • Guards against nil errors
  • Uses CaptureStackTrace(1) to skip itself and start from the caller
  • Only captures stack traces for unexpected errors to avoid performance overhead

191-208: LGTM! Proper OpenTelemetry span context extraction.

The function correctly validates span context before extraction and checks for trace/span ID validity individually. The detailed documentation explaining context immutability and parent-child span behavior is helpful.

pkg/logger/logger_test.go (1)

270-280: LGTM! Test correctly updated for the new context-based error handling API.

The test properly demonstrates the new pattern: embed error into context via WithErrorField, then use that context with the logger chain.

internal/hyperfleet_api/client.go (2)

274-282: LGTM! Well-structured HTTP tracing with proper cardinality management.

Good practices observed:

  • Method-only span name avoids high cardinality from URLs
  • URL captured as url.full attribute for debugging
  • Span properly ended via defer
  • Logger context updated with new span ID for correlation

The attribute names http.request.method and url.full align with OpenTelemetry HTTP semantic conventions.


323-326: Trace context injection looks correct.

Using reqCtx (which has the span context) for header injection ensures the correct trace/span IDs are propagated via W3C traceparent header.

pkg/logger/stack_trace.go (3)

51-65: Verify: Server errors (5xx) excluded from stack trace capture.

isExpectedAPIError includes IsServerError() (5xx responses), meaning stack traces won't be captured for these. While this reduces overhead, server errors can indicate upstream bugs worth investigating.

If 5xx errors are genuinely expected in your environment (e.g., transient upstream issues), this is fine. Otherwise, consider removing apiErr.IsServerError() to capture stack traces for unexpected server failures.


21-49: LGTM! Extensible skip-list pattern for stack trace control.

The slice-based checker design allows easy extension for new error types. Good coverage of:

  • Context cancellation/deadline errors
  • Network/transient errors
  • K8s API errors (NotFound, Conflict, etc.)
  • Custom K8s resource data errors

115-133: LGTM! Stack trace capture implementation.

Correct skip calculation (skip+2) to exclude runtime.Callers and CaptureStackTrace itself. The 32-frame limit is reasonable for most debugging scenarios.

pkg/logger/logger.go (1)

146-151: LGTM! Consistent use of key constants for base logger fields.

Using string(ComponentKey), string(VersionKey), and string(HostnameKey) ensures consistency with the context key definitions in context.go. This prevents key mismatches between base logger fields and context-injected fields.

pkg/otel/propagation_test.go (3)

13-16: Consider test isolation for global propagator setup.

The init() function sets the global propagator, which affects all tests in this package. This is generally fine for this test file, but if other tests assume a different propagator configuration, consider using t.Cleanup or a test-local setup instead.

For now, this works correctly since propagation_test.go owns this behavior.


143-176: LGTM! Comprehensive table-driven tests for invalid traceparent formats.

Good coverage of edge cases including:

  • Malformed format
  • Wrong version
  • Short trace/span IDs
  • Missing parts
  • All-zeros trace/span IDs (which are invalid per W3C spec)

18-241: LGTM! Well-structured test suite with thorough W3C Trace Context coverage.

The tests comprehensively validate ExtractTraceContextFromCloudEvent behavior across all relevant scenarios including sampled/unsampled traces and tracestate preservation.

pkg/otel/tracer.go (2)

66-99: LGTM! Well-documented tracer initialization with proper sampling strategy.

The implementation correctly:

  • Uses ParentBased(TraceIDRatioBased) to respect parent sampling decisions while applying probabilistic sampling for root spans
  • Avoids resource.Default() merge to prevent semconv schema URL conflicts (good catch!)
  • Returns the provider for proper shutdown handling by caller

Note: No exporter is configured, meaning traces aren't exported to a backend. Per the docstring, this is intentional - the tracer generates IDs for log correlation and HTTP propagation only. If backend export is needed later, add an exporter (e.g., OTLP).


35-55: LGTM! Robust environment variable parsing with clear logging.

Good validation:

  • Handles empty env var with informative log
  • Catches parse errors with warning
  • Validates range bounds (0.0-1.0)
  • Logs the configured ratio as both decimal and percentage
internal/executor/executor.go (6)

8-8: LGTM! Import additions are appropriate for the OTel integration.

The new imports support the tracing functionality: strings for error message joining, pkgotel for trace context extraction from CloudEvents, and the OTel packages for span creation.

Also applies to: 15-17


56-59: LGTM! Proper span lifecycle management.

The span is created at the start of execution and deferred span.End() ensures it's always closed, even on early returns or panics.


64-66: LGTM! Error context enrichment before logging.

Using WithErrorField to attach the error to the context before logging ensures consistent structured error fields (and conditional stack trace capture per the helper's implementation).


76-84: LGTM! Dynamic resource context logging.

The logging now correctly distinguishes between owned resources (e.g., NodePool with its owning Cluster) and standalone resources, adding appropriate resource_type and dynamic *_id fields.


211-226: LGTM! Well-documented trace initialization helper.

The function clearly documents its responsibilities (span creation, logger context enrichment) and the caller's obligation to call span.End(). Using the adapter config name as the tracer name provides good service attribution.


239-251: LGTM! Trace context extraction from CloudEvents.

This enables distributed tracing by propagating W3C trace context from upstream services. The extraction happens early in the handler, ensuring all subsequent operations (including the Execute span) are correlated.

Note: Per the PR comments, trace context extraction is now implemented here, addressing the previously noted gap.

internal/executor/resource_executor.go (5)

76-81: LGTM! K8s resource context for structured log filtering.

Adding k8s_kind, k8s_name, and k8s_namespace to the logging context enables filtering logs by specific Kubernetes resources, separate from the event's resource_type/resource_id.


107-113: LGTM! Generation extracted once and added to context early.

Extracting manifestGen once avoids redundant calls and ensures observed_generation appears in all subsequent log entries for this resource operation. The conditional check (> 0) correctly avoids logging when the annotation is absent.


141-142: LGTM! Consistent operation logging with result context.

The logging now includes K8sResult (SUCCESS/FAILED) as a structured field, making it easy to filter operations by outcome. The error context is properly enriched before logging failures.

Also applies to: 155-155, 165-174


339-339: LGTM! Debug-level logging for recreation steps.

Using Debugf for intermediate recreation steps (delete, wait, create) is appropriate—these are implementation details that shouldn't clutter info-level logs during normal operation.

Also applies to: 345-345, 351-351


366-366: LGTM! Improved deletion wait logging.

The context cancellation warning and deletion confirmation are at appropriate levels (warn/debug). Error checking during polling now uses WithErrorField for structured error capture.

Also applies to: 373-373, 377-378, 382-382

cmd/adapter/main.go (7)

45-49: LGTM! Reasonable shutdown timeout.

5 seconds is appropriate for tracer provider shutdown—long enough to flush pending spans but not so long as to delay application exit.


175-192: LGTM! Proper OTel initialization and shutdown.

The tracer is initialized with the adapter name for service attribution and a configurable sample ratio. The deferred shutdown correctly uses a fresh context (context.Background()) with timeout, since the original ctx may already be cancelled during shutdown.


158-161: LGTM! Consistent error context enrichment across startup failures.

All startup errors (config loading, API client, K8s client, executor) now use WithErrorField before logging, ensuring structured error fields and conditional stack traces.

Also applies to: 198-200, 208-210, 222-224


252-256: LGTM! Improved missing env var error handling.

Creating the error first, then enriching context before logging, ensures the error message is included in structured fields rather than just the log message.

Also applies to: 261-265


295-303: Good documentation of error context limitations.

The comment clearly explains that the stack trace captured here reflects the monitoring goroutine, not the error's origin in the broker library. This sets appropriate expectations for debugging.


340-341: LGTM! Graceful shutdown error handling.

Subscriber close errors and timeout conditions are properly logged with enriched context. The timeout error is explicitly constructed to provide a clear message.

Also applies to: 346-348


274-274: Verify broker library API compatibility

The code calls broker.NewSubscriber(log, subscriptionID) with log as the first parameter. Available documentation for the broker library shows the expected signature as NewSubscriber(subscriptionID string, configMap ...map[string]string), with subscriptionID as the first parameter, not log. Confirm that the version specified in go.mod (v1.0.1) matches this implementation, or verify if openshift-hyperfleet/hyperfleet-broker has a different API.

Comment on lines +1 to +16
// Package otel provides OpenTelemetry tracing utilities for the hyperfleet-adapter.
package otel

import (
"context"
"fmt"
"os"
"strconv"

"github.com/openshift-hyperfleet/hyperfleet-adapter/pkg/logger"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.37.0"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check OTEL SDK and semconv versions in go.mod for compatibility
grep -E "go.opentelemetry.io/otel" go.mod | head -20

Repository: openshift-hyperfleet/hyperfleet-adapter

Length of output: 247


🏁 Script executed:

#!/bin/bash
# Check for semconv in go.mod specifically
grep -i "semconv" go.mod
# Also get a broader view of go.mod around OTEL dependencies
cat go.mod | grep -A 2 -B 2 "otel"

Repository: openshift-hyperfleet/hyperfleet-adapter

Length of output: 737


🏁 Script executed:

#!/bin/bash
# Search for semconv in go.mod (entire file)
echo "=== go.mod semconv entries ==="
cat go.mod | grep -i semconv
# Check go.sum for semconv
echo "=== go.sum semconv entries ==="
grep -i "semconv" go.sum | head -10

Repository: openshift-hyperfleet/hyperfleet-adapter

Length of output: 144


🏁 Script executed:

#!/bin/bash
# Check if semconv is actually used in tracer.go
echo "=== Searching for semconv usage in tracer.go ==="
grep -n "semconv" pkg/otel/tracer.go
# Also check what's imported but possibly unused
echo "=== Full tracer.go file ==="
cat -n pkg/otel/tracer.go

Repository: openshift-hyperfleet/hyperfleet-adapter

Length of output: 5032


🌐 Web query:

OTEL SDK v1.38.0 semconv version bundled

💡 Result:

Which language SDK do you mean (Java, Python, Go, JS, .NET, etc.)?


🌐 Web query:

go.opentelemetry.io/otel sdk v1.38.0 semconv version dependency

💡 Result:

The OpenTelemetry Go SDK v1.38.0 uses the semantic conventions package version v1.38.0 (go.opentelemetry.io/otel/semconv/v1.38.0). [1][2]

Sources:

  • semconv v1.38.0 package docs. [1]
  • opentelemetry-go v1.38.0 release. [2]

Update semconv import to v1.38.0 or explicitly declare it in go.mod.

The code imports semconv/v1.37.0, but OTEL SDK v1.38.0 (declared in go.mod) bundles semconv/v1.38.0. This version mismatch causes schema URL conflicts, which the code works around by avoiding resource.Default() merging. Additionally, semconv is not declared in go.mod despite being directly imported at line 15. Either update the import to v1.38.0 to match the bundled version or explicitly add semconv v1.37.0 as a dependency.

🤖 Prompt for AI Agents
In @pkg/otel/tracer.go around lines 1 - 16, The semconv import in package otel
currently uses go.opentelemetry.io/otel/semconv/v1.37.0 which conflicts with the
SDK v1.38.0; either update the import to
go.opentelemetry.io/otel/semconv/v1.38.0 (change the import alias semconv to the
v1.38.0 path and run go mod tidy) or explicitly add semconv v1.37.0 to go.mod so
the versions match; update the import line in pkg/otel/tracer.go (semconv) and
run dependency resolution to eliminate the schema URL conflict.

@yasun1
Copy link
Contributor

yasun1 commented Jan 12, 2026

/test ci/prow/unit

@openshift-ci
Copy link

openshift-ci bot commented Jan 12, 2026

@yasun1: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test images-images
/test lint
/test presubmits-images
/test presubmits-integration
/test unit

Use /test all to run all jobs.

Details

In response to this:

/test ci/prow/unit

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@yasun1
Copy link
Contributor

yasun1 commented Jan 12, 2026

/test unit

@yasun1
Copy link
Contributor

yasun1 commented Jan 12, 2026

/test images-images lint presubmits-images

@yasun1
Copy link
Contributor

yasun1 commented Jan 12, 2026

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Jan 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yasun1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit d30e70e into openshift-hyperfleet:main Jan 12, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants