Skip to content

Conversation

@MaxAnderson95
Copy link

@MaxAnderson95 MaxAnderson95 commented Dec 28, 2025

Disclaimer: This is an entirely vibe-coded solution and proof of concept. I'm not necessarily asking for this exact implementation to be merged, but rather presenting this as a proof-of-concept/feature request to get feedback from the team. I have a specific use case that I needed to solve, and I wanted to share my approach to spark discussion about how multiline log collection could be supported in the Dash0 Operator.

If the outcome of this PR is that it's just closed without merging and the feature request is added for consideration in the future, that's perfectly fine with me!

Problem Statement

When collecting logs from Java/Spring Boot applications (or any application that produces multi-line log entries like stack traces), each line of a stack trace is currently collected as a separate log entry in Dash0. This makes it difficult to:

  1. Correlate stack trace lines with the error that caused them
  2. Search for and analyze exceptions as complete units
  3. Get a clear picture of what went wrong when debugging issues

Example: Before (Current Behavior)

With the current operator, a Java exception with a stack trace appears as multiple separate log entries:

stack-trace-before

Each line of the stack trace appears as its own log record, making it hard to see the full context of the error.

Proposed Solution

I've implemented support for configuring multiline log collection in the Dash0Monitoring CRD. This allows users to specify a regex pattern that identifies the start (or end) of a new log entry, enabling the OpenTelemetry collector to combine continuation lines into a single log record.

How It Works

The implementation uses the OpenTelemetry recombine operator (not the multiline config on the filelog receiver itself, which doesn't work well with container log parsing which has its own timestamp). The recombine operator runs after the container parser extracts the log body, allowing it to match against the actual application log content rather than the raw container log format.

Key implementation details:

  1. New CRD fields: Added multiline.lineStartPattern and multiline.lineEndPattern to spec.logCollection
  2. Recombine operator: Uses OTel's recombine operator with is_first_entry or is_last_entry expressions
  3. Namespace grouping: Since OTel's filelog receiver only supports one multiline pattern per receiver, namespaces with the same multiline config are grouped together, generating multiple filelog receivers if needed
  4. Validation webhook: Ensures only one of lineStartPattern or lineEndPattern is specified and validates regex syntax

Example: After (With Multiline Support)

With multiline log collection enabled, the entire stack trace is collected as a single log entry:

stack-trace-after

The full stack trace is now visible as a single log record, making debugging much easier.

API Changes

New CRD Schema

The Dash0Monitoring CRD's spec.logCollection now supports an optional multiline configuration:

apiVersion: operator.dash0.com/v1beta1
kind: Dash0Monitoring
metadata:
  name: dash0-monitoring
  namespace: my-java-app
spec:
  logCollection:
    enabled: true
    multiline:
      # Use lineStartPattern to identify the beginning of a new log entry
      # All lines NOT matching this pattern will be combined with the previous entry
      lineStartPattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3} (ERROR|WARN|INFO|DEBUG|TRACE)'

      # Alternatively, use lineEndPattern to identify the end of a log entry
      # (only one of lineStartPattern or lineEndPattern can be specified)
      # lineEndPattern: 'some-end-pattern'

Validation Rules

  • Only one of lineStartPattern or lineEndPattern can be specified (not both)
  • At least one must be specified when multiline is present
  • The pattern must be a valid Go regex

Generated OpenTelemetry Collector Configuration

When multiline is configured, the operator generates the following configuration in the filelog receiver:

receivers:
  filelog:
    include:
      - /var/log/pods/my-java-app_*/*/*.log
    storage: file_storage/filelogreceiver_offsets
    include_file_path: true
    include_file_name: false
    include_file_record_number: true
    operators:
      - id: container-parser
        max_log_size: 102400
        type: container
      - id: recombine-multiline
        type: recombine
        combine_field: body
        source_identifier: attributes["log.file.path"]
        is_first_entry: body matches '^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}\\.\\d{3} (ERROR|WARN|INFO|DEBUG|TRACE)'
        combine_with: "\n"
        max_log_size: 1048576

Note: The regex pattern is automatically escaped for OTTL expressions (backslashes are doubled).

My Specific Use Case

Application Log Format

I have a Java Spring Boot application that logs in the following format:

2025-12-28 00:44:00.940 ERROR 1 --- [scheduling-1] o.g.e.s.LogGeneratorService.logWithStackTrace:117 : Processing request failed
java.lang.RuntimeException: Connection timeout - ground control link unavailable
 at org.example.service.LogGeneratorService.establishConnection(LogGeneratorService.java:179)
 at org.example.service.LogGeneratorService.simulateNetworkOperation(LogGeneratorService.java:175)
 at org.example.service.LogGeneratorService.generateRandomException(LogGeneratorService.java:145)
 at org.example.service.LogGeneratorService.logWithStackTrace(LogGeneratorService.java:113)
 at java.base/java.lang.Thread.run(Thread.java:829)

My Regex Pattern

To properly collect these logs with stack traces as single entries, I use:

^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3} (ERROR|WARN|INFO|DEBUG|TRACE)

This pattern matches the timestamp and log level at the start of each new log entry. Lines that don't match (like java.lang.RuntimeException:... or at org.example...) are combined with the previous log entry.

Dash0Monitoring Configuration

apiVersion: operator.dash0.com/v1beta1
kind: Dash0Monitoring
metadata:
  name: dash0-monitoring
  namespace: example-app
spec:
  logCollection:
    enabled: true
    multiline:
      lineStartPattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3} (ERROR|WARN|INFO|DEBUG|TRACE)'

Files Changed

  • api/operator/common/common_monitoring_types.go - Added MultilineConfig struct
  • api/operator/common/zz_generated.deepcopy.go - Generated deepcopy methods
  • config/crd/bases/operator.dash0.com_dash0monitorings.yaml - Updated CRD schema
  • helm-chart/dash0-operator/templates/operator/deployment-and-webhooks.yaml - Helm chart CRD
  • internal/collectors/otelcolresources/collector_config_maps.go - Added template function and types
  • internal/collectors/otelcolresources/collector_config_maps_test.go - Fixed tests
  • internal/collectors/otelcolresources/daemonset.config.yaml.template - Added recombine operator
  • internal/collectors/otelcolresources/desired_state.go - Namespace grouping by multiline config
  • internal/webhooks/monitoring_validation_webhook.go - Added validation

Questions for the Team

  1. First off is there any interest in supporting this use case with the dash0 operator, or would this be considered "too advanced" in which case I should just use a custom otel collector and forward to dash0 directly? With a custom otel collector, we might lose the niceties of the dash0 operator like automatic attributes, and tying logs to resource objects and traces.
  2. Is this the right approach for supporting multiline logs, or would you prefer a different implementation?
  3. Are there any edge cases or considerations I might have missed?

I'm happy to iterate on this implementation based on your feedback, or hand it off to someone on the team who is better at Golang than I am. As I said this was entirely vibe coded with Claude Opus 4.5, so having a real programmer take this moving forward would probably be good. Looking forward to hearing your thoughts!

Add support for configuring multiline log collection in the Dash0Monitoring
CRD, enabling proper collection of Java stack traces and other multi-line
log entries as single log records.

Changes:
- Add MultilineConfig struct with lineStartPattern and lineEndPattern fields
  to the LogCollection configuration
- Implement recombine operator in the filelog receiver to merge continuation
  lines with the previous log entry
- Add escapeOttlRegex template function to properly escape regex patterns
  for OTTL expressions
- Group namespaces by multiline configuration to generate separate filelog
  receivers (OTel filelog only supports one multiline pattern per receiver)
- Add validation webhook to ensure only one of lineStartPattern or
  lineEndPattern is specified and that patterns are valid regex
- Update Helm chart CRD templates with new multiline fields
@MaxAnderson95 MaxAnderson95 marked this pull request as ready for review December 28, 2025 01:12
@MaxAnderson95 MaxAnderson95 requested a review from a team as a code owner December 28, 2025 01:12
@basti1302
Copy link
Member

Hi @MaxAnderson95,

thanks for contributing this. I agree that this would be a useful feature.

In general, the much preferred approach would always be to configure structured logging in the workload, which usually would lead to stack traces being attached to log entries as fields, which in turn makes recombine unnecessary. But the reality is also that there might be situations where this is not possible (e.g. running third party components in your cluster etc.).

Side note: I think the more appropriate level of granularity for this configuration settings would be per-workload instead of per namespace, but I think this would currently not be possible, as per-workload log collections settings are not supported at all right now.

I must admit I did not take a look at the code changes in detail yet. Just to manage expectations, it might be a while until I get around to do that.

For now I would like to leave this open and come back to it once I have the bandwidth.

@MaxAnderson95
Copy link
Author

Hi @basti1302 sounds great! Take your time! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants