Skip to content

Conversation

@brightsparc
Copy link
Contributor

@brightsparc brightsparc commented Jan 4, 2026

Overview

This PR exposes HTTP request headers and gRPC request metadata to Python processors, enabling them to access request context and add it as attributes to telemetry data. This feature provides the same functionality as the OpenTelemetry Collector's include_metadata capability.

Equivalent to OpenTelemetry Collector configuration:

receivers:
  otlp:
    protocols:
      http:
        include_metadata: true # Enables metadata propagation for HTTP transport
      grpc:
        include_metadata: true # Enables metadata propagation for gRPC transport

When ROTEL_OTLP_HTTP_INCLUDE_METADATA=true or ROTEL_OTLP_GRPC_INCLUDE_METADATA=true is set, specified headers/metadata are extracted by the receiver and made available to Python processors via the message_metadata property on ResourceSpans, ResourceMetrics, and ResourceLogs.

Changes

  • Added message_metadata: Optional[Dict[str, str]] field to Python SDK structs (ResourceSpans, ResourceMetrics, ResourceLogs)
  • Updated PythonProcessable trait to accept headers: Option<HashMap<String, String>>
  • Modified pipeline to extract headers from both HttpMetadata and GrpcMetadata and pass to Python processors
  • Added GrpcMetadata variant to MessageMetadataInner enum, sharing the same HashMap<String, String> structure as HTTP
  • Added context_processor.py example demonstrating header/metadata extraction and attribute addition
  • Added Dockerfile.context-processor for easy deployment with HTTP/gRPC metadata support

Configuration

HTTP Metadata Extraction

# Enable HTTP metadata extraction
ROTEL_OTLP_HTTP_INCLUDE_METADATA=true

# Specify which headers to extract (comma-separated)
ROTEL_OTLP_HTTP_HEADERS_TO_INCLUDE=my-custom-header,another-header

# Run with pyo3 feature enabled
cargo run --features pyo3 -- start \
  --otlp-http-endpoint 0.0.0.0:4318 \
  --otlp-http-include-metadata \
  --otlp-http-headers-to-include my-custom-header,another-header \
  --otlp-with-trace-processor ./rotel_python_processor_sdk/processors/context_processor.py

gRPC Metadata Extraction

# Enable gRPC metadata extraction
ROTEL_OTLP_GRPC_INCLUDE_METADATA=true

# Specify which headers to extract (comma-separated)
ROTEL_OTLP_GRPC_HEADERS_TO_INCLUDE=my-custom-header,another-header

# Run with pyo3 feature enabled
cargo run --features pyo3 -- start \
  --otlp-grpc-endpoint 0.0.0.0:4317 \
  --otlp-grpc-include-metadata \
  --otlp-grpc-headers-to-include my-custom-header,another-header \
  --otlp-with-trace-processor ./rotel_python_processor_sdk/processors/context_processor.py

Docker Deployment

The Dockerfile.context-processor builds the Rust binary inside Docker and includes the context processor with HTTP/gRPC metadata support. This Dockerfile is designed to work seamlessly with SigNoz, a popular OpenTelemetry-native observability platform.

Prerequisites: Ensure SigNoz is installed and running. For installation instructions, see the SigNoz Docker installation guide. The Docker image connects to SigNoz services via the signoz-net Docker network.

# Build the context processor Docker image
# This will compile the Rust code inside Docker (no local cross-compilation needed)
docker build -f Dockerfile.context-processor \
  --platform linux/amd64 \
  -t rotel-context-processor:latest .

# Run with OTLP exporter forwarding to Signoz OTel Collector on signoz-net network
docker run --rm -it \
  --network signoz-net \
  -p 5418:5418 \
  -e ROTEL_OTLP_HTTP_ENDPOINT=0.0.0.0:5418 \
  -e ROTEL_OTLP_HTTP_INCLUDE_METADATA=true \
  -e ROTEL_OTLP_HTTP_HEADERS_TO_INCLUDE=my-custom-header,another-header \
  rotel-context-processor:latest \
  start \
  --otlp-with-trace-processor /processors/context_processor.py \
  --exporter otlp \
  --otlp-exporter-endpoint signoz-otel-collector:4317

# Alternatively, use ClickHouse exporter for direct testing
docker run --rm -it \
  --network signoz-net \
  -p 5418:5418 \
  -e ROTEL_OTLP_HTTP_ENDPOINT=0.0.0.0:5418 \
  -e ROTEL_OTLP_HTTP_INCLUDE_METADATA=true \
  -e ROTEL_OTLP_HTTP_HEADERS_TO_INCLUDE=my-custom-header,another-header \
  -e ROTEL_CLICKHOUSE_EXPORTER_ENDPOINT=http://signoz-clickhouse:8123 \
  -e ROTEL_CLICKHOUSE_EXPORTER_DATABASE=otel \
  rotel-context-processor:latest \
  start \
  --otlp-with-trace-processor /processors/context_processor.py \
  --exporter clickhouse

The Docker image uses a flexible ENTRYPOINT (/rotel) that allows you to pass any command arguments. Configuration can be provided via:

  • Environment variables: All rotel options can be set via ROTEL_* environment variables (e.g., ROTEL_OTLP_HTTP_ENDPOINT, ROTEL_CLICKHOUSE_EXPORTER_ENDPOINT)
  • Command-line arguments: Pass arguments directly to the start command

Recommended configuration:

  • Receiver listening on port 5418 (HTTP) to avoid conflicts with SigNoz's OTel collector (which uses ports 4317/4318)
  • HTTP header extraction enabled via ROTEL_OTLP_HTTP_INCLUDE_METADATA=true
  • Headers to extract specified via ROTEL_OTLP_HTTP_HEADERS_TO_INCLUDE (comma-separated, e.g., my-custom-header,another-header)
  • Context processor at /processors/context_processor.py that extracts headers from context and adds them as span attributes

Integration with SigNoz: When running alongside SigNoz (see SigNoz Docker installation), the container connects to the signoz-net network and forwards processed telemetry to SigNoz's OTel collector. Processed traces with header attributes will appear in the SigNoz UI at http://localhost:8080.

Docker Compose example:

rotel:
  platform: linux/amd64
  build:
    context: ../rotel
    dockerfile: Dockerfile.context-processor
  environment:
    - ROTEL_OTLP_HTTP_ENDPOINT=0.0.0.0:5418
    - ROTEL_OTLP_HTTP_INCLUDE_METADATA=true
    - ROTEL_OTLP_HTTP_HEADERS_TO_INCLUDE=my-custom-header,another-header
    - ROTEL_CLICKHOUSE_EXPORTER_ENDPOINT=http://signoz-clickhouse:8123
    - ROTEL_CLICKHOUSE_EXPORTER_DATABASE=otel
  command:
    - start
    - --otlp-with-trace-processor
    - /processors/context_processor.py
    - --exporter
    - clickhouse
  networks:
    - signoz-net

Note: Python 3.14+ requires PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 environment variable (PyO3 0.24.2 supports up to Python 3.13). The Dockerfile uses Python 3.13 by default.

Usage

Accessing Metadata in Python Processors

Python processors can now access HTTP headers and gRPC metadata via the message_metadata property, which is a simple dictionary. The same interface works for both HTTP and gRPC:

from rotel_sdk.open_telemetry.common.v1 import AnyValue, KeyValue
from rotel_sdk.open_telemetry.trace.v1 import ResourceSpans

def process_spans(resource_spans: ResourceSpans) -> ResourceSpans:
    # Access HTTP headers or gRPC metadata (simple dict[str, str])
    # Both use the same interface - a dictionary with lowercase keys
    if resource_spans.message_metadata:
        # Get a specific header/metadata key (stored with lowercase keys)
        header_value = resource_spans.message_metadata.get("my-custom-header")

        if header_value:
            # Add as span attribute following OTel semantic convention
            attr = KeyValue(
                key="http.request.header.my-custom-header",
                value=AnyValue(header_value)
            )

            # Add to all spans
            for scope_spans in resource_spans.scope_spans:
                for span in scope_spans.spans:
                    span.attributes.append(attr)

    return resource_spans

Example: Adding Headers to Resource Attributes

def process_metrics(resource_metrics: ResourceMetrics) -> ResourceMetrics:
    if resource_metrics.message_metadata and resource_metrics.resource:
        # Headers are stored with lowercase keys
        custom_header = resource_metrics.message_metadata.get("my-custom-header")
        if custom_header:
            attr = KeyValue(
                key="http.request.header.my-custom-header",
                value=AnyValue(custom_header)
            )
            resource_metrics.resource.attributes.append(attr)

    return resource_metrics

Alignment with OpenTelemetry Collector

This feature provides equivalent functionality to OpenTelemetry Collector's include_metadata configuration:

OpenTelemetry Collector:

receivers:
  otlp:
    protocols:
      http:
        include_metadata: true # Enables metadata propagation for HTTP transport
      grpc:
        include_metadata: true # Enables metadata propagation for gRPC transport

Rotel (equivalent):

# HTTP metadata extraction
--otlp-http-include-metadata
--otlp-http-headers-to-include my-custom-header,another-header

# gRPC metadata extraction
--otlp-grpc-include-metadata
--otlp-grpc-headers-to-include my-custom-header,another-header

Behavior alignment:

  • HTTP headers and gRPC metadata are extracted by the receiver and stored in message metadata
  • Processors access headers/metadata via context (in Rotel: message_metadata property)
  • Headers/metadata are stored with lowercase keys for case-insensitive lookup
  • Headers/metadata are passed as a simple dict[str, str] to Python processors
  • Both HTTP and gRPC use the same unified interface, making processors protocol-agnostic

Implementation

Headers/metadata flow from receiver → pipeline → Python processor:

  1. HTTP: Receiver extracts headers and creates HttpMetadata
  2. gRPC: Receiver extracts metadata keys and creates GrpcMetadata
  3. Both variants share the same underlying HashMap<String, String> structure
  4. Pipeline extracts headers/metadata as HashMap<String, String> and passes to processors
  5. Python objects receive headers/metadata via message_metadata property (dict[str, str])
  6. Processors access headers/metadata directly from the dictionary using the same interface

Breaking Changes

None. This is a purely additive feature. Existing processors continue to work without modification.

Testing

  • Unit tests verify HTTP and gRPC metadata extraction for traces, metrics, and logs
  • End-to-end verified: headers/metadata flow from HTTP/gRPC receiver → pipeline → Python processor → span attributes → ClickHouse/OTLP exporter
  • Comprehensive test coverage for case-insensitive metadata key matching
  • Docker image tested with context processor using both ClickHouse exporter (direct) and OTLP exporter (forwarding to Signoz OTel Collector)
  • Verified in ClickHouse: custom headers successfully extracted and stored as http.request.header.* span attributes

Testing with generate-otlp

The built-in generate-otlp tool can be used to generate test telemetry data with custom headers for testing metadata extraction:

# Generate and send traces with test headers directly to Rotel
cargo run --bin generate-otlp -- traces \
  --http-endpoint localhost:5418 \
  --include-headers

# Or generate a trace file for testing
cargo run --bin generate-otlp -- traces --file trace.pb

# Then send it with curl (add your custom headers)
curl -X POST http://localhost:5418/v1/traces \
  -H "Content-Type: application/x-protobuf" \
  -H "my-custom-header: test-value" \
  --data-binary @trace.pb

The --include-headers flag adds example test headers (my-custom-header, another-header) that can be extracted by the context processor.

Querying Results in ClickHouse

After processing, headers are added as span attributes following the http.request.header.* naming convention. You can query them in ClickHouse:

-- Query for traces with a specific header (e.g., my-custom-header)
SELECT
  TraceId,
  SpanId,
  SpanName,
  SpanAttributes['http.request.header.my-custom-header'] AS custom_header
FROM otel.otel_traces
WHERE Timestamp > now() - INTERVAL 1 HOUR
  AND has(SpanAttributes, 'http.request.header.my-custom-header')
ORDER BY Timestamp DESC
LIMIT 10;

-- Count traces with a specific header
SELECT count(*)
FROM otel.otel_traces
WHERE Timestamp > now() - INTERVAL 1 HOUR
  AND has(SpanAttributes, 'http.request.header.my-custom-header');

The processor adds headers as span attributes using the key format http.request.header.<header-name> (e.g., http.request.header.my-custom-header).

Known Limitations

  • Python 3.14+: Requires PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 environment variable (PyO3 0.24.2 supports up to Python 3.13). The Dockerfile uses Python 3.13 by default.

Copy link
Contributor

@rjenkins rjenkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brightsparc - Thanks for the PR it's good work, we've been keen on adding the request metadata for a while! Good job on weaving these changes into the Message metadata structures, it's not well documented.

There are a few changes I'd like to make to this. In particular around how we store this request context metadata and how it's passed into the processors. I could describe the changes or we could spec them out and go back and forth but I think it's faster if I checkout this PR and push a commit so we can discuss. I've got most of them done and will complete tomorrow. We should then be able to merge this shortly after.

Brief summary of what I'm changing...

  • Going to move grpc/http request metadata out into a separate Option field.
  • RequestContext will be an enum with grpc and http to begin, in the future we plan to put additional types in here like kafka, for the Kafka receiver message headers and other Kafka message metadata like timestamp etc
  • On the processor API side we'll mirror the RequestContext type.

Primarily goal is to keep the metadata field in the Message for internal use and message acknowledgement. We've also had some requests to add message acknowledgement for the grpc/http OTLP receivers, such that Rotel will only return a 200 after it either spools to disk locally or durably exports, for example to Kafka. Splitting the request metadata into a separate field will allow us to do both.

Will follow up tomorrow after I push a commit.

self.schema_url = schema_url;
Ok(())
}
#[getter]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pyo3 stuff with objects and collections gets pretty complex. Not sure if this was intentional, but this will essentially make the data immutable from the python side as the underlying object is cloned. For the sake of clarity, as an example, if you call this function and then try to add to the dict on the python side with metadata['new_key'] = "some value" or del metadata['key'], the changes will not be reflected back in Rust after the processor completes. For the most part the SDK is implemented in a manner such that you can idiomatically work with objects or collections in python and the changes are reflected back on the Rust side.

However, for this use case, it's not necessarily a problem if we choose to say this request metadata should be immutable. However, we do have this setter below which will allow a total overwrite on the map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably do want this to be immutable in this case. My thinking is that the metadata properties could be added as resource or span attributes in the processor but probably shouldn’t be changed - however happy to take your lead on this.

Copy link
Contributor

@rjenkins rjenkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brightsparc. I've pushed the changes to move the request metadata into a separate struct member in the Message payload. I've also added some unit tests for the context_processor. You can run the processor sdk tests by cd'ing into the rotel_python_processor_sdk directory.

cd rotel_python_processor_sdk
cargo nextest run

For a follow up I think we should add some configuration options to the context processor so you can specify the key name to use, as well as actions such as upsert, insert, or convert similar to how the attributes processor behaves here https://github.com/streamfold/rotel/blob/main/rotel_python_processor_sdk/processors/attributes_processor.py and here https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/attributesprocessor/README.md.

We can likely merge the two features so the configuration from the attributes processor can use from_context. But we can address that in the future.

Take a look and let me know if this works well for your use case and if so I'll take a final 👀 and we can get this merged.

Copy link
Contributor Author

@brightsparc brightsparc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@brightsparc
Copy link
Contributor Author

We can likely merge the two features so the configuration from the attributes processor can use from_context. But we can address that in the future.

Take a look and let me know if this works well for your use case and if so I'll take a final 👀 and we can get this merged.

Sounds good, let me know if you want me to address the merge conflict, otherwise i'm happy for you to land this. Thanks

@rjenkins
Copy link
Contributor

rjenkins commented Jan 9, 2026

We can likely merge the two features so the configuration from the attributes processor can use from_context. But we can address that in the future.
Take a look and let me know if this works well for your use case and if so I'll take a final 👀 and we can get this merged.

Sounds good, let me know if you want me to address the merge conflict, otherwise i'm happy for you to land this. Thanks

@brightsparc Thank you, yes if you can fix the merge conflict in agent.rs I'll approve and we can merge this PR.

@brightsparc brightsparc requested a review from rjenkins January 9, 2026 17:22
@rjenkins rjenkins merged commit 87a53a3 into streamfold:main Jan 9, 2026
4 checks passed
@brightsparc brightsparc deleted the header-context branch January 9, 2026 20:47
@brightsparc
Copy link
Contributor Author

Will there be a new release and docker image that includes this context request mapping soon?

@brightsparc
Copy link
Contributor Author

Hey @rjenkins added a follow up PR for a general purpose processor config. Let me know if this is what you had in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants