[processor/cumulativetodelta] Silent misconfiguration when max_staleness lacks duration suffix

### Component(s)

processor/cumulativetodelta

### What happened?

#### Description

When configuring `max_staleness` without a duration suffix (e.g., `max_staleness: 300` instead of `max_staleness: 300s`), the processor silently accepts the configuration but interprets the value as **nanoseconds** instead of the intended unit. This causes the processor's state tracking to expire almost immediately, breaking delta calculation without any error or warning.

The processor appears to work correctly—metrics flow through, `AggregationTemporality` is changed to `Delta`—but the actual delta values remain identical to the cumulative values because state is lost between scrapes.

#### Steps to Reproduce

1. Configure the cumulativetodelta processor with a bare integer for `max_staleness`:

```yaml
processors:
  cumulativetodelta:
    include:
      metric_types:
        - "histogram"
    initial_value: "keep"
    max_staleness: 300    # Missing 's' suffix - interpreted as 300 nanoseconds!
```

2. Send histogram metrics through the processor
3. Observe that while `AggregationTemporality` shows `Delta`, the `Count`, `Sum`, and `BucketCounts` values remain cumulative (identical between scrapes when no new data arrives, instead of showing 0)

#### Expected Result

Either:
- **Option A (Preferred):** The collector should fail at startup with a clear validation error indicating that duration values require a unit suffix
- **Option B:** The collector should log a warning when a duration value seems unreasonably small (e.g., < 1 second for `max_staleness`)

#### Actual Result

The configuration is silently accepted. The value `300` is interpreted as 300 nanoseconds (not 300 seconds), causing:
- State entries to expire in ~300ns (essentially immediately)
- Delta calculation to fail silently because previous values are never retained
- Metrics to pass through with `AggregationTemporality: Delta` but with incorrect (cumulative) values

### Root Cause Analysis

The issue stems from how Go's type system interacts with the configuration parsing:

1. **Config struct definition** (`config.go`):
   ```go
   type Config struct {
       MaxStaleness time.Duration `mapstructure:"max_staleness"`
       // ...
   }
   ```

2. **confmap decoder configuration** (from `opentelemetry-collector/confmap/confmap.go`):
   ```go
   dc := &mapstructure.DecoderConfig{
       WeaklyTypedInput: false,
       DecodeHook: composehook.ComposeDecodeHookFunc(
           // ...
           mapstructure.StringToTimeDurationHookFunc(),  // Only handles strings
           // ...
       ),
   }
   ```

3. **The parsing chain:**
   - YAML parses `max_staleness: 300` as an **integer** (not a string)
   - `WeaklyTypedInput: false` prevents automatic int→string conversion
   - `StringToTimeDurationHookFunc()` only activates for string inputs, so it's bypassed
   - Go's `time.Duration` is `type Duration int64` (nanoseconds), allowing direct integer assignment
   - Result: `300` becomes 300 nanoseconds

4. **Why the fix works:** `max_staleness: 300s` is parsed by YAML as a **string**, triggering `StringToTimeDurationHookFunc()` which correctly calls `time.ParseDuration("300s")` → 300 seconds

This is a known class of problem documented in [go-yaml/yaml#200](https://github.com/go-yaml/yaml/issues/200).

### Proposed Solution

Add validation in the processor's `Validate()` method to catch unreasonably small duration values:

```go
func (c *Config) Validate() error {
    // Existing validation...
    
    // Validate max_staleness is reasonable (if set)
    if c.MaxStaleness > 0 && c.MaxStaleness < time.Second {
        return fmt.Errorf(
            "max_staleness value %v appears to be in nanoseconds; "+
            "duration values require a unit suffix (e.g., '300s', '5m')",
            c.MaxStaleness,
        )
    }
    
    return nil
}
```

Alternatively, this could be addressed at the confmap level in the core collector by adding a decode hook that rejects raw integers for `time.Duration` fields, which would benefit all components.

### Collector version

v0.131.0 (otel/opentelemetry-collector-contrib:0.131.0)

### Environment information

```
OS: macOS Darwin 25.1.0 (also reproduced in Docker linux/arm64)
Collector: otel/opentelemetry-collector-contrib:0.131.0
```

### OpenTelemetry Collector configuration

```yaml
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: "worker"
          metrics_path: "/metrics"
          scrape_interval: "30s"
          static_configs:
            - targets: ["host.docker.internal:9394"]

processors:
  batch:
    send_batch_size: 8192
    timeout: 10s
  cumulativetodelta:
    include:
      metric_types:
        - "histogram"
    initial_value: "keep"
    max_staleness: 300    # BUG: Missing 's' suffix

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [cumulativetodelta, batch]
      exporters: [debug]
```

### Log output

Debug output showing the issue (note `AggregationTemporality: Delta` but values remain cumulative):

```
Metric #0
Descriptor:
     -> Name: sidekiq_job_runtime_seconds
     -> Unit: seconds
     -> DataType: Histogram
     -> AggregationTemporality: Delta
HistogramDataPoints #0
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC   # <-- Epoch zero indicates state was lost
Timestamp: 2025-12-11 07:15:46.773 +0000 UTC
Count: 7           # <-- Should be 0 if no new jobs, but shows cumulative total
Sum: 43.749000     # <-- Should be 0.0 if no new jobs, but shows cumulative total
```

After fix (`max_staleness: 300s`):

```
Metric #0
Descriptor:
     -> Name: sidekiq_job_runtime_seconds
     -> DataType: Histogram
     -> AggregationTemporality: Delta
HistogramDataPoints #0
StartTimestamp: 2025-12-11 07:19:16.773 +0000 UTC   # <-- Proper timestamp
Timestamp: 2025-12-11 07:19:46.773 +0000 UTC
Count: 0           # <-- Correct delta (no new jobs)
Sum: 0.000000      # <-- Correct delta
```

### Additional context

This issue is particularly problematic because:

1. **No startup error** - The collector starts successfully
2. **No runtime error** - Metrics flow through without issues
3. **Subtle misbehavior** - The output looks correct (shows Delta temporality) but values are wrong
4. **Hard to debug** - Requires deep understanding of the processor internals to diagnose

The fix is trivial once identified (`300` → `300s`), but discovering the root cause required tracing through the confmap parsing logic, mapstructure configuration, and understanding Go's `time.Duration` type alias.

A validation check would save users significant debugging time and surface the misconfiguration immediately at startup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[processor/cumulativetodelta] Silent misconfiguration when max_staleness lacks duration suffix #44901

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Root Cause Analysis

Proposed Solution

Collector version

Environment information

OpenTelemetry Collector configuration

Log output

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[processor/cumulativetodelta] Silent misconfiguration when max_staleness lacks duration suffix #44901

Description

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Root Cause Analysis

Proposed Solution

Collector version

Environment information

OpenTelemetry Collector configuration

Log output

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions