Skip to content

Conversation

@MaxGabriel
Copy link
Contributor

@MaxGabriel MaxGabriel commented May 17, 2025

Edit: Realized I should add a TTL/max retention period for exemplars I think.

Prometheus has a neat feature called exemplars:

An exemplar is a specific trace representative of measurement taken in a given time interval. While metrics excel at giving you an aggregated view of your system, traces give you a fine grained view of a single request; exemplars are a way to link the two.

Exemplars are only available on histograms and counters. On each bucket of a histogram, you can store a single list of key-value pairs (typically trace_id and span_id). The prometheus server will scrape those and they can be displayed in tools like Grafana. That way, when you're investigating requests above 10 seconds or whatever, you can find examples to dig into in a tracing tool like Honeycomb.

Exemplars considered experimental and must be enabled on the prometheus server. That said, this feature has existed for four years with support built into tools like Grafana already.

This PR would be a breaking change because it adds an argument to the Sample constructor. That said, I expect that most library users do not deal with samples directly.

This PR only adds exemplars to histograms.

--

This PR does add a couple of hlint warnings:

prometheus-client/src/Prometheus/Metric/Histogram.hs:110:1-68: Warning: Eta reduce
Found:
  insert value bucketCounts
    = insertWithExemplar value [] bucketCounts
Perhaps:
  insert value = insertWithExemplar value []

prometheus-client/src/Prometheus/Metric/Histogram.hs:135:19-74: Suggestion: Use zipWith
Found:
  map toSample (zip upperBoundAndCount exemplarLabelPairs)
Perhaps:
  zipWith (curry toSample) upperBoundAndCount exemplarLabelPairs

But I didn't really like the suggested changes and I see there are plenty of other hlint warnings, so maybe lmk if it's fine to ignore those.

I still need to do some cleanup, but this is now working end-to-end and exemplars can be queried from the prometheus server and viewed in grafana

, exponentialBuckets
, linearBuckets
, getHistogram
, observeWithExemplar
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add something like observeDurationWithExemplar too, to fully match the existing functions.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would probably make sense.

--
-- This feature is experimental and must be [enabled on the Prometheus server](https://prometheus.io/docs/prometheus/latest/feature_flags/).
--
-- > withLabel incomingHttpRequestSeconds "Signup_POST" (`observeWithExemplar` 1.23 [("trace_id", "12345"), ("span_id", "67890")])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a previous commit I had a tiny helper called traceLabel and traceAndSpanLabels that would construct this. I could add that back if you'd like

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting to myself there's a syntax error here: (observeWithExemplar 1.23 [("trace_id", "12345"), ("span_id", "67890")])

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these aren't standardized (e.g. looks like gcp also wants project_id) I dunno that I'd create helpers. It is not that much more characters to type it out explicitly.

@MaxGabriel MaxGabriel closed this May 18, 2025
@MaxGabriel MaxGabriel reopened this May 19, 2025
Copy link
Owner

@fimad fimad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prometheus-client/src/Prometheus/Metric/Histogram.hs:110:1-68: Warning: Eta reduce
Found:
 insert value bucketCounts
   = insertWithExemplar value [] bucketCounts
Perhaps:
 insert value = insertWithExemplar value []

prometheus-client/src/Prometheus/Metric/Histogram.hs:135:19-74: Suggestion: Use >zipWith
Found:
 map toSample (zip upperBoundAndCount exemplarLabelPairs)
Perhaps:
 zipWith (curry toSample) upperBoundAndCount exemplarLabelPairs

But I didn't really like the suggested changes and I see there are plenty of other hlint warnings, so maybe lmk if it's fine to ignore those.

Yeah, I don't really like those suggestions either, I think those are fine to ignore.

, exponentialBuckets
, linearBuckets
, getHistogram
, observeWithExemplar
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would probably make sense.

-- These formats are nearly identical; the only practical difference
-- is that OpenMetrics supports exemplars and requires this content-type header:
-- @application/openmetrics-text; version=1.0.0; charset=utf-8@
data ExportFormat = PrometheusText | OpenMetricsOneZeroZero
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is already defined in the main package, why redefine it here as well?

--
-- This feature is experimental and must be [enabled on the Prometheus server](https://prometheus.io/docs/prometheus/latest/feature_flags/).
--
-- > withLabel incomingHttpRequestSeconds "Signup_POST" (`observeWithExemplar` 1.23 [("trace_id", "12345"), ("span_id", "67890")])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these aren't standardized (e.g. looks like gcp also wants project_id) I dunno that I'd create helpers. It is not that much more characters to type it out explicitly.

@MaxGabriel
Copy link
Contributor Author

Thanks for the comments. I'm doing some work on this off-and-on—main issue I ran into was that because we tail sample our traces, only some trace IDs make it into our tracing provider (honeycomb). I think the best solution is to add minimum retention period support that other clients have, so that exemplars aren't being constantly overwritten. Then, tell the sampler to keep those traces that you do make exemplars for

(Didn't explain this well, but mostly leaving this as a note for where I'm at)

@MaxGabriel
Copy link
Contributor Author

I ran into a stumbling block and asked my coworker about it:

I have been working on adding a feature called Exemplars to prometheus-client. Basically when you record time series data in a histogram, it lets you also record a specific data point like “this specific request took 15s and this is it’s trace_id”. Then grafana will create a dot on the graph for that exemplar data point and you can click it to see the full trace in grafana. Basically a way to link together metrics and traces.

There is a feature I want to add where you have a minimum retention period on an exemplar, so you e.g. only create an exemplar per bucket up to every 30 seconds or something. As part of this, I want to return from a function whether the exemplar was recorded or not.

The thing I’m running into is there is this class:

  doIO :: IO () -> m ()
  default doIO :: (MonadTrans t, MonadMonitor n, m ~ t n) => IO () -> m ()```

https://hackage.haskell.org/package/prometheus-client-1.1.1/docs/Prometheus.html#t:MonadMonitor

for doing IO, but it doesn’t let you return a value, so I can’t actually tell a caller if the exemplar was recorded or not. So, obvious fix to modify `doIO` to let it return a value (at that point I guess it’s just MonadIO?)

Then the issue is:

https://hackage.haskell.org/package/prometheus-client-1.1.1/docs/src/Prometheus.MonadMonitor.html#MonitorT

this uses WriterT to basically build up a list of IO actions. My understanding is this to let you record metrics in pure code, and only once you get back to IO does it send them all off. But I’m not sure I can have the WriterT thing work if I have `MonadMonitor` return values?

Any ideas on this?

They didn't think this was solvable:

> Not really possible to have pure metrics and return values here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants