-
Notifications
You must be signed in to change notification settings - Fork 55
Add exemplars #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add exemplars #74
Conversation
| , exponentialBuckets | ||
| , linearBuckets | ||
| , getHistogram | ||
| , observeWithExemplar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add something like observeDurationWithExemplar too, to fully match the existing functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would probably make sense.
| -- | ||
| -- This feature is experimental and must be [enabled on the Prometheus server](https://prometheus.io/docs/prometheus/latest/feature_flags/). | ||
| -- | ||
| -- > withLabel incomingHttpRequestSeconds "Signup_POST" (`observeWithExemplar` 1.23 [("trace_id", "12345"), ("span_id", "67890")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a previous commit I had a tiny helper called traceLabel and traceAndSpanLabels that would construct this. I could add that back if you'd like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting to myself there's a syntax error here: (observeWithExemplar 1.23 [("trace_id", "12345"), ("span_id", "67890")])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these aren't standardized (e.g. looks like gcp also wants project_id) I dunno that I'd create helpers. It is not that much more characters to type it out explicitly.
fimad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prometheus-client/src/Prometheus/Metric/Histogram.hs:110:1-68: Warning: Eta reduce Found: insert value bucketCounts = insertWithExemplar value [] bucketCounts Perhaps: insert value = insertWithExemplar value [] prometheus-client/src/Prometheus/Metric/Histogram.hs:135:19-74: Suggestion: Use >zipWith Found: map toSample (zip upperBoundAndCount exemplarLabelPairs) Perhaps: zipWith (curry toSample) upperBoundAndCount exemplarLabelPairsBut I didn't really like the suggested changes and I see there are plenty of other hlint warnings, so maybe lmk if it's fine to ignore those.
Yeah, I don't really like those suggestions either, I think those are fine to ignore.
| , exponentialBuckets | ||
| , linearBuckets | ||
| , getHistogram | ||
| , observeWithExemplar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would probably make sense.
| -- These formats are nearly identical; the only practical difference | ||
| -- is that OpenMetrics supports exemplars and requires this content-type header: | ||
| -- @application/openmetrics-text; version=1.0.0; charset=utf-8@ | ||
| data ExportFormat = PrometheusText | OpenMetricsOneZeroZero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is already defined in the main package, why redefine it here as well?
| -- | ||
| -- This feature is experimental and must be [enabled on the Prometheus server](https://prometheus.io/docs/prometheus/latest/feature_flags/). | ||
| -- | ||
| -- > withLabel incomingHttpRequestSeconds "Signup_POST" (`observeWithExemplar` 1.23 [("trace_id", "12345"), ("span_id", "67890")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these aren't standardized (e.g. looks like gcp also wants project_id) I dunno that I'd create helpers. It is not that much more characters to type it out explicitly.
|
Thanks for the comments. I'm doing some work on this off-and-on—main issue I ran into was that because we tail sample our traces, only some trace IDs make it into our tracing provider (honeycomb). I think the best solution is to add minimum retention period support that other clients have, so that exemplars aren't being constantly overwritten. Then, tell the sampler to keep those traces that you do make exemplars for (Didn't explain this well, but mostly leaving this as a note for where I'm at) |
|
I ran into a stumbling block and asked my coworker about it: I have been working on adding a feature called Exemplars to prometheus-client. Basically when you record time series data in a histogram, it lets you also record a specific data point like “this specific request took 15s and this is it’s trace_id”. Then grafana will create a dot on the graph for that exemplar data point and you can click it to see the full trace in grafana. Basically a way to link together metrics and traces. There is a feature I want to add where you have a minimum retention period on an exemplar, so you e.g. only create an exemplar per bucket up to every 30 seconds or something. As part of this, I want to return from a function whether the exemplar was recorded or not. The thing I’m running into is there is this class: |
Edit: Realized I should add a TTL/max retention period for exemplars I think.
Prometheus has a neat feature called exemplars:
Exemplars are only available on histograms and counters. On each bucket of a histogram, you can store a single list of key-value pairs (typically
trace_idandspan_id). The prometheus server will scrape those and they can be displayed in tools like Grafana. That way, when you're investigating requests above 10 seconds or whatever, you can find examples to dig into in a tracing tool like Honeycomb.Exemplars considered experimental and must be enabled on the prometheus server. That said, this feature has existed for four years with support built into tools like Grafana already.
This PR would be a breaking change because it adds an argument to the
Sampleconstructor. That said, I expect that most library users do not deal with samples directly.This PR only adds exemplars to histograms.
--
This PR does add a couple of hlint warnings:
But I didn't really like the suggested changes and I see there are plenty of other hlint warnings, so maybe lmk if it's fine to ignore those.
I still need to do some cleanup, but this is now working end-to-end and exemplars can be queried from the prometheus server and viewed in grafana