Skip to content
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
dist/
dist-newstyle/
.cabal-sandbox/
cabal.sandbox.config
cabal.config
.stack-work
*/*.yaml.lock
.devcontainer
.devcontainer
stack.yaml.lock
91 changes: 91 additions & 0 deletions prometheus-client/docs/memory-use.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Investigating Memory Use in `prometheus-client`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like an odd file to check into main. I think this would be more appropriate as the PR description.


We're running into problems with unexpectedly high memory use in `prometheus-client`, so I've been investigating improving the behavior.

Specifically, we're notificing a significant increase in the amount of time and memory allocated while calling `exportMetricsToText`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Specifically, we're notificing a significant increase in the amount of time and memory allocated while calling `exportMetricsToText`.
Specifically, we're noticing a significant increase in the amount of time and memory allocated while calling `exportMetricsToText`.

Every time we call the endpoint, more and more memory is used to render the response.
There are two reasonable possibilities for this:

1. The `exportMetricsToText` is producing a significantly larger `Text` value, which naturally requires significantly more memory.
2. The metrics themselves are somehow holding on to excessive memory or thunks.

Diagnosing this in our application is difficult - our profiling build is currently broken.
So I'm currently just looking at the code and correcting known smells.

## `LabelPairs`

The `LabelPairs` type was a `[(Text, Text)]`.
Lists and tuples are both known sources of potential laziness issues.
As a first pass, I replaced the type with a `newtype` so I could control the API for accessing and adding entries.
Then I replaced the tuple with a `data LabelPair = LabelPair { labelKey :: !Text, labelValue :: !Text }`.
This should prevent thunk accumulation for labels, and the concrete type may enable further memory improvement from GHC.

The fields on `Sample` are made strict, as well.
This should prevent thunk accumulation on the `Text` and `ByteString`, but a lazy `LabelPairs` is still possible.

Additionally, the `labelPairs` function now uses bang patterns on each `Text` in the tuple.
Since this is the whole of the interface for constructing a `LabelPair`, this should prevent any thunk accumulation on labels.

## `MetricImpl`

A `Metric` had a field `construct :: IO (s, IO [SampleGroup])`.
To avoid tuple, we introduce `MetricImpl s` which uses bang patterns on the fields.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To avoid tuple, we introduce `MetricImpl s` which uses bang patterns on the fields.
To avoid the tuple, we introduce `MetricImpl s` which uses bang patterns on the fields.


In practice, the `MetricImpl s` is almost always instantiated to a reference type, and evaluating a reference doesn't do much to help.
I did find the clarity in names helpful.

## `VectorState`

A `Prometheus.Metric.Vector` previously was defined as:

```haskell
type VectorState l m = (Metric m, Map.Map l (m, IO [SampleGroup]))
```

This `VectorState` was being stored in an `IORef`.
While the operation was evaluating the `VectorState` to WHNF, this only evaluated the tuple constructor, leaving the `Metric` and `Map.Map` unevaluated.
The `Map` is from `Data.Map.Strict`, which means that the values are evaluated to WHNF.
Methods from that module evaluate the structure of the `Map`, but polymorphic methods (ie `Functor`, `Traversable`) *do not*.

We can reduce the possibility of memory leaks here by replacing this with a record, bang patterns, and omitting the tuple for the `MetricImpl` type.

## `Counter` to `Builder`

`Counter` is a very simple metric.
Since the operations used for modifying the `IORef` all evaluate the value to WHNF, and the value is a `Double`, we are not retaining memory here.

However, there is a performance inefficiency: the use

```haskell
let sample = Sample (metricName info) mempty (BS.fromString $ show value)
```

`show :: Double -> String` is going to inefficiently allocate a `[Char]`, which will then be packed using `BS.fromString`.
Later, in consumption of this, we will convert that `ByteString` into a `Builder`.
A more efficient approach would use [`doubleDec`](https://hackage-content.haskell.org/package/bytestring-0.12.2.0/docs/Data-ByteString-Builder.html#v:doubleDec) to directly convert to a `Builder`, avoiding allocating the intermediate `String`.

Since the only actual use of the `Sample`'s payload value is in building the report, we can change `Sample` to contain a `Builder` and encode things directly.
This will improve efficiency by avoiding allocating intermediate `String`s.

## `Histogram`

While investigating `Histogram`, I found a few potential issues:

1. `cumulativeSum` has numerous problems:
* The function holds onto the entire `Map` converted-into-a-`[(Double, Int)]` in order to `zip` it with itself.
* `scanl1` is lazy, similar to `foldl`. On lists, this will result in thunk accumulation.
2. `showFFloat` is used, requiring a `Double -> String -> Text -> Builder` conversion path.
3. The entire computation is done inside of a `STM.atomically`.
This means that, should anything write to the `TVar`, all of the computation will be retried.
This is *probably* bad - we want to capture the state of the metrics *now*, and then return the `SampleGroup`, rather than allowing the computation to be aborted and retried multiple times.

The first two problems are based on the sizeof the histogram, so the number of buckets.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The first two problems are based on the sizeof the histogram, so the number of buckets.
The first two problems are based on the size of the histogram, so the number of buckets.

Every additional bucket causes another float to be rendered into a string, and another list cons cell to be held on to.
Since number of buckets is likely small, this is probably not a big deal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Since number of buckets is likely small, this is probably not a big deal.
Since the number of buckets is likely small, this is probably not a big deal.


However, might as well fix it up where I can!
`scanl1'` does not exist in `base`, but we can avoid retaining the input list in memory by preserving the tuple structure.
A bang pattern on the accumulator can help.

`formatFloat` is used to produce a `Text` label value for the `LabelPair`.
This suggests we can use `Data.Text.Lazy.Builder` to avoid the intermediate `String`.
3 changes: 2 additions & 1 deletion prometheus-client/prometheus-client.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ library
build-depends:
atomic-primops >=0.4
, base >=4.7 && <5
, bytestring >=0.9
, bytestring >=0.11.2
, clock
, containers
, deepseq
Expand Down Expand Up @@ -68,6 +68,7 @@ test-suite spec
default-language: Haskell2010
hs-source-dirs: src, tests
main-is: Spec.hs
build-tool-depends: hspec-discover:hspec-discover
build-depends:
QuickCheck
, atomic-primops
Expand Down
24 changes: 13 additions & 11 deletions prometheus-client/src/Prometheus.hs
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ module Prometheus (
-- set the value of a gauge as well as add and subtract arbitrary values.
--
-- >>> myGauge <- register $ gauge (Info "my_gauge" "An example gauge")
-- >>> setGauge myGauge 100
-- >>> setGauge myGauge 100
-- >>> addGauge myGauge 50
-- >>> subGauge myGauge 25
-- >>> getGauge myGauge
Expand Down Expand Up @@ -100,7 +100,7 @@ module Prometheus (
-- [(1 % 2,0.0),(9 % 10,0.0),(99 % 100,0.0)]

, Summary
, Quantile
, Quantile(..)
, summary
, defaultQuantiles
, getSummary
Expand All @@ -113,7 +113,7 @@ module Prometheus (
-- summaries in that they can be meaningfully aggregated across processes.
--
-- >>> myHistogram <- register $ histogram (Info "my_histogram" "") defaultBuckets
-- >>> observe myHistogram 0
-- >>> observe myHistogram 0
-- >>> getHistogram myHistogram
-- fromList [(5.0e-3,1),(1.0e-2,0),(2.5e-2,0),(5.0e-2,0),(0.1,0),(0.25,0),(0.5,0),(1.0,0),(2.5,0),(5.0,0),(10.0,0)]
, Histogram
Expand All @@ -130,11 +130,11 @@ module Prometheus (
-- partitioned across a set of dimensions.
--
-- >>> myVector <- register $ vector ("method", "code") $ counter (Info "http_requests" "")
-- >>> withLabel myVector ("GET", "200") incCounter
-- >>> withLabel myVector ("GET", "200") incCounter
-- >>> withLabel myVector ("GET", "404") incCounter
-- >>> withLabel myVector ("POST", "200") incCounter
-- >>> getVectorWith myVector getCounter
-- >>> withLabel myVector ("GET", "200") incCounter
-- >>> withLabel myVector ("GET", "200") incCounter
-- >>> withLabel myVector ("GET", "404") incCounter
-- >>> withLabel myVector ("POST", "200") incCounter
-- >>> getVectorWith myVector getCounter
-- [(("GET","200"),2.0),(("GET","404"),1.0),(("POST","200"),1.0)]
-- >>> exportMetricsAsText >>= Data.ByteString.Lazy.putStr
-- # HELP http_requests
Expand Down Expand Up @@ -199,11 +199,11 @@ module Prometheus (
-- >>> :m +Data.ByteString.UTF8
-- >>> newtype CPUTime = MkCPUTime ()
-- >>> let info = Info "cpu_time" "The current CPU time"
-- >>> let toValue = Data.ByteString.UTF8.fromString . show
-- >>> let toSample = Sample "cpu_time" [] . toValue
-- >>> let toValue = Data.ByteString.Builder.stringUtf8 . show
-- >>> let toSample = Sample "cpu_time" mempty . toValue
-- >>> let toSampleGroup = (:[]) . SampleGroup info GaugeType . (:[]) . toSample
-- >>> let collectCPUTime = fmap toSampleGroup getCPUTime
-- >>> let cpuTimeMetric = Metric (return (MkCPUTime (), collectCPUTime))
-- >>> let cpuTimeMetric = Metric (return $ MetricImpl (MkCPUTime ()) collectCPUTime)
-- >>> register cpuTimeMetric
-- >>> exportMetricsAsText >>= Data.ByteString.Lazy.putStr
-- # HELP cpu_time The current CPU time
Expand Down Expand Up @@ -244,6 +244,7 @@ module Prometheus (

, Info (..)
, Metric (..)
, MetricImpl (..)
, Sample (..)
, SampleGroup (..)
, SampleType (..)
Expand All @@ -266,5 +267,6 @@ import Prometheus.Registry
-- $setup
-- >>> :module +Prometheus
-- >>> :module +Control.Monad
-- >>> :module +Data.ByteString.Builder
-- >>> :set -XOverloadedStrings
-- >>> unregisterAll
9 changes: 5 additions & 4 deletions prometheus-client/src/Prometheus/Export/Text.hs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ module Prometheus.Export.Text (
exportMetricsAsText
) where

import Prometheus.Label (unLabelPairs, LabelPair(..))
import Prometheus.Info
import Prometheus.Metric
import Prometheus.Registry
Expand Down Expand Up @@ -65,18 +66,18 @@ exportSamples samples =
exportSample :: Sample -> Build.Builder
exportSample (Sample name labels value) =
Build.byteString (T.encodeUtf8 name)
<> (case labels of
<> (case unLabelPairs labels of
[] -> mempty
l:ls ->
Build.charUtf8 '{'
<> exportLabel l
<> mconcat [ Build.charUtf8 ',' <> exportLabel l' | l' <- ls ]
<> Build.charUtf8 '}')
<> Build.charUtf8 ' '
<> Build.byteString value
<> value

exportLabel :: (Text, Text) -> Build.Builder
exportLabel (key, value) =
exportLabel :: LabelPair -> Build.Builder
exportLabel LabelPair { labelKey = key, labelValue = value } =
Build.byteString (T.encodeUtf8 key)
<> Build.charUtf8 '='
<> Build.stringUtf8 (show value)
56 changes: 33 additions & 23 deletions prometheus-client/src/Prometheus/Label.hs
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE GADTs #-}

module Prometheus.Label (
Label (..)
, LabelPairs
, LabelPairs(..)
, LabelPair(..)
, Label0
, Label1
, Label2
Expand All @@ -20,7 +25,12 @@ import Data.Text

-- | A list of tuples where the first value is the label and the second is the
-- value of that label.
type LabelPairs = [(Text, Text)]
newtype LabelPairs = LabelPairs { unLabelPairs :: [LabelPair] }
deriving stock Show
deriving newtype (Semigroup, Monoid)

data LabelPair = LabelPair { labelKey :: !Text, labelValue :: !Text }
deriving stock Show

-- | Label describes a class of types that can be used to as the label of
-- a vector.
Expand All @@ -30,61 +40,61 @@ class Ord l => Label l where
type Label0 = ()

instance Label () where
labelPairs () () = []
labelPairs () () = LabelPairs mempty

type Label1 = Text

instance Label Text where
labelPairs key value = [(key, value)]
labelPairs !key !value = LabelPairs [LabelPair key value]

type Label2 = (Text, Text)

instance (a ~ Text, b ~ a) => Label (a, b) where
labelPairs (k1, k2) (v1, v2) = [(k1, v1), (k2, v2)]
labelPairs (!k1, !k2) (!v1, !v2) = LabelPairs [LabelPair k1 v1, LabelPair k2 v2]

type Label3 = (Text, Text, Text)

instance (a ~ Text, b ~ a, c ~ a) => Label (a, b, c) where
labelPairs (k1, k2, k3) (v1, v2, v3) = [(k1, v1), (k2, v2), (k3, v3)]
labelPairs (!k1, !k2, !k3) (!v1, !v2, !v3) = LabelPairs [LabelPair k1 v1, LabelPair k2 v2, LabelPair k3 v3]

type Label4 = (Text, Text, Text, Text)

instance (a ~ Text, b ~ a, c ~ a, d ~ a) => Label (a, b, c, d) where
labelPairs (k1, k2, k3, k4) (v1, v2, v3, v4) =
[(k1, v1), (k2, v2), (k3, v3), (k4, v4)]
labelPairs (!k1, !k2, !k3, !k4) (!v1, !v2, !v3, !v4) =
LabelPairs [LabelPair k1 v1, LabelPair k2 v2, LabelPair k3 v3, LabelPair k4 v4]

type Label5 = (Text, Text, Text, Text, Text)

instance (a ~ Text, b ~ a, c ~ a, d ~ a, e ~ a) => Label (a, b, c, d, e) where
labelPairs (k1, k2, k3, k4, k5) (v1, v2, v3, v4, v5) =
[(k1, v1), (k2, v2), (k3, v3), (k4, v4), (k5, v5)]
labelPairs (!k1, !k2, !k3, !k4, !k5) (!v1, !v2, !v3, !v4, !v5) =
LabelPairs [LabelPair k1 v1, LabelPair k2 v2, LabelPair k3 v3, LabelPair k4 v4, LabelPair k5 v5]

type Label6 = (Text, Text, Text, Text, Text, Text)

instance (a ~ Text, b ~ a, c ~ a, d ~ a, e ~ a, f ~ a) => Label (a, b, c, d, e, f) where
labelPairs (k1, k2, k3, k4, k5, k6) (v1, v2, v3, v4, v5, v6) =
[(k1, v1), (k2, v2), (k3, v3), (k4, v4), (k5, v5), (k6, v6)]
labelPairs (!k1, !k2, !k3, !k4, !k5, !k6) (!v1, !v2, !v3, !v4, !v5, !v6) =
LabelPairs [LabelPair k1 v1, LabelPair k2 v2, LabelPair k3 v3, LabelPair k4 v4, LabelPair k5 v5, LabelPair k6 v6]

type Label7 = (Text, Text, Text, Text, Text, Text, Text)

instance (a ~ Text, b ~ a, c ~ a, d ~ a, e ~ a, f ~ a, g ~ a) => Label (a, b, c, d, e, f, g) where
labelPairs (k1, k2, k3, k4, k5, k6, k7) (v1, v2, v3, v4, v5, v6, v7) =
[(k1, v1), (k2, v2), (k3, v3), (k4, v4), (k5, v5), (k6, v6),
(k7, v7)]
labelPairs (!k1, !k2, !k3, !k4, !k5, !k6, !k7) (!v1, !v2, !v3, !v4, !v5, !v6, !v7) =
LabelPairs [LabelPair k1 v1, LabelPair k2 v2, LabelPair k3 v3, LabelPair k4 v4, LabelPair k5 v5, LabelPair k6 v6,
LabelPair k7 v7]

type Label8 = (Text, Text, Text, Text, Text, Text, Text, Text)

instance (a ~ Text, b ~ a, c ~ a, d ~ a, e ~ a, f ~ a, g ~ a, h ~ a) => Label (a, b, c, d, e, f, g, h) where
labelPairs (k1, k2, k3, k4, k5, k6, k7, k8)
(v1, v2, v3, v4, v5, v6, v7, v8) =
[(k1, v1), (k2, v2), (k3, v3), (k4, v4), (k5, v5), (k6, v6),
(k7, v7), (k8, v8)]
labelPairs (!k1, !k2, !k3, !k4, !k5, !k6, !k7, !k8)
(!v1, !v2, !v3, !v4, !v5, !v6, !v7, !v8) =
LabelPairs [LabelPair k1 v1, LabelPair k2 v2, LabelPair k3 v3, LabelPair k4 v4, LabelPair k5 v5, LabelPair k6 v6,
LabelPair k7 v7, LabelPair k8 v8]

type Label9 = (Text, Text, Text, Text, Text, Text, Text, Text,
Text)

instance (a ~ Text, b ~ a, c ~ a, d ~ a, e ~ a, f ~ a, g ~ a, h ~ a, i ~ a) => Label (a, b, c, d, e, f, g, h, i) where
labelPairs (k1, k2, k3, k4, k5, k6, k7, k8, k9)
(v1, v2, v3, v4, v5, v6, v7, v8, v9) =
[(k1, v1), (k2, v2), (k3, v3), (k4, v4), (k5, v5), (k6, v6),
(k7, v7), (k8, v8), (k9, v9)]
labelPairs (!k1, !k2, !k3, !k4, !k5, !k6, !k7, !k8, !k9)
(!v1, !v2, !v3, !v4, !v5, !v6, !v7, !v8, !v9) =
LabelPairs [LabelPair k1 v1, LabelPair k2 v2, LabelPair k3 v3, LabelPair k4 v4, LabelPair k5 v5, LabelPair k6 v6,
LabelPair k7 v7, LabelPair k8 v8, LabelPair k9 v9]
19 changes: 13 additions & 6 deletions prometheus-client/src/Prometheus/Metric.hs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

module Prometheus.Metric (
Metric (..)
, MetricImpl (..)
, Sample (..)
, SampleGroup (..)
, SampleType (..)
Expand All @@ -12,6 +13,7 @@ import Prometheus.Label

import Control.DeepSeq
import qualified Data.ByteString as BS
import qualified Data.ByteString.Builder as Builder
import Data.Text (Text)


Expand All @@ -34,13 +36,13 @@ instance Show SampleType where
-- | A single value recorded at a moment in time. The sample type contains the
-- name of the sample, a list of labels and their values, and the value encoded
-- as a ByteString.
data Sample = Sample Text LabelPairs BS.ByteString
deriving (Show)
data Sample = Sample !Text !LabelPairs !Builder.Builder
-- deriving (Show)

-- | A Sample group is a list of samples that is tagged with meta data
-- including the name, help string, and type of the sample.
data SampleGroup = SampleGroup Info SampleType [Sample]
deriving (Show)
data SampleGroup = SampleGroup !Info !SampleType [Sample]
-- deriving (Show)

-- | A metric represents a single value that is being monitored. It is comprised
-- of a handle value and a collect method. The handle value is typically a new
Expand All @@ -58,8 +60,13 @@ newtype Metric s =
-- metric. A counter would return state pointing to the mutable
-- reference.
-- 2. An 'IO' action that samples the metric and returns 'SampleGroup's.
-- This is the data that will be stored by Prometheus.
construct :: IO (s, IO [SampleGroup])
-- This is the data that will be stored by Prometheus.
construct :: IO (MetricImpl s)
}

data MetricImpl s = MetricImpl
{ metricImplState :: !s
, metricImplCollect :: IO [SampleGroup]
}

instance NFData a => NFData (Metric a) where
Expand Down
Loading