Skip to content

CPU throttling on PromProxy node with 1300+ agents #112

@mhussain584

Description

@mhussain584

Hi Paul,
Wishes
Hope you are doing well

Again thanks for your help with release of 1.23.1 last time which fixed my issue.

Now, i've encountered another issue, where i am sending metrics of 1300+ agents with like 500+ dummy agents, prometheus-proxy service is experiencing CPU throttling

Current Node Resource

8 Cores 
32 GB memory

I've upgraded to 2.1.0 for both agent and proxy, still the throttling is evident

PromPorxy logs

  16:33:30.299 INFO  [CallLogging.kt:48] - 200 OK: GET - /blmxx04_hssidsmac_virtual_09_10_147_233_111_hs_metrics - ip-100-65-70-119.ec2.internal [DefaultDispatcher-worker-31]
16:33:30.302 INFO  [CallLogging.kt:48] - 200 OK: GET - /blmxx06_hssidsmac_virtual_09_10_147_233_117_hs_metrics - ip-100-65-70-119.ec2.internal [DefaultDispatcher-worker-36]
16:33:30.305 INFO  [CallLogging.kt:48] - 503 Service Unavailable: GET - /blmxx03_hssidsmac_virtual_15_10_147_233_105_hs_metrics - ip-100-65-70-119.ec2.internal [DefaultDispatcher-worker-62]
16:33:30.305 INFO  [ProxyHttpConfig.kt:103] -  Throwable caught: ClosedByteChannelException [DefaultDispatcher-worker-7]
io.ktor.utils.io.ClosedByteChannelException: Broken pipe
  at io.ktor.utils.io.CloseToken$wrapCause$1.invoke(CloseToken.kt:16)
  at io.ktor.utils.io.CloseToken$wrapCause$1.invoke(CloseToken.kt:16)
  at io.ktor.utils.io.CloseToken.wrapCause(CloseToken.kt:21)
  at io.ktor.utils.io.CloseToken.wrapCause$default(CloseToken.kt:16)
  at io.ktor.utils.io.ByteChannel.getClosedCause(ByteChannel.kt:61)
  at io.ktor.utils.io.ByteReadChannelOperationsKt.rethrowCloseCauseIfNeeded(ByteReadChannelOperations.kt:543)
  at io.ktor.utils.io.ByteChannel.flush(ByteChannel.kt:94)
  at io.ktor.utils.io.CloseHookByteWriteChannel.flush(CloseHookByteWriteChannel.kt)
  at io.ktor.utils.io.ByteReadChannelOperationsKt.copyTo(ByteReadChannelOperations.kt:206)
  at io.ktor.utils.io.ByteReadChannelOperationsKt$copyTo$2.invokeSuspend(ByteReadChannelOperations.kt)
  at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
  at kotlinx.coroutines.DispatchedTaskKt.resume(DispatchedTask.kt:233)
  at kotlinx.coroutines.DispatchedTaskKt.resumeUnconfined(DispatchedTask.kt:175)
  at kotlinx.coroutines.DispatchedTaskKt.dispatch(DispatchedTask.kt:147)
  at kotlinx.coroutines.CancellableContinuationImpl.dispatchResume(CancellableContinuationImpl.kt:470)
  at kotlinx.coroutines.CancellableContinuationImpl.resumeImpl$kotlinx_coroutines_core(CancellableContinuationImpl.kt:504)
  at kotlinx.coroutines.CancellableContinuationImpl.resumeImpl$kotlinx_coroutines_core$default(CancellableContinuationImpl.kt:493)
  at kotlinx.coroutines.CancellableContinuationImpl.resumeWith(CancellableContinuationImpl.kt:359)
  at io.ktor.utils.io.ByteChannel$Slot$Task$DefaultImpls.resume(ByteChannel.kt:239)
  at io.ktor.utils.io.ByteChannel$Slot$Read.resume(ByteChannel.kt:242)
  at io.ktor.utils.io.ByteChannel.closeSlot(ByteChannel.kt:177)
  at io.ktor.utils.io.ByteChannel.flushAndClose(ByteChannel.kt:133)
  at io.ktor.utils.io.ByteWriteChannelOperationsKt$writer$job$1.invokeSuspend(ByteWriteChannelOperations.kt:184)
  at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
  at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:100)
  at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:586)
  at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:829)
  at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:717)
  at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:704)
Caused by: io.ktor.utils.io.ClosedWriteChannelException: Broken pipe
  at io.ktor.utils.io.ByteChannel$writeBuffer$1.invoke(ByteChannel.kt:54)
  at io.ktor.utils.io.ByteChannel$writeBuffer$1.invoke(ByteChannel.kt:54)
  at io.ktor.utils.io.CloseToken.wrapCause(CloseToken.kt:21)
  at io.ktor.utils.io.CloseToken.throwOrNull(CloseToken.kt:26)
  at io.ktor.utils.io.ByteChannel.getWriteBuffer(ByteChannel.kt:54)
  at io.ktor.utils.io.ByteWriteChannelOperationsKt.writeByte(ByteWriteChannelOperations.kt:19)
  at io.ktor.http.cio.internals.CharsKt.writeIntHex(Chars.kt:110)
  at io.ktor.http.cio.ChunkedTransferEncodingKt.writeChunk(ChunkedTransferEncoding.kt:167)
  at io.ktor.http.cio.ChunkedTransferEncodingKt.access$writeChunk(ChunkedTransferEncoding.kt:1)
  at io.ktor.http.cio.ChunkedTransferEncodingKt.encodeChunked(ChunkedTransferEncoding.kt:138)
  at io.ktor.http.cio.ChunkedTransferEncodingKt$encodeChunked$2.invokeSuspend(ChunkedTransferEncoding.kt)
  at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
  at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:100)
  at kotlinx.coroutines.EventLoop.processUnconfinedEvent(EventLoop.common.kt:65)
  at kotlinx.coroutines.DispatchedTaskKt.resumeUnconfined(DispatchedTask.kt:243)
  at kotlinx.coroutines.DispatchedTaskKt.dispatch(DispatchedTask.kt:147)
  at kotlinx.coroutines.CancellableContinuationImpl.dispatchResume(CancellableContinuationImpl.kt:470)
  at kotlinx.coroutines.CancellableContinuationImpl.resumeImpl$kotlinx_coroutines_core(CancellableContinuationImpl.kt:504)
  at kotlinx.coroutines.CancellableContinuationImpl.resumeImpl$kotlinx_coroutines_core$default(CancellableContinuationImpl.kt:493)
  at kotlinx.coroutines.CancellableContinuationImpl.resumeWith(CancellableContinuationImpl.kt:359)
  at io.ktor.utils.io.ByteChannel$Slot$Task$DefaultImpls.resume(ByteChannel.kt:236)
  at io.ktor.utils.io.ByteChannel$Slot$Read.resume(ByteChannel.kt:242)
  at io.ktor.utils.io.ByteChannel.flushWriteBuffer(ByteChannel.kt:389)
  at io.ktor.utils.io.ByteChannel.flush(ByteChannel.kt:96)
  at io.ktor.utils.io.ByteChannel.flushAndClose(ByteChannel.kt:128)
  ... 7 common frames omitted
Caused by: io.ktor.utils.io.ClosedWriteChannelException: Broken pipe
  at io.ktor.utils.io.ByteChannel$writeBuffer$1.invoke(ByteChannel.kt:54)
  at io.ktor.utils.io.ByteChannel$writeBuffer$1.invoke(ByteChannel.kt:54)
  at io.ktor.utils.io.CloseToken.wrapCause(CloseToken.kt:21)
  at io.ktor.utils.io.CloseToken.throwOrNull(CloseToken.kt:26)
  at io.ktor.utils.io.ByteChannel.getWriteBuffer(ByteChannel.kt:54)
  at io.ktor.utils.io.ByteReadChannelOperationsKt.copyTo(ByteReadChannelOperations.kt:175)
  at io.ktor.utils.io.ByteReadChannelOperationsKt$copyTo$1.invokeSuspend(ByteReadChannelOperations.kt)
  at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
  at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:100)
  at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:113)
  at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:89)
  at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:586)
  at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:820)
  ... 2 common frames omitted
Caused by: java.io.IOException: Broken pipe
  at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
  at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62)
  at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:132)
  at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:97)
  at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:53)
  at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:532)
  at io.ktor.network.sockets.CIOWriterKt$attachForWritingDirectImpl$1.invokeSuspend$lambda$1(CIOWriter.kt:42)
  at io.ktor.utils.io.core.ByteReadPacketExtensions_jvmKt.read(ByteReadPacketExtensions.jvm.kt:30)
  at io.ktor.network.sockets.CIOWriterKt$attachForWritingDirectImpl$1.invokeSuspend(CIOWriter.kt:78)
  ... 8 common frames omitted
16:33:30.306 INFO  [CallLogging.kt:48] - 200 OK: GET - /blmxx03_hssidsmac_virtual_04_10_147_234_71_hs_metrics - ip-100-65-70-119.ec2.internal [DefaultDispatcher-worker-7]

Container resource usage

Image

Prom-agent.conf (Just a snippet)

proxy {
  admin.debugEnabled = true

  admin.enabled: true
  metrics.enabled: true

  http.requestLoggingEnabled: true
}

agent {

  proxy.hostname: "promproxy-us-east-1.aws-prod.examplecom:50051"
  admin.enabled: true
  metrics.enabled: true

  pathConfigs: [
    {
      name: "Agent System metrics"
      path: blmxx_agent_sys_metrics
      url: "http://10.147.92.199:9100/metrics"
    },
    {
      name: "Agent metrics"
      path: blmxx_agent_metrics
      labels: "{\"cloud\": \"blmxx\"}"
      url: "http://localhost:8083/metrics"
    },
    {
      name: "blmxx01.iosserver-5xxxxxxxx38.app_metrics"
      path: "blmxx01_iosserver_app_metrics_10_147_234_98_9091"
      labels: "{\"cloud\": \"blmxx01\", \"service\": \"iosserver\", \"ip\": \"10.147.234.98\"}"
      url: "http://10.147.234.98:9091/metrics"
    },
    {
      name: "blmxx01.iosserver-5xxxxxxxx38.sys_metrics"
      path: "blmxx01_iosserver_sys_metrics_10_147_234_98_9100"
      labels: "{\"cloud\": \"blmxx01\", \"service\": \"iosserver\", \"ip\": \"10.147.234.98\"}"
      url: "http://10.147.234.98:9100/metrics"
    }
,
    {
      name: "blmxx01.iosserver-5xxxxxxxx38VIRTUAL_01"
      path: "blmxx01_iosserver_virtual_01_10_147_234_98_hs_metrics"
      labels: "{\"cloud\": \"blmxx01\", \"service\": \"iosserver\", \"ip\": \"10.147.234.98\", \"cradle\": \"01\"}"
      url: "http://10.147.234.98/VIRTUAL/01/hs/metrics"
    },
    {
      name: "blmxx01.iosserver-5xxxxxxxx38.VIRTUAL_02"
      path: "blmxx01_iosserver_virtual_02_10_147_234_98_hs_metrics"
      labels: "{\"cloud\": \"blmxx01\", \"service\": \"iosserver\", \"ip\": \"10.147.234.98\", \"cradle\": \"02\"}"
      url: "http://10.147.234.98/VIRTUAL/02/hs/metrics"
    },
    {
      name: "blmcm01.iosserver-5xxxxxxxx38.VIRTUAL_03"
      path: "blmxx01_iosserver_virtual_03_10_147_234_98_hs_metrics"
      labels: "{\"cloud\": \"blmxx01\", \"service\": \"iosserver\", \"ip\": \"10.147.234.98\", \"cradle\": \"03\"}"
      url: "http://10.147.234.98/VIRTUAL/03/hs/metrics"
    },
    {
      name: "blmxx01.iosserver-5xxxxxxxx38.VIRTUAL_04"
      path: "blmxx01_iosserver_virtual_04_10_147_234_98_hs_metrics"
      labels: "{\"cloud\": \"blmxx01\", \"service\": \"iosserver\", \"ip\": \"10.147.234.98\", \"cradle\": \"04\"}"
      url: "http://10.147.234.98/VIRTUAL/04/hs/metrics"
    },
    {
      name: "blmxx01.iosserver-5xxxxxxxx38.VIRTUAL_05"
      path: "blmxx01_iosserver_virtual_05_10_147_234_98_hs_metrics"
      labels: "{\"cloud\": \"blmxx01\", \"service\": \"iosserver\", \"ip\": \"10.147.234.98\", \"cradle\": \"05\"}"
      url: "http://10.147.234.98/VIRTUAL/05/hs/metrics"
    },
    {
      name: "blmxx0.iosserver-5xxxxxxxx38.VIRTUAL_06"
      path: "blmxx01_iosserver_virtual_06_10_147_234_98_hs_metrics"
      labels: "{\"cloud\": \"blmxx01\", \"service\": \"iosserver\", \"ip\": \"10.147.234.98\", \"cradle\": \"06\"}"
      url: "http://10.147.234.98/VIRTUAL/06/hs/metrics"
    },
    {

Your help on this will be appreciated

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions