[operate] metric ingestion fails due to "Too many parts" in time_series_v4 tables under moderate load

After running SigNoz for approximately 12 hours with a moderate metrics ingestion load (Kubernetes cluster monitoring), the signoz_metrics database—specifically the time_series_v4_1week table—accumulates over 3,000 active parts, and this number continues to grow. Despite:

Increasing ClickHouse’s part limit (max_parts_in_total)
Manually triggering merges via ALTER TABLE ... MERGE PARTS
…the part count does not decrease, leading to:

Rising disk I/O pressure
Slower metric queries
Risk of hitting system limits (e.g., too many open files)
This suggests that either automatic merging is ineffective or the data ingestion pattern creates too many small parts that cannot be merged efficiently.

How to reproduce
Deploy SigNoz v2.x (Helm chart) on a Kubernetes cluster with default settings.
Enable metrics collection from ~100-node Kubernetes cluster (via Prometheus + kube-state-metrics).
Let the system run for 12+ hours under steady load (~50k–100k samples/sec).
Query ClickHouse system tables:
SELECT table, count() AS parts
FROM system.parts
WHERE database = 'signoz_metrics' AND active = 1
GROUP BY table
ORDER BY parts DESC;
→ Observe time_series_v4_1week has >3000 parts.

errorlog:
{"date_time":"1765261347.566535","thread_name":"TCPServerConnection ([#228])","thread_id":"1062","level":"Error","query_id":"","logger_name":"TCPHandler","message":"Code: 252. DB::Exception: Too many parts (3001 with average size of 37.52 KiB) in table 'signoz_metrics.time_series_v4_1week (b551387e-903a-4682-9588-702f998fc386)'. Merges are processing significantly slower than inserts: while pushing to view signoz_metrics.time_series_v4_1week_mv (c80068d3-13ca-4315-8328-803ad28cd320): while pushing to view signoz_metrics.time_series_v4_1day_mv (95c52533-4529-4119-94c8-580fbc64c7c8): while pushing to view signoz_metrics.time_series_v4_6hrs_mv (93a5f98b-638e-47d2-bc92-5e75910f2354). (TOO_MANY_PARTS), Stack trace (when copying this message, always include the lines below):\n\n0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000f87489b\n1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000009d9940c\n2. DB::Exception::Exception<unsigned long&, ReadableSize, String>(int, FormatStringHelperImpl<std::type_identity<unsigned long&>::type, std::type_identity<ReadableSize>::type, std::type_identity<String>::type>, unsigned long&, ReadableSize&&, String&&) @ 0x00000000148fe9bc\n3. DB::MergeTreeData::delayInsertOrThrowIfNeeded(Poco::Event*, std::shared_ptr<DB::Context const> const&, bool) const @ 0x00000000148fe0f1\n4. DB::runStep(std::function<void ()>, DB::ThreadStatus*, std::atomic<unsigned long>*) @ 0x00000000152e381f\n5. DB::ExceptionKeepingTransform::work() @ 0x00000000152e2fd0\n6. DB::ExecutionThreadContext::executeTask() @ 0x00000000150551e9\n7. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x0000000015048c98\n8. DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x0000000015048072\n9. DB::PushingPipelineExecutor::start() @ 0x000000001505da5d\n10. DB::TCPHandler::processInsertQuery(DB::QueryState&) @ 0x0000000014fa6790\n11. DB::TCPHandler::runImpl() @ 0x0000000014f97608\n12. DB::TCPHandler::run() @ 0x0000000014fb6239\n13. Poco::Net::TCPServerConnection::start() @ 0x00000000186d9707\n14. Poco::Net::TCPServerDispatcher::run() @ 0x00000000186d9b59\n15. Poco::PooledThread::run() @ 0x00000000186a4e3b\n16. Poco::ThreadImpl::runnableEntry(void*) @ 0x00000000186a331d\n17. ? @ 0x00007fe65fc74ac3\n18. ? @ 0x00007fe65fd06850\n","source_file":"src\/Server\/TCPHandler.cpp; auto DB::TCPHandler::runImpl()::(anonymous class)::operator()() const","source_line":"477"


SigNoz backend: 4 nodes × 16 vCPU / 32 GB RAM
ClickHouse cluster: 3 nodes (1 shard, 1 replica), each on dedicated VMs
Storage: Local SSDs

<img width="466" height="611" alt="Image" src="https://github.com/user-attachments/assets/dbf7d736-1fc2-4263-9db9-7c0c420d6c5f" />

<img width="1832" height="754" alt="Image" src="https://github.com/user-attachments/assets/7c59dfec-e38d-4c5d-bc7d-653dd7bd1105" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[operate] metric ingestion fails due to "Too many parts" in time_series_v4 tables under moderate load #9794

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[operate] metric ingestion fails due to "Too many parts" in time_series_v4 tables under moderate load #9794

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions