Skip to content

Conversation

@geekysatbir
Copy link
Contributor

Description

This PR adds a comprehensive "Troubleshooting and Best Practices" section to the Prometheus receiver README. The new section addresses common operational challenges and provides actionable guidance for production deployments.

What's Included

Common Issues and Solutions

  • Metrics Not Appearing: Debugging steps and solutions for connectivity, configuration, and filtering issues
  • High CPU Usage: Optimization strategies including scrape interval tuning and metric filtering
  • Memory Issues: Techniques for managing memory in high-volume environments

Best Practices for Production

  • Multi-Replica Deployments: Manual sharding strategies and TargetAllocator configuration
  • Performance Optimization: Scrape interval recommendations, filtering strategies, and resource management
  • Production Configuration Examples: Ready-to-use configurations with best practices applied

Monitoring and Security

  • Monitoring the Receiver: Key metrics to track and alerting recommendations
  • Security Considerations: TLS configuration, network security, and secret management guidance

Debugging Tips

  • Logging configuration
  • Using the Prometheus API server for inspection
  • Configuration testing strategies

Motivation

The Prometheus receiver README currently lacks guidance for troubleshooting common issues and implementing production best practices. This documentation gap makes it challenging for users to:

  • Diagnose and resolve operational issues quickly
  • Optimize receiver performance in production environments
  • Implement secure and scalable configurations

This guide fills that gap by providing comprehensive, production-tested guidance based on real-world operational experience.

Testing

  • Documentation reviewed for accuracy and completeness
  • All code examples validated for syntax correctness
  • Markdown formatting verified

Related Issues

This addresses the need for better operational documentation mentioned in various community discussions and support channels.

Checklist

  • Code follows the style guidelines of this project
  • Documentation has been updated accordingly
  • Changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works (N/A - documentation only)
  • New and existing unit tests pass locally with my changes (N/A - documentation only)

Added a comprehensive configuration example demonstrating how to use the
Prometheus receiver in a production pipeline with processors (batch, resource)
and exporters (otlp, prometheusremotewrite). This example helps users understand
how to integrate the receiver into a complete observability setup.
…dant config

- Changed scrape_interval from 15s to 5s so batch processor is effective
- Added YAML comment explaining metric_relabel_configs filtering
- Removed redundant service.name from resource processor (already set by receiver)
- Removed external_labels from prometheusremotewrite to keep config simple
- Updated description to reflect changes
…tices guide

This commit adds a new 'Troubleshooting and Best Practices' section to the
Prometheus receiver README, providing users with:

- Common issues and solutions (metrics not appearing, high CPU usage, memory issues)
- Best practices for production deployments (multi-replica setups, performance optimization)
- Monitoring and security considerations
- Debugging tips and techniques

The guide addresses real-world operational challenges and provides actionable
solutions based on production experience. This will help users:
- Quickly diagnose and resolve common issues
- Optimize receiver performance in high-volume environments
- Implement secure and scalable configurations
- Monitor receiver health effectively

This documentation fills a gap in the current README and will significantly
improve the user experience for teams deploying the Prometheus receiver in
production environments.
@geekysatbir geekysatbir requested review from a team, ArthurSens and dashpole as code owners December 13, 2025 03:22
@github-actions github-actions bot added the receiver/prometheus Prometheus receiver label Dec 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

receiver/prometheus Prometheus receiver

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants