Skip to content

Conversation

@ridwanmsharif
Copy link
Collaborator

chore: rebase feature/shutdownscrape on upstream/main

This change does the following:
 - Rebase the feature branch on top of main to get rid of some vulns and
   some dependency issues that popped up when upgrading the cloud run
   sidecar
 - Fix tests by adding more robust and clear instructions on when to
   notify downstream managers of changes to the discovered targets.

fionaliao and others added 30 commits September 10, 2024 10:16
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>

Co-authored-by: Owen Williams <owen.williams@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
This reverts commit e1c2c51.

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Julien <roidelapluie@o11y.eu>
Mantine UI: Fix 404 on /discovered-alertmanagers
Bring back documentation link in the form of an action button
Signed-off-by: Julien <roidelapluie@o11y.eu>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Mantine UI: Use actual lookback delta in explain
fix(utf8): propagate validationScheme config to scraping options
Signed-off-by: Julien <roidelapluie@o11y.eu>
…-division

promql: correctly handle unary negation of native histograms and add tests for multiplication and division of native histograms by negative scalars
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
Signed-off-by: Julien <roidelapluie@o11y.eu>
Explain: Use param scalars in aggregations description
Follow-up to prometheus/prometheus#14893

Signed-off-by: Julius Volz <julius.volz@gmail.com>
test: pass enable_npm to setup_environment
Co-authored-by: Julien <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Fix HTML rendering for aggregator Explain view
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
 Conflicts:
	scrape/scrape_test.go
          Pick both changes.
juliusv and others added 22 commits October 2, 2024 15:19
to avoid port collisions

Signed-off-by: Jorge Alberto Díaz Orozco <diazorozcoj@gmail.com>
…Port

Add a mutex and used ports list to the tests random port generator to avoid port collisions
Adds eval_info command to PromQL testing framework
Signed-off-by: Julien <roidelapluie@o11y.eu>
…entelemetry-io-449d04d146

Bump the go-opentelemetry-io group with 9 updates
Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.195.0 to 0.199.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md)
- [Commits](googleapis/google-api-go-client@v0.195.0...v0.199.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
…entation/examples/remote_storage/github.com/prometheus/common-0.60.0

Bump github.com/prometheus/common from 0.57.0 to 0.60.0 in /documentation/examples/remote_storage
Signed-off-by: Julien <roidelapluie@o11y.eu>
…e.golang.org/api-0.199.0

Bump google.golang.org/api from 0.195.0 to 0.199.0
Now we check that a rule execution has taken place.

This also reduces the time to run the rules tests from 45s to 25s.

Signed-off-by: Julien <roidelapluie@o11y.eu>
Notify web UI when starting up and shutting down
…14965)

* enhance: wip ct parse optimizations

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: further work on optimization

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: further improvements and remove unused code

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: improve optimizations and fix some CT parse errors

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: check for LsetHash along with name

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: cleanup and documentation

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* enhance: improve comments and add cleaner functions

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: improve comments and add cleaner functions

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: rename to resetCTParseValues

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: post-merge fixes

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: add all possible reserved suffixes

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* test: separate CT values for each metric

Signed-off-by: Manik Rana <manikrana54@gmail.com>

---------

Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <Manikrana54@gmail.com>
Some of the changes are a bit unreadable because the previous files were not
saved with the project's linter / auto-formatter settings applied. But it's
basically:

* For icons that are not Mantine-native components, use the rem() function
  for computing their size, so they scale correctly with the root font size.
  See https://mantine.dev/styles/rem/.
* Try a different icon for the notifications tray, since the bell icon was
  already used for Prometheus alerts. Other candidates from
  https://tabler.io/icons would be IconExclamationCircle or
  IconDeviceDesktopExclamation or IconMessageCircleExclamation.
* The server startup alert looked a bit cramped, introduced a Stack to add
  spacing between the text and the progress bar.
* Added a bit of spacing between notification text and date. Things looked
  cramped. To make things look ok with that, I also top-aligned the
  notification text and icon.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
…b.com/linode/linodego-1.41.0

Bump github.com/linode/linodego from 1.40.0 to 1.41.0
Style cleanups, mostly for web notifications and startup alert
To experiment with Prometheus pull methond on semi-short lived serverless environments
we (Google Cloud) want to attempt init and shutdown scrapes to have some basic data from
short lived workloads. This has little sense for Prometheus scrape manager,
but it might be useful for external importers like OpenTelemetry and distributions,
which can be deployed in various configurations (e.g. sidecar with just single target).

It's up to us to merge it or close. I would be fine maintaining on upstream, but we
might as well keep it in our fork to experiment on this--and perhaps merge once official
contrib Otel decides to use those options.

NOTE: Also added high level scrape manager test. I think it was kind of bad we never had test integrating
all scrape manager pieces. Can add that in separate PR as well.

Alternatives attempted:

* Manager Option for scrape on shutdown. This was not trivial due to 3 different context we pass. We would need to disconnect them from parent context (sometimes) for scrapeAndReport. Intrusive and prone to mistaken cancelations.
* ForceScrape method. This is not trivia as the scrape would need to be additionally locked. Plus semantics on what to do after (continue interval?) is not clear. We can scope this problem down to stopAndScrape semantics.

Signed-off-by: bwplotka <bwplotka@gmail.com>
This change adds an offset to the test scraper so a scrape is not
initiated when the context is cancelled. Without this change, there
was a race between the context cancellation and the scrape jitter wait
and this change fixes that.

Can verify using:
go test -run ^TestScrapeLoopStopBeforeRun$ github.com/prometheus/prometheus/scrape -race -count 1000

Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com>
This config option is quite useful on serverless environments where
we are sensitive to the start up latencies of the scraper. The
serverless instance might only last for a few seconds and may not be
able to afford the minimum 5s reload interval before the scrape pools
are created.

Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com>
This change adds an option that will allow users to skip the initial
wait before sending target sets to the scrape manager. This is
particularly useful in environments where the startup latency is
required to be low just as in serverless deployments.

Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com>
@ridwanmsharif ridwanmsharif changed the base branch from feature/shutdownscrape to main October 5, 2024 06:51
@ridwanmsharif ridwanmsharif changed the base branch from main to feature/shutdownscrape October 5, 2024 06:52
@ridwanmsharif ridwanmsharif marked this pull request as draft October 5, 2024 06:55
@ridwanmsharif
Copy link
Collaborator Author

I feel like maybe I don't know what I'm doing????

I need to update the https://github.com/GoogleCloudPlatform/run-gmp-sidecar/ repo with the latest OTel version components and it relies on this branch. OTel also relies on some common prometheus libraries and in the sidecar repo, we replace prometheus libraries with equivalent one from this branch.

But things diverged and lots of things broke when I upgraded. I tried in this PR to rebase feature/shutdownscrape on upstream/main and it does work. The last few commits are all that really matter in this change, but since its a rebase, the PR looks awful. Is there a better way?

Should I just push to another branch and sunset feature/shutdownscrape in favor of that branch?

The last commit does make some changes that need to be reviewed though - was needed to make tests work but I need an adult to sanity check what is happening in there in the discovery and scrape managers.

Cherrypicked prometheus/prometheus#12650 since I
was running into prometheus/prometheus#12649
and open-telemetry/opentelemetry-collector-contrib#24881
when upgrading the OTel collector.

Original author:    Goutham Veeramachaneni <gouthamve@gmail.com>

Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com>
This change does the following:
 - Rebase the feature branch on top of main to get rid of some vulns and
   some dependency issues that popped up when upgrading the cloud run
   sidecar
 - Fix tests by adding more robust and clear instructions on when to
   notify downstream managers of changes to the discovered targets.

Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com>
@ridwanmsharif ridwanmsharif force-pushed the ridwanmsharif/rebase-main branch from 2afed06 to c9a4070 Compare October 8, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.