forked from apache/cloudberry
-
Notifications
You must be signed in to change notification settings - Fork 0
Add yagp_hooks_collector #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Just create the overal project structure with a Makefile to generate protobufs, compile it into a shared library extension and install it
as a fundation to build on
- borrow GlowByte code to generate plan text and SessionInfo - borrow code from our in-house pg_stat_statements to generate query id and plan id - refactor code to follow common name conventions and identations
- do some minor refactoring to follow common naming convention - add additional message right after ExecutorStart hook
1) Query instrumentation 2) /proc/self/* stats
It allows finer granularity than executor hooks. Also removed some code duplication and data duplicaton
1. Initialize query instrumentation to NULL so that it can be properly checked later (temporary solution, need to find a proper fix) 2. Don't collect spillinfo on query end. Reason: a) it will always be zero and b) it could crash if we failed to enlarge a spillfile. Seems like we need some cummulative statistics for spillinfo. Need to check what explain analyze use.
1. Sync with protobuf changes to collect segment info 2. Remove noisy logging 3. Fix some missing node types in pg_stat_statements
Reason: when query info hook is called with status 'DONE' planstate is already deallocated by ExecutorEnd
1) Give higher gRPC timeouts to query dispatcher as losing messages there is more critical 2) If we've failed to send a message via gRPC we notify a background thread about it and refuse sending any new message until this thread re-establishes the lost connection
Don't collect system queries with empty query text and ccnt == 0
Rethrowing them might break other extensions and even query execution pipeline itself
We inherited this issue from PostgreSQL. PostgreSQL uses glibc to sort strings. In version glibc=2.28, collations broke down badly (in general, there are no guarantees when updating glibc). Changing collations breaks indexes. Similarly, a cluster with different collations also behaves unpredictably. What and when something has changed in glibc can be found on https://github.com/ardentperf/glibc-unicode-sorting Also there is special postgresql-wiki https://wiki.postgresql.org/wiki/Locale_data_changes And you tube video https://www.youtube.com/watch?v=0E6O-V8Jato In short, the issue can be seen through the use of bash: ( echo "1-1"; echo "11" ) | LC_COLLATE=en_US.UTF-8 sort gives the different results in ubunru 18.04 and 22.04. There is no way to solve the problem other than by not changing the symbol order. We freeze symbol order and use it instead of glibc. Here the solution https://github.com/postgredients/mdb-locales. In this PR I have added PostgreSQL patch that replaces all glibc locale-related calls with a calls to an external libary. It activates using new configure parameter --with-mdblocales, which is off by default. Using custom locales needs libmdblocales1 package and mdb-locales package with symbol table. Build needs libmdblocales-dev package with headers.
* MDB admin patch & tests This patch introcudes new pseudo-pre-defined role "mdb_admin". Introduces 2 new function: extern bool mdb_admin_allow_bypass_owner_checks(Oid userId, Oid ownerId); extern void check_mdb_admin_is_member_of_role(Oid member, Oid role); To check mdb admin belongship and role-to-role ownership transfer correctness. Our mdb_admin ACL model is the following: * Any roles user or/and roles can be granted with mdb_admin * mdb_admin memeber can tranfser ownershup of relations, namespaces and functions to other roles, if target role in neither: superuser, pg_read_server_files, pg_write_server_files nor pg_execute_server_program. This patch allows mdb admin to tranfers ownership on non-superuser objects * f
This commit introduces new mdb internal role mdb_superuser. Role is capaple of: GRANT/REVOKE any set of priviledges to/from any object in database. Has power of pg_database_owner in any database, including: DROP any object in database (except system catalog and stuff) Role is NOT capaple of: Create database, role, extension or alter other roles with such priviledges. Transfer ownership to /pass has_priv of roles: PG_READ_ALL_DATA PG_WRITE_ALL_DATA PG_EXECUTE_SERVER_PROGRAM PG_READ_SERVER_FILES PG_WRITE_SERVER_FILES PG_DATABASE_OWNER Fix configure.ac USE_MDBLOCALES option handling Apply autoreconf stuff Set missing ok parameter ito true while acquiring mdb_superuser oid In regress tests, nobody creates mdb_superuser role, so missing ok is fine Allow mdb_superuser to have power of pg_database_owner Allow mdb_superuser to alter objects and grant ACl to objects, owner by pg_database_owner. Also, when acl check, allow mdb_supersuer use pg_database_owner role power to pass check
Copy of [1] from gpdb to collect workfile stats in yagp-hooks-collector. [1] open-gpdb/gpdb@8813a55
Copy of [1] from gpdb to create a global QueryState for unique hashing for yagp-hooks-collector. [1] open-gpdb/gpdb@476b540
Update usage in yagp_hooks_collector of - heap_create_with_catalog() - standard_ExecutorRun() - standard_ProcessUtility() - InstrAlloc() - CreateTemplateTupleDesc() - ExplainInitState() -> NewExplainState() - gpmon_gettmid() -> gp_gettmid() - Gp_session_role -> Gp_role - strerror(errno) -> "%m" - Include utils/varlena.h for SplitIdentifierString() in gpdbwrappers.cpp.
Remove unnecessary copies of the core jumbling functions from yagp_hooks_collector/stat_statements_parser. In commit [1] query jumbling moved to core, thus there is no need to keep a copy of jumbling functions in yagp_hooks_collector. [1] 5fd9dfa
In yagp_hooks_collector we need control over place where function is executed, and Cloudberry supports only set-returning functions to execute on COORDINATOR so change the type of the functions. We can see the error below without this change: ERROR: EXECUTE ON COORDINATOR is only supported for set-returning functions.
In gpdb create table was executed for each partition. Now one single create table is executed. Thus only one create table query goes through executor. Change the test accordingly.
Change makefile, test and add it to CI of yagp_hooks_collector. Add option --with-yagp-hooks-collector. Similarly to [1]. [1] open-gpdb/gpdb@7be8893
Correct CI for yagp_hooks_collector to use correct env script.
Correct defines for token ids copied from gram.y.
Full copy of [1] for yagp_hooks_collector. [1] open-gpdb/gpdb@845278f#diff-fa2654417413bbb37d47ecf1644dc5af90c76c77f2a90e05c27107967b5f6fd8
Similarly to [1] add missing executor query info hooks. [1] open-gpdb/gpdb@87fc05d
Copy of [1] with additinal changed needed for Clouberry are described below: The testing C functions have changed to set-returning ones if comparing with [1] because we need a control over the place where function is executed - either on master or segments, and in Cloudberry these functions must return set of values so they were changed to return SETOF. [1] open-gpdb/gpdb@989ca06
Copy of [1] - send() may return -1 in case of an error, do not add -1 to total_bytes sent. [1] open-gpdb/gpdb@e1f6c08
The extension generates normalized query text and plan using jumbling functions. Those functions may fail when translating to wide character if the current locale cannot handle the character set. Fix changes functions that generate normalized query text/plan to noexcept versions so we can check if error occured and continute execution. The test checks that even when those functions fail, the plan is still executed. This test is partially taken from src/test/regress/gp_locale.sql.
Cloudberry builds treat compiler warnings as errors. For consistency, this behavior has been enabled in yagp_hooks_collector. This commit also fixes the warnings in yagp_hooks_collector.
leborchuk
approved these changes
Jan 22, 2026
leborchuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add
yagp_hooks_collectorextension to Cloudberry, imported fromopen-gpdb/yagp_hooks_collectorwith full commit history. (Don't worry about # of commits and lines added, it has only 15-20 short-mid commits described below).Structure
open-gpdb/yagp_hooks_collectorMost changes are ports from
open-gpdb/gpdb/gpcontrib/yagp_hooks_collector, adapted for Cloudberry (see 7c6f24d for details).DONT SQUASH OR REBASE this pr, it's only for the sake of getting feedback, then, I am going to merge yagp-hooks-collector branch, so the history stays.
open-gpdb/gpdb/gpcontrib/yagp_hooks_collector:2680062 [yagp_hooks_collector] Fix warnings and error them
Cloudberry builds treat compiler warnings as errors. For consistency,
this behavior has been enabled in yagp_hooks_collector.
This commit also fixes the warnings in yagp_hooks_collector.
5f0eb82 [yagp_hooks_collector] Make gen of norm plan/query noexcept
The extension generates normalized query text and plan using jumbling functions.
Those functions may fail when translating to wide character if the current locale
cannot handle the character set.
Fix changes functions that generate normalized query text/plan to noexcept versions
so we can check if error occured and continute execution.
The test checks that even when those functions fail, the plan is still executed.
This test is partially taken from src/test/regress/gp_locale.sql.
open-gpdb/gpdb/gpcontrib/yagp_hooks_collector:90803d1 [yagp_hooks_collector] Refactor
Similarly to [1].
[1] open-gpdb/gpdb@bdfabca#diff-99e92ab48c7310c149ea2bd3414a20a1edf31f26823253906aae1981e088775aR39-R79
38dd155 [yagp_hooks_collector] Correct add of bytes sent
Copy of [1] - send() may return -1 in case of an error, do not add -1 to total_bytes sent.
[1] open-gpdb/gpdb@e1f6c08
00dff37 [yagp_hooks_collector] Add test for UDS sending
Copy of [1] with additinal changed needed for Clouberry are described below:
The testing C functions have changed to set-returning ones if comparing with
[1] because we need a control over the place where function is executed -
either on master or segments, and in Cloudberry these functions must return
set of values so they were changed to return SETOF.
[1] open-gpdb/gpdb@989ca06
7ab24ed [yagp_hooks_collector] Add submit & done hooks
Similarly to [1] add missing executor query info hooks.
[1] open-gpdb/gpdb@87fc05d
f3015f7 [yagp_hooks_collector] Add consistent filtering
Full copy of [1] for yagp_hooks_collector.
[1] open-gpdb/gpdb@845278f#diff-fa2654417413bbb37d47ecf1644dc5af90c76c77f2a90e05c27107967b5f6fd8
7cff09c [yagp_hooks_collector] Correct tokens from gram.y
Correct defines for token ids copied from gram.y.
b0b2e7b [yagp_hooks_collector] Add comments for func args
gpcontrib-yagp-hooks-collectorto CI, change Makefile to compile the extension with cloudberry along with configure option to enable it--with-yagp_hooks_collector:9be5f14 [yagp_hooks_collector] Change greenplum_path.sh to cloudberry-env.sh
Correct CI for yagp_hooks_collector to use correct env script.
8f0bf5f [yagp_hooks_collector] Add CI test and with option
Change makefile, test and add it to CI of yagp_hooks_collector.
Add option --with-yagp-hooks-collector. Similarly to [1].
[1] open-gpdb/gpdb@7be8893
2afca84 [yagp_hooks_collector] Change test out for part tbl
In gpdb create table was executed for each partition. Now one single create table is executed. Thus only one create table query goes through executor. Change the test accordingly.
b7f9d2c [yagp_hooks_collector] Change test functions to SRF
In yagp_hooks_collector we need control over place where function is executed, and Cloudberry supports only set-returning functions to execute on COORDINATOR so change the type of the functions. We can see the error below without this change: ERROR: EXECUTE ON COORDINATOR is only supported for set-returning functions.
7b0654e [yagp_hooks_collector] Del redundant funcs
Remove unnecessary copies of the core jumbling functions from yagp_hooks_collector/stat_statements_parser.
In commit [1] query jumbling moved to core, thus there is no need to keep a copy of jumbling functions in yagp_hooks_collector.
[1] 5fd9dfa
7c6f24d [yagp_hooks_collector] Use updated names and func's interfaces
Update usage in yagp_hooks_collector of
af1092e [yagp_hooks_collector] Port YagpQueryState from gpdb
Copy of [1] from gpdb to create a global QueryState for unique hashing for yagp-hooks-collector.
[1] open-gpdb/gpdb@476b540
b362823 [yagp_hooks_collector] Port workfile stats from gpdb
Copy of [1] from gpdb to collect workfile stats in yagp-hooks-collector.
[1] open-gpdb/gpdb@8813a55
8fb4a8e Add 'gpcontrib/yagp_hooks_collector/' from commit 'ca620e9f75f5bc717620d529ef6753c7666c2d99'
git-subtree-dir: gpcontrib/yagp_hooks_collector
git-subtree-mainline: e7eae0d
git-subtree-split: ca620e9