Reproduce and fix foreign key versioning bug #13
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe your changes:
I worked on fixing a bug where foreign key constraints caused unnecessary version updates during ingestion. The root cause was that the
_table_constraints_handlerfunction, responsible for patching table constraints, did not includereferredColumnsin its unique key generation for foreign keys. This led to foreign keys with identicalconstraintTypeandcolumnsbut varyingreferredColumns(e.g.,department.idvspublic.department.id) being incorrectly identified as the same constraint, resulting in spurious reordering and version changes.My changes introduce a new helper function
_get_constraint_keythat incorporatesreferredColumnsinto the unique key for foreign keys. The_table_constraints_handlernow uses this improved key generation, ensuring that foreign key constraints are correctly matched and preventing unnecessary version bumps.I tested these changes by:
referredColumns.ingestion/tests/unit/metadata/ingestion/models/test_table_constraints.pyto cover foreign key matching with and withoutreferredColumnsvariations.Type of change:
Checklist:
Fixes #17987: Fix foreign key versioning bug