-
Notifications
You must be signed in to change notification settings - Fork 16
fix: failure to handle $eq operator if the type of the value is array #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThe changes rework metadata filter comparison operators in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/integration_tests/test_collection_get.py (1)
268-268: Optional: Remove extraneousfprefixes from strings without placeholders.Several assertion messages use f-strings without any interpolation. While harmless, removing the
fprefix improves clarity.🔎 Suggested cleanup
- assert len(result["ids"]) == 0, f"Expected no matches, got {result['ids']}" + assert len(result["ids"]) == 0, "Expected no matches" - assert "id2" in matched_ids, f"Expected id2 in results" + assert "id2" in matched_ids, "Expected id2 in results" - assert "id4" in matched_ids, f"Expected id4 in results" + assert "id4" in matched_ids, "Expected id4 in results" - assert "id1" not in matched_ids, f"id1 should be excluded (has ml)" + assert "id1" not in matched_ids, "id1 should be excluded (has ml)" - assert "id3" not in matched_ids, f"id3 should be excluded (is ml)" + assert "id3" not in matched_ids, "id3 should be excluded (is ml)" - assert "id8" not in matched_ids, f"id8 should be excluded (has ml)" + assert "id8" not in matched_ids, "id8 should be excluded (has ml)" - print(f" Scalar fields work correctly") + print(" Scalar fields work correctly")Based on static analysis hints.
Also applies to: 280-284, 331-331
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/pyseekdb/client/filters.pytests/integration_tests/test_collection_get.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-22T12:44:52.456Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_collection_dml.py:36-36
Timestamp: 2025-12-22T12:44:52.456Z
Learning: In integration tests under tests/integration_tests, it is acceptable to use the internal db_client._server._execute() to run raw SQL (e.g., CREATE TABLE) when needing custom table setup with specific vector index configurations. Prefer using public/test harness APIs for common setup, and clearly document and limit the use of internal calls to ensure test reliability. Ensure proper cleanup and isolation for each test, and avoid leaking internal APIs into production or non-test code.
Applied to files:
tests/integration_tests/test_collection_get.py
📚 Learning: 2025-12-22T12:45:51.412Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_collection_hybrid_search_builder_integration.py:127-132
Timestamp: 2025-12-22T12:45:51.412Z
Learning: In integration test Python files under tests/integration_tests, it is acceptable to use string interpolation to construct SQL INSERT statements for test data when the data is controlled and internal to the test. This pattern should only be used for trusted, test-only data and not with untrusted input; ensure the injection risk is mitigated by keeping inputs deterministic, non-user-supplied, and isolated to the test environment. If possible, prefer parameterized queries in non-test code, but in these tests, interpolate values selectively when you can guarantee safety.
Applied to files:
tests/integration_tests/test_collection_get.py
🪛 Ruff (0.14.10)
tests/integration_tests/test_collection_get.py
268-268: f-string without any placeholders
Remove extraneous f prefix
(F541)
280-280: f-string without any placeholders
Remove extraneous f prefix
(F541)
281-281: f-string without any placeholders
Remove extraneous f prefix
(F541)
282-282: f-string without any placeholders
Remove extraneous f prefix
(F541)
283-283: f-string without any placeholders
Remove extraneous f prefix
(F541)
284-284: f-string without any placeholders
Remove extraneous f prefix
(F541)
331-331: f-string without any placeholders
Remove extraneous f prefix
(F541)
369-369: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: integration-test (oceanbase)
- GitHub Check: integration-test (server)
- GitHub Check: integration-test (embedded)
🔇 Additional comments (4)
tests/integration_tests/test_collection_get.py (1)
212-370: Excellent comprehensive regression test for array field support!This test thoroughly validates the new JSON_OVERLAPS-based implementation for
$eqand$neoperators with:
- ✅ Array fields with membership checks (id1, id8)
- ✅ Scalar fields for backward compatibility (id3, id4)
- ✅ Edge cases: empty array (id6), null (id7), missing field (id5)
- ✅ Consistency validation:
$eqmatches$inwith single value- ✅ Combined logical operators (
$andwith$eqand$ne)The test structure is clear with descriptive assertions and helpful debug output. Proper cleanup ensures test isolation.
src/pyseekdb/client/filters.py (3)
172-178: Direct equality correctly aligned with $eq operator.The refactored direct equality (e.g.,
{"tags": "ml"}) now uses the sameJSON_OVERLAPSapproach as the explicit$eqoperator. This ensures consistent behavior whether users specify the operator or not.
137-140: Range operators intentionally exclude arrays (expected behavior).After handling
$eqand$newithJSON_OVERLAPS, the remaining comparison operators ($lt,$gt,$lte,$gte) continue usingJSON_EXTRACTfor direct scalar comparisons. This is correct since range comparisons are semantically undefined for array values.Note: Users attempting range queries on array fields will get undefined or error results. Consider documenting this limitation if not already covered.
123-136: Well-designed fix using JSON_OVERLAPS for array membership checks.The refactored
$eqand$neoperators correctly useJSON_OVERLAPSto support both scalar and array metadata values:
- Wrapping
op_valuein a single-element array[op_value]enables membership semanticsJSON_OVERLAPS(["ml", "ai"], ["ml"])returns true (array contains "ml")JSON_OVERLAPS("ml", ["ml"])returns true (scalar equals "ml", treated as single-element array)This aligns with the PR objective to fix issue #93 where
{"tags": {"$eq": "ml"}}should match documents with"tags": ["ml", "ai"].However, verify that
JSON_OVERLAPSsemantics match expectations in your target database (OceanBase/MySQL):#!/bin/bash # Verify JSON_OVERLAPS behavior with scalars, arrays, and NULL in the target database # Test 1: Scalar compared with single-element array echo "Test 1: JSON_OVERLAPS with scalar vs array" echo "SELECT JSON_OVERLAPS('\"ml\"', '[\"ml\"]') AS scalar_match;" # Test 2: Array compared with single-element array (overlap) echo "Test 2: JSON_OVERLAPS with array overlap" echo "SELECT JSON_OVERLAPS('[\"ml\", \"ai\"]', '[\"ml\"]') AS array_overlap;" # Test 3: Array compared with single-element array (no overlap) echo "Test 3: JSON_OVERLAPS with array disjoint" echo "SELECT JSON_OVERLAPS('[\"java\"]', '[\"ml\"]') AS array_disjoint;" # Test 4: NULL handling echo "Test 4: JSON_OVERLAPS with NULL" echo "SELECT JSON_OVERLAPS(NULL, '[\"ml\"]') AS null_handling;" # Test 5: Empty array echo "Test 5: JSON_OVERLAPS with empty array" echo "SELECT JSON_OVERLAPS('[]', '[\"ml\"]') AS empty_array;" echo "" echo "Run these queries against your OceanBase/MySQL instance to confirm behavior."
Summary
fix: failure to handle $eq operator if the type of the value is array
close #93
Solution Description
Use JSON_EXTRACT to extract the value of the specific key and compare it with the input value like #87
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.