fix: failure to handle $eq operator if the type of the value is array #102

longbingljw · 2025-12-31T07:27:32Z

Summary

fix: failure to handle $eq operator if the type of the value is array
close #93

Solution Description

Use JSON_EXTRACT to extract the value of the specific key and compare it with the input value like #87

Summary by CodeRabbit

Bug Fixes
- Enhanced metadata filter operators ($eq, $ne, $in, $nin) to properly handle scalar and array field comparisons across all query types.
Tests
- Added comprehensive regression test suite covering $eq and $ne operators with array fields, including edge cases and operator consistency validation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-31T07:27:41Z

📝 Walkthrough

Walkthrough

The changes rework metadata filter comparison operators in filters.py to use JSON_OVERLAPS instead of JSON_EXTRACT, enabling proper handling of $eq/$ne operators when metadata field values are arrays or scalars. A comprehensive regression test validates the new behavior across multiple scenarios.

Changes

Cohort / File(s)	Summary
Metadata filter operator rework `src/pyseekdb/client/filters.py`	Introduces special handling for `$eq`, `$ne`, `$in`, and `$nin` operators to use JSON_OVERLAPS comparisons on extracted fields, replacing direct JSON_EXTRACT equality checks. Values are JSON-encoded and wrapped as arrays for consistent comparison semantics. Affects both operator-based and direct field equality paths.
Regression test for array field comparisons `tests/integration_tests/test_collection_get.py`	Adds `test_eq_ne_operators_with_array_fields` method covering `$eq`/`$ne` operators on array and scalar metadata fields, including edge cases (empty arrays, null, missing fields), direct equality without operators, and consistency validation across operator variants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: metadata $in/$nin operators not matching JSON array fields #87: Both PRs modify src/pyseekdb/client/filters.py to use JSON_OVERLAPS for metadata filter SQL generation; this PR extends the approach from $in/$nin to $eq/$ne and other comparison operators.

Suggested reviewers

hnwyllmm

Poem

🐰 A rabbit hops through metadata trees,
Where arrays now dance in JSON breeze,
$eq and friends find harmony,
With OVERLAPS magic, wild and free,
Bug #93 solved with glee! ✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: fixing the $eq operator to handle array-type values in metadata filters.
Linked Issues check	✅ Passed	The changes implement JSON_OVERLAPS for $eq/$ne operators and array comparisons, directly addressing issue #93's requirement to handle $eq when stored metadata is an array.
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing the $eq/$ne operator handling for array fields, with a comprehensive regression test added to prevent future breakage.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/integration_tests/test_collection_get.py (1)

268-268: Optional: Remove extraneous f prefixes from strings without placeholders.

Several assertion messages use f-strings without any interpolation. While harmless, removing the f prefix improves clarity.

🔎 Suggested cleanup

-            assert len(result["ids"]) == 0, f"Expected no matches, got {result['ids']}"
+            assert len(result["ids"]) == 0, "Expected no matches"
-            assert "id2" in matched_ids, f"Expected id2 in results"
+            assert "id2" in matched_ids, "Expected id2 in results"
-            assert "id4" in matched_ids, f"Expected id4 in results"
+            assert "id4" in matched_ids, "Expected id4 in results"
-            assert "id1" not in matched_ids, f"id1 should be excluded (has ml)"
+            assert "id1" not in matched_ids, "id1 should be excluded (has ml)"
-            assert "id3" not in matched_ids, f"id3 should be excluded (is ml)"
+            assert "id3" not in matched_ids, "id3 should be excluded (is ml)"
-            assert "id8" not in matched_ids, f"id8 should be excluded (has ml)"
+            assert "id8" not in matched_ids, "id8 should be excluded (has ml)"
-            print(f"   Scalar fields work correctly")
+            print("   Scalar fields work correctly")

Based on static analysis hints.

Also applies to: 280-284, 331-331

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c24053 and 20d0254.

📒 Files selected for processing (2)

src/pyseekdb/client/filters.py
tests/integration_tests/test_collection_get.py

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-12-22T12:44:52.456Z

Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_collection_dml.py:36-36
Timestamp: 2025-12-22T12:44:52.456Z
Learning: In integration tests under tests/integration_tests, it is acceptable to use the internal db_client._server._execute() to run raw SQL (e.g., CREATE TABLE) when needing custom table setup with specific vector index configurations. Prefer using public/test harness APIs for common setup, and clearly document and limit the use of internal calls to ensure test reliability. Ensure proper cleanup and isolation for each test, and avoid leaking internal APIs into production or non-test code.

Applied to files:

tests/integration_tests/test_collection_get.py

📚 Learning: 2025-12-22T12:45:51.412Z

Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_collection_hybrid_search_builder_integration.py:127-132
Timestamp: 2025-12-22T12:45:51.412Z
Learning: In integration test Python files under tests/integration_tests, it is acceptable to use string interpolation to construct SQL INSERT statements for test data when the data is controlled and internal to the test. This pattern should only be used for trusted, test-only data and not with untrusted input; ensure the injection risk is mitigated by keeping inputs deterministic, non-user-supplied, and isolated to the test environment. If possible, prefer parameterized queries in non-test code, but in these tests, interpolate values selectively when you can guarantee safety.

Applied to files:

tests/integration_tests/test_collection_get.py

🪛 Ruff (0.14.10)

tests/integration_tests/test_collection_get.py

268-268: f-string without any placeholders

Remove extraneous f prefix

(F541)

280-280: f-string without any placeholders

Remove extraneous f prefix

(F541)

281-281: f-string without any placeholders

Remove extraneous f prefix

(F541)

282-282: f-string without any placeholders

Remove extraneous f prefix

(F541)

283-283: f-string without any placeholders

Remove extraneous f prefix

(F541)

284-284: f-string without any placeholders

Remove extraneous f prefix

(F541)

331-331: f-string without any placeholders

Remove extraneous f prefix

(F541)

369-369: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: integration-test (oceanbase)
GitHub Check: integration-test (server)
GitHub Check: integration-test (embedded)

🔇 Additional comments (4)

tests/integration_tests/test_collection_get.py (1)

212-370: Excellent comprehensive regression test for array field support!

This test thoroughly validates the new JSON_OVERLAPS-based implementation for $eq and $ne operators with:

✅ Array fields with membership checks (id1, id8)

✅ Scalar fields for backward compatibility (id3, id4)

✅ Edge cases: empty array (id6), null (id7), missing field (id5)

✅ Consistency validation: $eq matches $in with single value

✅ Combined logical operators ($and with $eq and $ne)

The test structure is clear with descriptive assertions and helpful debug output. Proper cleanup ensures test isolation.

src/pyseekdb/client/filters.py (3)

172-178: Direct equality correctly aligned with $eq operator.

The refactored direct equality (e.g., {"tags": "ml"}) now uses the same JSON_OVERLAPS approach as the explicit $eq operator. This ensures consistent behavior whether users specify the operator or not.

137-140: Range operators intentionally exclude arrays (expected behavior).

After handling $eq and $ne with JSON_OVERLAPS, the remaining comparison operators ($lt, $gt, $lte, $gte) continue using JSON_EXTRACT for direct scalar comparisons. This is correct since range comparisons are semantically undefined for array values.

Note: Users attempting range queries on array fields will get undefined or error results. Consider documenting this limitation if not already covered.

123-136: Well-designed fix using JSON_OVERLAPS for array membership checks.

The refactored $eq and $ne operators correctly use JSON_OVERLAPS to support both scalar and array metadata values:

Wrapping op_value in a single-element array [op_value] enables membership semantics

JSON_OVERLAPS(["ml", "ai"], ["ml"]) returns true (array contains "ml")

JSON_OVERLAPS("ml", ["ml"]) returns true (scalar equals "ml", treated as single-element array)

This aligns with the PR objective to fix issue #93 where {"tags": {"$eq": "ml"}} should match documents with "tags": ["ml", "ai"].

However, verify that JSON_OVERLAPS semantics match expectations in your target database (OceanBase/MySQL):
#!/bin/bash
# Verify JSON_OVERLAPS behavior with scalars, arrays, and NULL in the target database

# Test 1: Scalar compared with single-element array
echo "Test 1: JSON_OVERLAPS with scalar vs array"
echo "SELECT JSON_OVERLAPS('\"ml\"', '[\"ml\"]') AS scalar_match;"

# Test 2: Array compared with single-element array (overlap)
echo "Test 2: JSON_OVERLAPS with array overlap"
echo "SELECT JSON_OVERLAPS('[\"ml\", \"ai\"]', '[\"ml\"]') AS array_overlap;"

# Test 3: Array compared with single-element array (no overlap)
echo "Test 3: JSON_OVERLAPS with array disjoint"
echo "SELECT JSON_OVERLAPS('[\"java\"]', '[\"ml\"]') AS array_disjoint;"

# Test 4: NULL handling
echo "Test 4: JSON_OVERLAPS with NULL"
echo "SELECT JSON_OVERLAPS(NULL, '[\"ml\"]') AS null_handling;"

# Test 5: Empty array
echo "Test 5: JSON_OVERLAPS with empty array"
echo "SELECT JSON_OVERLAPS('[]', '[\"ml\"]') AS empty_array;"

echo ""
echo "Run these queries against your OceanBase/MySQL instance to confirm behavior."

fix: failure to handle $eq operator if the type of the value is array

20d0254

coderabbitai bot reviewed Dec 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: failure to handle $eq operator if the type of the value is array #102

fix: failure to handle $eq operator if the type of the value is array #102

Uh oh!

longbingljw commented Dec 31, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 31, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: failure to handle $eq operator if the type of the value is array #102

Are you sure you want to change the base?

fix: failure to handle $eq operator if the type of the value is array #102

Uh oh!

Conversation

longbingljw commented Dec 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Solution Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

longbingljw commented Dec 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 31, 2025 •

edited

Loading