Skip to content

Conversation

@longbingljw
Copy link
Member

@longbingljw longbingljw commented Dec 31, 2025

Summary

fix: failure to handle $eq operator if the type of the value is array
close #93

Solution Description

Use JSON_EXTRACT to extract the value of the specific key and compare it with the input value like #87

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced metadata filter operators ($eq, $ne, $in, $nin) to properly handle scalar and array field comparisons across all query types.
  • Tests

    • Added comprehensive regression test suite covering $eq and $ne operators with array fields, including edge cases and operator consistency validation.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 31, 2025

📝 Walkthrough

Walkthrough

The changes rework metadata filter comparison operators in filters.py to use JSON_OVERLAPS instead of JSON_EXTRACT, enabling proper handling of $eq/$ne operators when metadata field values are arrays or scalars. A comprehensive regression test validates the new behavior across multiple scenarios.

Changes

Cohort / File(s) Summary
Metadata filter operator rework
src/pyseekdb/client/filters.py
Introduces special handling for $eq, $ne, $in, and $nin operators to use JSON_OVERLAPS comparisons on extracted fields, replacing direct JSON_EXTRACT equality checks. Values are JSON-encoded and wrapped as arrays for consistent comparison semantics. Affects both operator-based and direct field equality paths.
Regression test for array field comparisons
tests/integration_tests/test_collection_get.py
Adds test_eq_ne_operators_with_array_fields method covering $eq/$ne operators on array and scalar metadata fields, including edge cases (empty arrays, null, missing fields), direct equality without operators, and consistency validation across operator variants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • hnwyllmm

Poem

🐰 A rabbit hops through metadata trees,
Where arrays now dance in JSON breeze,
$eq and friends find harmony,
With OVERLAPS magic, wild and free,
Bug #93 solved with glee! ✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing the $eq operator to handle array-type values in metadata filters.
Linked Issues check ✅ Passed The changes implement JSON_OVERLAPS for $eq/$ne operators and array comparisons, directly addressing issue #93's requirement to handle $eq when stored metadata is an array.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the $eq/$ne operator handling for array fields, with a comprehensive regression test added to prevent future breakage.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/integration_tests/test_collection_get.py (1)

268-268: Optional: Remove extraneous f prefixes from strings without placeholders.

Several assertion messages use f-strings without any interpolation. While harmless, removing the f prefix improves clarity.

🔎 Suggested cleanup
-            assert len(result["ids"]) == 0, f"Expected no matches, got {result['ids']}"
+            assert len(result["ids"]) == 0, "Expected no matches"
-            assert "id2" in matched_ids, f"Expected id2 in results"
+            assert "id2" in matched_ids, "Expected id2 in results"
-            assert "id4" in matched_ids, f"Expected id4 in results"
+            assert "id4" in matched_ids, "Expected id4 in results"
-            assert "id1" not in matched_ids, f"id1 should be excluded (has ml)"
+            assert "id1" not in matched_ids, "id1 should be excluded (has ml)"
-            assert "id3" not in matched_ids, f"id3 should be excluded (is ml)"
+            assert "id3" not in matched_ids, "id3 should be excluded (is ml)"
-            assert "id8" not in matched_ids, f"id8 should be excluded (has ml)"
+            assert "id8" not in matched_ids, "id8 should be excluded (has ml)"
-            print(f"   Scalar fields work correctly")
+            print("   Scalar fields work correctly")

Based on static analysis hints.

Also applies to: 280-284, 331-331

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c24053 and 20d0254.

📒 Files selected for processing (2)
  • src/pyseekdb/client/filters.py
  • tests/integration_tests/test_collection_get.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-22T12:44:52.456Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_collection_dml.py:36-36
Timestamp: 2025-12-22T12:44:52.456Z
Learning: In integration tests under tests/integration_tests, it is acceptable to use the internal db_client._server._execute() to run raw SQL (e.g., CREATE TABLE) when needing custom table setup with specific vector index configurations. Prefer using public/test harness APIs for common setup, and clearly document and limit the use of internal calls to ensure test reliability. Ensure proper cleanup and isolation for each test, and avoid leaking internal APIs into production or non-test code.

Applied to files:

  • tests/integration_tests/test_collection_get.py
📚 Learning: 2025-12-22T12:45:51.412Z
Learnt from: longbingljw
Repo: oceanbase/pyseekdb PR: 91
File: tests/integration_tests/test_collection_hybrid_search_builder_integration.py:127-132
Timestamp: 2025-12-22T12:45:51.412Z
Learning: In integration test Python files under tests/integration_tests, it is acceptable to use string interpolation to construct SQL INSERT statements for test data when the data is controlled and internal to the test. This pattern should only be used for trusted, test-only data and not with untrusted input; ensure the injection risk is mitigated by keeping inputs deterministic, non-user-supplied, and isolated to the test environment. If possible, prefer parameterized queries in non-test code, but in these tests, interpolate values selectively when you can guarantee safety.

Applied to files:

  • tests/integration_tests/test_collection_get.py
🪛 Ruff (0.14.10)
tests/integration_tests/test_collection_get.py

268-268: f-string without any placeholders

Remove extraneous f prefix

(F541)


280-280: f-string without any placeholders

Remove extraneous f prefix

(F541)


281-281: f-string without any placeholders

Remove extraneous f prefix

(F541)


282-282: f-string without any placeholders

Remove extraneous f prefix

(F541)


283-283: f-string without any placeholders

Remove extraneous f prefix

(F541)


284-284: f-string without any placeholders

Remove extraneous f prefix

(F541)


331-331: f-string without any placeholders

Remove extraneous f prefix

(F541)


369-369: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration-test (oceanbase)
  • GitHub Check: integration-test (server)
  • GitHub Check: integration-test (embedded)
🔇 Additional comments (4)
tests/integration_tests/test_collection_get.py (1)

212-370: Excellent comprehensive regression test for array field support!

This test thoroughly validates the new JSON_OVERLAPS-based implementation for $eq and $ne operators with:

  • ✅ Array fields with membership checks (id1, id8)
  • ✅ Scalar fields for backward compatibility (id3, id4)
  • ✅ Edge cases: empty array (id6), null (id7), missing field (id5)
  • ✅ Consistency validation: $eq matches $in with single value
  • ✅ Combined logical operators ($and with $eq and $ne)

The test structure is clear with descriptive assertions and helpful debug output. Proper cleanup ensures test isolation.

src/pyseekdb/client/filters.py (3)

172-178: Direct equality correctly aligned with $eq operator.

The refactored direct equality (e.g., {"tags": "ml"}) now uses the same JSON_OVERLAPS approach as the explicit $eq operator. This ensures consistent behavior whether users specify the operator or not.


137-140: Range operators intentionally exclude arrays (expected behavior).

After handling $eq and $ne with JSON_OVERLAPS, the remaining comparison operators ($lt, $gt, $lte, $gte) continue using JSON_EXTRACT for direct scalar comparisons. This is correct since range comparisons are semantically undefined for array values.

Note: Users attempting range queries on array fields will get undefined or error results. Consider documenting this limitation if not already covered.


123-136: Well-designed fix using JSON_OVERLAPS for array membership checks.

The refactored $eq and $ne operators correctly use JSON_OVERLAPS to support both scalar and array metadata values:

  • Wrapping op_value in a single-element array [op_value] enables membership semantics
  • JSON_OVERLAPS(["ml", "ai"], ["ml"]) returns true (array contains "ml")
  • JSON_OVERLAPS("ml", ["ml"]) returns true (scalar equals "ml", treated as single-element array)

This aligns with the PR objective to fix issue #93 where {"tags": {"$eq": "ml"}} should match documents with "tags": ["ml", "ai"].

However, verify that JSON_OVERLAPS semantics match expectations in your target database (OceanBase/MySQL):

#!/bin/bash
# Verify JSON_OVERLAPS behavior with scalars, arrays, and NULL in the target database

# Test 1: Scalar compared with single-element array
echo "Test 1: JSON_OVERLAPS with scalar vs array"
echo "SELECT JSON_OVERLAPS('\"ml\"', '[\"ml\"]') AS scalar_match;"

# Test 2: Array compared with single-element array (overlap)
echo "Test 2: JSON_OVERLAPS with array overlap"
echo "SELECT JSON_OVERLAPS('[\"ml\", \"ai\"]', '[\"ml\"]') AS array_overlap;"

# Test 3: Array compared with single-element array (no overlap)
echo "Test 3: JSON_OVERLAPS with array disjoint"
echo "SELECT JSON_OVERLAPS('[\"java\"]', '[\"ml\"]') AS array_disjoint;"

# Test 4: NULL handling
echo "Test 4: JSON_OVERLAPS with NULL"
echo "SELECT JSON_OVERLAPS(NULL, '[\"ml\"]') AS null_handling;"

# Test 5: Empty array
echo "Test 5: JSON_OVERLAPS with empty array"
echo "SELECT JSON_OVERLAPS('[]', '[\"ml\"]') AS empty_array;"

echo ""
echo "Run these queries against your OceanBase/MySQL instance to confirm behavior."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: failed to handle $eq operator if the type of the value is array

1 participant