Skip to content

Conversation

@Emmanuel-Tyty
Copy link

@Emmanuel-Tyty Emmanuel-Tyty commented Oct 23, 2025

Issue: #915

  • Extend TextPattern matching to recursively search through metadata arrays and maps
  • Add serde(rename_all = lowercase) to MetadatumPattern for proper TOML deserialization
  • Add comprehensive tests with 'testing regex' pattern
  • Update metadata_regex_filter example with tested configuration and real output
  • Verified on preprod testnet with successful transaction filtering

This enables filtering transactions based on metadata content patterns, making
the existing metadata value field actually usable for real-world Cardano metadata
structures which are typically nested in arrays and maps.

Summary by CodeRabbit

  • New Features

    • Regex-based metadata matching in Select filters — supports text, nested arrays/maps, byte/UTF‑8 subjects, and case-insensitive patterns; regexes can be specified in configuration.
  • Documentation

    • Added a Metadata Filtering guide with examples, common regex patterns, and a runnable example + preprod configuration.
  • Tests

    • Added unit tests validating regex-based metadata matching, UTF-8 handling, and related edge cases.

- Add Regex variant to TextPattern enum for pattern matching
- Implement custom PartialEq for TextPattern to handle Regex comparison
- Add regex serialization/deserialization support in serde_ext
- Add comprehensive tests for regex metadata matching
- Update Select filter documentation with regex examples
- Add metadata_regex_filter example demonstrating CIP-20 message filtering
- Support case-insensitive and complex regex patterns for metadata values

This feature enables filtering Cardano transactions by metadata content
using regular expressions, useful for:
- Monitoring specific message patterns on-chain
- Filtering application-specific messages
- Building notification systems based on message content
- CIP-20 transaction message filtering
- Extend TextPattern matching to recursively search through metadata arrays and maps
- Add serde(rename_all = lowercase) to MetadatumPattern for proper TOML deserialization
- Add comprehensive tests with 'testing regex' pattern
- Update metadata_regex_filter example with tested configuration and real output
- Verified on preprod testnet with successful transaction filtering

This enables filtering transactions based on metadata content patterns, making
the existing metadata value field actually usable for real-world Cardano metadata
structures which are typically nested in arrays and maps.
@coderabbitai
Copy link

coderabbitai bot commented Oct 23, 2025

Walkthrough

Adds regex-based text pattern matching for metadata, serde (de)serialization helpers for Regex, recursive matching across Metadatum arrays/maps, tests, documentation updates, and a runnable example demonstrating a Select filter matching metadata label 674 with a text regex.

Changes

Cohort / File(s) Summary
Docs & Example
docs/v2/filters/select.mdx, examples/metadata_regex_filter/README.md, examples/metadata_regex_filter/daemon.toml
Adds "Metadata Filtering" docs and a runnable example showing a Select filter matching metadata label 674 using a text regex (e.g., "Hello World"); includes usage and run instructions.
Text pattern & matching logic
src/filters/select/eval/mod.rs
Replaces TextPattern::Exact(String) with TextPattern::Regex(regex::Regex); adds serde support and manual PartialEq; updates PatternOf<&str>, PatternOf<&[u8]>, and PatternOf<&Metadatum> to use regex matching and to recurse into Metadatum::Array and Metadatum::Map.
Metadatum pattern & tests
src/filters/select/eval/metadata.rs
Adds #[serde(rename_all = "lowercase")] to MetadatumPattern and new tests validating regex-based TextPattern matching across Metadatum variants.
Regex serde helpers
src/filters/select/eval/serde_ext.rs
Adds pub mod regex_pattern with serialize and deserialize helpers to (de)serialize regex::Regex to/from strings for serde.

Sequence Diagram(s)

sequenceDiagram
    participant Config as TOML / Config
    participant Serde as serde_ext::regex_pattern
    participant Pattern as TextPattern::Regex
    participant Matcher as Pattern Matcher
    participant Meta as Metadatum (Text/Array/Map)/Bytes

    Config->>Serde: parse regex string
    Serde-->>Pattern: Regex instance
    Pattern->>Matcher: is_match(subject)
    alt subject is &str or &[u8]
        Matcher->>Matcher: decode UTF‑8 (if bytes) and run regex.is_match
    else subject is Metadatum::Text
        Matcher->>Matcher: run regex.is_match on text
    else subject is Metadatum::Array or Metadatum::Map
        Matcher->>Matcher: recurse into items/keys/values
        opt short-circuit on first match
        end
    end
    Matcher-->>Pattern: Match / NoMatch / Uncertain
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect serde regex (de)serialization error mapping in serde_ext::regex_pattern.
  • Verify PartialEq compares regex patterns by their source (including flags/options).
  • Review recursive traversal of Metadatum::Array/Map for correctness and short-circuit behavior.
  • Confirm UTF-8 decoding behavior and handling of invalid UTF-8 in PatternOf<&[u8]>.

Possibly related issues

  • Regex filter on metadata value #915 — Matches scope: adds serde rename on MetadatumPattern, regex-based TextPattern, recursive metadata traversal, and related tests; likely directly related.

Poem

🐰 I hopped through metadata, thread by thread,
I sniffed the regex blossoms, soft and spread,
Labels, maps, arrays I gently combed,
Found "Hello World" tucked and homed,
A carrot-filtered cheer — joy ahead! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "feat: Add recursive regex pattern matching for metadata filtering" directly and accurately reflects the primary change in this pull request. The changeset implements regex-based pattern matching with recursive traversal through nested metadata structures (arrays and maps), which is the central feature. The title is specific and clear, mentioning both the key technical capability (recursive regex pattern matching) and the domain (metadata filtering), making it immediately understandable to someone reviewing the commit history. The title is neither vague nor misleading, and it appropriately summarizes the main point from the developer's perspective.
Docstring Coverage ✅ Passed Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96c7634 and 6b65812.

📒 Files selected for processing (2)
  • src/filters/select/eval/metadata.rs (2 hunks)
  • src/filters/select/eval/serde_ext.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/filters/select/eval/metadata.rs
🧰 Additional context used
🧬 Code graph analysis (1)
src/filters/select/eval/serde_ext.rs (1)
src/filters/select/eval/bytes.rs (2)
  • serialize (49-55)
  • deserialize (77-82)
🔇 Additional comments (1)
src/filters/select/eval/serde_ext.rs (1)

110-131: LGTM! Clean and correct serde implementation for Regex.

The module provides a straightforward serde integration for Regex that correctly:

  • Serializes using regex.as_str() to preserve the full pattern string including embedded flags
  • Deserializes with proper error handling via serde::de::Error::custom for invalid patterns
  • Follows standard Rust and serde conventions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
examples/metadata_regex_filter/daemon.toml (1)

21-26: Config looks good; optional regex hardening

Regex matches substrings by default. If you intended whole‑string or case‑insensitive matches, consider:

  • Whole string: regex = "^(?i)testing regex$"
  • Case‑insensitive substring: regex = "(?i)testing regex"
src/filters/select/eval/metadata.rs (1)

125-148: Strengthen test to validate actual matching (including recursion)

Current test only checks field presence. Add assertions against real Metadatum structures (Text, Array, Map) to exercise recursive matching.

Example sketch (can be adapted to existing test helpers):

// Build nested map/array: {"msg": ["Hello brave world"]}
let nested = Metadatum {
    metadatum: Some(M::Map(Map {
        pairs: vec![Pair {
            key: Some(Metadatum{ metadatum: Some(M::Text("msg".into())) }),
            value: Some(Metadatum{ metadatum: Some(M::Array(Array {
                items: vec![Metadatum{ metadatum: Some(M::Text("Hello brave world".into())) }]
            }))}),
        }]
    }))
};
let aux = AuxData { metadata: vec![ Metadata { label: 674, value: Some(nested) } ] };

let re = Regex::new("(?i)hello.*world").unwrap();
let pat = MetadataPattern { label: Some(674), value: Some(MetadatumPattern::Text(TextPattern::Regex(re))) };

assert_eq!(pat.is_match(&aux), MatchOutcome::Positive);

I can wire this with the exact pallas types used in the repo’s test helpers if you want.

examples/metadata_regex_filter/README.md (1)

35-41: Docs align with behavior

Recursive search and substring semantics are correctly described. Consider adding a brief note that TOML requires escaping backslashes in regexes (you already do in examples below).

src/filters/select/eval/serde_ext.rs (1)

110-128: Regex serde helper is correct and minimal

Good choice to serialize by pattern string and map parse errors to serde. Recommend adding a round‑trip unit test for a few patterns (anchors, flags, escapes).

src/filters/select/eval/mod.rs (2)

224-233: Avoid allocation when matching bytes as UTF‑8

Use from_utf8 to prevent a temporary String and reduce copies.

 impl PatternOf<&[u8]> for TextPattern {
     fn is_match(&self, subject: &[u8]) -> MatchOutcome {
-        let subject = match String::from_utf8(subject.to_vec()) {
-            Ok(subject) => subject,
-            Err(_) => return MatchOutcome::Uncertain,
-        };
-
-        self.is_match(subject.as_str())
+        match std::str::from_utf8(subject) {
+            Ok(s) => self.is_match(s),
+            Err(_) => MatchOutcome::Uncertain,
+        }
     }
 }

235-247: Short‑circuit map scanning to reduce work on large metadata

We can skip scanning values if any key already matches, preserving current Positive/Uncertain folding.

 match subject.metadatum.as_ref() {
     Some(M::Text(text)) => self.is_match(text.as_str()),
     Some(M::Array(array)) => self.is_any_match(array.items.iter()),
     Some(M::Map(map)) => {
-        let key_matches = self.is_any_match(map.pairs.iter().filter_map(|p| p.key.as_ref()));
-        let value_matches = self.is_any_match(map.pairs.iter().filter_map(|p| p.value.as_ref()));
-        key_matches + value_matches
+        let key_matches = self.is_any_match(map.pairs.iter().filter_map(|p| p.key.as_ref()));
+        if key_matches == MatchOutcome::Positive {
+            return MatchOutcome::Positive;
+        }
+        let value_matches = self.is_any_match(map.pairs.iter().filter_map(|p| p.value.as_ref()));
+        key_matches + value_matches
     }
     _ => MatchOutcome::Negative,
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 61dc55e and 6d2906b.

📒 Files selected for processing (6)
  • docs/v2/filters/select.mdx (1 hunks)
  • examples/metadata_regex_filter/README.md (1 hunks)
  • examples/metadata_regex_filter/daemon.toml (1 hunks)
  • src/filters/select/eval/metadata.rs (2 hunks)
  • src/filters/select/eval/mod.rs (2 hunks)
  • src/filters/select/eval/serde_ext.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/filters/select/eval/serde_ext.rs (1)
src/filters/select/eval/bytes.rs (2)
  • serialize (49-55)
  • deserialize (77-82)
src/filters/select/eval/mod.rs (1)
src/filters/select/eval/metadata.rs (3)
  • is_match (12-17)
  • is_match (53-59)
  • is_match (63-65)
🔇 Additional comments (4)
docs/v2/filters/select.mdx (2)

115-121: Regex example is clear and aligns with implementation

Case‑insensitive example is correct for the regex crate’s syntax. Nothing to change.


127-138: Escaping guidance is correct

Double backslashes are required in TOML strings. Examples are accurate.

If we want, I can add a short “TOML escaping tips” note to reduce confusion.

src/filters/select/eval/metadata.rs (1)

4-9: Serde rename improves UX and is backward-consistent with docs

Lowercase variant names match the TOML examples (text, int). Good addition.

src/filters/select/eval/mod.rs (1)

197-204: New Regex variant with serde integration: solid addition

Variant + serde(with) are correctly wired; PartialEq via as_str() is appropriate.

@Emmanuel-Tyty Emmanuel-Tyty marked this pull request as draft October 23, 2025 19:46
@Emmanuel-Tyty Emmanuel-Tyty marked this pull request as ready for review October 25, 2025 12:59
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/filters/select/eval/metadata.rs (1)

11-16: Avoid panics: replace todo!() on Int with a safe outcome

Deserializing an “int” pattern then evaluating will panic. Prefer a non-panicking default.

 impl PatternOf<&Metadatum> for MetadatumPattern {
     fn is_match(&self, subject: &Metadatum) -> MatchOutcome {
         match self {
             MetadatumPattern::Text(x) => x.is_match(subject),
-            MetadatumPattern::Int(_) => todo!(),
+            // Until implemented, avoid crashing the pipeline
+            MetadatumPattern::Int(_) => MatchOutcome::Uncertain,
         }
     }
 }
🧹 Nitpick comments (3)
docs/v2/filters/select.mdx (1)

103-111: Example LGTM; small clarity tweak suggested

Config is valid TOML and matches the serde layout (metadata.value.text.regex). Consider clarifying that recursion searches map keys and values, and only text values are matched.

-Match transactions with metadata containing regex pattern (recursively searches arrays and maps)
+Match transactions with metadata containing a regex pattern (recursively searches arrays and maps — including map keys and values — and matches only text metadatum)
src/filters/select/eval/metadata.rs (1)

125-146: Strengthen tests: assert recursive matching on arrays/maps, not the raw regex

Current test only checks Regex::is_match on strings. Add cases that:

  • Build Metadatum::Array and Metadatum::Map nesting Text values matching and not matching.
  • Verify TextPattern::is_match(&Metadatum) returns Positive/Negative as expected.
  • Optionally, an end-to-end MetadataPattern over AuxData.

I can draft test scaffolding targeting nested array/map metadatum if helpful.

src/filters/select/eval/mod.rs (1)

233-245: Consider matching UTF‑8 bytes and add tests for recursive metadata matching

  • Extend to match Bytes (attempt UTF‑8 decode) to cover metadata where text is stored as bytes.
  • Add tests for:
    • nested arrays/maps (deep recursion),
    • map key vs value matching,
    • negative/uncertain paths (non‑UTF‑8 bytes should return Uncertain via &[u8] impl).

I can add a focused test module building Metadatum::Array/Map with mixed types to assert Positive/Negative/Uncertain outcomes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d2906b and 1403f17.

📒 Files selected for processing (5)
  • docs/v2/filters/select.mdx (1 hunks)
  • examples/metadata_regex_filter/README.md (1 hunks)
  • examples/metadata_regex_filter/daemon.toml (1 hunks)
  • src/filters/select/eval/metadata.rs (2 hunks)
  • src/filters/select/eval/mod.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • examples/metadata_regex_filter/README.md
  • examples/metadata_regex_filter/daemon.toml
🧰 Additional context used
🧬 Code graph analysis (2)
src/filters/select/eval/mod.rs (1)
src/filters/select/eval/metadata.rs (3)
  • is_match (12-17)
  • is_match (53-59)
  • is_match (63-65)
src/filters/select/eval/metadata.rs (1)
src/filters/select/eval/mod.rs (16)
  • is_match (122-122)
  • is_match (142-148)
  • is_match (152-154)
  • is_match (158-160)
  • is_match (164-166)
  • is_match (170-172)
  • is_match (187-194)
  • is_match (213-217)
  • is_match (221-228)
  • is_match (232-245)
  • is_match (273-275)
  • is_match (299-316)
  • is_match (337-359)
  • is_match (371-377)
  • is_match (398-422)
  • is_match (535-547)
🔇 Additional comments (4)
src/filters/select/eval/mod.rs (3)

204-210: Equality by pattern string is fine

Comparing Regex by as_str() is adequate and deterministic.

Add a tiny unit test asserting PartialEq symmetry for identical/different patterns.


213-216: LGTM: str-matching via Regex

Straightforward delegation; no issues.


197-203: ****

TextPattern has only ever contained a Regex variant in this codebase. No TextPattern::Exact usage, tests, or config examples exist. The review appears to conflate TextPattern with NumericPattern (CoinPattern), which does retain an Exact variant. The current Regex-only design is intentional and introduces no backward compatibility issue.

Likely an incorrect or invalid review comment.

src/filters/select/eval/metadata.rs (1)

4-6: Verify backward compatibility and address incomplete Int variant

Unable to confirm whether rename_all = "lowercase" is newly added or whether existing configs use capitalized "Text"/"Int" variant names. However, two issues are confirmed:

  1. Int variant contains unguarded todo!() (line 15) — will panic if deserialized or matched
  2. No deserialization tests — tests construct MetadatumPattern in code; missing JSON/YAML round-trip validation

If backward compatibility is required, add serde aliases as suggested. Additionally, either implement Int pattern matching or gate it behind a feature/compilation flag. Add tests that deserialize MetadatumPattern from JSON to prevent regressions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/filters/select/eval/mod.rs (4)

204-210: Equality compares only the pattern text; builder flags are ignored.

If regex flags are ever set via a builder (outside inline (?i) etc.), as_str() equality may give false positives. Add a brief note to document this invariant.

Apply this small doc addition:

 impl PartialEq for TextPattern {
+    /// Equality is defined by the literal pattern string only.
+    /// Note: flags must be encoded in the pattern (e.g., `(?i)`) for equality to reflect them.
     fn eq(&self, other: &Self) -> bool {
         match (self, other) {
             (TextPattern::Regex(a), TextPattern::Regex(b)) => a.as_str() == b.as_str(),
         }
     }
 }

221-229: Avoid allocation when decoding UTF‑8 bytes.

Use std::str::from_utf8 to eliminate the to_vec() copy.

Apply:

 impl PatternOf<&[u8]> for TextPattern {
     fn is_match(&self, subject: &[u8]) -> MatchOutcome {
-        let subject = match String::from_utf8(subject.to_vec()) {
-            Ok(subject) => subject,
-            Err(_) => return MatchOutcome::Uncertain,
-        };
-
-        self.is_match(subject.as_str())
+        match std::str::from_utf8(subject) {
+            Ok(s) => self.is_match(s),
+            Err(_) => MatchOutcome::Uncertain,
+        }
     }
 }

233-241: Also match ByteString metadata (UTF‑8) and propagate Uncertain on invalid bytes.

This keeps recursion consistent with your TextPattern bytes impl and makes nested maps/arrays containing bytes searchable.

Apply:

 match subject.metadatum.as_ref() {
-    Some(M::Text(text)) => self.is_match(text.as_str()),
+    Some(M::Text(text)) => self.is_match(text.as_str()),
+    Some(M::Bytes(bytes)) => self.is_match(bytes.as_slice()),
     Some(M::Array(array)) => self.is_any_match(array.items.iter()),
     Some(M::Map(map)) => {
         let key_matches = self.is_any_match(map.pairs.iter().filter_map(|p| p.key.as_ref()));
         let value_matches = self.is_any_match(map.pairs.iter().filter_map(|p| p.value.as_ref()));
         key_matches + value_matches
     }
     _ => MatchOutcome::Negative,
 }

Note: confirm the exact variant name (Bytes) in pallas::...::metadatum::Metadatum. Adjust if it differs.


672-692: UTF‑8 bytes tests are solid; consider adding recursive cases.

Add tests for:

  • Matching within nested arrays and maps (deep recursion).
  • ByteString metadata: positive, negative, and invalid UTF‑8 → Uncertain (after adding the Bytes arm).
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1403f17 and 292b62b.

📒 Files selected for processing (3)
  • docs/v2/filters/select.mdx (1 hunks)
  • src/filters/select/eval/metadata.rs (2 hunks)
  • src/filters/select/eval/mod.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/filters/select/eval/metadata.rs
  • docs/v2/filters/select.mdx
🧰 Additional context used
🧬 Code graph analysis (1)
src/filters/select/eval/mod.rs (2)
src/filters/select/eval/metadata.rs (3)
  • is_match (12-17)
  • is_match (53-59)
  • is_match (63-65)
src/filters/select/eval/serde_ext.rs (1)
  • is_match (87-89)
🔇 Additional comments (3)
src/filters/select/eval/mod.rs (3)

197-203: Serde integration for Regex looks good.

Enum shape and #[serde(with = "serde_ext::regex_pattern")] are appropriate for config I/O. No issues.


215-216: String matching path is correct.

Delegating to Regex::is_match is the right behavior.


660-671: Equality tests are precise and valuable.

Covers same‑pattern equality and different‑pattern inequality.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/filters/select/eval/metadata.rs (1)

148-184: Add test coverage for recursive array and map matching.

This test validates matching against Text, Int, and Bytes metadatum variants, but the PR adds recursive matching for Array and Map variants (see src/filters/select/eval/mod.rs lines 237-241). Consider adding test cases to verify:

  • Regex matching within nested arrays
  • Regex matching within map keys and values
  • Multi-level nested structures (arrays containing maps, etc.)

Example test structure:

// Test array matching
let array_metadatum = Metadatum {
    metadatum: metadatum::Metadatum::Array(metadatum::Array {
        items: vec![
            Metadatum {
                metadatum: metadatum::Metadatum::Text("Hello World".to_string()).into(),
            }
        ],
    }).into(),
};
assert_eq!(text_pattern.is_match(&array_metadatum), MatchOutcome::Positive);

// Test map matching
let map_metadatum = Metadatum {
    metadatum: metadatum::Metadatum::Map(metadatum::Map {
        pairs: vec![
            metadatum::Pair {
                key: Some(Metadatum {
                    metadatum: metadatum::Metadatum::Text("key".to_string()).into(),
                }),
                value: Some(Metadatum {
                    metadatum: metadatum::Metadatum::Text("Hello World".to_string()).into(),
                }),
            }
        ],
    }).into(),
};
assert_eq!(text_pattern.is_match(&map_metadatum), MatchOutcome::Positive);
src/filters/select/eval/mod.rs (1)

231-244: Verify that a clarifying comment is added to document the OR semantics for map key/value matching.

The verification confirms the review comment is accurate: the code implements OR logic across independent key and value checks. Both fold_any_of and the Add trait use short-circuit OR semantics—any Positive match returns Positive immediately.

The map matching logic (lines 238-242) independently checks all keys and all values, then combines the results with the + operator, which implements OR. This means a map matches if any key OR any value matches, not if a key-value pair contains the pattern together.

While the implementation is correct and intentional, there are no comments explaining this non-obvious behavior. The original review's suggestion to add a clarifying comment remains valid and should be addressed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 292b62b and 96c7634.

📒 Files selected for processing (2)
  • src/filters/select/eval/metadata.rs (2 hunks)
  • src/filters/select/eval/mod.rs (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/filters/select/eval/metadata.rs (1)
src/filters/select/eval/mod.rs (16)
  • is_match (122-122)
  • is_match (142-148)
  • is_match (152-154)
  • is_match (158-160)
  • is_match (164-166)
  • is_match (170-172)
  • is_match (187-194)
  • is_match (213-217)
  • is_match (221-228)
  • is_match (232-245)
  • is_match (273-275)
  • is_match (299-316)
  • is_match (337-359)
  • is_match (371-377)
  • is_match (398-422)
  • is_match (535-547)
src/filters/select/eval/mod.rs (1)
src/filters/select/eval/metadata.rs (3)
  • is_match (12-17)
  • is_match (53-59)
  • is_match (63-65)
🔇 Additional comments (5)
src/filters/select/eval/metadata.rs (2)

4-4: LGTM: Serde attribute enables proper TOML deserialization.

The rename_all = "lowercase" attribute is consistent with other patterns in the codebase and aligns with the PR objective to enable proper TOML deserialization.


125-146: LGTM: Test validates basic regex pattern construction.

The test correctly validates that TextPattern::Regex can be constructed within MetadatumPattern::Text and that basic regex matching works as expected.

src/filters/select/eval/mod.rs (3)

204-210: LGTM: PartialEq implementation correctly compares regex patterns.

The implementation compares Regex instances by their string representation using as_str(), which is a reasonable approach since Regex doesn't implement PartialEq natively. This allows two regex patterns with the same expression to be considered equal.


660-671: LGTM: Test validates PartialEq implementation.

The test correctly verifies that TextPattern instances with identical regex patterns are equal and different patterns are not equal.


673-694: LGTM: Comprehensive UTF-8 byte matching test.

The test thoroughly validates the UTF-8 handling:

  • Valid UTF-8 bytes that match the pattern
  • Valid UTF-8 bytes that don't match
  • Invalid UTF-8 sequences returning Uncertain

This ensures robust error handling for byte-based pattern matching.

Comment on lines +197 to +202
#[derive(Serialize, Deserialize, Clone, Debug)]
#[serde(rename_all = "lowercase")]
pub enum TextPattern {
Exact(String),
// TODO: Regex
#[serde(with = "serde_ext::regex_pattern")]
Regex(regex::Regex),
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the removal of Exact string matching variant.

The PR changes TextPattern from supporting an Exact(String) variant to only Regex(regex::Regex). This is a breaking API change that removes the ability to perform exact (non-regex) string matching.

If exact matching is still needed, users would need to escape regex metacharacters or use anchored patterns like ^exact_string$. Please confirm this removal is intentional and consider:

  1. Documenting this breaking change in the PR description
  2. Providing migration guidance for existing configurations
  3. Evaluating if a simple exact-match fast path is needed before falling back to regex

I'll help verify this review comment by investigating the codebase. Let me start by exploring the TextPattern usage and the scope of this breaking change.


🏁 Script executed:

#!/bin/bash
# Search for all TextPattern usages to understand the impact
echo "=== TextPattern definitions and usages ==="
rg -n "TextPattern" --type rust -C 2 | head -100

echo -e "\n=== Searching for TextPattern::Exact references ==="
rg -n "Exact\(" --type rust -C 1

echo -e "\n=== Checking for any exact matching references in tests/configs ==="
rg -in "exact" src/filters/select/ --type rust -C 1 | head -50

Length of output: 6473


Let me investigate the git history and check for any documentation about this change:


🏁 Script executed:

#!/bin/bash
# Check git history for TextPattern changes
echo "=== Git history for TextPattern changes ==="
git log -p --all -S "TextPattern::Exact" -- src/filters/select/ 2>/dev/null | head -80 || echo "No TextPattern::Exact in git history"

echo -e "\n=== Recent commits affecting TextPattern ==="
git log --oneline -n 20 -- src/filters/select/eval/mod.rs 2>/dev/null || echo "Git unavailable"

echo -e "\n=== Check for configuration examples or documentation ==="
find . -type f \( -name "*.md" -o -name "*.json" -o -name "*.yaml" -o -name "*.toml" \) | xargs grep -l "TextPattern\|exact.*match" 2>/dev/null | head -20

echo -e "\n=== Search for serde tests that might reference old format ==="
rg -n "serde\|deserialize" src/filters/select/eval/mod.rs --type rust -A 5 -B 5 | grep -A 5 -B 5 "text_pattern\|Text" | head -60

Length of output: 740


🏁 Script executed:

#!/bin/bash
# Check git history for TextPattern changes
echo "=== Git history for TextPattern changes ==="
git log -p --all -S "TextPattern::Exact" -- src/filters/select/ 2>/dev/null | head -80 || echo "No TextPattern::Exact in git history"

echo -e "\n=== Recent commits affecting TextPattern ==="
git log --oneline -n 20 -- src/filters/select/eval/mod.rs 2>/dev/null || echo "Git unavailable"

echo -e "\n=== Check for PR description or documentation ==="
find . -type f \( -name "*.md" -o -name "CHANGELOG*" \) -exec grep -l "TextPattern\|exact.*match\|breaking" {} \; 2>/dev/null | head -10

Length of output: 695


Let me check the CHANGELOG and examine the specific commits that introduced these regex changes:


🏁 Script executed:

#!/bin/bash
# Check CHANGELOG for breaking changes related to TextPattern
echo "=== CHANGELOG.md content (grep for relevant sections) ==="
grep -A 10 -B 5 -i "textpattern\|exact.*match\|breaking\|select.*filter" CHANGELOG.md 2>/dev/null | head -100

echo -e "\n=== Examine commit that added regex support ==="
git show 37725d8:src/filters/select/eval/mod.rs 2>/dev/null | grep -A 20 "enum TextPattern" | head -30

echo -e "\n=== Check parent commit to see previous TextPattern definition ==="
git show 37725d8^:src/filters/select/eval/mod.rs 2>/dev/null | grep -A 20 "enum TextPattern" | head -30

Length of output: 4208


Let me check which specific commit removed the Exact variant and look for any migration guidance:


🏁 Script executed:

#!/bin/bash
# Find which commit removed TextPattern::Exact
echo "=== Commits modifying TextPattern enum ==="
git log --oneline -p 37725d8..HEAD -- src/filters/select/eval/mod.rs 2>/dev/null | grep -A 30 "enum TextPattern" | head -100

echo -e "\n=== Check if Exact variant appears in any recent commit ==="
git log -p 37725d8..HEAD -- src/filters/select/eval/mod.rs 2>/dev/null | grep -B 5 -A 5 "Exact(String)" | head -80

echo -e "\n=== Look for migration notes or comments in recent changes ==="
git log --oneline -p 1403f17..HEAD -- src/filters/select/ 2>/dev/null | grep -i "migration\|deprecat\|breaking\|exact" | head -40

Length of output: 1655


Confirm breaking API change: TextPattern::Exact variant removed.

Verification confirms the review comment is correct. The TextPattern::Exact(String) variant was removed in recent commits, eliminating exact string matching capability. This is a breaking change affecting:

  • API compatibility: Code or configurations using TextPattern::Exact will no longer compile/load
  • Functionality: Exact matching is no longer available; users must now use regex patterns (e.g., ^exact_string$)
  • Performance: Regex matching is slower than direct string equality

The commit history shows the Exact variant existed in earlier commits and was cleanly removed without deprecation warnings or migration guidance. The PR should either:

  1. Document this breaking change explicitly in the PR description and CHANGELOG
  2. Provide clear migration guidance for users with existing exact-match configurations
  3. Reconsider if an exact-match fast path should be preserved before falling back to regex
🤖 Prompt for AI Agents
In src/filters/select/eval/mod.rs around lines 197 to 202, the TextPattern enum
had its Exact(String) variant removed, breaking API compatibility and removing
fast exact-match behavior; restore or mitigate this by reintroducing a
TextPattern::Exact(String) variant (with serde support) alongside Regex, update
serialization attributes to handle both variants, implement matching logic to
prefer exact equality for Exact to preserve performance, and add migration notes
in the PR/CHANGELOG and examples showing how to convert existing Exact uses to
either the restored Exact variant or equivalent Regex (^...$) if you
intentionally keep the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant