Skip to content

Conversation

@rlskoeser
Copy link
Member

@rlskoeser rlskoeser commented Feb 7, 2025

Experimental combined date converter that aggregates existing parsers into a single combined parser that can handle known formats supported by our various parsers.

  • moved all grammar files into a single location so we can use relative imports (simplifies this work and sets up future shared parser functionality such as missing digits)
  • aggregate parser/transformer that combines and hands off logic to existing parsers
  • explicitly do not support serialization (must use a specific converter)

Summary by CodeRabbit

  • New Features

    • Added an omnibus date converter that parses EDTF, Hebrew, and Islamic/Hijri formats in one interface (serialization/formatting unsupported).
  • Documentation

    • Added docs for the combined converter and expanded the converters overview.
  • Improvements

    • Improved Hebrew and Islamic parsing (better year/day handling) and consolidated shared grammars to reduce ambiguities.
  • Tests

    • Added tests for combined parsing, error cases, and non-serializable behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@rlskoeser rlskoeser marked this pull request as draft February 7, 2025 16:20
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 7, 2025

Walkthrough

Adds a shared GRAMMAR_FILE_PATH constant, centralizes grammar file resolution, implements year/day token handlers, introduces a combined Lark grammar and parser plus an OmnibusDateConverter that merges EDTF, Hebrew, and Islamic transformers, and adds tests and docs for the combined parser.

Changes

Cohort / File(s) Summary
Core exports & constants
src/undate/converters/__init__.py, src/undate/converters/base.py
Add and export GRAMMAR_FILE_PATH (Path); update __all__ to include GRAMMAR_FILE_PATH and expose BaseDateConverter.
Combined parser & converter
src/undate/converters/combined.py, src/undate/converters/grammars/combined.lark
New omnibus grammar and module: build combined Lark parser, create combined_transformer via merge_transformers, add CombinedDateTransformer, and add OmnibusDateConverter (parse supported; serialization unsupported; handles empty/unrecognized input errors).
Grammar files
src/undate/converters/grammars/combined.lark, src/undate/converters/grammars/hebrew.lark
Add combined.lark that composes/renames EDTF/Hebrew/Islamic rules; minor whitespace/comment fix in hebrew.lark.
Parser path refactor
src/undate/converters/calendars/hebrew/parser.py, src/undate/converters/calendars/islamic/parser.py, src/undate/converters/edtf/parser.py
Replace per-file pathlib grammar resolution with GRAMMAR_FILE_PATH / "<name>.lark" and import GRAMMAR_FILE_PATH.
Transformer method additions
src/undate/converters/calendars/hebrew/transformer.py, src/undate/converters/calendars/islamic/transformer.py, src/undate/converters/edtf/transformer.py
Add year(self, items) in EDTF/Hebrew/Islamic transformers; add day(self, items) in Islamic transformer. Methods aggregate token parts into Tree("year"/"day", ...).
Tests
tests/test_converters/test_combined_parser.py
Add tests exercising parser+transformer and OmnibusDateConverter for EDTF, Hebrew, Islamic inputs; validate parse successes, error paths, and serialization prohibition.
Docs
docs/undate/converters.rst
Add Sphinx automodule/autoclass entries documenting undate.converters and undate.converters.combined.OmnibusDateConverter.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Omnibus as OmnibusDateConverter
    participant Parser as Lark Parser (combined.lark)
    participant CombTF as CombinedDateTransformer
    participant TFs as Merged TFs (EDTF / Hebrew / Islamic)
    participant Result as Undate / UndateInterval

    Client->>Omnibus: parse(value)
    Omnibus->>Parser: parse(value)
    alt parsed tree
        Parser->>CombTF: return parse tree
        CombTF->>TFs: dispatch/transform(tree)
        TFs->>TFs: calendar-specific transform
        TFs->>CombTF: transformed Undate/UndateInterval
        CombTF->>Omnibus: return first result
        Omnibus->>Client: return Undate/UndateInterval
    else empty or unrecognized
        Parser-->>Omnibus: raise UnexpectedCharacters / no match
        Omnibus-->>Client: raise ValueError
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect combined.lark imports/renames and override rules for ambiguity avoidance.
  • Verify merge_transformers usage for rule-name collisions and ordering.
  • Review OmnibusDateConverter error handling (empty input, UnexpectedCharacters) and explicit no-serialization behavior.
  • Confirm GRAMMAR_FILE_PATH points to correct directory and all parser modules use it consistently.

Possibly related PRs

Suggested reviewers

  • robcast
  • jdamerow
  • ColeDCrawford

Poem

🐇 I found three grammars, stitched them in a row,
EDTF, Hebrew, Hijri — now together they flow.
I hop through parse trees, merging tokens with care,
One omnibus parser — all dates now share.
Hop, parse, return — cheerful rabbit flair!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and concisely describes the primary change: introduction of an experimental combined/omnibus date parser that aggregates multiple existing parsers.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/combined-parser

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0691b12 and ce5baaa.

📒 Files selected for processing (2)
  • docs/undate/converters.rst (2 hunks)
  • src/undate/converters/__init__.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/undate/converters/init.py
  • docs/undate/converters.rst

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Feb 7, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.96%. Comparing base (09d90f3) to head (0691b12).
⚠️ Report is 6 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #112      +/-   ##
===========================================
+ Coverage    98.93%   98.96%   +0.03%     
===========================================
  Files           38       40       +2     
  Lines         2057     2121      +64     
===========================================
+ Hits          2035     2099      +64     
  Misses          22       22              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rlskoeser rlskoeser changed the title Preliminary combined date parser Experimental combined / omnibus date parser Dec 18, 2025
@rlskoeser rlskoeser marked this pull request as ready for review December 18, 2025 21:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a33e43b and 7a99c5c.

📒 Files selected for processing (13)
  • docs/undate/converters.rst (1 hunks)
  • src/undate/converters/__init__.py (1 hunks)
  • src/undate/converters/base.py (1 hunks)
  • src/undate/converters/calendars/hebrew/parser.py (1 hunks)
  • src/undate/converters/calendars/hebrew/transformer.py (1 hunks)
  • src/undate/converters/calendars/hijri/parser.py (1 hunks)
  • src/undate/converters/calendars/hijri/transformer.py (1 hunks)
  • src/undate/converters/combined.py (1 hunks)
  • src/undate/converters/edtf/parser.py (1 hunks)
  • src/undate/converters/edtf/transformer.py (1 hunks)
  • src/undate/converters/grammars/combined.lark (1 hunks)
  • src/undate/converters/grammars/hebrew.lark (1 hunks)
  • tests/test_converters/test_combined_parser.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tests/test_converters/test_combined_parser.py (3)
src/undate/undate.py (2)
  • Undate (35-487)
  • UndateInterval (490-579)
src/undate/converters/combined.py (1)
  • parse (65-81)
src/undate/converters/base.py (1)
  • parse (66-74)
src/undate/converters/combined.py (5)
src/undate/undate.py (2)
  • Undate (35-487)
  • UndateInterval (490-579)
src/undate/converters/base.py (3)
  • BaseDateConverter (59-135)
  • parse (66-74)
  • to_string (76-85)
src/undate/converters/edtf/transformer.py (1)
  • EDTFTransformer (6-78)
src/undate/converters/calendars/hebrew/transformer.py (1)
  • HebrewDateTransformer (12-40)
src/undate/converters/calendars/hijri/transformer.py (1)
  • HijriDateTransformer (12-49)
src/undate/converters/edtf/transformer.py (3)
src/undate/converters/calendars/hebrew/transformer.py (1)
  • year (29-32)
src/undate/converters/calendars/hijri/transformer.py (1)
  • year (31-35)
src/undate/undate.py (1)
  • year (380-388)
src/undate/converters/calendars/hebrew/transformer.py (3)
src/undate/converters/calendars/hijri/transformer.py (1)
  • year (31-35)
src/undate/converters/edtf/transformer.py (1)
  • year (69-72)
src/undate/undate.py (1)
  • year (380-388)
🔇 Additional comments (21)
src/undate/converters/edtf/transformer.py (1)

69-72: LGTM!

The new year method correctly mirrors the pattern used in HebrewDateTransformer.year and HijriDateTransformer.year (per the relevant snippets), enabling consistent year handling across the combined transformer pipeline. The implementation also aligns with the existing year_unspecified method in this same class.

src/undate/converters/base.py (1)

47-57: LGTM!

Clean centralization of grammar file path resolution. Using pathlib.Path(__file__).parent ensures cross-platform compatibility, and having a single GRAMMAR_FILE_PATH constant eliminates duplication across the various parser modules.

docs/undate/converters.rst (1)

15-18: LGTM!

The autoclass directive for OmnibusDateConverter is correctly formatted and will properly document the new combined parser.

src/undate/converters/calendars/hijri/parser.py (1)

3-5: LGTM!

Clean refactor to use the centralized GRAMMAR_FILE_PATH, eliminating local path construction and aligning with the pattern applied to other parsers in this PR.

src/undate/converters/grammars/hebrew.lark (1)

14-30: LGTM!

Formatting improvements only—the comment whitespace fix and one-alternative-per-line layout improve readability without changing grammar behavior.

src/undate/converters/edtf/parser.py (1)

3-5: LGTM! Centralized grammar path approach.

The refactoring to use GRAMMAR_FILE_PATH from a shared constant promotes consistency across all parsers and simplifies future grammar path management.

src/undate/converters/calendars/hebrew/parser.py (1)

3-5: LGTM! Consistent with the centralized grammar path refactoring.

The Hebrew parser now uses the same centralized GRAMMAR_FILE_PATH approach as the EDTF parser, maintaining consistency across the codebase.

src/undate/converters/calendars/hebrew/transformer.py (1)

29-32: LGTM! Consistent year aggregation pattern.

The new year() method correctly aggregates year parts into a single string and returns the expected Tree structure. This implementation aligns with the patterns used in EDTFTransformer.year() and HijriDateTransformer.year(), maintaining consistency across calendar transformers.

src/undate/converters/calendars/hijri/transformer.py (1)

31-41: LGTM! Year and day aggregation methods added.

Both methods correctly aggregate parts into strings and return proper Tree structures. The implementations are consistent with the Hebrew transformer pattern.

The inline comments about anonymous tokens in the combined parser are helpful for future maintainers, even though the root cause isn't fully explained.

src/undate/converters/__init__.py (1)

1-3: LGTM! Public API exposure of centralized grammar path.

The export of GRAMMAR_FILE_PATH enables consistent access to the centralized grammar directory across the converter ecosystem.

tests/test_converters/test_combined_parser.py (3)

9-22: Excellent test coverage across calendar systems.

The test cases comprehensively cover EDTF (years with unknowns, ranges), Hebrew (month+year), and Hijri (with and without day) date formats, validating the omnibus parser's multi-calendar support.


32-34: Good approach to work around equality semantics.

Using repr() comparison is appropriate here since the comment explains that "the same unknown date is not considered strictly equal." This ensures the test validates the parsed structure rather than relying on potentially loose equality implementations.


43-45: Good validation of non-serialization constraint.

This test confirms that the omnibus converter correctly raises ValueError when serialization is attempted, aligning with the documented limitation.

src/undate/converters/grammars/combined.lark (3)

1-4: LGTM! Clean grammar orchestration.

The combined grammar correctly imports and coordinates the three calendar systems, with proper whitespace handling and a clear start rule.


12-24: Good use of namespaced imports.

The relative imports with explicit namespace prefixes (edtf__, hebrew__, hijri__) properly isolate each grammar's rules while enabling composition. The inline comments helpfully reference the Lark documentation for this pattern.


27-32: Thoughtful ambiguity resolution for year-only dates.

The overrides to exclude year-only forms from Hebrew and Hijri grammars are well-documented and correctly address the ambiguity problem—without calendar context, a year-only date would be unclear. Defaulting to EDTF/Gregorian for such cases is reasonable and explicitly noted in the combined.py module documentation.

src/undate/converters/combined.py (5)

20-23: LGTM! Minimal delegation transformer.

The CombinedDateTransformer.start() method correctly returns the parsed children, allowing the merged sub-transformers (EDTF, Hebrew, Hijri) to handle the actual transformation logic.


31-36: LGTM! Proper transformer composition.

The merge_transformers() call correctly combines the three calendar-specific transformers with appropriate namespace prefixes matching the grammar imports in combined.lark.


40-42: LGTM! Parser initialization with strict mode.

Using Lark.open() with the combined grammar and strict=True enables better error detection and validates the grammar structure.


65-81: Solid parse implementation with appropriate error handling.

The method correctly:

  • Validates non-empty input (lines 70-71)
  • Delegates to the parser and transformer (lines 75-77)
  • Converts UnexpectedCharacters to ValueError with a user-friendly message (lines 78-81)

The indexing on line 77 ([0]) is safe because the CombinedDateTransformer.start() method returns the non-empty children list from a successful parse.


83-85: LGTM! Clear non-serialization boundary.

The to_string() method appropriately raises ValueError as documented, making the limitation explicit and preventing silent failures.

Comment on lines 7 to 10

..note: base converter is pretty developer-facing documentation;
.. calendar/converter usage would be helpful to add
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

RST directive syntax is malformed.

The note directive should use .. note:: (two dots, space, directive name, double colon). The current syntax will not render as a note block in Sphinx documentation.

🔎 Proposed fix
-..note: base converter is pretty developer-facing documentation; 
-.. calendar/converter usage would be helpful to add
+.. note::
+
+   Base converter is pretty developer-facing documentation;
+   calendar/converter usage would be helpful to add.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
..note: base converter is pretty developer-facing documentation;
.. calendar/converter usage would be helpful to add
.. note::
Base converter is pretty developer-facing documentation;
calendar/converter usage would be helpful to add.
🤖 Prompt for AI Agents
In docs/undate/converters.rst around lines 7 to 10, the RST note directive is
malformed ("..note:") and won't render; change it to the correct Sphinx syntax
(".. note::") and ensure a blank line after the directive before the note
content, and also add a short example section showing calendar/converter usage
(header, brief usage snippet or steps) to make the converter guidance more
user-facing.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/undate/converters/calendars/islamic/transformer.py (1)

31-35: Consider clarifying or investigating the anonymous token behavior.

The comment indicates uncertainty about why an anonymous token appears in the combined parser. While the implementation handles this correctly, it would be valuable to either:

  • Document the root cause if understood, or
  • Add a TODO to investigate this behavior further

This helps future maintainers understand whether this is expected or a quirk that might need addressing.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a99c5c and e655936.

📒 Files selected for processing (10)
  • docs/undate/converters.rst (1 hunks)
  • src/undate/converters/base.py (2 hunks)
  • src/undate/converters/calendars/hebrew/transformer.py (1 hunks)
  • src/undate/converters/calendars/islamic/parser.py (1 hunks)
  • src/undate/converters/calendars/islamic/transformer.py (1 hunks)
  • src/undate/converters/combined.py (1 hunks)
  • src/undate/converters/edtf/transformer.py (1 hunks)
  • src/undate/converters/grammars/combined.lark (1 hunks)
  • src/undate/converters/grammars/hebrew.lark (1 hunks)
  • tests/test_converters/test_combined_parser.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/undate/converters/base.py
  • docs/undate/converters.rst
  • src/undate/converters/edtf/transformer.py
🧰 Additional context used
🧬 Code graph analysis (3)
tests/test_converters/test_combined_parser.py (3)
src/undate/undate.py (1)
  • Undate (51-678)
src/undate/converters/combined.py (1)
  • parse (65-81)
src/undate/converters/base.py (1)
  • parse (72-80)
src/undate/converters/calendars/islamic/transformer.py (1)
src/undate/undate.py (2)
  • year (484-492)
  • day (508-517)
src/undate/converters/calendars/hebrew/transformer.py (1)
src/undate/undate.py (1)
  • year (484-492)
🔇 Additional comments (16)
src/undate/converters/calendars/hebrew/transformer.py (2)

29-30: LGTM! Good extensibility design.

The use of self.calendar to pass the calendar parameter enables easy subclassing (e.g., for Seleucid calendar) and aligns well with the multi-calendar architecture of this PR.


32-35: LGTM! Consistent with EDTF transformer pattern.

The implementation correctly concatenates items and returns a Tree structure, following the same pattern used in the EDTF transformer.

src/undate/converters/calendars/islamic/transformer.py (1)

37-41: LGTM! Consistent implementation.

The day() method correctly follows the same pattern as the year() method. The anonymous token concern applies here as well (see comment on Line 31-35).

src/undate/converters/grammars/hebrew.lark (2)

14-14: LGTM! Minor typo fix.

Corrected double space in comment.


18-30: LGTM! Improved readability.

The reformatting of month alternatives across multiple lines enhances readability without changing semantics.

src/undate/converters/calendars/islamic/parser.py (1)

3-5: LGTM! Good centralization of grammar path resolution.

Replacing the local pathlib-based path construction with the shared GRAMMAR_FILE_PATH constant improves maintainability and aligns with the centralized grammar management approach across all parsers in this PR.

tests/test_converters/test_combined_parser.py (4)

9-22: LGTM! Comprehensive test coverage.

The test cases appropriately cover multiple scenarios:

  • EDTF formats (years, unknown digits, intervals)
  • Hebrew calendar dates
  • Islamic/Hijri calendar dates
  • Various date component combinations (year-only, month-year, full dates)

26-34: LGTM! Appropriate use of repr() comparison.

The test correctly validates the transformer behavior, and the use of repr() comparison is well-justified by the comment explaining the non-strict equality of identical calendar representations.


38-40: LGTM! Good end-to-end integration test.

This test validates the full converter path through Undate.parse(), complementing the direct transformer tests.


43-45: LGTM! Appropriate validation of serialization restriction.

The test correctly verifies that the omnibus converter raises ValueError when attempting serialization, consistent with the documented limitation.

src/undate/converters/grammars/combined.lark (2)

4-24: LGTM! Well-structured grammar composition.

The use of relative imports with namespace renaming (edtf__, hebrew__, islamic__) correctly handles grammar composition. The comment referencing the Lark GitHub issue provides helpful context for the design decision.


27-32: LGTM! Smart disambiguation strategy.

The overrides correctly prevent year-only ambiguity by requiring month+year or day+month+year for Hebrew and Islamic dates. The comment acknowledging potential future support for calendar-labeled years shows good forward thinking.

src/undate/converters/combined.py (4)

20-23: LGTM! Appropriate pass-through design.

The CombinedDateTransformer.start() method correctly delegates to the merged transformers by returning children unchanged, allowing the merged transformers to handle the actual transformation logic.


26-42: LGTM! Correct transformer merging and parser setup.

The merged transformer correctly combines the three calendar transformers with appropriate namespace prefixes matching the grammar. The parser setup with rel_to=__file__ and strict=True is appropriate. The comment about year-only interpretation provides helpful context for users.


62-81: LGTM! Robust parsing implementation.

The parse() method correctly:

  • Validates non-empty input
  • Handles the parse-transform workflow
  • Extracts the first transformed item from the returned list
  • Converts Lark's UnexpectedCharacters to a more user-friendly ValueError with a clear error message

83-85: LGTM! Correctly implements serialization restriction.

The to_string() method appropriately raises ValueError with a clear message, consistent with the documented limitation that the omnibus converter does not support serialization (since it would be ambiguous which format to serialize to).

@rlskoeser rlskoeser merged commit 96cb8d6 into develop Dec 19, 2025
4 checks passed
@rlskoeser rlskoeser deleted the feature/combined-parser branch December 19, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants