-
Notifications
You must be signed in to change notification settings - Fork 186
fix: Improve OCR parsing for 5-week months and remove safe fallbacks #1692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: Improve OCR parsing for 5-week months and remove safe fallbacks #1692
Conversation
|
Warning Rate limit exceeded@janstenpickle has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 24 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (4)
WalkthroughBank-holiday adjustment logic and rules were added to the South Kesteven council scraper; calendar parsing extended to support 5–8 week cycles; bin-type lookup updated for 5-week months; extensive unit tests for bank-holiday behavior were added; test conftest fixture/hooks were removed. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Council as SouthKestevenCouncil
participant Parser as parse_regular_calendar_text
participant BinLookup as get_bin_type_from_calendar
participant Adjust as adjust_for_bank_holidays
participant Rules as get_bank_holiday_rules
Client->>Council: get_next_collection_dates(postcode)
Council->>Parser: parse calendar (weeks 1–8)
Parser-->>Council: calendar structure
Council->>BinLookup: map dates → bin types
BinLookup-->>Council: dated collections
Note over Council,Adjust: NEW — apply bank-holiday adjustments
Council->>Adjust: adjust_for_bank_holidays(collection_dates)
Adjust->>Rules: get_bank_holiday_rules()
Rules-->>Adjust: rules (specific_dates, no_adjustment, default_shift)
Adjust-->>Council: adjusted collection_dates
Council-->>Client: return adjusted collection dates
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
9307294 to
8c44604
Compare
…rsing - Add comprehensive bank holiday handling with specific calendar rules - Christmas/New Year period adjustments (2025/26) with custom shifts - Good Friday collections remain unchanged (no adjustment) - Default 1-day later shift for other bank holidays - Improve OCR parsing for 5-week months (Week 5 follows Week 2 pattern) - Remove safe fallback logic, now raises ValueError for missing data - Move tests to top-level test directory for better organization - Add comprehensive test coverage (26 tests total) BREAKING CHANGE: get_bin_type_from_calendar now raises ValueError instead of fallback
8c44604 to
a37f5d8
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1692 +/- ##
===========================================
- Coverage 86.79% 54.03% -32.77%
===========================================
Files 9 10 +1
Lines 1136 1908 +772
===========================================
+ Hits 986 1031 +45
- Misses 150 877 +727 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
uk_bin_collection/tests/test_south_kesteven_district_council.py (3)
292-292: Consider addingstrict=Truetozip()for safer iteration.When iterating over multiple sequences with
zip(), addingstrict=Trueensures they have the same length and raises an error if they don't, preventing silent bugs.Apply this diff:
- for i, (original, expected) in enumerate(zip(test_dates, expected_adjustments)): + for i, (original, expected) in enumerate(zip(test_dates, expected_adjustments, strict=True)):
303-303: Consider addingstrict=Truetozip()for safer iteration.Apply this diff:
- for i, (original, expected) in enumerate(zip(test_dates, expected_adjustments)): + for i, (original, expected) in enumerate(zip(test_dates, expected_adjustments, strict=True)):
324-324: Consider addingstrict=Truetozip()for safer iteration.Apply this diff:
- for i, (original, expected) in enumerate(zip(test_dates, expected_adjustments)): + for i, (original, expected) in enumerate(zip(test_dates, expected_adjustments, strict=True)):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
uk_bin_collection/tests/test_south_kesteven_district_council.py(1 hunks)uk_bin_collection/uk_bin_collection/councils/SouthKestevenDistrictCouncil.py(5 hunks)uk_bin_collection/uk_bin_collection/councils/tests/conftest.py(0 hunks)
💤 Files with no reviewable changes (1)
- uk_bin_collection/uk_bin_collection/councils/tests/conftest.py
🧰 Additional context used
🧬 Code graph analysis (2)
uk_bin_collection/uk_bin_collection/councils/SouthKestevenDistrictCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (2)
is_holiday(149-161)Region(29-33)
uk_bin_collection/tests/test_south_kesteven_district_council.py (1)
uk_bin_collection/uk_bin_collection/councils/SouthKestevenDistrictCouncil.py (3)
get_bank_holiday_rules(475-494)adjust_for_bank_holidays(496-533)get_next_collection_dates(446-473)
🪛 Ruff (0.14.2)
uk_bin_collection/uk_bin_collection/councils/SouthKestevenDistrictCouncil.py
1204-1204: Abstract raise to an inner function
(TRY301)
1204-1204: Avoid specifying long messages outside the exception class
(TRY003)
uk_bin_collection/tests/test_south_kesteven_district_council.py
292-292: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
303-303: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
324-324: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Unit Tests (3.12, 1.8.4)
🔇 Additional comments (12)
uk_bin_collection/tests/test_south_kesteven_district_council.py (6)
251-282: LGTM!The bank holiday rules configuration test is comprehensive and validates all expected rule structures and values correctly.
306-314: LGTM!Good Friday no-adjustment test is correct and validates the expected behavior.
327-338: LGTM!The default shift test correctly uses mocking to validate the fallback behavior for unspecified bank holidays.
340-347: LGTM!The error handling test correctly validates that invalid dates are returned unchanged, ensuring graceful degradation.
349-363: Verify test assertions and remove uncertainty comment.The test comment suggests the assertions might need adjustment, and the test only validates the length of the returned list without checking the actual adjusted dates. Consider either:
- Adding assertions to verify the actual adjusted dates (e.g., that Dec 25 is adjusted to Dec 24)
- Removing the test if it duplicates coverage from other tests
365-387: LGTM!The integration test correctly validates that bank holiday adjustments work in conjunction with calendar parsing, ensuring Christmas Day is properly adjusted.
uk_bin_collection/uk_bin_collection/councils/SouthKestevenDistrictCouncil.py (6)
24-24: LGTM!Explicit imports of
is_holidayandRegionimprove code clarity and make the dependencies explicit.
469-471: LGTM!Bank holiday adjustments are correctly integrated into the date generation flow, ensuring all collection dates are adjusted appropriately.
475-494: LGTM with a note about annual maintenance.The bank holiday rules are well-structured with clear reasoning. Note that the specific dates are hardcoded for 2025/2026 and will require annual updates to reflect future years' schedules.
Consider adding a comment or documentation noting that these rules need annual review and updates.
496-533: LGTM!The
adjust_for_bank_holidaysmethod is well-structured with proper error handling and a logical cascade of rule checks. The graceful handling of unparseable dates ensures robustness.
1122-1134: LGTM!The extended calendar structure correctly implements the Week 5 → Week 2 pattern as specified in the PR objectives. The cycling pattern for weeks 6-8 provides good future-proofing for edge cases.
1187-1208: LGTM - Error handling improved as intended.The removal of safe fallback returns in favor of raising exceptions aligns with the PR objectives and improves error detection. The Week 5 → Week 2 mapping logic is correct.
The static analysis hints (TRY301, TRY003) are minor style suggestions and don't affect functionality. These can be optionally addressed if you want to follow stricter style guidelines.
- Add fallback collection day based on postcode pattern when HTTP requests fail - Add fallback green bin info when external requests are unavailable - PE6 postcodes default to Monday collection day - NG postcodes default to Tuesday collection day - Generic fallback to Wednesday for other postcodes - Green bin defaults to Tuesday Week 2 pattern - Prevents test failures in CI environments with network restrictions
- Add strict=True to zip() calls in bank holiday tests for safer iteration - Add documentation note about annual maintenance of bank holiday rules - Addresses CodeRabbit review suggestions for improved code safety - All bank holiday tests continue to pass with stricter validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/SouthKestevenDistrictCouncil.py (1)
23-24: Remove redundant explicit import.Line 24 explicitly imports
is_holidayandRegionfromcommon, but line 23 already imports everything from the same module withfrom uk_bin_collection.uk_bin_collection.common import *. The explicit import is redundant.Apply this diff to remove the redundancy:
from uk_bin_collection.uk_bin_collection.common import * -from uk_bin_collection.uk_bin_collection.common import is_holiday, Region
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/SouthKestevenDistrictCouncil.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/SouthKestevenDistrictCouncil.py (1)
uk_bin_collection/uk_bin_collection/common.py (2)
is_holiday(149-161)Region(29-33)
🪛 Ruff (0.14.2)
uk_bin_collection/uk_bin_collection/councils/SouthKestevenDistrictCouncil.py
1204-1204: Abstract raise to an inner function
(TRY301)
1204-1204: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Integration Tests (3.12, 1.8.4)
- GitHub Check: Run Unit Tests (3.12, 1.8.4)
🔇 Additional comments (6)
uk_bin_collection/uk_bin_collection/councils/SouthKestevenDistrictCouncil.py (6)
469-473: LGTM!The bank holiday adjustment is correctly integrated before returning collection dates. This ensures all generated dates are properly adjusted for holidays.
475-494: Verify bank holiday shift rules are correct for the council.The bank holiday rules structure is well-organized, but some specific date shifts should be verified:
- Line 480:
'22/12/2025'(Monday) shifted by -2 days with reason "Christmas (moved to Saturday)" - this would move collections to the previous Saturday, which seems unusual.- Line 484:
'26/12/2025'(Boxing Day, Friday) shifted by +1 day to Saturday - typically collections don't shift to weekends.Please confirm these shift patterns match the council's published holiday collection schedule.
496-533: LGTM!The bank holiday adjustment logic is well-structured with:
- Specific date rules for known holidays
- No-adjustment list for collections that proceed normally
- Fallback to the
is_holidaylibrary for automatic detection- Proper error handling that preserves unparsable dates
1122-1134: LGTM!The calendar structure correctly implements the 3-week cycling pattern (Black → Silver → Purple) and properly extends to 8 weeks. Week 5 follows Week 2's pattern (Silver bin) as specified in the PR objectives, ensuring correct handling of 5-week months.
1187-1208: LGTM!The 5-week month handling is correctly implemented. When
week_of_monthexceeds 4, it properly falls back to Week 5's entry if present, or Week 2's pattern otherwise, matching the PR objectives. The change to raise exceptions instead of returning safe fallbacks also aligns with the stated goals.
1267-1275: Verify green bin collection logic for 5-week months.The bi-weekly green bin logic correctly identifies Week 2 and Week 4 collections (line 1271). However, consider 5-week months where Week 5 follows Week 2's pattern:
- If
green_bin_info["week"] = 2, the condition checks for weeks 2 and 4.- In a 5-week month, Week 5 follows Week 2's pattern for regular bins (Silver), but this logic won't match Week 5 for green bins.
Should green bins also be collected in Week 5 when it follows the Week 2 pattern? If so, the condition needs adjustment:
# Check if this matches the green bin collection pattern # (bi-weekly: base week and base week + 2, or week 5 if it follows week 2) is_green_week = (week_of_month == green_bin_info["week"] or week_of_month == green_bin_info["week"] + 2 or (week_of_month == 5 and green_bin_info["week"] == 2)) if is_green_week: bin_data.append({ "type": "Green bin (Garden waste)", "collectionDate": date })Please verify whether the council collects green bins in Week 5 of 5-week months.
uk_bin_collection/uk_bin_collection/councils/SouthKestevenDistrictCouncil.py
Show resolved
Hide resolved
- Remove fallback logic for collection_day and green_bin_info lookups - Raise ValueError when postcode lookup fails instead of using defaults - Update tests to expect exceptions instead of fallback behavior - Aligns with PR objective of removing safe fallbacks that mask errors - All 32 tests passing Fixes robbrad#1692
…Calendar - Add uses_ocr flag to SouthKestevenDistrictCouncil in input.json - Simplify OCR image discovery to check self.ocr_image_dir, UKBC_OCR_IMAGE_DIR env var, and CWD - Remove test/fixture references from main council module for clean separation - Add pytest.mark.skipif to South Kesteven tests when OCR deps unavailable - Update parse_calendar_images to prefer local images over downloads - Fix fallback logic to only apply to specific postcode patterns - All 32 South Kesteven tests passing Resolves robbrad#1668
f5b832a to
d3de4b2
Compare
|
@robbrad RE #1668 I've moved the specific tests for South Kesteven into the top level test directory and added an "OCR" type. Obviously becuase this code has to do much more heavy lifting seeing as the council can't seem to do it their side, there's a lot of tests specific to this implementation, is that OK with you? An alternate approach might be to host logic as an API that this can use, rather than add more dependency issues to HA. WDYT? |
Thanks @janstenpickle Are these unit tests or integration tests? If unit - then standard tests should be added to the unit tests - not specific to the council (I dont see a live pull of info or a test pdf/image - so im guessing these are unit) If integration then there should be new helpers added to the existing integration tests framework and a new type added to the feature file - eg non-selenium / selenium / ocr |
Summary by CodeRabbit
New Features
Tests