fix: Council Fix Pack - January 2026 #1772

m26dvd · 2025-12-10T12:17:47Z

BREAKING CHANGES
fix: #1500 - Renfrewshire Council - This scraper has been re-developed and now uses the UPRN value of your address only. It also no longer requires Selenium

Fixes
fix: #1771 - South Lanarkshire Council
fix: #1777 - Newark and Sherwood District Council
fix: #1760 - Folkstone and Hythe District Council

Summary by CodeRabbit

New Features
- Expanded UPRN-based lookup support for Renfrewshire, Newark and Sherwood, and additional councils.
Bug Fixes
- Improved collection data retrieval for Folkestone and Hythe, Renfrewshire, and South Lanarkshire councils.
Documentation
- Updated usage guides and command-line parameters for multiple councils; some now require UPRN instead of postcode. Added Isle of Anglesey council documentation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

fix: robbrad#1500 - Renfrewshire Council

codecov · 2025-12-10T12:20:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.81%. Comparing base (83cce53) to head (4b293ef).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1772   +/-   ##
=======================================
  Coverage   86.81%   86.81%           
=======================================
  Files           9        9           
  Lines        1138     1138           
=======================================
  Hits          988      988           
  Misses        150      150

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fix: robbrad#1771 - South Lanarkshire Council

coderabbitai · 2025-12-10T17:07:59Z

📝 Walkthrough

Walkthrough

This PR addresses issues across four council scrapers by replacing legacy approaches with modern data retrieval methods: removing Selenium web driver configuration from Folkestone and Hythe, introducing UPRN-based API fetching for Newark and Sherwood, replacing Selenium with direct HTTP and ICS parsing for Renfrewshire, adding error guards for South Lanarkshire, and updating documentation for affected councils.

Changes

Cohort / File(s)	Summary
Council Scraper: Folkestone and Hythe `uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py`	Replaced HTML section-based parsing with dual API GET requests to two endpoints; now iterates over service-card articles to extract bin types and collection dates; simplified data extraction logic.
Council Scraper: Renfrewshire `uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py`	Eliminated Selenium-based automation and Cloudflare workarounds; introduced direct HTTP approach with UPRN-based view URL construction; replaced HTML parsing with ICS file retrieval and icalevents parsing; extracts events within 365 days and maps to bin types.
Council Scraper: Newark and Sherwood `uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py`	Added UPRN validation via `check_uprn()`; constructs remote URI with pid parameter; performs HTTP GET on constructed URL instead of parsing provided page content; downstream date extraction logic preserved.
Council Scraper: South Lanarkshire `uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py`	Introduced guard clause to skip rows where `schedule_type` is "Area"; prevents index out of range error when area names lack spaces; minor date formatting adjustments.
Test Configuration `uk_bin_collection/tests/input.json`	Removed `web_driver` field from Folkestone and Hythe entry; migrated Renfrewshire from address-based fields (house_number, paon, postcode) to UPRN-based (uprn: "123118848"); updated URLs and wiki notes.
Documentation `wiki/Councils.md`	Added Isle of Anglesey entry; updated Cumberland, Epsom and Ewell, Merton, Renfrewshire, and North Hertfordshire entries with new URLs and parameter configurations (shifted from postcode/web_driver to UPRN-based parameters where applicable).

Sequence Diagrams

sequenceDiagram
    participant Client
    participant CouncilSite as Council Website
    participant ICSServer as ICS Server
    participant Parser as ICS Parser

    rect rgb(220, 245, 220)
    Note over Client,Parser: New Renfrewshire Flow
    Client->>CouncilSite: GET view URL<br/>(with UPRN)
    CouncilSite-->>Client: HTML page with ICS link
    Client->>Client: Locate ICS link on page
    Client->>ICSServer: GET ICS file
    ICSServer-->>Client: ICS calendar data
    Client->>Parser: Parse ICS events<br/>(next 365 days)
    Parser-->>Client: Bin type + collection date pairs
    Client->>Client: Sort by collectionDate
    end

    rect rgb(245, 220, 220)
    Note over Client: Old Renfrewshire Flow (Selenium + Cloudflare bypass)
    Client->>CouncilSite: Initialize browser via Selenium
    CouncilSite-->>Client: Cloudflare challenge
    Client->>Client: Cloudflare bypass logic
    Client->>CouncilSite: Dynamic interaction + wait
    CouncilSite-->>Client: Rendered HTML
    Client->>Client: Parse HTML sections
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

fix: Islington, Worcester #1763: Removes web_driver field from the same FolkestoneandHytheDistrictCouncil test entry.
Fix: Updated Winchester council to work with their new website #1743: Modifies council scrapers to parse card-based HTML structures similarly to the Folkestone and Hythe changes.
Dec release #1757: Adds documentation for the SouthLanarkshireCouncil parse_data method alongside the skip logic being introduced here.

Suggested reviewers

dp247

Poem

🐰 Four councils fixed with nimble code,
No more Selenium's heavy load!
ICS files hop and skip with glee,
UPRNs fetch with API spree,
Bin schedules hopping ever free! 🗑️✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'fix: Council Fix Pack - January 2026' is vague and generic, using 'Fix Pack' terminology that doesn't convey specific information about the changeset.	Consider a more specific title that highlights the primary change, such as 'fix: Update council scrapers for Renfrewshire, South Lanarkshire, Newark and Sherwood, and Folkestone and Hythe'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	The code changes directly address all four linked issues: Renfrewshire scraper reworked to remove Selenium (#1500), South Lanarkshire parsing logic fixed (#1771), Newark and Sherwood scraper updated (#1777), and Folkestone and Hythe scraper rewritten (#1760).
Out of Scope Changes check	✅ Passed	All changes are scoped to fixing the four mentioned councils and updating related documentation; no out-of-scope modifications are present.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fix: robbrad#1777 - fix: Newark and Sherwood District Council

fix: robbrad#1760 - Folkstone and Hythe District Council

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

wiki/Councils.md (1)

85-86: Remove duplicate Cumberland entry from contents list and clarify the two council sections.

The contents list at lines 85–86 contains a duplicate - [Cumberland](#cumberland) entry. Additionally, the file has two separate sections both titled "### Cumberland" (one for CumberlandAllerdaleCouncil using postcode/house number parameters, another for CumberlandCouncil using UPRN). Since Allerdale merged into Cumberland, clarify which entry users should use—either remove the legacy CumberlandAllerdaleCouncil entry, add a deprecation note, or use distinct section headings to differentiate them.

🧹 Nitpick comments (3)

uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py (2)
39-55: Consider error handling for ICS parsing.

The events() function from icalevents could fail if the ICS file is malformed or the URL is unreachable. While the current implementation will naturally propagate exceptions, consider whether more specific error messages would improve debugging.
Potential improvement
-        # Parse ICS calendar
-        upcoming_events = events(ics_url, start=now, end=future)
+        # Parse ICS calendar
+        try:
+            upcoming_events = events(ics_url, start=now, end=future)
+        except Exception as e:
+            raise ValueError(
+                f"Failed to parse ICS calendar for UPRN {user_uprn}: {str(e)}"
+            ) from e
This would provide clearer error messages if ICS parsing fails, though the existing behavior of propagating the original exception is also acceptable.
57-59: Use date_format constant for consistency.

The hardcoded date format string "%d/%m/%Y" should ideally reference the date_format constant to ensure consistency across the codebase.
🔎 Suggested improvement
         bindata["bins"].sort(
-            key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y")
+            key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
         )
This ensures that if date_format is ever changed, the sorting logic remains consistent.
uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py (1)
4-4: Static analysis flagged star imports.

Static analysis tools flag datetime and date_format as potentially undefined because they're imported via star import from common. While this is a codebase-wide pattern, explicit imports would improve clarity and help static analysis tools.
Optional refactor to explicit imports
+from datetime import datetime
 from uk_bin_collection.uk_bin_collection.common import *
Note: This assumes date_format is intentionally provided by the common module's star export.
Also applies to: 38-38, 42-42

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83cce53 and f14840a.

📒 Files selected for processing (6)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py
uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py
uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py
uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py
wiki/Councils.md

🧰 Additional context used

🧬 Code graph analysis (1)

uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py (1)

uk_bin_collection/uk_bin_collection/common.py (1)

check_uprn (67-78)

🪛 Ruff (0.14.10)

uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py

20-20: check_uprn may be undefined, or defined from star imports

(F405)

25-25: requests may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py

22-22: check_uprn may be undefined, or defined from star imports

(F405)

28-28: Probable use of requests call without timeout

(S113)

35-37: Avoid specifying long messages outside the exception class

(TRY003)

53-53: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py

96-96: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py

38-38: datetime may be undefined, or defined from star imports

(F405)

42-42: date_format may be undefined, or defined from star imports

(F405)

🔇 Additional comments (11)

wiki/Councils.md (6)

160-160: ✓ Isle of Anglesey entry is well integrated.

The new council entry is properly alphabetized in the contents list and follows the documentation format. The note clarifying that either UPRN or postcode/house number can be passed, and that the URL is not used, is helpful.

Also applies to: 2148-2158

1646-1654: ✓ Epsom and Ewell parameters and Selenium requirement properly documented.

The updated command, parameter list, and explicit note about Selenium requirement are clear and consistent with other Selenium-dependent councils in the documentation.

3141-3147: ✓ Renfrewshire scraper modernization properly documented.

The updated URL and parameter set (UPRN-only, no Selenium) align with the stated PR objective. The removal of -p and -w parameters and the updated note clearly communicate the breaking change. Users transitioning from the old approach will need to adjust their setup.

1753-1762: ✓ Folkestone and Hythe entry is properly documented.

The UPRN-based approach without Selenium is clearly presented and consistent with other modernized council entries.

2526-2532: ✓ Merton and North Hertfordshire entries updated consistently.

Both councils are now documented with the UPRN-based approach (-s -u parameters) and updated URLs. The changes align with the PR's modernization strategy and are properly documented.

Also applies to: 2809-2815

1-2: Verify that input.json has been updated to match these documentation changes.

This file is marked as auto-generated from uk_bin_collection/tests/input.json. The marked changes should be reflected in input.json to ensure they persist after regeneration. If these changes were made directly to Councils.md without updating the source file, they will be lost during the next regeneration.

Confirm that all council entry updates (Isle of Anglesey, Cumberland, Epsom and Ewell, Merton, North Hertfordshire, Renfrewshire) have corresponding updates in input.json.

uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py (2)

74-75: LGTM! Guard prevents index error on "Area" rows.

This addition directly addresses issue #1771 by skipping rows where schedule_type is "Area", which likely lack the expected table cell structure that would cause the "list index out of range" error at line 77 when accessing .split(" ")[1].

95-97: LGTM! Improved readability.

The reformatting of the date formatting statement improves code readability with no functional changes.

uk_bin_collection/tests/input.json (1)

1973-1980: LGTM! Test data updated for uprn-based flow.

The test configuration correctly reflects the migration from Selenium-based scraping to the uprn-based ICS retrieval approach:

Replaced house_number, paon, and postcode with uprn

Removed web_driver requirement

Updated wiki_note to reference FindMyAddress for UPRN lookup

uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py (2)

1-5: LGTM! Imports updated for ICS-based approach.

The imports correctly reflect the migration from Selenium to a direct HTTP + ICS parsing approach:

requests for HTTP calls

icalevents for ICS calendar parsing

datetime utilities for date handling

21-23: LGTM! UPRN parameter handling is correct.

The code properly extracts and validates the UPRN parameter using the check_uprn helper function from the common module.

coderabbitai · 2025-12-26T23:54:53Z

uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py

+        session = requests.session()
+        response = session.get(URI1)
+        response = session.get(URI2)


⚠️ Potential issue | 🟠 Major

Add HTTP error handling and clarify the purpose of URI1.

The response from the first GET request to URI1 is immediately overwritten by the second request to URI2. If URI1 is required to establish session state (e.g., cookies), this should be documented. If it's unnecessary, remove it to reduce latency.

Additionally, both GET requests lack error handling—network failures or non-2xx responses will raise unhandled exceptions and break the integration.

🔎 Proposed fix with error handling

# Make the GET request session = requests.session() -response = session.get(URI1) -response = session.get(URI2) +session.get(URI1) # Initialize session state if needed +response = session.get(URI2) +response.raise_for_status() # Raise HTTPError for bad status codes

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

session = requests.session()

response = session.get(URI1)

response = session.get(URI2)

session = requests.session()

session.get(URI1) # Initialize session state if needed

response = session.get(URI2)

response.raise_for_status() # Raise HTTPError for bad status codes

🤖 Prompt for AI Agents

In uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py around lines 26 to 28, the first session.get(URI1) call is immediately overwritten by session.get(URI2) (losing the first response) and neither request has any HTTP/network error handling; decide whether URI1 is required for session state (if so, add a comment explaining its purpose and preserve its response or at least its effect on session cookies) or remove the unnecessary call to reduce latency, and add robust error handling for both requests by wrapping each session.get in a try/except for requests.exceptions.RequestException, using a timeout, and calling response.raise_for_status() (or handling non-2xx responses) so network errors and HTTP errors are caught and handled gracefully instead of crashing.

I'm guessing that response is reset as this is what the website requires to function?

@dp247 Yes, you're correct! The first request to URI1 is setting up the session state (likely cookies or other session data) that the second request to URI2 requires to function properly. The response from the first request isn't used—only its side effect on the session object matters.

This is a common pattern when dealing with web applications where an API endpoint requires prior session initialization from visiting the main page first.

However, I'd still recommend:

Adding a comment to document this behavior for future maintainers:

# Make the GET request session = requests.session() session.get(URI1) # Initialize session state (cookies) required by URI2 response = session.get(URI2)

Adding error handling to make the integration more robust against network failures or HTTP errors:

session = requests.session() session.get(URI1) # Initialize session state (cookies) required by URI2 response = session.get(URI2) response.raise_for_status() # Raise HTTPError for bad status codes

The raise_for_status() call will help catch issues early if the API returns error responses, making debugging easier if the integration breaks again in the future.

coderabbitai · 2025-12-26T23:54:53Z

uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py

+        for collection in collections:
+            bin_type = collection.find("h3")
+            bin_type = bin_type.text
+            next_collection = collection.find("p", {"class": "service-next"})
+            date_str = next_collection.text.split(":")[1].split("(")[0].strip()
+            dt = datetime.strptime(date_str, "%A, %d %B %Y")
+
+            dict_data = {
+                "type": bin_type.strip(),
+                "collectionDate": dt.strftime(date_format),
+            }
+            bindata["bins"].append(dict_data)


⚠️ Potential issue | 🔴 Critical

Add defensive error handling for HTML parsing and date extraction.

The extraction loop assumes the HTML structure is always present and correctly formatted. If the structure changes or elements are missing, unhandled exceptions will occur:

Lines 34-35: collection.find("h3") can return None, causing AttributeError when accessing .text

Lines 36-37: collection.find("p", ...) can return None, and the string split operations assume specific format

Line 38: datetime.strptime() can raise ValueError if the date format differs

Since this fix addresses issue #1760 where the integration became unavailable, robust error handling is essential to prevent similar failures.

🔎 Proposed fix with defensive checks

collections = soup.find_all("article", {"class": "service-card"}) for collection in collections: - bin_type = collection.find("h3") - bin_type = bin_type.text - next_collection = collection.find("p", {"class": "service-next"}) - date_str = next_collection.text.split(":")[1].split("(")[0].strip() - dt = datetime.strptime(date_str, "%A, %d %B %Y") - - dict_data = { - "type": bin_type.strip(), - "collectionDate": dt.strftime(date_format), - } - bindata["bins"].append(dict_data) + bin_type_elem = collection.find("h3") + if not bin_type_elem: + continue + + next_collection = collection.find("p", {"class": "service-next"}) + if not next_collection or ":" not in next_collection.text: + continue + + try: + date_str = next_collection.text.split(":")[1].split("(")[0].strip() + dt = datetime.strptime(date_str, "%A, %d %B %Y") + except (ValueError, IndexError): + continue + + dict_data = { + "type": bin_type_elem.text.strip(), + "collectionDate": dt.strftime(date_format), + } + bindata["bins"].append(dict_data)

🧰 Tools

🪛 Ruff (0.14.10)

38-38: datetime may be undefined, or defined from star imports

(F405)

42-42: date_format may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents

In uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py around lines 33 to 44, the loop assumes HTML nodes and date format always exist; add defensive parsing: null-check the result of collection.find("h3") and collection.find("p", {"class":"service-next"}) before accessing .text, validate the split result has the expected parts (guard against missing ":" or "("), parse the date inside a try/except catching ValueError, and if any step fails log a warning/debug with context (collection index or raw HTML) and skip appending that bin entry instead of letting an exception bubble up; only build and append dict_data when type text and parsed date are confirmed valid.

Wouldn't hurt to add in the suggested error handling

@dp247, here's the complete code with defensive error handling to prevent failures when the HTML structure changes:

collections = soup.find_all("article", {"class": "service-card"}) for collection in collections: bin_type_elem = collection.find("h3") if not bin_type_elem: continue next_collection = collection.find("p", {"class": "service-next"}) if not next_collection or ":" not in next_collection.text: continue try: date_str = next_collection.text.split(":")[1].split("(")[0].strip() dt = datetime.strptime(date_str, "%A, %d %B %Y") except (ValueError, IndexError): continue dict_data = { "type": bin_type_elem.text.strip(), "collectionDate": dt.strftime(date_format), } bindata["bins"].append(dict_data)

This approach:

Checks if h3 and p elements exist before accessing their text

Validates the expected string format (presence of :)

Wraps date parsing in try/except to handle format mismatches

Silently skips malformed entries rather than crashing the entire collection

uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py

uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py

actions-user and others added 2 commits December 10, 2025 12:09

docs: Update Councils.md from input.json

42e31f8

fix: Renfrewshire Council

71d92a1

fix: robbrad#1500 - Renfrewshire Council

fix: South Lanarkshire Council

243afae

fix: robbrad#1771 - South Lanarkshire Council

m26dvd and others added 3 commits December 15, 2025 12:01

fix: Newark and Sherwood District Council

4b293ef

fix: robbrad#1777 - fix: Newark and Sherwood District Council

fix: Folkstone and Hythe District Council

90f129b

fix: robbrad#1760 - Folkstone and Hythe District Council

docs: Update Councils.md from input.json

f14840a

m26dvd marked this pull request as ready for review December 26, 2025 23:52

coderabbitai bot reviewed Dec 26, 2025

View reviewed changes

fix: Council Fix Pack - January 2026 #1772

Are you sure you want to change the base?

fix: Council Fix Pack - January 2026 #1772

Conversation

m26dvd commented Dec 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

codecov bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dp247 Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dp247 Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

m26dvd commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

codecov bot commented Dec 10, 2025 •

edited

Loading

coderabbitai bot commented Dec 10, 2025 •

edited

Loading

coderabbitai bot Dec 26, 2025 •

edited

Loading

coderabbitai bot Dec 26, 2025 •

edited

Loading