-
Notifications
You must be signed in to change notification settings - Fork 186
fix: Council Fix Pack - January 2026 #1772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
fix: robbrad#1500 - Renfrewshire Council
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1772 +/- ##
=======================================
Coverage 86.81% 86.81%
=======================================
Files 9 9
Lines 1138 1138
=======================================
Hits 988 988
Misses 150 150 ☔ View full report in Codecov by Sentry. |
fix: robbrad#1771 - South Lanarkshire Council
📝 WalkthroughWalkthroughThis PR addresses issues across four council scrapers by replacing legacy approaches with modern data retrieval methods: removing Selenium web driver configuration from Folkestone and Hythe, introducing UPRN-based API fetching for Newark and Sherwood, replacing Selenium with direct HTTP and ICS parsing for Renfrewshire, adding error guards for South Lanarkshire, and updating documentation for affected councils. Changes
Sequence DiagramssequenceDiagram
participant Client
participant CouncilSite as Council Website
participant ICSServer as ICS Server
participant Parser as ICS Parser
rect rgb(220, 245, 220)
Note over Client,Parser: New Renfrewshire Flow
Client->>CouncilSite: GET view URL<br/>(with UPRN)
CouncilSite-->>Client: HTML page with ICS link
Client->>Client: Locate ICS link on page
Client->>ICSServer: GET ICS file
ICSServer-->>Client: ICS calendar data
Client->>Parser: Parse ICS events<br/>(next 365 days)
Parser-->>Client: Bin type + collection date pairs
Client->>Client: Sort by collectionDate
end
rect rgb(245, 220, 220)
Note over Client: Old Renfrewshire Flow (Selenium + Cloudflare bypass)
Client->>CouncilSite: Initialize browser via Selenium
CouncilSite-->>Client: Cloudflare challenge
Client->>Client: Cloudflare bypass logic
Client->>CouncilSite: Dynamic interaction + wait
CouncilSite-->>Client: Rendered HTML
Client->>Client: Parse HTML sections
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
fix: robbrad#1777 - fix: Newark and Sherwood District Council
fix: robbrad#1760 - Folkstone and Hythe District Council
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
wiki/Councils.md (1)
85-86: Remove duplicate Cumberland entry from contents list and clarify the two council sections.The contents list at lines 85–86 contains a duplicate
- [Cumberland](#cumberland)entry. Additionally, the file has two separate sections both titled "### Cumberland" (one for CumberlandAllerdaleCouncil using postcode/house number parameters, another for CumberlandCouncil using UPRN). Since Allerdale merged into Cumberland, clarify which entry users should use—either remove the legacy CumberlandAllerdaleCouncil entry, add a deprecation note, or use distinct section headings to differentiate them.
🧹 Nitpick comments (3)
uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py (2)
39-55: Consider error handling for ICS parsing.The
events()function fromicaleventscould fail if the ICS file is malformed or the URL is unreachable. While the current implementation will naturally propagate exceptions, consider whether more specific error messages would improve debugging.Potential improvement
- # Parse ICS calendar - upcoming_events = events(ics_url, start=now, end=future) + # Parse ICS calendar + try: + upcoming_events = events(ics_url, start=now, end=future) + except Exception as e: + raise ValueError( + f"Failed to parse ICS calendar for UPRN {user_uprn}: {str(e)}" + ) from eThis would provide clearer error messages if ICS parsing fails, though the existing behavior of propagating the original exception is also acceptable.
57-59: Usedate_formatconstant for consistency.The hardcoded date format string
"%d/%m/%Y"should ideally reference thedate_formatconstant to ensure consistency across the codebase.🔎 Suggested improvement
bindata["bins"].sort( - key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y") + key=lambda x: datetime.strptime(x.get("collectionDate"), date_format) )This ensures that if
date_formatis ever changed, the sorting logic remains consistent.uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py (1)
4-4: Static analysis flagged star imports.Static analysis tools flag
datetimeanddate_formatas potentially undefined because they're imported via star import fromcommon. While this is a codebase-wide pattern, explicit imports would improve clarity and help static analysis tools.Optional refactor to explicit imports
+from datetime import datetime from uk_bin_collection.uk_bin_collection.common import *Note: This assumes
date_formatis intentionally provided by thecommonmodule's star export.Also applies to: 38-38, 42-42
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.pyuk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.pyuk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.pyuk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.pywiki/Councils.md
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
check_uprn(67-78)
🪛 Ruff (0.14.10)
uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py
20-20: check_uprn may be undefined, or defined from star imports
(F405)
25-25: requests may be undefined, or defined from star imports
(F405)
uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py
22-22: check_uprn may be undefined, or defined from star imports
(F405)
28-28: Probable use of requests call without timeout
(S113)
35-37: Avoid specifying long messages outside the exception class
(TRY003)
53-53: date_format may be undefined, or defined from star imports
(F405)
uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py
96-96: date_format may be undefined, or defined from star imports
(F405)
uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py
38-38: datetime may be undefined, or defined from star imports
(F405)
42-42: date_format may be undefined, or defined from star imports
(F405)
🔇 Additional comments (11)
wiki/Councils.md (6)
160-160: ✓ Isle of Anglesey entry is well integrated.The new council entry is properly alphabetized in the contents list and follows the documentation format. The note clarifying that either UPRN or postcode/house number can be passed, and that the URL is not used, is helpful.
Also applies to: 2148-2158
1646-1654: ✓ Epsom and Ewell parameters and Selenium requirement properly documented.The updated command, parameter list, and explicit note about Selenium requirement are clear and consistent with other Selenium-dependent councils in the documentation.
3141-3147: ✓ Renfrewshire scraper modernization properly documented.The updated URL and parameter set (UPRN-only, no Selenium) align with the stated PR objective. The removal of
-pand-wparameters and the updated note clearly communicate the breaking change. Users transitioning from the old approach will need to adjust their setup.
1753-1762: ✓ Folkestone and Hythe entry is properly documented.The UPRN-based approach without Selenium is clearly presented and consistent with other modernized council entries.
2526-2532: ✓ Merton and North Hertfordshire entries updated consistently.Both councils are now documented with the UPRN-based approach (
-s -uparameters) and updated URLs. The changes align with the PR's modernization strategy and are properly documented.Also applies to: 2809-2815
1-2: Verify that input.json has been updated to match these documentation changes.This file is marked as auto-generated from
uk_bin_collection/tests/input.json. The marked changes should be reflected in input.json to ensure they persist after regeneration. If these changes were made directly to Councils.md without updating the source file, they will be lost during the next regeneration.Confirm that all council entry updates (Isle of Anglesey, Cumberland, Epsom and Ewell, Merton, North Hertfordshire, Renfrewshire) have corresponding updates in input.json.
uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py (2)
74-75: LGTM! Guard prevents index error on "Area" rows.This addition directly addresses issue #1771 by skipping rows where
schedule_typeis "Area", which likely lack the expected table cell structure that would cause the "list index out of range" error at line 77 when accessing.split(" ")[1].
95-97: LGTM! Improved readability.The reformatting of the date formatting statement improves code readability with no functional changes.
uk_bin_collection/tests/input.json (1)
1973-1980: LGTM! Test data updated for uprn-based flow.The test configuration correctly reflects the migration from Selenium-based scraping to the uprn-based ICS retrieval approach:
- Replaced
house_number,paon, andpostcodewithuprn- Removed
web_driverrequirement- Updated
wiki_noteto reference FindMyAddress for UPRN lookupuk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py (2)
1-5: LGTM! Imports updated for ICS-based approach.The imports correctly reflect the migration from Selenium to a direct HTTP + ICS parsing approach:
requestsfor HTTP callsicaleventsfor ICS calendar parsingdatetimeutilities for date handling
21-23: LGTM! UPRN parameter handling is correct.The code properly extracts and validates the UPRN parameter using the
check_uprnhelper function from the common module.
| session = requests.session() | ||
| response = session.get(URI1) | ||
| response = session.get(URI2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add HTTP error handling and clarify the purpose of URI1.
The response from the first GET request to URI1 is immediately overwritten by the second request to URI2. If URI1 is required to establish session state (e.g., cookies), this should be documented. If it's unnecessary, remove it to reduce latency.
Additionally, both GET requests lack error handling—network failures or non-2xx responses will raise unhandled exceptions and break the integration.
🔎 Proposed fix with error handling
# Make the GET request
session = requests.session()
-response = session.get(URI1)
-response = session.get(URI2)
+session.get(URI1) # Initialize session state if needed
+response = session.get(URI2)
+response.raise_for_status() # Raise HTTPError for bad status codes📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| session = requests.session() | |
| response = session.get(URI1) | |
| response = session.get(URI2) | |
| session = requests.session() | |
| session.get(URI1) # Initialize session state if needed | |
| response = session.get(URI2) | |
| response.raise_for_status() # Raise HTTPError for bad status codes |
🤖 Prompt for AI Agents
In
uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py
around lines 26 to 28, the first session.get(URI1) call is immediately
overwritten by session.get(URI2) (losing the first response) and neither request
has any HTTP/network error handling; decide whether URI1 is required for session
state (if so, add a comment explaining its purpose and preserve its response or
at least its effect on session cookies) or remove the unnecessary call to reduce
latency, and add robust error handling for both requests by wrapping each
session.get in a try/except for requests.exceptions.RequestException, using a
timeout, and calling response.raise_for_status() (or handling non-2xx responses)
so network errors and HTTP errors are caught and handled gracefully instead of
crashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing that response is reset as this is what the website requires to function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dp247 Yes, you're correct! The first request to URI1 is setting up the session state (likely cookies or other session data) that the second request to URI2 requires to function properly. The response from the first request isn't used—only its side effect on the session object matters.
This is a common pattern when dealing with web applications where an API endpoint requires prior session initialization from visiting the main page first.
However, I'd still recommend:
- Adding a comment to document this behavior for future maintainers:
# Make the GET request
session = requests.session()
session.get(URI1) # Initialize session state (cookies) required by URI2
response = session.get(URI2)- Adding error handling to make the integration more robust against network failures or HTTP errors:
session = requests.session()
session.get(URI1) # Initialize session state (cookies) required by URI2
response = session.get(URI2)
response.raise_for_status() # Raise HTTPError for bad status codesThe raise_for_status() call will help catch issues early if the API returns error responses, making debugging easier if the integration breaks again in the future.
| for collection in collections: | ||
| bin_type = collection.find("h3") | ||
| bin_type = bin_type.text | ||
| next_collection = collection.find("p", {"class": "service-next"}) | ||
| date_str = next_collection.text.split(":")[1].split("(")[0].strip() | ||
| dt = datetime.strptime(date_str, "%A, %d %B %Y") | ||
|
|
||
| dict_data = { | ||
| "type": bin_type.strip(), | ||
| "collectionDate": dt.strftime(date_format), | ||
| } | ||
| bindata["bins"].append(dict_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add defensive error handling for HTML parsing and date extraction.
The extraction loop assumes the HTML structure is always present and correctly formatted. If the structure changes or elements are missing, unhandled exceptions will occur:
- Lines 34-35:
collection.find("h3")can returnNone, causingAttributeErrorwhen accessing.text - Lines 36-37:
collection.find("p", ...)can returnNone, and the string split operations assume specific format - Line 38:
datetime.strptime()can raiseValueErrorif the date format differs
Since this fix addresses issue #1760 where the integration became unavailable, robust error handling is essential to prevent similar failures.
🔎 Proposed fix with defensive checks
collections = soup.find_all("article", {"class": "service-card"})
for collection in collections:
- bin_type = collection.find("h3")
- bin_type = bin_type.text
- next_collection = collection.find("p", {"class": "service-next"})
- date_str = next_collection.text.split(":")[1].split("(")[0].strip()
- dt = datetime.strptime(date_str, "%A, %d %B %Y")
-
- dict_data = {
- "type": bin_type.strip(),
- "collectionDate": dt.strftime(date_format),
- }
- bindata["bins"].append(dict_data)
+ bin_type_elem = collection.find("h3")
+ if not bin_type_elem:
+ continue
+
+ next_collection = collection.find("p", {"class": "service-next"})
+ if not next_collection or ":" not in next_collection.text:
+ continue
+
+ try:
+ date_str = next_collection.text.split(":")[1].split("(")[0].strip()
+ dt = datetime.strptime(date_str, "%A, %d %B %Y")
+ except (ValueError, IndexError):
+ continue
+
+ dict_data = {
+ "type": bin_type_elem.text.strip(),
+ "collectionDate": dt.strftime(date_format),
+ }
+ bindata["bins"].append(dict_data)🧰 Tools
🪛 Ruff (0.14.10)
38-38: datetime may be undefined, or defined from star imports
(F405)
42-42: date_format may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
In
uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py
around lines 33 to 44, the loop assumes HTML nodes and date format always exist;
add defensive parsing: null-check the result of collection.find("h3") and
collection.find("p", {"class":"service-next"}) before accessing .text, validate
the split result has the expected parts (guard against missing ":" or "("),
parse the date inside a try/except catching ValueError, and if any step fails
log a warning/debug with context (collection index or raw HTML) and skip
appending that bin entry instead of letting an exception bubble up; only build
and append dict_data when type text and parsed date are confirmed valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't hurt to add in the suggested error handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dp247, here's the complete code with defensive error handling to prevent failures when the HTML structure changes:
collections = soup.find_all("article", {"class": "service-card"})
for collection in collections:
bin_type_elem = collection.find("h3")
if not bin_type_elem:
continue
next_collection = collection.find("p", {"class": "service-next"})
if not next_collection or ":" not in next_collection.text:
continue
try:
date_str = next_collection.text.split(":")[1].split("(")[0].strip()
dt = datetime.strptime(date_str, "%A, %d %B %Y")
except (ValueError, IndexError):
continue
dict_data = {
"type": bin_type_elem.text.strip(),
"collectionDate": dt.strftime(date_format),
}
bindata["bins"].append(dict_data)This approach:
- Checks if
h3andpelements exist before accessing their text - Validates the expected string format (presence of
:) - Wraps date parsing in try/except to handle format mismatches
- Silently skips malformed entries rather than crashing the entire collection
BREAKING CHANGES
fix: #1500 - Renfrewshire Council - This scraper has been re-developed and now uses the UPRN value of your address only. It also no longer requires Selenium
Fixes
fix: #1771 - South Lanarkshire Council
fix: #1777 - Newark and Sherwood District Council
fix: #1760 - Folkstone and Hythe District Council
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.