chore: added crawler #1

johnburbridge · 2025-03-17T19:40:27Z

No description provided.

…mory and persistent cache - Modified clear_expired() to track unique URLs using a set - Changed SQL query to fetch URLs instead of just count - Updated test to verify cache state without relying on has() method - Ensures consistent behavior across Python versions

- Add RobotsParser class for parsing robots.txt files - Add SitemapParser class for parsing sitemap.xml files - Update Crawler to respect robots.txt and use sitemaps - Add command line options for robots.txt and sitemaps - Add unit tests for both parsers - Add lxml dependency for XML parsing

- Add max_subsitemaps parameter to limit number of subsitemaps processed - Add overall_timeout parameter to control maximum processing time - Implement concurrent processing of subsitemaps using asyncio - Update command line options to control sitemap processing - Update tests to work with enhanced sitemap parser

Update repository references from johnburbridge/scraper to spiralhous…

johnburbridge added 7 commits March 17, 2025 12:39

chore: added crawler

fbcc3b7

chore: added missing crawler

633fb54

test: fix timing issue with python 3.11

f83a537

text: added local site for integration testing

0fc117e

johnburbridge merged commit 6e4036a into main Mar 18, 2025
2 checks passed

johnburbridge added a commit that referenced this pull request Mar 21, 2025

Merge pull request #1 from spiralhouse/refactor-docs

4d97a95

Update repository references from johnburbridge/scraper to spiralhous…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: added crawler #1

chore: added crawler #1

Uh oh!

johnburbridge commented Mar 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore: added crawler #1

chore: added crawler #1

Uh oh!

Conversation

johnburbridge commented Mar 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants