Skip to content

Conversation

@milahu
Copy link

@milahu milahu commented Dec 28, 2025

  • setup: add setup.py, move files to python module
  • config: dont read config from config.py
  • selenium_driverless: make it work on linux by default
  • download images - replace Add option to download images #26
  • download comments - fix [Feature request] Comments? #3
    • handle removed comments
    • clickable URLs in comments
  • add option to write the original "preloads" data to json files
    • rework or remove JSON_DATA_DIR
  • output filepath format: let me set the format of all file paths with string.Template like path/to/${blog_handle}/${post_slug}.md
  • post datetime format: should be ISO date format by default, with option to configure date format
  • add a page footer: Version ${iso_datetime_of_last_change_on_this_page} archived from <a href="${original_post_url}">${original_blog_domain}</a> (this should not use the scrape time to get reproducible output)
  • repost count ("restack count"). related: like_count_element = soup.select_one("a.post-ufi-button .label")
  • make optional: HTML output
  • rework HTML_TEMPLATE and ASSETS_DIR for HTML output
  • input faster: email + password
  • preserve original post html - related: content_element = soup.select_one("div.available-content")
  • remove post footer: ${blog_name} is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Subscribed
  • fix off-by-one error in for url in tqdm(self.post_urls, total=total):
  • remove NUM_POSTS_TO_SCRAPE
  • convert like_count from str to int

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant