-
Notifications
You must be signed in to change notification settings - Fork 55
Description
The documentation for how to scrape datasets shows that you can use either collect-mail --url or collect-mail --file when scraping IETF mailing lists, but only collect-mail --file when scraping W3C/3GPP/IEEE/etc mailing lists.
From my (admittedly limited) poking around in the code, it seems like mailman.collect_archive_from_url could be pretty simply rewritten using the code already in the documentation (linked above) to allow the --url option to work for all of the different mailing list types. Which I imagine might be useful for those who are coming to this package without necessarily wanting to download hundreds of mailing lists in one go?
(Please forgive if there is an existing issue about this or if I've wildly misunderstood the code in mailman.py, I've just been getting acquainted with the package! 😅 )