-
Notifications
You must be signed in to change notification settings - Fork 26
Description
The feed fetcher should run as a separate process in the background and refresh each feed every given interval. We should also support pubsubhubbub for push updates, but that can come later. When updates are found, they should be stored in MongoDB.
Some interesting modules to look at:
- https://github.com/danmactough/node-feedparser
- https://github.com/fent/node-feedsub
- https://github.com/andris9/pubsubhubbub
- https://github.com/technoweenie/nubnub
- https://github.com/superfeedr/superfeedr-node
Note that we need to be able to support a wide variety of feed types, so the more robust the modules we use, the better.
TODO
- Support for handling item updates
- Handle feed items without proper guids
- Thoroughly test tons of real world feeds, some of them collected here and submit patches to node-feedparser if necessary. We also need to handle most of the weird situations other developers have had come up in the real world, many of those listed here.
- Pubsubhubbub and RSS Cloud support, prioritized in that order. Those feeds will not be polled (as frequently?) but instead use push to get updates.
Ideally, the feed fetcher/parser itself would be a reusable module that allows subscription to a feed, and then provides events when new items are found. Then we'd have an application specific fetcher that would use that module to receive and store posts in the database. The API (i.e. not the fetcher) will also need to access the module that loads actual feeds in order to add them when someone subscribes to a new feed that we're not already crawling, so separating those things out is important.