Skip to content

Feed fetching #4

@devongovett

Description

@devongovett

The feed fetcher should run as a separate process in the background and refresh each feed every given interval. We should also support pubsubhubbub for push updates, but that can come later. When updates are found, they should be stored in MongoDB.

Some interesting modules to look at:

  1. https://github.com/danmactough/node-feedparser
  2. https://github.com/fent/node-feedsub
  3. https://github.com/andris9/pubsubhubbub
  4. https://github.com/technoweenie/nubnub
  5. https://github.com/superfeedr/superfeedr-node

Note that we need to be able to support a wide variety of feed types, so the more robust the modules we use, the better.

TODO

  • Support for handling item updates
  • Handle feed items without proper guids
  • Thoroughly test tons of real world feeds, some of them collected here and submit patches to node-feedparser if necessary. We also need to handle most of the weird situations other developers have had come up in the real world, many of those listed here.
  • Pubsubhubbub and RSS Cloud support, prioritized in that order. Those feeds will not be polled (as frequently?) but instead use push to get updates.

Ideally, the feed fetcher/parser itself would be a reusable module that allows subscription to a feed, and then provides events when new items are found. Then we'd have an application specific fetcher that would use that module to receive and store posts in the database. The API (i.e. not the fetcher) will also need to access the module that loads actual feeds in order to add them when someone subscribes to a new feed that we're not already crawling, so separating those things out is important.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions