-
Notifications
You must be signed in to change notification settings - Fork 26
Design Documentation
Reader stores it's data in MongoDB using the Mongoose ODM. It consists of several different models, all found in the src/models folder. Each model is loaded by src/db.js, which also handles connecting to the database and several other shared database functions.
The models are:
-
User- holds data about a user account. Passwords are hashed using bcrypt. -
Feed- represents an RSS feed. A singleFeedinstance is shared by all users that subscribe to it. -
Post- represents a single post within aFeed, also shared by all users. -
Tag- represents a user's tag/label/folder that can be applied to both feeds and posts. Also used by the system to store state information, such as read status. -
Preference- used as a key-value store to hold preferences related to a feed or post for a user.
Users are marked as subscribing to a feed by adding the user/-/state/com.google/reading-list tag to the feed. When a user subscribes to a feed that no other users have subscribed to before, a new Feed is created and fetched in a background job by Kue (see below). If all users unsubscribe from a given feed, it and all it's posts are removed from the database to save space.
Users can also have their own titles associated with each feed, as they can on Google Reader. These are stored in a lookup table by user id in the Feed record.
The API server is what hosts the HTTP endpoints that clients connect to. It uses Express as a nice framework for this, and several other modules. The code for the API server lives in the src/api folder. Each of the files is actually a separate Express app, but they are merged into a single app at the end in src/api.js, which is the main file that is run in order to start the server. Authentication and user account stuff are in src/api/user.js, and src/session.js. Session data is stored in Redis. The other endpoints are grouped by type and should be fairly self explanatory. Mostly fetching and updating the database in various ways. If you're not familiar with MongoDB, that's fine. I'd review the Mongoose docs, and you'll get the hang of it pretty quickly.
The fetcher is what goes out and crawls feeds periodically, storing new updates in the database. It runs as a separate process from the HTTP API server, and multiple instances of the fetcher can be run simultaneously in order to better distribute load across the CPU. The fetcher synchronizes it's work using the Kue job queueing module and Redis.
When a feed is first subscribed to, a job is added to the queue and processed by an available worker process. When the worker is done, it queues another job some interval in the future. This interval is currently 10 minutes, but it should probably vary depending on the "velocity" of the feed's posting activity. It may also change when we have implemented Pubsubhubbub and perhaps RSSCloud, which are push systems that would remove the need to poll the feed.
In order to process the job, the fetcher loads the feed using the feedparser module, and merges the feed metadata and posts into the existing data.
- "stream" - a collection of posts found in either a tag or feed.
Reader uses Promises as an abstraction for asynchronous operations rather than normal callbacks. The codebase originally started out using plain callbacks and the async module to slightly tame them, but it got unwieldy, deeply nested, and hard to read. Promise based functions return an object that represents the value that the function will eventually return after it's asynchronous operation completes rather than accepting a callback to handle this event. This allows promises to do all sorts of interesting things since there is an actual value representing the state of the program. If you're interested in learning more about promises, you should read this article by James Coglan.
Testing is performed using the QUnit testing framework and qunit-cli to wrap it for terminal output. The tests are meant to be comprehensive and cover as many edge case conditions as possible. Please run the tests before submitting patches, and add more tests if you're adding new features.
Eventually, the tests should be able to run against other backends, including the Google Reader API while it is still running (until July). In order to make this happen, at least a subset of the tests will have to be less specific to this implementation, and rely only on HTTP output. However, it is also good to test the internals of the implementation, so those tests should remain.