A robust, configurable alternative to Readability.js for extracting main content from web pages. Supports Node.js (with jsdom) and browser environments, including multi-column layouts, noise removal, and Markdown-ready text output.
-
Extracts main content intelligently from articles, sections, and divs.
-
Handles multi-column layouts and merges content in reading order.
-
Removes noise:
script,style,noscript,iframe,.ads,.advertisement. -
Returns API-compatible objects with:
title– Document titletextContent– Cleaned main textcontent– HTML of the main container(s)length– Character count oftextContentexcerpt– First N characters of text
-
Markdown-ready text formatting (headings, paragraphs, lists, code, figures).
-
Fully configurable: ignored classes, thresholds, tag boosts, excerpt length.
-
Compatible with Node.js (via
jsdom) and browser/Chrome extensions.
npm install jsdomInclude the Readability.js file in your project.
import Readability from './Readability.js';
import fs from 'fs';
const html = fs.readFileSync('example.html', 'utf-8');
const reader = new Readability(html, {
ignoreClasses: /nav|sidebar|ads/i,
minTextLength: 60,
columnMinText: 40,
columnThreshold: 0.25,
tagBoosts: { article: 1.7, main: 1.5 },
excerptLength: 300
});
const article = reader.parse();
console.log("Title:", article.title);
console.log("Excerpt:", article.excerpt);
console.log("Text content:\n", article.textContent);// In a Chrome extension content script or browser console
const reader = new Readability(document, {
ignoreClasses: /nav|sidebar|ads/i,
minTextLength: 60,
columnMinText: 40,
columnThreshold: 0.25,
tagBoosts: { article: 1.7, main: 1.5 },
excerptLength: 300
});
const article = reader.parse();
console.log("Title:", article.title);
console.log("Excerpt:", article.excerpt);
console.log("Text content:\n", article.textContent);| Option | Type | Default | Description | ||||||
|---|---|---|---|---|---|---|---|---|---|
ignoreClasses |
RegExp | `/aside | nav | footer | header | sidebar | ads | advertisement/i` | Regex to ignore unwanted elements |
minTextLength |
Number | 50 |
Minimum text length for main content candidates | ||||||
columnMinText |
Number | 30 |
Minimum text length for column children | ||||||
columnThreshold |
Number | 0.3 |
Fraction of max column score to include | ||||||
tagBoosts |
Object | { article: 1.5, section: 1.2, main: 1.3 } |
Boost multipliers for tags | ||||||
excerptLength |
Number | 200 |
Number of characters in returned excerpt |
The parse() method returns an object:
{
title: "Page Title",
textContent: "Cleaned main text of the article...",
content: "<article>HTML content...</article>",
length: 1234,
excerpt: "First 200 characters of main text..."
}If extraction fails, parse() returns null.
<h1>–<h6>→ Markdown headings<p>→ Paragraphs<li>→ List items (-)<pre>/<code>→ Code blocks<figure>→if<img>and optional<figcaption>exist
| Feature / Aspect | Readability.js | Readability-Alternative 1.0 |
|---|---|---|
| Environment Support | Browser only | Browser + Node.js (via jsdom) |
| Noise Removal | Basic (scripts/styles) | Enhanced (scripts, styles, ads, iframes, custom ignored classes) |
| Multi-column Detection | No | Yes, intelligently merges columns in reading order |
| Markdown-ready Text Output | No | Yes, handles headings, lists, code blocks, figures |
| Configurable Tag Boosts | No | Yes, supports boosting article, section, main tags |
| Configurable Thresholds | No | Yes, minimum text length, column thresholds, excerpt length |
| Excerpt Generation | No | Yes, configurable excerpt from main content |
| API Output | {title, textContent} |
{title, textContent, content, length, excerpt} |
| Resiliency / Edge Cases | Moderate | High: guards against empty nodes, malformed HTML, missing elements |
| Installation | Built-in in browser | Node.js: requires jsdom, browser: works directly |
| Customization | Limited | High: regex for ignored classes, tag boosts, thresholds, excerpt length |
- Node.js Support — Extract content server-side or in scripts.
- Multi-column & Merge-aware — Preserves reading order.
- Robust Noise Removal — Removes scripts, ads, and custom unwanted elements.
- Markdown-friendly Output — Ready for export or processing.
- Configurable & Extensible — Fine-tune thresholds, boosts, and ignored elements.
- Production-Ready — Handles empty or malformed DOMs gracefully.
- Node.js version requires
jsdom. - Browser version works with any
DocumentorHTMLElement(e.g.,document). - Both versions share the same config structure and API, enabling consistent behavior across environments.
MIT License — free to use and modify.