Skip to content

Conversation

@nonara
Copy link
Collaborator

@nonara nonara commented Nov 14, 2025

No description provided.

claude and others added 17 commits November 13, 2025 21:27
…#74)

Moved nodeHtmlParserConfig from config.ts to utilities.ts to break
the circular dependency where config.ts imported from utilities.ts
and utilities.ts imported nodeHtmlParserConfig from config.ts.

Resolves #74
Fixed transformer.js to correctly remove performance functions
regardless of CI environment variable. The previous logic was:
  if (process.env.CI || !cfg.removePerf) return node;
which would skip removal when CI=true.

Now correctly checks only:
  if (!cfg.removePerf) return node;

Also ensured ts-patch is properly installed so the transformer
actually runs during compilation.

Resolves #58
HTML is case-insensitive by spec, but the library was failing to process
tags with mixed case (e.g., <Br>, <DIV>, <Strong>). This caused translation
to stop prematurely, resulting in data loss.

Root cause: The HTML parser with lowerCaseTagName: false would preserve the
original case, but wouldn't recognize mixed-case void elements like <Br> as
self-closing tags. This caused content after the tag to be incorrectly parsed
as children of that tag.

Solution:
1. Set lowerCaseTagName: true in nodeHtmlParserConfig to normalize all tags
2. Updated visitor.ts to handle tags case-insensitively using toUpperCase()
3. Added comprehensive tests for various mixed-case tag scenarios

All translator lookups and element matching now work regardless of the
original HTML tag casing, preventing data loss when processing HTML with
inconsistent capitalization.

Resolves #63
Addresses #69 and #66 by documenting expected behavior:

- Explains paragraph spacing is standard markdown (blank lines between paragraphs)
- Documents line breaks vs paragraphs behavior
- Provides clear examples of maxConsecutiveNewlines option usage
- Shows how to control consecutive newlines for different use cases

Both issues are by-design behavior, not bugs. The maxConsecutiveNewlines
option (default: 3) already provides the control users need.
…fixes #34)

- Modified text node processing in visitor.ts to preserve trailing
  whitespace when followed by inline formatting elements
- Newlines before <b>, <strong>, <em>, <i>, <code>, <del> etc. are
  now correctly converted to spaces instead of being removed
- Only trim leading spaces if they were originally newlines
- Preserve trailing spaces in text nodes for proper inline spacing
Fixes #52, #24

- Set surroundingNewlines to false for block elements inside code blocks
- Add blockquote translator to defaultCodeBlockTranslators
- Explicitly set preserveWhitespace: true on CODE translator
- Ensures whitespace fidelity and clean newlines in code blocks
@nonara nonara merged commit 5c6d2c6 into master Nov 14, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants