Skip to content

Missing UTF-8 decode that passes through BOM but replaces surrogates #17

@domenic

Description

@domenic

I'm trying to use utf8.js in jsdom in preference to native TextDecoder()/TextEncoder() because of the correctness issues you've fixed in this library.

However, I don't think it has everything I need.

Whereas:

  • utf8fromString vs. utf8fromStringLoose should always be equivalent because input is always a scalar value string.
  • utf8toString: decode, pass through BOM, throw on surrogates
  • utf8toStringLoose: decode, skip BOM, replace surrogates

So I believe I'm missing a "decode, pass through BOM, replace surrogates" operation. Would you be up for providing it?

(I also find the terminology of "loose" vs. not quite confusing, especially since it seems to mean different things on encode vs. decode... Just giving clear documentation in the README would help here.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions