Missing UTF-8 decode that passes through BOM but replaces surrogates

I'm trying to use utf8.js in jsdom in preference to native `TextDecoder()`/`TextEncoder()` because of the correctness issues you've fixed in this library.

However, I don't think it has everything I need.

- Spec [UTF-8 decode](https://encoding.spec.whatwg.org/#utf-8-decode) -> skip BOM, replace surrogates
- Spec [UTF-8 decode without BOM](https://encoding.spec.whatwg.org/#utf-8-decode-without-bom) -> pass through BOM, replace surrogates
- Spec [UTF-8 decode without BOM or fail](https://encoding.spec.whatwg.org/#utf-8-decode-without-bom-or-fail) -> pass through BOM, throw on surrogates
- Spec [UTF-8 encode](https://encoding.spec.whatwg.org/#utf-8-encode) -> there are no variants, because it assumes the input is always a scalar value string

Whereas:

- `utf8fromString` vs. `utf8fromStringLoose` should always be equivalent because input is always a scalar value string.
- `utf8toString`: decode, pass through BOM, throw on surrogates
- `utf8toStringLoose`: decode, skip BOM, replace surrogates

So I believe I'm missing a "decode, pass through BOM, replace surrogates" operation. Would you be up for providing it?

(I also find the terminology of "loose" vs. not quite confusing, especially since it seems to mean different things on encode vs. decode... Just giving clear documentation in the README would help here.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missing UTF-8 decode that passes through BOM but replaces surrogates #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing UTF-8 decode that passes through BOM but replaces surrogates #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions