XMLHttpRequest `.response` encoding error when `.responseType` is "arraybuffer" and synchronous

Consider the following code ([StackBlitz](https://stackblitz.com/edit/node-r2qjjkj2?file=index.js)) where `XHR_URL` links to [example.txt](https://gist.github.com/jeremy-code/19f2a087bcd74686cc68fd8657b43ec2) with content `Héllo 🌍`:

```js
import XMLHttpRequest from 'xmlhttprequest-ssl';

const XHR_URL =
  'https://gist.githubusercontent.com/jeremy-code/19f2a087bcd74686cc68fd8657b43ec2/raw/77bd7608adf37e79dcc6f760a0aa0f9ee8f28d32/example.txt';

const getXhrResponse = () => new TextDecoder().decode(xhr.response);

const xhr = new XMLHttpRequest();
xhr.responseType = 'arraybuffer';

xhr.open('GET', XHR_URL, /* async */ false);
xhr.send(null);
console.log(`Synchronous:  ${getXhrResponse()}`);

xhr.onload = () => {
  console.log(`Asynchronous: ${getXhrResponse()}`);
};
xhr.open('GET', XHR_URL, /* async */ true);
xhr.send(null);
```

This outputs:

```console
❯ node index.js
Synchronous:  H�llo <
Asynchronous: Héllo 🌍
```

While I can't verify the browser implementation of the synchronous call since all major browsers (WebKit, Firefox, Chrome) disable synchronous XHR with `.responseType` "arraybuffer", on WebKit, Firefox, Chrome it does output `Asynchronous: Héllo 🌍`.

I believe the issue is within this function's implementation

```js
var stringToBuffer = function(str) {
  const ab = new ArrayBuffer(str.length)
  const buf = Buffer.from(ab);
  for (let k = 0; k < str.length; k++)
    buf[k] = Number(str.charCodeAt(k));
  return buf;
}
```

which I believe results in two errors:

1. `.charCodeAt()` outputs the UTF-16 code unit, which I don't think is intended since it's being stored in a Buffer (which is internally a `Uint8Array`)
2. Iterating through string indices causes unexpected behavior with charCodeAt, see [String.prototype.charCodeAt() > Looping with codePointAt()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/codePointAt#looping_with_codepointat)

```js
const INPUT = "Héllo 🌍";

var stringToBuffer1 = function (str) {
  const ab = new ArrayBuffer(str.length);
  const buf = Buffer.from(ab);
  for (let k = 0; k < str.length; k++) buf[k] = Number(str.charCodeAt(k));
  return buf;
};

var stringToBuffer2 = function (str) {
  const encoder = new TextEncoder();
  const buf = Buffer.from(encoder.encode(str));
  return buf;
};

console.log("stringToBuffer1:", stringToBuffer1(INPUT));
console.log("stringToBuffer2:", stringToBuffer2(INPUT));
console.log("stringToBuffer1.toString():", stringToBuffer1(INPUT).toString());
console.log("stringToBuffer2.toString():", stringToBuffer2(INPUT).toString());
```

```console
❯ node test.js
stringToBuffer1: <Buffer 48 e9 6c 6c 6f 20 3c 0d>
stringToBuffer2: <Buffer 48 c3 a9 6c 6c 6f 20 f0 9f 8c 8d>
stringToBuffer1.toString(): H�llo <
stringToBuffer1.toString(): Héllo 🌍
```

I believe this code should work as expected. [TextEncoder](https://nodejs.org/docs/latest-v8.x/api/util.html#util_class_util_textencoder) comes with `node:util` by default since version v8.3.0, which I believe should be fine since the `package.json` engines claims `>=12.0.0`.

```js
var stringToBuffer = function(str) {
  const encoder = new TextEncoder();
  const buf = Buffer.from(encoder.encode(str));
  return buf;
}
```

and it passes all tests with the exception of exactly one test in `test-sync-response.js`:

```js
  xhr.onreadystatechange = function () {
    if (xhr.readyState === 4) {
      // xhr.response is an ArrayBuffer
      var binaryStr = Buffer.from(xhr.response).toString('binary');
      var f32 = stringToFloat32Array(binaryStr);
      log('/binary2', f32);
      var answer = new Float32Array([1, 5, 6, 7]);
      assert.equal(isEqual(f32, answer), true);
      ...
```

The issue with that test is a bit harder to fix. If you read the buffer of the response for `[ 1, 5, 6, 7 ]`, yyou get `00 00 ef bf bd 3f 00 00 ef bf bd 40 00 00 ef bf bd 40 00 00 ef bf bd 40` or "\u0000\u0000�?\u0000\u0000�@\u0000\u0000�@\u0000\u0000�@" which comes from being interpreted as utf8 in the server `data: responseData.toString('utf8')}`. The actual spec claims:

> If charset is null, [prescan](https://html.spec.whatwg.org/multipage/parsing.html#prescan-a-byte-stream-to-determine-its-encoding) the first 1024 bytes of xhr’s [received bytes](https://xhr.spec.whatwg.org/#received-bytes) and if that does not terminate unsuccessfully then let charset be the return value.

which is a bit complicated. 

Let me know what you think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

XMLHttpRequest `.response` encoding error when `.responseType` is "arraybuffer" and synchronous #29

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

XMLHttpRequest .response encoding error when .responseType is "arraybuffer" and synchronous #29

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

XMLHttpRequest `.response` encoding error when `.responseType` is "arraybuffer" and synchronous #29