Skip to content

XMLHttpRequest .response encoding error when .responseType is "arraybuffer" and synchronous #29

@jeremy-code

Description

@jeremy-code

Consider the following code (StackBlitz) where XHR_URL links to example.txt with content Héllo 🌍:

import XMLHttpRequest from 'xmlhttprequest-ssl';

const XHR_URL =
  'https://gist.githubusercontent.com/jeremy-code/19f2a087bcd74686cc68fd8657b43ec2/raw/77bd7608adf37e79dcc6f760a0aa0f9ee8f28d32/example.txt';

const getXhrResponse = () => new TextDecoder().decode(xhr.response);

const xhr = new XMLHttpRequest();
xhr.responseType = 'arraybuffer';

xhr.open('GET', XHR_URL, /* async */ false);
xhr.send(null);
console.log(`Synchronous:  ${getXhrResponse()}`);

xhr.onload = () => {
  console.log(`Asynchronous: ${getXhrResponse()}`);
};
xhr.open('GET', XHR_URL, /* async */ true);
xhr.send(null);

This outputs:

node index.js
Synchronous:  H�llo <
Asynchronous: Héllo 🌍

While I can't verify the browser implementation of the synchronous call since all major browsers (WebKit, Firefox, Chrome) disable synchronous XHR with .responseType "arraybuffer", on WebKit, Firefox, Chrome it does output Asynchronous: Héllo 🌍.

I believe the issue is within this function's implementation

var stringToBuffer = function(str) {
  const ab = new ArrayBuffer(str.length)
  const buf = Buffer.from(ab);
  for (let k = 0; k < str.length; k++)
    buf[k] = Number(str.charCodeAt(k));
  return buf;
}

which I believe results in two errors:

  1. .charCodeAt() outputs the UTF-16 code unit, which I don't think is intended since it's being stored in a Buffer (which is internally a Uint8Array)
  2. Iterating through string indices causes unexpected behavior with charCodeAt, see String.prototype.charCodeAt() > Looping with codePointAt()
const INPUT = "Héllo 🌍";

var stringToBuffer1 = function (str) {
  const ab = new ArrayBuffer(str.length);
  const buf = Buffer.from(ab);
  for (let k = 0; k < str.length; k++) buf[k] = Number(str.charCodeAt(k));
  return buf;
};

var stringToBuffer2 = function (str) {
  const encoder = new TextEncoder();
  const buf = Buffer.from(encoder.encode(str));
  return buf;
};

console.log("stringToBuffer1:", stringToBuffer1(INPUT));
console.log("stringToBuffer2:", stringToBuffer2(INPUT));
console.log("stringToBuffer1.toString():", stringToBuffer1(INPUT).toString());
console.log("stringToBuffer2.toString():", stringToBuffer2(INPUT).toString());
node test.js
stringToBuffer1: <Buffer 48 e9 6c 6c 6f 20 3c 0d>
stringToBuffer2: <Buffer 48 c3 a9 6c 6c 6f 20 f0 9f 8c 8d>
stringToBuffer1.toString(): H�llo <
stringToBuffer1.toString(): Héllo 🌍

I believe this code should work as expected. TextEncoder comes with node:util by default since version v8.3.0, which I believe should be fine since the package.json engines claims >=12.0.0.

var stringToBuffer = function(str) {
  const encoder = new TextEncoder();
  const buf = Buffer.from(encoder.encode(str));
  return buf;
}

and it passes all tests with the exception of exactly one test in test-sync-response.js:

  xhr.onreadystatechange = function () {
    if (xhr.readyState === 4) {
      // xhr.response is an ArrayBuffer
      var binaryStr = Buffer.from(xhr.response).toString('binary');
      var f32 = stringToFloat32Array(binaryStr);
      log('/binary2', f32);
      var answer = new Float32Array([1, 5, 6, 7]);
      assert.equal(isEqual(f32, answer), true);
      ...

The issue with that test is a bit harder to fix. If you read the buffer of the response for [ 1, 5, 6, 7 ], yyou get 00 00 ef bf bd 3f 00 00 ef bf bd 40 00 00 ef bf bd 40 00 00 ef bf bd 40 or "\u0000\u0000�?\u0000\u0000�@\u0000\u0000�@\u0000\u0000�@" which comes from being interpreted as utf8 in the server data: responseData.toString('utf8')}. The actual spec claims:

If charset is null, prescan the first 1024 bytes of xhr’s received bytes and if that does not terminate unsuccessfully then let charset be the return value.

which is a bit complicated.

Let me know what you think.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions