-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Consider the following code (StackBlitz) where XHR_URL links to example.txt with content Héllo 🌍:
import XMLHttpRequest from 'xmlhttprequest-ssl';
const XHR_URL =
'https://gist.githubusercontent.com/jeremy-code/19f2a087bcd74686cc68fd8657b43ec2/raw/77bd7608adf37e79dcc6f760a0aa0f9ee8f28d32/example.txt';
const getXhrResponse = () => new TextDecoder().decode(xhr.response);
const xhr = new XMLHttpRequest();
xhr.responseType = 'arraybuffer';
xhr.open('GET', XHR_URL, /* async */ false);
xhr.send(null);
console.log(`Synchronous: ${getXhrResponse()}`);
xhr.onload = () => {
console.log(`Asynchronous: ${getXhrResponse()}`);
};
xhr.open('GET', XHR_URL, /* async */ true);
xhr.send(null);This outputs:
❯ node index.js
Synchronous: H�llo <
Asynchronous: Héllo 🌍While I can't verify the browser implementation of the synchronous call since all major browsers (WebKit, Firefox, Chrome) disable synchronous XHR with .responseType "arraybuffer", on WebKit, Firefox, Chrome it does output Asynchronous: Héllo 🌍.
I believe the issue is within this function's implementation
var stringToBuffer = function(str) {
const ab = new ArrayBuffer(str.length)
const buf = Buffer.from(ab);
for (let k = 0; k < str.length; k++)
buf[k] = Number(str.charCodeAt(k));
return buf;
}which I believe results in two errors:
.charCodeAt()outputs the UTF-16 code unit, which I don't think is intended since it's being stored in a Buffer (which is internally aUint8Array)- Iterating through string indices causes unexpected behavior with charCodeAt, see String.prototype.charCodeAt() > Looping with codePointAt()
const INPUT = "Héllo 🌍";
var stringToBuffer1 = function (str) {
const ab = new ArrayBuffer(str.length);
const buf = Buffer.from(ab);
for (let k = 0; k < str.length; k++) buf[k] = Number(str.charCodeAt(k));
return buf;
};
var stringToBuffer2 = function (str) {
const encoder = new TextEncoder();
const buf = Buffer.from(encoder.encode(str));
return buf;
};
console.log("stringToBuffer1:", stringToBuffer1(INPUT));
console.log("stringToBuffer2:", stringToBuffer2(INPUT));
console.log("stringToBuffer1.toString():", stringToBuffer1(INPUT).toString());
console.log("stringToBuffer2.toString():", stringToBuffer2(INPUT).toString());❯ node test.js
stringToBuffer1: <Buffer 48 e9 6c 6c 6f 20 3c 0d>
stringToBuffer2: <Buffer 48 c3 a9 6c 6c 6f 20 f0 9f 8c 8d>
stringToBuffer1.toString(): H�llo <
stringToBuffer1.toString(): Héllo 🌍I believe this code should work as expected. TextEncoder comes with node:util by default since version v8.3.0, which I believe should be fine since the package.json engines claims >=12.0.0.
var stringToBuffer = function(str) {
const encoder = new TextEncoder();
const buf = Buffer.from(encoder.encode(str));
return buf;
}and it passes all tests with the exception of exactly one test in test-sync-response.js:
xhr.onreadystatechange = function () {
if (xhr.readyState === 4) {
// xhr.response is an ArrayBuffer
var binaryStr = Buffer.from(xhr.response).toString('binary');
var f32 = stringToFloat32Array(binaryStr);
log('/binary2', f32);
var answer = new Float32Array([1, 5, 6, 7]);
assert.equal(isEqual(f32, answer), true);
...The issue with that test is a bit harder to fix. If you read the buffer of the response for [ 1, 5, 6, 7 ], yyou get 00 00 ef bf bd 3f 00 00 ef bf bd 40 00 00 ef bf bd 40 00 00 ef bf bd 40 or "\u0000\u0000�?\u0000\u0000�@\u0000\u0000�@\u0000\u0000�@" which comes from being interpreted as utf8 in the server data: responseData.toString('utf8')}. The actual spec claims:
If charset is null, prescan the first 1024 bytes of xhr’s received bytes and if that does not terminate unsuccessfully then let charset be the return value.
which is a bit complicated.
Let me know what you think.