-
-
Notifications
You must be signed in to change notification settings - Fork 34.4k
http: trim whitespace from header field name #5844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The http-parser tolerates extraneous whitespace in header field names on incoming messages. This whitespace should be trimmed when those header fields are passed on to IncomingMessage.
| // always joined. | ||
| IncomingMessage.prototype._addHeaderLine = function(field, value, dest) { | ||
| field = field.toLowerCase(); | ||
| field = field.toLowerCase().trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the trim() should be moved to the default: case to avoid unnecessary perf hits for the common header names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for trimming in the default case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we can't. Consider, Content-Type for instance. If I move the trim() into the default: clause then send the following request:
POST / HTTP/1.1
Content-Type: text/plain
Content-Type : text/foo
The headers that end up reported on the server side are:
{
'content-type': ['text/plain', 'text/foo']
}However, if I send the second content type without the whitespace, it comes to:
POST / HTTP/1.1
Content-Type: text/plain
Content-Type: text/foo
{ 'content-type': 'text/plain' }One thing I could do is a quick charAt check on the field name is see if it's \s or \t and trim only if it is.
Modifying the http-parser to do the trimming turns out to be quite a bit more involved. I'll keep investigating that change. @indutny may have some ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasnell why is it hard to do it in http-parser? It seems to me that it is just a matter of adding two extra states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser state model is fairly complicated. I get a bit hesitant when it
comes to messing with it. It's the right approach, for sure tho. I just
wanted to make sure that someone more familiar with that bit of code
weighed in on it.
On Mar 25, 2016 6:24 AM, "Fedor Indutny" notifications@github.com wrote:
In lib/_http_incoming.js
#5844 (comment):@@ -121,7 +121,7 @@ IncomingMessage.prototype._addHeaderLines = function(headers, n) {
// and drop the second. Extended header fields (those beginning with 'x-') are
// always joined.
IncomingMessage.prototype._addHeaderLine = function(field, value, dest) {
- field = field.toLowerCase();
- field = field.toLowerCase().trim();
@jasnell https://github.com/jasnell why is it hard to do it in
http-parser? It seems to me that it is just a matter of adding two extra
states.—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/nodejs/node/pull/5844/files/2f45d2eda4f16435bf844ebf4f28e84bf159e2cb#r57443293
|
I wonder if it would be possible to have the http parser do the trimming to avoid having the perf hit at all? |
|
Generally looks good, but I'm not sure about something.
Why? |
|
@mscdex ... it potentially can be moved into http-parser, yes, I'll investigate further. @benjamingr ... because technically whitespace between the field name and |
|
I can see a scenario where this would be a breaking change for someone but it does sound convoluted. Makes sense, and I see how it could be causing bugs - +1 with the trim only when needed nit. |
|
@indutny ... do you have any thoughts on how we can modify http-parser to strip the header field name whitespace without it being too complicated of a change? |
|
Agreed with @mscdex that trimming should probably happen in the parser, should be much more efficient. If not, please trim before toLowerCase (so that toLowerCase may operate on less characters). |
|
@jasnell I'm slightly curious about reasoning behind nodejs/http-parser@48a4364f . Looks like it should be reverted. |
|
@jasnell sorry, I meant part of it that allows whitespace in the middle of header field. |
Skip whitespace from the left and the right sides of the header field. See: nodejs/node#5844
|
Alternative fix for http-parser: https://github.com/nodejs/http-parser/pull/295/files . On our benchmarks it seems that parser becomes 4% less efficient in a presence of leading/trailing whitespace in the field. |
|
Without the whitespace performance drop is about 10%, so I guess we should benchmark this PR too and see which one will perform better. |
|
Or maybe just skip whitespace in |
|
It's also going to be a breaking change for some. Trimming leading
|
|
Isn't this against RFC7230 which says whitespace is optional (e.g. not forbidden): |
|
@silverwind ... this deals with whitespace before the colon, not after. RFC7230 forbids whitespace before the colon. leading and trailing whitespace in the header field value is ok. @indutny ... btw, I did see your other proposed approach but haven't had time to review it. I'll be returning to this later this week. |
|
@jasnell yeah, just noticed my error. Still I'd be cautious about this perf hit. I'll re-label as major because it's definitely a breaking one. |
|
Has anyone actually done perf testing on this yet? |
|
@Fishrock123 .. not yet. I'll be returning to this likely tomorrow. That said, the change is likely better done within the http-parser as @indutny has suggested so I will likely close this PR in favor of that other change. |
|
Closing this in favor of making the change in http_parser. That said, the proposed change in http_parser had some issues of it's own. Will still need to work those out but it's not likely to land before v6 |
|
@jasnell what's the rationale behind this? As I mentioned it seems that the performance hit in http-parser is quite high. Is JS variant even slower than that? |
Pull Request check-list
make -j8 test(UNIX) orvcbuild test nosign(Windows) pass withthis change (including linting)?
test (or a benchmark) included?
existing APIs, or introduces new ones)?
Affected core subsystem(s)
http
Description of change
The http-parser tolerates extraneous whitespace in header field
names on incoming messages. This whitespace should be trimmed
when those header fields are passed on to IncomingMessage.
/cc @indutny @nodejs/http