Skip to content

Conversation

@jasnell
Copy link
Member

@jasnell jasnell commented Mar 22, 2016

Pull Request check-list

  • Does make -j8 test (UNIX) or vcbuild test nosign (Windows) pass with
    this change (including linting)?
  • Is the commit message formatted according to [CONTRIBUTING.md][0]?
  • If this change fixes a bug (or a performance problem), is a regression
    test (or a benchmark) included?
  • Is a documentation update included (if this change modifies
    existing APIs, or introduces new ones)?

Affected core subsystem(s)

http

Description of change

The http-parser tolerates extraneous whitespace in header field
names on incoming messages. This whitespace should be trimmed
when those header fields are passed on to IncomingMessage.

/cc @indutny @nodejs/http

The http-parser tolerates extraneous whitespace in header field
names on incoming messages. This whitespace should be trimmed
when those header fields are passed on to IncomingMessage.
@jasnell jasnell added http Issues or PRs related to the http subsystem. semver-minor PRs that contain new features and should be released in the next minor version. labels Mar 22, 2016
// always joined.
IncomingMessage.prototype._addHeaderLine = function(field, value, dest) {
field = field.toLowerCase();
field = field.toLowerCase().trim();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the trim() should be moved to the default: case to avoid unnecessary perf hits for the common header names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for trimming in the default case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we can't. Consider, Content-Type for instance. If I move the trim() into the default: clause then send the following request:

POST / HTTP/1.1
Content-Type: text/plain
Content-Type    : text/foo

The headers that end up reported on the server side are:

{
  'content-type': ['text/plain', 'text/foo']
}

However, if I send the second content type without the whitespace, it comes to:

POST / HTTP/1.1
Content-Type: text/plain
Content-Type: text/foo
{ 'content-type': 'text/plain' }

One thing I could do is a quick charAt check on the field name is see if it's \s or \t and trim only if it is.

Modifying the http-parser to do the trimming turns out to be quite a bit more involved. I'll keep investigating that change. @indutny may have some ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasnell why is it hard to do it in http-parser? It seems to me that it is just a matter of adding two extra states.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser state model is fairly complicated. I get a bit hesitant when it
comes to messing with it. It's the right approach, for sure tho. I just
wanted to make sure that someone more familiar with that bit of code
weighed in on it.
On Mar 25, 2016 6:24 AM, "Fedor Indutny" notifications@github.com wrote:

In lib/_http_incoming.js
#5844 (comment):

@@ -121,7 +121,7 @@ IncomingMessage.prototype._addHeaderLines = function(headers, n) {
// and drop the second. Extended header fields (those beginning with 'x-') are
// always joined.
IncomingMessage.prototype._addHeaderLine = function(field, value, dest) {

  • field = field.toLowerCase();
  • field = field.toLowerCase().trim();

@jasnell https://github.com/jasnell why is it hard to do it in
http-parser? It seems to me that it is just a matter of adding two extra
states.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/nodejs/node/pull/5844/files/2f45d2eda4f16435bf844ebf4f28e84bf159e2cb#r57443293

@mscdex
Copy link
Contributor

mscdex commented Mar 22, 2016

I wonder if it would be possible to have the http parser do the trimming to avoid having the perf hit at all?

@benjamingr
Copy link
Member

Generally looks good, but I'm not sure about something.

This whitespace should be trimmed when those header fields are passed on to IncomingMessage

Why?

@jasnell
Copy link
Member Author

jasnell commented Mar 22, 2016

@mscdex ... it potentially can be moved into http-parser, yes, I'll investigate further.

@benjamingr ... because technically whitespace between the field name and : is not permitted by HTTP but we tolerate it when parsing. The http-parser itself, for instance, treats 'Content-Length: 0' as equivalent to 'Content-Length : 0' but will pass those on as 'Content-Length' and 'Content-Length '.

@benjamingr
Copy link
Member

I can see a scenario where this would be a breaking change for someone but it does sound convoluted. Makes sense, and I see how it could be causing bugs - +1 with the trim only when needed nit.

@jasnell jasnell modified the milestone: 6.0.0 Mar 22, 2016
@jasnell
Copy link
Member Author

jasnell commented Mar 23, 2016

@indutny ... do you have any thoughts on how we can modify http-parser to strip the header field name whitespace without it being too complicated of a change?

@ronkorving
Copy link
Contributor

Agreed with @mscdex that trimming should probably happen in the parser, should be much more efficient. If not, please trim before toLowerCase (so that toLowerCase may operate on less characters).

@indutny
Copy link
Member

indutny commented Mar 25, 2016

@jasnell I'm slightly curious about reasoning behind nodejs/http-parser@48a4364f . Looks like it should be reverted.

@indutny
Copy link
Member

indutny commented Mar 25, 2016

@jasnell sorry, I meant part of it that allows whitespace in the middle of header field.

indutny added a commit to indutny/http-parser that referenced this pull request Mar 25, 2016
Skip whitespace from the left and the right sides of the header field.

See: nodejs/node#5844
@indutny
Copy link
Member

indutny commented Mar 25, 2016

Alternative fix for http-parser: https://github.com/nodejs/http-parser/pull/295/files . On our benchmarks it seems that parser becomes 4% less efficient in a presence of leading/trailing whitespace in the field.

@indutny
Copy link
Member

indutny commented Mar 25, 2016

Without the whitespace performance drop is about 10%, so I guess we should benchmark this PR too and see which one will perform better.

@indutny
Copy link
Member

indutny commented Mar 25, 2016

Or maybe just skip whitespace in src/node_http_parser.cc? It can really contain only 0x20 character, and .trim() accounts for all kinds of unicode whitespace.

@jasnell
Copy link
Member Author

jasnell commented Mar 25, 2016

It's also going to be a breaking change for some. Trimming leading
whitespace could be problematic due to header line folding concerns.
Trimming whitespace in the middle will break against some known buggy
servers (IIS). The primary concern here tho is the trailing whitespace.
On Mar 25, 2016 7:32 AM, "Fedor Indutny" notifications@github.com wrote:

Without the whitespace performance drop is about 10%, so I guess we should
benchmark this PR too and see which one will perform better.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#5844 (comment)

@silverwind
Copy link
Contributor

Isn't this against RFC7230 which says whitespace is optional (e.g. not forbidden):

   Each header field consists of a case-insensitive field name followed
   by a colon (":"), optional leading whitespace, the field value, and
   optional trailing whitespace.

@jasnell
Copy link
Member Author

jasnell commented Apr 4, 2016

@silverwind ... this deals with whitespace before the colon, not after. RFC7230 forbids whitespace before the colon. leading and trailing whitespace in the header field value is ok.

@indutny ... btw, I did see your other proposed approach but haven't had time to review it. I'll be returning to this later this week.

@silverwind
Copy link
Contributor

@jasnell yeah, just noticed my error. Still I'd be cautious about this perf hit. I'll re-label as major because it's definitely a breaking one.

@silverwind silverwind added semver-major PRs that contain breaking changes and should be released in the next major version. and removed semver-minor PRs that contain new features and should be released in the next minor version. labels Apr 4, 2016
@Fishrock123
Copy link
Contributor

Has anyone actually done perf testing on this yet?

@jasnell
Copy link
Member Author

jasnell commented Apr 7, 2016

@Fishrock123 .. not yet. I'll be returning to this likely tomorrow. That said, the change is likely better done within the http-parser as @indutny has suggested so I will likely close this PR in favor of that other change.

@jasnell
Copy link
Member Author

jasnell commented Apr 21, 2016

Closing this in favor of making the change in http_parser. That said, the proposed change in http_parser had some issues of it's own. Will still need to work those out but it's not likely to land before v6

@jasnell jasnell closed this Apr 21, 2016
@indutny
Copy link
Member

indutny commented Apr 21, 2016

@jasnell what's the rationale behind this? As I mentioned it seems that the performance hit in http-parser is quite high. Is JS variant even slower than that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

http Issues or PRs related to the http subsystem. semver-major PRs that contain breaking changes and should be released in the next major version.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants