nodejs / http-parser Goto Github PK

View Code? Open in Web Editor NEW

6.3K 346.0 1.5K 821 KB

http request/response parser for c

License: MIT License

C 96.61% Python 1.19% Makefile 2.20%

nodejs node

http-parser's Introduction

HTTP Parser

http-parser is not actively maintained. New projects and projects looking to migrate should consider llhttp.

This is a parser for HTTP messages written in C. It parses both requests and responses. The parser is designed to be used in performance HTTP applications. It does not make any syscalls nor allocations, it does not buffer data, it can be interrupted at anytime. Depending on your architecture, it only requires about 40 bytes of data per message stream (in a web server that is per connection).

Features:

No dependencies
Handles persistent streams (keep-alive).
Decodes chunked encoding.
Upgrade support
Defends against buffer overflow attacks.

The parser extracts the following information from HTTP messages:

Header fields and values
Content-Length
Request method
Response status code
Transfer-Encoding
HTTP version
Request URL
Message body

Usage

One http_parser object is used per TCP connection. Initialize the struct using http_parser_init() and set the callbacks. That might look something like this for a request parser:

http_parser_settings settings;
settings.on_url = my_url_callback;
settings.on_header_field = my_header_field_callback;
/* ... */

http_parser *parser = malloc(sizeof(http_parser));
http_parser_init(parser, HTTP_REQUEST);
parser->data = my_socket;

When data is received on the socket execute the parser and check for errors.

size_t len = 80*1024, nparsed;
char buf[len];
ssize_t recved;

recved = recv(fd, buf, len, 0);

if (recved < 0) {
  /* Handle error. */
}

/* Start up / continue the parser.
 * Note we pass recved==0 to signal that EOF has been received.
 */
nparsed = http_parser_execute(parser, &settings, buf, recved);

if (parser->upgrade) {
  /* handle new protocol */
} else if (nparsed != recved) {
  /* Handle error. Usually just close the connection. */
}

http_parser needs to know where the end of the stream is. For example, sometimes servers send responses without Content-Length and expect the client to consume input (for the body) until EOF. To tell http_parser about EOF, give 0 as the fourth parameter to http_parser_execute(). Callbacks and errors can still be encountered during an EOF, so one must still be prepared to receive them.

Scalar valued message information such as status_code, method, and the HTTP version are stored in the parser structure. This data is only temporally stored in http_parser and gets reset on each new message. If this information is needed later, copy it out of the structure during the headers_complete callback.

The parser decodes the transfer-encoding for both requests and responses transparently. That is, a chunked encoding is decoded before being sent to the on_body callback.

The Special Problem of Upgrade

http_parser supports upgrading the connection to a different protocol. An increasingly common example of this is the WebSocket protocol which sends a request like

    GET /demo HTTP/1.1
    Upgrade: WebSocket
    Connection: Upgrade
    Host: example.com
    Origin: http://example.com
    WebSocket-Protocol: sample

followed by non-HTTP data.

(See RFC6455 for more information the WebSocket protocol.)

To support this, the parser will treat this as a normal HTTP message without a body, issuing both on_headers_complete and on_message_complete callbacks. However http_parser_execute() will stop parsing at the end of the headers and return.

The user is expected to check if parser->upgrade has been set to 1 after http_parser_execute() returns. Non-HTTP data begins at the buffer supplied offset by the return value of http_parser_execute().

Callbacks

During the http_parser_execute() call, the callbacks set in http_parser_settings will be executed. The parser maintains state and never looks behind, so buffering the data is not necessary. If you need to save certain data for later usage, you can do that from the callbacks.

There are two types of callbacks:

notification typedef int (*http_cb) (http_parser*); Callbacks: on_message_begin, on_headers_complete, on_message_complete.
data typedef int (*http_data_cb) (http_parser*, const char *at, size_t length); Callbacks: (requests only) on_url, (common) on_header_field, on_header_value, on_body;

Callbacks must return 0 on success. Returning a non-zero value indicates error to the parser, making it exit immediately.

For cases where it is necessary to pass local information to/from a callback, the http_parser object's data field can be used. An example of such a case is when using threads to handle a socket connection, parse a request, and then give a response over that socket. By instantiation of a thread-local struct containing relevant data (e.g. accepted socket, allocated memory for callbacks to write into, etc), a parser's callbacks are able to communicate data between the scope of the thread and the scope of the callback in a threadsafe manner. This allows http_parser to be used in multi-threaded contexts.

Example:

 typedef struct {
  socket_t sock;
  void* buffer;
  int buf_len;
 } custom_data_t;


int my_url_callback(http_parser* parser, const char *at, size_t length) {
  /* access to thread local custom_data_t struct.
  Use this access save parsed data for later use into thread local
  buffer, or communicate over socket
  */
  parser->data;
  ...
  return 0;
}

...

void http_parser_thread(socket_t sock) {
 int nparsed = 0;
 /* allocate memory for user data */
 custom_data_t *my_data = malloc(sizeof(custom_data_t));

 /* some information for use by callbacks.
 * achieves thread -> callback information flow */
 my_data->sock = sock;

 /* instantiate a thread-local parser */
 http_parser *parser = malloc(sizeof(http_parser));
 http_parser_init(parser, HTTP_REQUEST); /* initialise parser */
 /* this custom data reference is accessible through the reference to the
 parser supplied to callback functions */
 parser->data = my_data;

 http_parser_settings settings; /* set up callbacks */
 settings.on_url = my_url_callback;

 /* execute parser */
 nparsed = http_parser_execute(parser, &settings, buf, recved);

 ...
 /* parsed information copied from callback.
 can now perform action on data copied into thread-local memory from callbacks.
 achieves callback -> thread information flow */
 my_data->buffer;
 ...
}

In case you parse HTTP message in chunks (i.e. read() request line from socket, parse, read half headers, parse, etc) your data callbacks may be called more than once. http_parser guarantees that data pointer is only valid for the lifetime of callback. You can also read() into a heap allocated buffer to avoid copying memory around if this fits your application.

Reading headers may be a tricky task if you read/parse headers partially. Basically, you need to remember whether last header callback was field or value and apply the following logic:

(on_header_field and on_header_value shortened to on_h_*)
 ------------------------ ------------ --------------------------------------------
| State (prev. callback) | Callback   | Description/action                         |
 ------------------------ ------------ --------------------------------------------
| nothing (first call)   | on_h_field | Allocate new buffer and copy callback data |
|                        |            | into it                                    |
 ------------------------ ------------ --------------------------------------------
| value                  | on_h_field | New header started.                        |
|                        |            | Copy current name,value buffers to headers |
|                        |            | list and allocate new buffer for new name  |
 ------------------------ ------------ --------------------------------------------
| field                  | on_h_field | Previous name continues. Reallocate name   |
|                        |            | buffer and append callback data to it      |
 ------------------------ ------------ --------------------------------------------
| field                  | on_h_value | Value for current header started. Allocate |
|                        |            | new buffer and copy callback data to it    |
 ------------------------ ------------ --------------------------------------------
| value                  | on_h_value | Value continues. Reallocate value buffer   |
|                        |            | and append callback data to it             |
 ------------------------ ------------ --------------------------------------------

Parsing URLs

A simplistic zero-copy URL parser is provided as http_parser_parse_url(). Users of this library may wish to use it to parse URLs constructed from consecutive on_url callbacks.

See examples of reading in headers:

http-parser's People

Contributors

Stargazers

Watchers

Forkers

ice799 tml tjweir humbletim ptlomholt arsane virtuo sgala mopemope k1mo timwee tootallnate nvl tmm1 ignacio a2800276 surapaneni seanjensengrey shogsbro edwardgeorge nikicat thomaslee brownman caeies apaprocki bnoordhuis slmnhq hintjens marsch sequoiar meegogo silasb toffaletti crnt cfis w00w00 vmg carlosmn pquerna bviefhues zhuomingliang thisandagain pbnkp felixge onnlucky boazy blizzard4591 gongweijun86 tbashore mendeley txdv gwik jamesamcl dgwynne mmalecki onnolia andryuhat jiefoxi einaros mustafa avsej x4 yyankovblc tomika childhood alkor deepfryed wates fatterlinux erikdubbelboer kamechb martell tzuryby mudzot bpaquet haibocheng prakaashkpk sunilmallya malfaux donliu liangtaohy tritech ismylhakkituran cloudstrifegit ipconfigme zhuzhaoyuan paolo-lulli marciapuffstone hollylee thepicard slumdunking daqing15 atassumer najaf gotomypc alepharchives gotosprey rakesh-roshan lucifer545 kennethho

http-parser's Issues

eat CRLF between requests even on connection:close

see nodejs/node-v0.x-archive#1165

we need a failing test

Forcing a pause at the end of HTTP headers

I've implemented a simple C++ wrapper for the http-parser library and ran into a problem when using the parser pause feature.

Basically, I have a Request object that wraps a http_parser instance. This request object has a feed() method which invokes http_parser_execute(). The request object also has a headers_complete() which reports the end of HTTP headers. To make this test reliable (mainly for proxying purposes), I force http_parser_execute() to return by pausing the parser calling http_parser_pause() from the on_headers_complete() callback. All this seemingly works, except that it doesn't consume the last byte of a request without a body. To finish processing the request, I have to call http_parser_execute() again with the last byte of data.

Now, calling feed()/http_parser_execute() one extra time is not that much of a problem, except that it makes it prevents clients from accurately finding the exact position of the end of HTTP headers. In a proxying scenario (HTTP proxy, CGI/SCGI/FastCGI or even WebSockets), where you want to forward the body byte-for-byte, clients end up prefixing the HTTP request body with an extra byte.

This was reported to me on the httpxx issue tracker, but I believe it's an issue in http-parser. Basically, the on_headers_complete() callback is called upon examining the last byte of the header data, and pausing in that callback prevents the library from marking the last byte (the one the triggered the callback) as consumed.

Visit the httpx issue #5 for a detailed discussion. If it's unclear, I might be able to concoct an SSCCE that illustrates the problem.

Note: this issue seems related to pull request 89.

Multiple calls of data callbacks

http://github.com/ry/http-parser/blob/37a0ff8928fb0d83cec0d0d8909c5a4abcd221af/http_parser.h#L38-40

/*
* http_data_cb does not return data chunks. It will be call arbitrarally
* many times for each string. E.G. you might get 10 callbacks for "on_path"
* each providing just a few characters more data.
*/

Maybe i did not understand ragel docs about %actions.
Is this still true? Could you please write an example of copying headers to separate array of struct { char *name; char *value; }?

The http_data_cb functions need to return 0 on success

Hello,

I've spent quite a while last week trying to figure out why one of my callback functions worked only on some requests and not on others. This was due to me returning the length parameter as an int, and not zero like http-parser expects. (I assume zero means “no error” in this context.)

Could you please add this to the documentation? I'm sure it would save someone else some time.

Thanks!

Uninitialized method field being accesed during response parsing

I'm getting this in valgrind:

==7845== Conditional jump or move depends on uninitialised value(s)
==7845== at 0x520FD17: http_parser_execute (http_parser.c:1319)

It looks like commit SHA: 0264a0a added a check for upgrading CONNECT requests, but "method" is only initialized when processing a request, so I think this is a bug while response parsing.

Version release

A query more than an issue: When is the next planned version release? v1.0 was back on May 11, 2011, and there have been a ton of changes since then.

Using url_callback and http_parser_parse_url()

Hi guys,
Firstly, I want to say, really good work. I have downloaded this library, because I am coding a packet parser for network communication, as part of plugin for MS Visio. Joyent/http parser looks great for my necessaries, but I have problem with parsing URL as char* from my packets.
Can you give me, please, some usable example, how to use url_callback and http_parser_url() together please? I really need it for my bachelor thesis.
Thank you very much..

Connection header not parsed properly, breaks FF websockets

See https://bugzilla.mozilla.org/show_bug.cgi?id=730742 and nodejs/node-v0.x-archive#2849.

In a nutshell, FF sends a header that looks like Connection: keep-alive, upgrade while http-parser only understands Connection: upgrade. Should be fixed on short notice.

Overflow checks on content-length

Nice library.

How about some sanity checks on content-length to prevent overflow?

eg.

case s_chunk_size:
...
STRICT_CHECK(parser->content_length & 0x0800000000000000);
parser->content_length *= 16;
parser->content_length += c;
break;

and in the header parser

case h_content_length:
...
STRICT_CHECK(parser->content_length >= 0x0CCCCCCCCCCCCCCC);
parser->content_length *= 10;
parser->content_length += ch - '0';

Add support for HTTP verb SEARCH

As per Ben Noordhuis' request, here's a new issue to track the one I filed against Node:

nodejs/node-v0.x-archive#2897

auto-lower() chars in header names

Hi!

HTTP headers are case-insensitive. Since in http_parser_execute() we already analyze each incoming character, wonder if we could force lowering the case of characters seen as part of header names?

TIA,
--Vladimir

data chunked callbacks

There does not seen to be a native way to know the begin/end of a single data chunk with their corresponding length. We are adding two callback to denote the begin/end of every data chunk. Do not to hesitate mailing me for further information.

parsing CONNECT requests

When parsing CONNECT requests, host names can have numbers in them also. Here is a simple patch against a recent version of the file that I think fixes it. Basically, when in the s_req_schema state you need to check that c is between '0' and '9' as well and break if you find it to handle hosts with numbers (s3.amazon.com) in them.

I am not sure what your protocol is for submitting patches, but here is mine...

I could not get it to work, email me if you want the patch but I think it is a straightforward one liner.

Parse error on proxy request with auth URL

Works -> GET http://example.com/ HTTP/1.1
Fails -> GET http://user:[email protected]/ HTTP/1.1

See nodejs/node-v0.x-archive#3527.

..

Return value from on_body ignored.

All callbacks check return value and set an error on non-zero except for on_body:

if (settings->on_body) settings->on_body(parser, p, to_read);

Oversight? Seems like it should work similarly to the other callbacks and was perhaps missed because it bypasses the CALLBACK macros.

Move test suite to be usable by other libraries

It would be nice if other libraries/programs could test against the test suite.

parse_url_char asserts on form feed/tab

While maybe not a proper url, I don't think it should assert when a form feed (\f) or tab (\t) is present in the path.

Issue seems to be parse_url_char does an assert(!isspace(ch)) but is called from switch statements that only handle a subset of isspace.

Seems like the assert should be removed/changed or the Makefile should build with NDEBUG by default

Support for multiline headers

This commit http://github.com/jonashaag/http-parser/commit/88d0cc8bebf18fb9ecc85594b14b8e57b3e6bd98 adds a few tests for multiline headers (as specified by RFC2616).

I'd like to see multiline headers implemented in http-parser.

README out of date

The README appears to be quite out of date.

Fix pedantic warnings in C89

The macro hackery for the maps leaves trailing commas behind which are not valid in C89 and -pedantic will complain for that.

The following patch fixes that: https://gist.github.com/3344440

Add more url callbacks

Currently the library calls on_url on every chunk and I have to concat chunks to get the whole url, that's fine.
But what about adding more callbacks like: on_host, on_port, on_query_string etc. What you think?

add reason (aka status) string to response parser

Currently the parser ignores this data, the code has the comment:

/* the human readable status. e.g. "NOT FOUND"

we are not humans so just ignore this */

All of HTTP is human readable so this is a fallacy. It would be useful to get the reason string from the parser in order to construct a data structure that represents a copy of the parsed data. For example, to modify a response before sending it to a client like a proxy might do.

Can't build using gyp

root@node /mnt/workspace/node/deps/http_parser $ gyp -f make --depth=`pwd` http_parser.gyp
Traceback (most recent call last):
  File "/usr/local/bin/gyp", line 18, in <module>
    sys.exit(gyp.main(sys.argv[1:]))
  File "/usr/local/lib/python2.7/site-packages/gyp/__init__.py", line 465, in main
    options.circular_check)
  File "/usr/local/lib/python2.7/site-packages/gyp/__init__.py", line 101, in Load
    depth, generator_input_info, check, circular_check)
  File "/usr/local/lib/python2.7/site-packages/gyp/input.py", line 2265, in Load
    depth, check)
  File "/usr/local/lib/python2.7/site-packages/gyp/input.py", line 389, in LoadTargetBuildFile
    build_file_path)
  File "/usr/local/lib/python2.7/site-packages/gyp/input.py", line 1012, in ProcessVariablesAndConditionsInDict
    build_file)
  File "/usr/local/lib/python2.7/site-packages/gyp/input.py", line 1027, in ProcessVariablesAndConditionsInList
    ProcessVariablesAndConditionsInDict(item, is_late, variables, build_file)
  File "/usr/local/lib/python2.7/site-packages/gyp/input.py", line 941, in ProcessVariablesAndConditionsInDict
    expanded = ExpandVariables(value, is_late, variables, build_file)
  File "/usr/local/lib/python2.7/site-packages/gyp/input.py", line 711, in ExpandVariables
    ' in ' + build_file
KeyError: 'Undefined variable library in http_parser.gyp while trying to load http_parser.gyp'

Bug in handling CONNECT requests

There appears to be a bug in handling http CONNECT requests. In a CONNECT request, the uri that follows is of the form hostname:port, but the parser assumes if the first character is not '/', that the uri contains a 'full' uri, including schema, which is not the case for such requests. As a result, parsing fails for such requests.

URLs with 0-length host, port are considered valid

What it says.

We could fix this by adding s_req_host_start and s_req_port_start states which require a single valid character before transitioning to the existing s_req_host and s_req_port states.

message_complete callback doesn't call

If Transfer-Encoding chunked, message_complete callback doesn't call.
Or is it should be so ?

IPV6 address will cause parsing to fail

When parsing the request uri a check is made for : to see where the port number starts. This will fail if the address is in IPV6 "numeric" format (no FQDN).

This was the first one I spotted, maybe there are more?

  case s_req_host:
  {
    c = LOWER(ch);
    if (c >= 'a' && c <= 'z') break;
    if ((ch >= '0' && ch <= '9') || ch == '.' || ch == '-') break;
    switch (ch) {
      case ':':
        state = s_req_port;
        break;
      case '/':
        MARK(path);
        state = s_req_path;
        break;
      case ' ':
        /* The request line looks like:
         *   "GET http://foo.bar.com HTTP/1.1"
         * That is, there is no path.
         */
        CALLBACK(url);
        state = s_req_http_start;
        break;
      default:
        goto error;
    }
    break;
  }

Feature request: add url parsing utility

Following the recent API change (removal of path, query_string), I would like to upvode clifffrey's suggestion in https://github.com/ry/http-parser/pull/54#issuecomment-1591625 to have a utility function(s) to parse the URL given in the url callback.

As mentioned in that thread, it is well and possible to use a third library, but it's a hassle. For my small needs, it feels kind of awkward to need two libraries in order to be able to parse HTTP including URLs.

Parser error when parsing request line with a non standard HTTP method

I'm trying to serve PURGE request from a HTTP server implemented using Node.JS. PURGE requests are used by Squid and Varnish to purge objects from their cache (cf. http://wiki.squid-cache.org/SquidFaq/OperatingSquid#How_can_I_purge_an_object_from_my_cache.3F and https://www.varnish-cache.org/docs/trunk/tutorial/purging.html).

But Node.JS's HTTP server does let not my code serve non-standard HTTP method. Instead, when receiving a request line using a non-standard method, the server close the connection after a parser error. So it seems that http_parser does not support non-standard methods.

According to the HTTP 1.1 specification (rfc2616, section 5.1), the method of a request line can be any valid token. So it seems natural to expect an HTTP parser to accept any method in the request line. It is up to the parser user to properly handle unknown method.

problem parse url in GET

GET //user/homepage.aspx?cas=1warn=false&ticket=Leibniz-78546881-60ee-4798-90f3-a8c9a185728d-2010-11-22_17:07:48.959&error=0|登录 HTTP/1.1
Host: xx.xxx.com
X-Real-IP: x.x.x.x
...
parser stops at '|' . of the uri

http_parser_init does not clear status_code

We use http_parser with HTTP_BOTH flag and reuse the same parser instance for many incoming messages.
Note that messages are carried over UDP so it cannot be determined if the to-be-parsed message is a request or a response.
Also note that complete message is carried in one UDP datagram.

Apparently http parser does not clear its states properly after finished messages (our messages have no Content-Length and no body):

http_parser_init(&parser, HTTP_BOTH);

loop over messages:
    size_t parsed = http_parser_execute(&parser, &settings, data, data_length);
    http_parser_execute(&parser, &settings, NULL, 0); // to indicate end-of-stream

Therefore we resolved to initialise the parser every time before parsing a message:

loop over messages:
    http_parser_init(&parser, HTTP_BOTH);
    size_t parsed = http_parser_execute(&parser, &settings, data, data_length);

But http_parser_init does not clear all state variables of the http parser.
For example status_code remains set from the last message.

If we are using http_parser in an unsupported way, please adjust the documentation.
If this is a bug, then please add the following line at http_parser.c:1859

parser->status_code = 0;

or maybe better, bzero the whole parser structure at the beginning of http_parser_init (perhaps excluding data attribute).

Bump HTTP_PARSER_VERSION_MAJOR to 2

There have been significant changes to the ABI, which require the major version to be bumped to 2, to prevent the source from conflicting with v1.0 packages.

Overflow detection on chunk trailers

In reading over the code, I noticed that chunk trailers were excluded in the test PARSING_HEADER. This has the effect of allowing chunk header parsing to continue indefinitely if an attacker crafts a trailer such as:

...
"\r\n0\r\n"  // end of chunk
"overflow: " // header start
"xxxxxxxxxx......" 1 gigabyte of data

The on_header_value callback will get called as each buffer that is fed to the parser is completed, but the callback consumer is probably not expecting the header value to grow unbounded.

Is there any reason the parser cannot or should not check for header size overflow when parsing chunk trailers? Testing I've done indicates that just removing the check does the trick (see trailer_overflow_patch)

Correct way to interrupt parser at EOM?

My app has a need to get out of the parser at the end of a message. From the docs, it would appear that the correct way is to return a non-zero value from the "message_complete" callback. This should return out of the parser the number of bytes processed. This works most of the time, however, in the case of s_body_identity, the byte count is off by one.


case s_body_identity:
        to_read = MIN(pe - p, (int64_t)parser->content_length);
        if (to_read > 0) {
          if (settings->on_body) settings->on_body(parser, p, to_read);
          p += to_read - 1;
          parser->content_length -= to_read;
          if (parser->content_length == 0) {
            CALLBACK2(message_complete);
            state = NEW_MESSAGE();
          }
        }
        break;

In this case, the buffer pointer was advanced by 'to_read -1', to position it in the correct place for the loop counter increase. Side effect is that the returned byte count is wrong.

So the question: is the above behavior a bug, or is there a better way to break out of the parser?

Regards,
Sean

on_message_complete is not called

This happens for a site as simple as http://tired.com/ (retrieved with curl -i http://tired.com/).
The breakage was introduced some time after commit 2498961 , and it seems to be related to content_length parsing.


HTTP/1.1 200 OK
Date: Sun, 19 Feb 2012 12:34:46 GMT
Server: Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny16 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g mod_perl/2.0.4 Perl/v5.10.0
Last-Modified: Mon, 29 Sep 2003 08:34:25 GMT
ETag: "3d7130-b8-3c873c3fc0640"
Accept-Ranges: bytes
Content-Length: 184
Vary: Accept-Encoding
Content-Type: text/html

<HTML>
<HEAD>
<TITLE>Are you tired?</TITLE>
</HEAD>
<BODY BGCOLOR=#FFFFFF>
<PRE>
<CENTER>



Are you tired?

Tell <a href="mailto:[email protected]">us</a> why.
</CENTER>
</PRE>
</BODY>

Simple Response handling

Dear Developers,

Nowadays I've faced with the fact that some webservers occassionally send responses without status line and header fields. I've found it in RFC 1945 under section 6 and tried to implement it in http-parser. Please check it (and correct it if necessary) and merge it into your code if possible.

http_parser.c
http_parser.h

Thanks in advance,
/tomika

PS.: Sorry but I cannot find a way to attach code.

minor documentation bug for should_keep_alive


/* If http_should_keep_alive() in the on_headers_complete or
 * on_message_complete callback returns true, then this will be should be
 * the last message on the connection.
 * If you are the server, respond with the "Connection: close" header.
 * If you are the client, close the connection.
 */

Perhaps the intent was to say, if this function returns false, then this should be the last message.
This was sufficiently confusing enough to make me poke into the source and see. The intent is obviously that it returns false only
on default < http/1.0 or http/1.1 with Connection: close

Overflow detection on chunk extensions

There's a TODO in the code:

/* just ignore this shit. TODO check for overflow */

Allowing unbounded chunk extensions is hardly the end of the world, but doesn't hurt to address it. Easiest way I found was to treat the state for the chunk headers as if they were regular headers, and have the parser bail if the total amount of data in a chunk header is larger then HTTP_MAX_HEADER_SIZE.

See chunk_ext_overflow

URLs with user:password@host:port syntax don't parse

im playing with replacing a proxy at work with node, but cannot get some requests through to the http servers request handler because http-parser inappropriately rejects them.

the tests below reproduce the issues. the second one fails because the parser thinks its looking at a port rather than a password in the uri. i cant explain why the first one fails.

please excuse the weird paths in the diff.

diff --git a/deps/http_parser/test.c b/deps/http_parser/test.c
index 7d95b0e..c42e756 100644
--- a/deps/http_parser/test.c
+++ b/deps/http_parser/test.c
@@ -70,8 +70,46 @@ static int num_messages;

/* * R E Q U E S T S * */
const struct message requests[] =
+{ {.name= "env https_proxy=node curl -i https://secure.google.com/"

,.type= HTTP_REQUEST
,.raw= "CONNECT secure.google.com:443 HTTP/1.1\r\n"

    "User-Agent: curl/7.21.4 OpenSSL/1.0.0a zlib/1.2.3 libidn/1.19\r\n"

```
    "Proxy-Connection: Keep-Alive\r\n"
```
```
    "\r\n"
```
,.should_keep_alive= TRUE
,.message_complete_on_eof= FALSE
,.http_major= 1
,.http_minor= 1
,.method= HTTP_CONNECT
,.request_url= "secure.google.com:443"
,.num_headers= 2
,.headers=
{ { "User-Agent", "curl/7.21.4 OpenSSL/1.0.0a zlib/1.2.3 libidn/1.19" }
, { "Proxy-Connection", "Keep-Alive" }
}
,.body= ""
}

+, {.name= "env ftp_proxy=node ftp ftp://user:pass@host/test"

,.type= HTTP_REQUEST
,.raw= "GET ftp://user:pass@host/test HTTP/1.0\r\n"
```
    "User-Agent: OpenBSD ftp\r\n"
```
```
    "\r\n"
```
,.should_keep_alive= FALSE
,.message_complete_on_eof= FALSE
,.http_major= 1
,.http_minor= 0
,.method= HTTP_GET
,.request_url= "ftp://user:pass@host/test"
,.num_headers= 1
,.headers=
{ { "User-Agent", "OpenBSD ftp" }
}
,.body= ""
}

#define CURL_GET 0
-{ {.name= "curl get"
+, {.name= "curl get"
,.type= HTTP_REQUEST
,.raw= "GET /test HTTP/1.1\r\n"
"User-Agent: curl/7.18.0 (i486-pc-linux-gnu) libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.1\r\n"

test.c:1572:7: error: unknown conversion type character 'z' in format

$ make
gcc -I. -DHTTP_PARSER_STRICT=1 -DHTTP_PARSER_DEBUG=1  -Wall -Wextra -Werror -O0
-g  -c http_parser.c -o http_parser_g.o
gcc -I. -DHTTP_PARSER_STRICT=1 -DHTTP_PARSER_DEBUG=1  -Wall -Wextra -Werror -O0
-g  -c test.c -o test_g.o
cc1.exe: warnings being treated as errors
test.c: In function 'test_no_overflow_long_body':
test.c:1572:7: error: unknown conversion type character 'z' in format
test.c:1572:7: error: too many arguments for format
test.c:1592:11: error: unknown conversion type character 'z' in format
test.c:1592:11: error: too many arguments for format
make: *** [test_g.o] Error 1




$ uname -a
MINGW32_NT-6.1 DAURN-PHENOM 1.0.17(0.48/3/2) 2011-04-24 23:39 i686 Msys



$ gcc -v
Using built-in specs.
COLLECT_GCC=C:\MinGW\bin\gcc.exe
COLLECT_LTO_WRAPPER=c:/mingw/bin/../libexec/gcc/mingw32/4.5.2/lto-wrapper.exe
Target: mingw32
Configured with: ../gcc-4.5.2/configure --enable-languages=c,c++,ada,fortran,obj
c,obj-c++ --disable-sjlj-exceptions --with-dwarf2 --enable-shared --enable-libgo
mp --disable-win32-registry --enable-libstdcxx-debug --enable-version-specific-r
untime-libs --disable-werror --build=mingw32 --prefix=/mingw
Thread model: win32
gcc version 4.5.2 (GCC)

Bug parsing first caracter of header field

We found a bug while parsing the above url: http://www.aldeaglobal.net.ar/

Once again the problem reside on the header response. The first caracter on the header field shoud be
an ascii between a to z, but we found a '.' (dot). The web browser does not seem to have a problem.

Regards.

Problem parsing URL

I am not sure if this is an error parsing URLs or if Firefox and Chrome are too lenient in their interpretation of the standards, but when the parser encounters the following URL from taobao.com

http://a.tbcdn.cn/p/fp/2010c/??fp-header-min.css,fp-base-min.css,fp-channel-min.css,fp-product-min.css,fp-mall-min.css,fp-category-min.css,fp-sub-min.css,fp-gdp4p-min.css,fp-css3-min.css,fp-misc-min.css?t=20101022.css

Firefox and Chrome both interpret the query string to begin with a '?' (i.e., "?fp-header..."), where as the parser seems to skip over both '?' chars and returns the rest of the query string starting with "fp-header..."

I can create a patch if you want, but I worked around it in one of my callbacks instead.

--Sam

error parsing response

We encountered a bug while parsing the above Url : http://www.ufc.com

The problem appears to be the actual server response witch is the following :

HTTP/1.1 200 OK
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
en-US Content-Type: text/xml // this is the problem
Content-Type: text/xml
Content-Length: 18293
Date: Fri, 23 Jul 2010 18:45:38 GMT
Connection: keep-alive

. ..10 .loop . . ..

The parser does not expect an blank space on state "s_header_field". We got it working just by adding an condition to ignore this space.

if(ch==' ' ){break;}

We dont dont know if this fix, is actually causing others problems (at least not as far as we know).

I guess the web browser parser ignores this space because its work. Try it.

http-parser does not follow rfc2616 when parsing header, prototype ajax requests (palm webos) do not work

I wrote a small http server with node, just to test nodejs. While testing it worked fine with my browser. When I tried to connect with my Palm Pre (emulator) via a prototype javascript ajax request it just returns "success" with no responseText. I tracked down the problem with wireshark and telnet and found that prototype sends the following line inside the header:

X-$PrototypeBI-Version: 1.6.0.3

This line causes node to end the current request without any notification. After some more testing I found out that the "$" is the problem.

With the help at #node.js I looked up rfc2616 page 16 which tells me that "$" should be allowed as a token. A lookup at the source of http_parser.c turns out that there are not all characters allowed that should be (line 111) I fixed it for my local installation by replacing the "0" with "$" at character 36.

It would be nice if this could be fixed for all allowed characters following the rfc.

Public API for starting in the headers-state

Multipart bodies are almost like a HTTP req/res; they just don't have status line:

------------0xKhTmLbOuNdArY
Content-Disposition: form-data; name="text1"

hallo welt test123

------------0xKhTmLbOuNdArY
Content-Disposition: form-data; name="text2"


------------0xKhTmLbOuNdArY
Content-Disposition: form-data; name="upload"; filename="hello.pl"
Content-Type: application/octet-stream

#!/usr/bin/perl

use strict;
use warnings;

print "Hello World :)\n"

------------0xKhTmLbOuNdArY--

It would be nice if it was possible parse these headers (the body would just appear as a streaming/EOF-detected body) using http-parser.

Assertion on s_headers_done

Line 934 assigns state = s_headers_done;

s_headers_done is not handled in the switch for the state machine, so it falls through to the default condition

     assert(0 && "unhandled state");

To reproduce the error, use the following bytes (taken from wireshark) (ignore xx)

0020 xx xx xx xx xx xx xx xx 48 54 54 50 2f 31 2e 30
0030 20 32 30 30 20 4f 4b 0a 4c 61 73 74 2d 4d 6f 64
0040 69 66 69 65 64 3a 20 54 68 75 2c 20 33 31 20 4a
0050 61 6e 20 31 39 39 38 20 31 30 3a 31 32 3a 31 31
0060 20 43 53 54 0a 43 6f 6e 74 65 6e 74 2d 54 79 70
0070 65 3a 20 69 6d 61 67 65 2f 67 69 66 0a 43 6f 6e
0080 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 31 33 35
0090 30 31 0a 0a 47 49 46 38 39 61 d6 01 3c 00 87 00
00a0 00 00 00 00 00 00 33 00 00 66 00 00 99 00 00 cc
00b0 00 00 ff 33 00 00 33 00 33 33 00 66 33 00 99 33
00c0 00 cc 33 00 ff 66 00 00 66 00 33 66 00 66 66 00
00d0 99 66 00 cc 66 00 ff 99 00 00 99 00 33 99 00 66
00e0 99 00 99 99 00 cc 99 00 ff cc 00 00 cc 00 33 cc
00f0 00 66 cc 00 99 cc 00 cc cc 00 ff ff 00 00 ff 00
0100 33 ff 00 66 ff 00 99 ff 00 cc ff 00 ff 00 33 00
0110 00 33 33 00 33 66 00 33 99 00 33 cc 00 33 ff 33
0120 33 00 33 33 33 33 33 66 33 33 99 33 33 cc 33 33
0130 ff 66 33 00 66 33 33 66 33 66 66 33 99 66 33 cc
0140 66 33 ff 99 33 00 99 33 33 99 33 66 99 33 99 99
0150 33 cc 99 33 ff cc 33 00 cc 33 33 cc 33 66 cc 33
0160 99 cc 33 cc cc 33 ff ff 33 00 ff 33 33 ff 33 66
0170 ff 33 99 ff 33 cc ff 33 ff 00 66 00 00 66 33 00
0180 66 66 00 66 99 00 66 cc 00 66 ff 33 66 00 33 66
0190 33 33 66 66 33 66 99 33 66 cc 33 66 ff 66 66 00
01a0 66 66 33 66 66 66 66 66 99 66 66 cc 66 66 ff 99
01b0 66 00 99 66 33 99 66 66 99 66 99 99 66 cc 99 66
01c0 ff cc 66 00 cc 66 33 cc 66 66 cc 66 99 cc 66 cc
01d0 cc 66 ff ff 66 00 ff 66 33 ff 66 66 ff 66 99 ff
01e0 66 cc ff 66 ff 00 99 00 00 99 33 00 99 66 00 99
01f0 99 00 99 cc 00 99 ff 33 99 00 33 99 33 33 99 66
0200 33 99 99 33 99 cc 33 99 ff 66 99 00 66 99 33 66
0210 99 66 66 99 99 66 99 cc 66 99 ff 99 99 00 99 99
0220 33 99 99 66 99 99 99 99 99 cc 99 99 ff cc 99 00
0230 cc 99 33 cc 99 66 cc 99 99 cc 99 cc cc 99 ff ff
0240 99 00 ff 99 33 ff 99 66 ff 99 99 ff 99 cc ff 99
0250 ff 00 cc 00 00 cc 33 00 cc 66 00 cc 99 00 cc cc
0260 00 cc ff 33 cc 00 33 cc 33 33 cc 66 33 cc 99 33
0270 cc cc 33 cc ff 66 cc 00 66 cc 33 66 cc 66 66 cc
0280 99 66 cc cc 66 cc ff 99 cc 00 99 cc 33 99 cc 66
0290 99 cc 99 99 cc cc 99 cc ff cc cc 00 cc cc 33 cc
02a0 cc 66 cc cc 99 cc cc cc cc cc ff ff cc 00 ff cc
02b0 33 ff cc 66 ff cc 99 ff cc cc ff cc ff 00 ff 00
02c0 00 ff 33 00 ff 66 00 ff 99 00 ff cc 00 ff ff 33
02d0 ff 00 33 ff 33 33 ff 66 33 ff 99 33 ff cc 33 ff
02e0 ff 66 ff 00 66 ff 33 66 ff 66 66 ff 99 66 ff cc
02f0 66 ff ff 99 ff 00 99 ff 33 99 ff 66 99 ff 99 99
0300 ff cc 99 ff ff cc ff 00 cc ff 33 cc ff 66 cc ff
0310 99 cc ff cc cc ff ff ff ff 00 ff ff 33 ff ff 66
0320 ff ff 99 ff ff cc ff ff ff 00 00 00 00 00 00 00
0330 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0340 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0350 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0360 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0370 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0380 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0390 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
03a0 00 00 00 00 00 15 51 4d 0d 05 41 15 c9 b8 c0 0c
03b0 04 a0 0f 00 84 e4 13 00 50 00 00 00 b0 00 00 00
03c0 00 58 07 f0 00 0c fe ff ff 9f 5b 5b b7 32 52 3a
03d0 e7 68 b5 02 84 18 03 84 30 a6 c4 00 00 00 00 40
03e0 00 00 00 08 00 00 00 00 40 00 1c 7e 00 00 3c 02
03f0 00 00 b8 5f 00 00 10 ec c3 22 27 ad f6 e2 ac 37
0400 ef fe 83 a1 38 92 a5 79 a2 a9 ba b2 ad fb c2 b1
0410 3c d3 b5 7d e3 b9 be f3 bd ff 03 83 c2 21 b1 68
0420 3c 22 93 ca 25 b3 e9 7c 42 a3 d2 29 b5 6a bd 62
0430 b3 da 2d b7 eb fd 82 c3 e2 31 b9 6c 3e a3 d3 ea
0440 35 bb ed 7e c3 e3 f2 39 bd 6e bf e3 f3 fa 3d bf
0450 ef ff 03 06 0a 0e 12 16 1a 1e 22 26 2a 2e 26 0e
0460 08 08 04 04 3c 4e 0a 0c 0c 30 62 66 8a 59 3e 5a
0470 3a 56 56 5e a2 58 6a 96 9a 9a 5c 7a 8e 78 76 52
0480 56 72 88 6e 78 c6 9e d6 da ce e2 ce 86 82 58 4a
0490 82 fa 76 5e 10 e0 ca 7e 92 da 22 67 e6 1a 4f 0e
04a0 44 ee 7e 38 4a e2 82 4a a4 ea 3e 06 a8 5e b0 42
04b0 0e 10 24 87 23 52 bb 66 43 46 02 4c 83 77 e8 8a
04c0 e6 b6 9e 47 06 00 a4 7f 5b 70 9e 3f 12 ac 8b f7
04d0 07 76 c7 93 24 6f 5e ba 67 06 f8 91 e0 34 90 1e
04e0 c3 74 0e 25 21 2c c0 6a a0 80 88 fe 2e ee 91 96
04f0 0f 58 36 ee 7b 1c f7 69 30 c6 0d 52 c3 86 e8 e8
0500 4d 92 64 80 02 be 93 8f 0e 62 8c b9 27 e0 33 57
0510 04 29 19 38 20 71 db 4e 47 1e 05 54 10 50 f2 67
0520 4a 86 38 61 5a d3 48 14 a9 cc a6 75 b2 09 e4 38
0530 c0 80 51 03 56 ad 4a 14 28 4d 5e c9 57 12 48 32
0540 94 77 d5 ea cf b1 48 3f 4d 5a 6a cf 29 db 38 1b
0550 39 76 9a a7 d2 ea 3a 92 5c 87 7a ac 40 b0 a0 d5
0560 03 63 09 1e bc 0a 72 27 25 a3 02 0c d0 6a ab 58
0570 0d be 47 00 e0 55 02 bc b2 00 c8 bd 1e 7d 59 16
0580 70 80 9f d0 67 01 fc 1a e8 64 80 e0 e6 7d a6 bf
0590 ca eb f8 b8 d9 e4 c5 ae cf a0 ad 09 29 2e ca d6
05a0 12 f6 46 1a 6b f2 80 4e 0a a1 43 7b 4e 4d da f6
05b0 84 d4 aa 29 4d 7d ad bc cc c4 74 50 05 d2 63 7a
05c0 3b 6c ce ab 0d 35 5b 20 90 33 ec 75 d2 16 0b 50
05d0 04 eb 8a f8 f2 f2 5e 42 77 16 ea 31

Bug parsing content-length header value

We encountered a bug while parsing the above url http://www.farmalive.com.ar/.
We fix this bug just ignoring the blank/space after the content-length value. The state of the parser is
"h_content_length". The source code should be :

case h_content_length:
----begin add----
if( ch == ' ' ) break; // is space ignore
-----end add ------
if (ch < '0' || ch > '9') goto error;
parser->content_length = ch - '0';
break;

Bug parsing content-length header value

case h_content_length:

   ----begin add----
   if( ch == ' ' ) break; // is space ignore
  -----end add ------
   if (ch < '0' || ch > '9') goto error;
    parser->content_length = ch - '0';
    break;

provide an `error_code` member

I can end parsing a message by returning some non-0 value. So if the parsing failed, I would like to see why it failed, hence, look at the return (error) code.

Please provide something like a error_code member in the http_parser struct.