seanmonstar / httparse Goto Github PK

View Code? Open in Web Editor NEW

535.0 11.0 105.0 224 KB

A push parser for the HTTP 1.x protocol in Rust.

Home Page: https://docs.rs/httparse

License: Apache License 2.0

Rust 100.00%

rust http parse

httparse's People

Contributors

Stargazers

Watchers

Forkers

xstr reem jeanphix gnieto bonifaido tempbottle ebarnard frewsxcv jethrogb 17dec dfa1 abonander guillaumegomez utkarshkukreti acidburn0zzz bvandersloot epirat kamyuentse trezm emblica hannesg sw17ch earthengine pl0q1n nchie mm4nn isgasho coder-256 euclio iostat marwes tsnoam gavadinov mitioshi bcandersen benxiangge fanatid zhaofengli jamscones hawk777 spinda eaufavor arnidagur o0ignition0o chris-barrett-fw chathaway-codes acfoltzer chenpufeng kornelski andrewmatilde caleblbaker shnatsel nox sielicki soveu sfackler integritee-network abdulniyaspm pinghe phala-network miketaylr allan-silva layer0-platform indiv0 johnnnywang ameliabradley ralfjung isgashocn harshavamsi zenria makesoftwaresafe reu bnoordhuis lucacasonato alvin-tosh aarono r-bk gootorov davidkorczynski tyrchen schoonology segfo kitcatier andywangevertz timando iq-scm 00xc f0rki deepfence grnmeira nneesshh pgiii cppcoffee baoyachi andrewaylett dyxushuai traceflight agentmishra lucab abrarnitk

httparse's Issues

Invalid end of chunk (parse_chunk_size)

Hello, the end of the chunk should be ended by two sets of \r\n. but parse_chunk_size considers that one set is enough, which results in the inability to know the completion of message transmission in the TCP stream.

for example:

correct:

let buf = b"0\r\n\r\n";
assert_eq!(httparse::parse_chunk_size(buf), Ok(httparse::Status::Complete((3, 0))));

unexp:

let buf = b"0\r\n";
assert_eq!(httparse::parse_chunk_size(buf), Ok(httparse::Status::Partial));

let buf = b"0\r\n\r";
assert_eq!(httparse::parse_chunk_size(buf), Ok(httparse::Status::Partial));

Usage example with async

Hello, I'm trying to use this crate to implement a parser, but I'm having difficulties implementing response reading loop.

Here's my crate's code: https://github.com/MOZGIII/http-proxy-client-async
And I'm having issues with this section in particular:
https://github.com/MOZGIII/http-proxy-client-async/blob/d5d29ec06c5cd912e17ec358ee77860c7e8b4f61/src/http.rs#L39-L54

pub async fn receive_response<'buf, ARW>(stream: &mut ARW) -> io::Result<Vec<u8>>
where
    ARW: AsyncRead + AsyncWrite + Unpin,
{
    let mut response_headers = [httparse::EMPTY_HEADER; 16];
    let mut buf = [0u8; 1024];
    let mut response = httparse::Response::new(&mut response_headers);

    let (consumed, total) = loop {
        let total = stream.read(&mut buf).await?;
        let result = response.parse(&buf[..total]);
        match result {
            Err(err) => return Err(io::Error::new(io::ErrorKind::InvalidData, err)),
            Ok(httparse::Status::Complete(consumed)) => break (consumed, total),
            Ok(httparse::Status::Partial) => continue,
        };
    };

    let leftovers = Vec::from(&buf[consumed..total]);
    Ok(leftovers)
}

The issue is with borrowing the buf:

error[E0502]: cannot borrow `buf` as mutable because it is also borrowed as immutable
  --> src/http.rs:44:33
   |
44 |         let total = stream.read(&mut buf).await?;
   |                                 ^^^^^^^^ mutable borrow occurs here
45 |         let result = response.parse(&buf[..total]);
   |                      --------        --- immutable borrow occurs here
   |                      |
   |                      immutable borrow later used here

Since this crate has a kind of unique API, how would you recommend solving this issue?

multiline headers not handled

While RFC 7230 deprecated mutliline headers (search around for obs-fold in the RFC, they're still something you sometimes encounter. I noticed this while using the multipart_mime library, which in turn uses httparse to handle headers in a MIME message. Here's a failing test case:

    req! {
        test_multiline_header,
        b"GET / HTTP/1.1\r\nX-Received: by 10.84.217.214 with SMTP id whatever;\r\n        Wed, 21 Jun 2017 09:04:21 -0700 (PDT)",
        |req| {
            assert_eq!(req.headers.len(), 1);
        }
    }

I'm not sure how to actually fix the issue, but I figured I'd at least report the bug.

Improve error message on invalid request start-line

If the HTTP message begins with whitespace, the resulting error is "invalid HTTP version". This felt misleading, or at least less helpful than it could be, since the version (HTTP/1.1) in the start-line was fine. If the message could instead either, ideally call out leading whitespace (as it may be subtle to spot), or at least instead say something like "invalid start-line", or "doesn't match HTTP format", or similar, I think it could help save people some time in tracking down this particular issue.

httparse should skip invalid headers

I am trying to use reqwest to parse a response from a server I don't control. reqwest uses hyper which uses httparse for parsing HTTP/1.x headers. Anyway, this server has a weird bug where it consistently returns a single corrupted header line in an otherwise completely valid response (the header contains unescaped non-token characters). Specifically, for some reason it tries to send the DOCTYPE as a header. The bug is unlikely to be fixed (this is old software), but it isn't really a problem because the page displays fine in all major browsers.

It seems that all major browsers simply ignore invalid header lines. However, httparse returns an error that aborts the entire parsing process. IMO this is a problem and should be fixed.

Here's a screenshot from Chrome that shows the invalid header being ignored:

In fact, Chrome's behavior is commented as: "skip malformed header".

Although technically changing this could be breaking, in this case, I can't imagine that any code would rely on response parsing to fail in this particular case.

Here are the relevant lines:

httparse/src/lib.rs

Lines 594 to 595 in 6f696f5

    
           } else if !is_header_name_token(b) { 
        
               return Err(Error::HeaderName);

httparse/src/lib.rs

Lines 613 to 614 in 6f696f5

    
           } else if !is_header_name_token(b) { 
        
               return Err(Error::HeaderName);

httparse/src/lib.rs

Line 672 in 6f696f5

expect!(bytes.next() == b'\n' => Err(Error::HeaderValue));

httparse/src/lib.rs

Lines 613 to 614 in 6f696f5

    
           } else if !is_header_name_token(b) { 
        
               return Err(Error::HeaderName);

I think all of these would be fixed by consuming b until the next newline then continue 'headers. I would open a PR but I just want to check that you agree that this change should be made.

HTTP POST Request Example

Hello Team,
This doesnt qualifiy as an issue but more of an example request. I am trying to build a Rust client for making a http api call which is based on no-std. I am new to

Please can give me an example for the following api calls with httparse crate.

Call to get Ouath Token in reponse passing username and password:
curl -d "username=hello&password=world" -H "Content-Type: application/x-www-form-urlencoded" -X POST http://localhost:8080/login
Call to get data using the token

curl --get http://localhost:8080/user/api/data --header "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJyZW5hdWx0IiwibmJmIjoxNjMxNzEzMTcyLCJleHAiOjE2MzE3OTk1NzIsInVzZXJJZCI6IjEiLCJhdXRob3JpdGllcyI6IlVTRVIiLCJ1c2VybmFtZSI6InJlbmF1bHQifQ.Ktsg_084LPg8KSZnKqdloRjdHBQzEeuBGAka8CHcrUIA6kubvMGBq03qWYKhUP-_FrBZKOd5eb2DuUt24K0TLcaM-meGNtUvSDU-0wVZIxEwgSTHbVZ2QRf9eNuSkcW7s1QHg29hzxZ2_f2KHNZWqVjSs4JxqExXYPkxLhidYT8d_22oLWcDMtnfdUZ6fhmsRZ-jV0h-sB_zV0z3dBY9ZNL_KduYhdCGzXpGPjfpieYJAieDqc2P1Gy2N1gk88eCKvsYAs011egSBmhRGy-fJuU_Y4rlvdxa5I6pjec_vvWBMVMxGtyxLjHCFn8VQW59DnUA5hEOVRzgv7l_IiZrvA"

Thanks for the help.

Benchmark is not an apples to apples comparison (sse4.2)

Hello, first, thanks for making this tool.

I wanted to point out your benchmark is a bit unfair as you compare httparse sse4 against picohttpparser without sse4. The reason picohttpparser doesn't have sse4 is because your dependency 'pico-sys' does not compile picohttpparser with sse4 enabled.

Your benchmark showed a ~60% improvement in performance for 'bench_pico' once sse4 was enabled in the underlying crate.

I forked the underlying crate 'pico-sys' and made a few modifications if you want to verify my results:

https://github.com/errantmind/rust-pico-sys

parse_chunk_size can have integer overflow

AFAICT the HTTP spec does not limit the chunk size. Yet, parse_chunk_size tries to parse it into a u64. If a value is provided that doesn't fit in a u64, the multiplication operator will overflow in release mode or panic in debug mode. Instead, the function should return an error. This might be a good use for checked_mul.

support raw UTF8 paths

Hello !
I discovered recently that many user agents send raw UTF8 bytes in HTTP paths, and proxies forward them without issues. See actix/actix-web#3102

However, it looks like this library fails to parse such HTTP requests, preventing web servers from handling them.

Would it be possible to handle such requests in this library ?

Private function shrink() is unsound

This code does not uphold Rust safety invariants: either debug_assert!() should be assert!() or the function must be marked unsafe fn:

httparse/src/lib.rs

Lines 38 to 43 in 6f696f5

    
           #[inline] 
        
           fn shrink<T>(slice: &mut &mut [T], len: usize) { 
        
               debug_assert!(slice.len() >= len); 
        
               let ptr = slice.as_mut_ptr(); 
        
               *slice = unsafe { slice::from_raw_parts_mut(ptr, len) }; 
        
           }

Also, it's weird to see a custom function for this - slice[..len] looks like it should be sufficient, but I'm probably missing something.

exposing parse_headers() for parsing multipart/form-data

I'm trying to parse multipart/form-data form submissions. I can extract and pass on the header lines of each part into a parser. But which parser? parse_headers() is private and its not obvious how to use it. Advice?

Allow spaces in header name

It might be inconsistent to have a space in header names, but some legacy systems have such. Is it possible to support them in the httparse?

A way to ignore headers completely

For my purpose i need to completely ignore headers, i am only interested in request method and path.

Is there a way to ignore TooManyHeaders error and get all other request data?

Allowed characters in path

We're using actix-web library in our project and it uses this library for HTTP-parsing.
We are in a situation where we need to accept requests with non RFC2396 characters (like caret ^) in query parameters.
Urls are considered human interface and humans shouldn't be expected to handle urlencoding in these situations.
We checked other parser implementations from Nginx (https://github.com/nginx/nginx/blob/master/src/http/ngx_http_parse.c#L13) and Node.js (https://github.com/nodejs/http-parser/blob/master/http_parser.c#L187) and they seem to be more liberal by allowing more characters than httparse imlementation.

Should we find out some workaround or would same kind of implementation be in the scope of httparse?

use with ringbuffer

Hi,

First, thanks for this great library!

Is it possible to parse around the edge of a ring buffer boundary? E.g by providing two buffers as input to the parser? Currently I'm shifting all remaining bytes down after every successful parse but it would be nice to not have to do that. Any advice?

Parsing Response causes a panic

code:

#![feature(plugin)]
#![plugin(afl_coverage_plugin)]

extern crate afl_coverage;

extern crate httparse;

use std::io::{self, Read};


fn main() {
    let mut input = String::new();
    let result = io::stdin().read_to_string(&mut input);
    if result.is_ok() {
/*
        {
            let mut headers = [httparse::EMPTY_HEADER; 16];
            let mut req = httparse::Request::new(&mut headers);
            req.parse(input.as_bytes());
        }
*/

        {
            let mut headers = [httparse::EMPTY_HEADER; 16];
            let mut res = httparse::Response::new(&mut headers);
            res.parse(input.as_bytes());
        }
    }

input: (this is encoded in base64, decode it before feeding it in)

SFRUUC8xLjESMjAw

error:

root@vultr:~/afl-staging-area2# cargo run < outputs/crashes/id:000002,sig:04,src:000001,op:havoc,rep:2
     Running `target/debug/afl-staging-area2`
thread '<main>' panicked at 'arithmetic operation overflowed', /root/httparse/src/lib.rs:34
An unknown error occurred

To learn more, run the command again with --verbose.

This bug was found using https://github.com/kmcallister/afl.rs 👍

Remove trailing whitespace from header field values

RFC7230#Field parsing states that parsers should remove leading in trailing whitespace from header field values. Currently httparse only removes leading whitespace, removing trailing whitespace would make httparse easier to use for other crates.

The following two requests should be parsed the same:

GET / HTTP/1.1\r\nHost: foo.com\r\nUser-Agent: foobarsoft\r\n\r\n
GET / HTTP/1.1\r\nHost: foo.com\r\nUser-Agent: foobarsoft \t \t \r\n\r\n

Failure to parse 200 status code not followed by SP

I was trying to use hyper and reqwest on a project yesterday to communicate to a server internal to my workplace. When trying to create a client using either crate, it returns an HTTP(Status) error, stating "Invalid Status provided".

Here's the strace of the executable, changed only to remove private server details from the request and response headers:

sendto(3, "GET /{OMITTED} HTTP/1.1\r\nHost: {OMITTED}\r\nAccept: */*\r\nUser-Agent: reqwest/0.1.0\r\n\r\n", 103, 0, NULL, 0) = 103
read(3, "HTTP/1.1 200\r\nServer: nginx/1.4.6 (Ubuntu)\r\nDate: Fri, 02 Dec 2016 21:18:20 GMT\r\nContent-Type: text/html\r\nTransfer-Encoding: chunked\r\nConnection: keep-alive\r\nContent-Language: en\r\n\r\n1fba\r\n<html>\n<head>

{Output Truncated}

write(1, "Failed because: Invalid Status provided\n", 40Failed because: Invalid Status provided
) = 40
+++ exited with 0 +++

After discussing this in the rust IRC, we've suspect that httparse might be failing on the fact that the custom webserver is returning "HTTP/1.1 200\r\n", with no SP or Reason Phrase following the status code before the CRLF.

CCing @joshtriplett on this, as well, since he helped me find the issue

Allow spaces between header names and colons

Sorry, I edited my issue, mixed up two problems at once.

curl -v "https://videoroll.net/vpaut_option_get.php?pl_id=6577"

Note the space after Access-Control-Allow-Credentials.

< HTTP/1.1 200 OK
< Server: nginx/1.16.0
< Date: Tue, 16 Mar 2021 09:03:39 GMT
< Content-Type: text/json;charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials : true
< Expires: Tue, 23 Mar 2021 09:03:39 GMT
< Cache-Control: max-age=604800

What does the spec says about this?

No whitespace is allowed between the header field-name and colon. In the past, differences in the handling of such whitespace have led to security vulnerabilities in request routing and response handling. A server MUST reject any received request message that contains whitespace between a header field-name and colon with a response code of 400 (Bad Request). A proxy MUST remove any such whitespace from a response message before forwarding the message downstream.

So first, I want to point out that the security vulnerabilities can't really happen through httparse (or at the very least, through Hyper), given Hyper does not keep around the literal text of the HTTP request, and thus those pesky spaces cannot be passed to anyone downstream AFAIK.

Second, and this is what matters to me (wearing my work hat), is that the spec itself implies that a proxy has to be able to parse those anyway to remove the headers, thus it needs to successfully be able to parse such a response. httparse (and thus Hyper) does not let any of that pass through, which is a problem for people implementing proxies.

Making a patch that makes those spaces not fail the entire parse is pretty trivial, but I wonder if we want a relaxed_response_parsing option of some sort. Such an option exists in Squid.

In the browser space, Firefox (and AFAIK Chrome too) happily ignore spaces between the header name and the colon, and will also ignore spaces in header names themselves (ignoring the whole "name: value" pair and just skipping to the next header in the response.

curl -v https://crlog-crcn.adobe.com/crcn/PvbPreference -X POST

Note el famoso Updated Preferences: [] header in the response.

> POST /crcn/PvbPreference HTTP/1.1
> Host: crlog-crcn.adobe.com
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200
< Updated Preferences: []
< Content-Length: 2
< Date: Tue, 16 Mar 2021 08:43:38 GMT
<

We can discuss whether or not to let that through at a later date.

Improve API

I would like to build a function

fn read_headers<'a,'b,R: BufRead>(
        stream: &mut R,
        buf: &'b mut Vec<u8>,
        headers: &'a mut [Header<'b>]
    ) -> Result<Request<'a,'b>,E>

that reads as much from stream into buf as necessary to get a Complete return from Request::parse. This turns out to not be trivial and require lots of extra allocations and work.

Here's what I came up with:

fn read_headers<'a,'b,R: BufRead>(clnt: &mut R, buf: &'b mut Vec<u8>, headers: &'a mut [Header<'b>]) -> Result<Request<'a,'b>,String> {
    fn extend_and_parse<R: BufRead>(clnt: &mut R, headers: &mut [Header]) -> Result<Vec<u8>,String> {
        let mut buf=Vec::<u8>::new();
        let len=headers.len();
        loop {
            let buf_orig_len=buf.len();
            let additional_len={
                let additional=try!(clnt.fill_buf().map_err(|e|e.to_string()));
                buf.extend_from_slice(additional);
                additional.len()
            };
            let mut headers=Vec::with_capacity(len);
            headers.resize(len,httparse::EMPTY_HEADER);
            let mut req=Request::new(&mut headers);
            match req.parse(&buf) {
                Ok(httparse::Status::Complete(n)) => {
                    clnt.consume(n-buf_orig_len);
                    break
                },
                Ok(httparse::Status::Partial) => {
                    clnt.consume(additional_len);
                }
                Err(e) => return Err(format!("HTTP parse error {:?}",e)),
            };
        }
        Ok(buf)
    }

    let result=extend_and_parse(clnt,headers);
    result.map(move|nb|{
        ::core::mem::replace(buf,nb);
        let mut req=Request::new(headers);
        req.parse(buf);
        req
    })
}

The main issues are having to allocate a new array of temporary headers for every iteration, and having to parse the succesful result twice.

I think this is partially Rust's fault, but also partially httparse for having a not so great API. For example, the lifetime of everything is fixed upon Request creation, so that a parse failure doesn't release the borrow of the input buffer.

Need for `Request` to take a slice of `MaybeUninit<Header>`

Context: hyper passes uninitialized array of httparse::Header to Request::new.
This is undefined behavior and I discovered it while working on Replacing mem::uninitialized with MaybeUninit

I acknowledge that changing headers field will be a breaking change, because it is public, so
is there a chance that a Request type copy could be implemented, but with headers: &mut [MaybeUninit<Header>]?

EDIT: I just realized that author of httparse is also author of hyper 😅

Document the AVX2 intructions

These were added in #40, but understanding exactly what is happening is difficult. It'd be best to document what in the world is happening :)

How return a Request properly

Beginner here. Have not understood lifetimes completely. I am trying to return a parsed Request from a function like this -

fn read_and_parse(mut stream: TcpStream) -> Option<Request> {
  ..
  return req;
}

I get this error -

error[E0107]: wrong number of lifetime parameters: expected 2, found 0
  --> src/main.rs:64:61
   |
64 | fn read_and_parse(mut stream: TcpStream) -> Option<&'static Request> {
   |                                                             ^^^^^^^ expected 2 lifetime parameters

error: aborting due to previous error

How do I solve this?

Support Miri

Miri is currently choking on the use of is_x86_feature_detected!() here: https://github.com/seanmonstar/httparse/blob/master/src/simd/mod.rs#L74

Miri is probably not going to be supporting SIMD anytime soon anyway so it'd be nice if we could use #[cfg(miri)] to turn off feature detection entirely and just use the naive algorithms.

This could potentially happen automatically but this crate may need attention in other places in order to work in Miri.

Benchmarks do not compile

It seems that commit alexcrichton/libc@1791046 in the libc dependency broke bench_pico benchmark. Type alias size_t now stands for usize, not for u32/u64 (architecture dependent) as it did before.

README is outdated

In the "Usage" part of the README file:

assert!(req.parse(buf)?.is_partial());

throws an error:

cannot use the ? operator in a function that returns ()

The correct usage is as in the documentation:

req.parse(buf).unwrap().is_partial()

enable stdsimd

I try to launch:
cargo +nightly clippy on my project and I get this error:

error[E0658]: macro is_x86_feature_detected! is unstable (see issue #0)
  --> /Users/marco/.cargo/registry/src/github.com-1ecc6299db9ec823/httparse-1.3.2/src/simd/mod.rs:72:48
   |
72 |             if cfg!(target_arch = "x86_64") && is_x86_feature_detected!("avx2") {
   |                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = help: add #![feature(stdsimd)] to the crate attributes to enable

error[E0658]: macro is_x86_feature_detected! is unstable (see issue #0)
  --> /Users/marco/.cargo/registry/src/github.com-1ecc6299db9ec823/httparse-1.3.2/src/simd/mod.rs:75:23
   |
75 |             } else if is_x86_feature_detected!("sse4.2") {
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = help: add #![feature(stdsimd)] to the crate attributes to enable

error: aborting due to 2 previous errors

I have tried to add httparse in my dependencies with:

#![feature(stdsimd)]
extern crate httparse;

but it stay similar.

Is it possible to have some help on it ?
Thanks,
Marc-Antoine

Comment for Request.version doesn't make sense

httparse/src/lib.rs

Lines 284 to 285 in 6f696f5

    
           /// The request version, such as `HTTP/1.1`. 
        
           pub version: Option<u8>,

It suggests version could contain HTTP/1.1, but version is a u8. I believe it should say something like "For HTTP/1.1 this will contain 1".

Please increase the demo number of headers

tldr: Please change the documentation demo line from

let mut headers = [httparse::EMPTY_HEADER; 16];
let mut req = httparse::Request::new(&mut headers);

let mut headers = [httparse::EMPTY_HEADER; 64];
let mut req = httparse::Request::new(&mut headers);

This is a really silly bug that bit us.

Our server uses httparse. Since we weren't quite sure how many headers we needed to pre-allocate, we used the demo code's suggested 16 headers. Unfortunately, with work going on around the new Sec-* headers, we suddenly saw a spike of clients trying to connect with 17 headers. It took us a bit to figure out what was going on and why these connections were mysteriously failing.

Boosting the count might help others not suddenly hit this problem.

Add an option to get the body out of request

It would be nice, if one could (at least had an option to) get the body of request, while parsing. Body of HTTP request is a byte sequence anyway, so it just could be stored in Request struct as &[u8], and further manipulation could be left to the user.

Not support http body

I've get a pcap file from tcpdump.
I tried to use httparse to process those packet. However, I found it well return Status::Complete even if packet is not complete.
I think this is caused by not processing the body.
As a rust novice, I translated the relevant code from golang's http library and it works fine now. How do I submit this code. Or can someone help me submit the code.

invalid HTTP status-code parsed

Here is my hyper trace

TRACE hyper::client::pool TRACE hyper::client::connect::http DEBUG hyper::client::connect::http DEBUG hyper::client::connect::http TRACE hyper::client::conn TRACE hyper::client TRACE hyper::proto::h1::conn TRACE hyper::client::pool TRACE hyper::proto::h1::role DEBUG hyper::proto::h1::io TRACE hyper::proto::h1::conn TRACE hyper::proto::h1::conn DEBUG hyper::proto::h1::io TRACE hyper::proto::h1::role TRACE hyper::proto::h1::conn DEBUG hyper::proto::h1::conn DEBUG hyper::proto::h1::dispatch Error invalid HTTP TRACE hyper::proto::h1::conn TRACE hyper::proto::h1::conn TRACE hyper::proto::h1::conn > checkout waiting for idle connection: "http://10.200.14.75:8000"
> Http::connect; scheme=http, host=10.200.14.75, port=Some(8000)
> connecting to 10.200.14.75:8000
> connected to Some(V4(10.200.14.75:8000))
> client handshake HTTP/1
> handshake complete, spawning background dispatcher task
> flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Busy }
> checkout dropped for "http://10.200.14.75:8000"
> Client::encode method=POST, body=Some(Known(6))
> flushed 152 bytes
> flushed({role=client}): State { reading: Init, writing: KeepAlive, keep_alive: Busy }
> Conn::read_head
> read 172 bytes
> Response.parse([Header; 100], [u8; 172])
> State::close_read()
> parse error (invalid HTTP status-code parsed) with 172 bytes
> read_head error: invalid HTTP status-code parsed
status-code parsed
> State::close()
> flushed({role=client}): State { reading: Closed, writing: Closed, keep_alive: Disabled }
> shut down IO complete

Same call works fine with curl cmd

curl -d 'abcdef1234abcdef=27062019112303|' http://10.200.14.75:8000 -v

Rebuilt URL to: http://10.200.14.75:8000/
Trying 10.200.14.75...
Connected to 10.200.14.75 (10.200.14.75) port 8000 (#0)

POST / HTTP/1.1
Host: 10.200.14.75:8000
User-Agent: curl/7.47.0
Accept: /
Content-Length: 32
Content-Type: application/x-www-form-urlencoded

upload completely sent off: 32 out of 32 bytes
< HTTP/1.1 200 OK
< Server: Messy
< Date: Fri, 31 Aug 2011 00:31:53 GMT
< Content-Type: text/html
< Connection: Keep-Alive
< Content-Length: 15
<
831377785945
Connection #0 to host 10.200.14.75 left intact

Any idea what is wrong here. My rust code is

extern crate hyper;
extern crate pretty_env_logger;

use std::io::{self, Write};

use hyper::{Client, Request, Body};
use hyper::rt::{self, Future, Stream};

fn main() {
    pretty_env_logger::init();

    let url = "http://10.200.14.75:8000".to_string();

    let url = url.parse::<hyper::Uri>().unwrap();
    if url.scheme_part().map(|s| s.as_ref()) != Some("http") {
        println!("This example only works with 'http' URLs.");
        return;
    }

    rt::run(fetch_url(url));
}

fn fetch_url(url: hyper::Uri) -> impl Future<Item=(), Error=()> {
    let mut request = Request::builder();
    let req = request
       .method("POST")
       .uri("http://10.200.14.75:8000")
       .header("User-Agent", "my-awesome-agent/1.0")
       .header("Content-Type", "application/x-www-form-urlencoded").body(Body::from("Hallo!"))
    .expect("request builder");

    let client = Client::builder()
    .keep_alive(true)
    .build_http();
    client
        .request(req)
        .and_then(|res| {
            println!("Response: {}", res.status());
            println!("Headers: {:#?}", res.headers());

            res.into_body().for_each(|chunk| {
                io::stdout().write_all(&chunk)
                    .map_err(|e| panic!("example expects stdout is open, error={}", e))
            })
        })
        .map(|_| {
            println!("\n\nDone.");
        })
        .map_err(|err| {
            eprintln!("Error {}", err);
        })
}

Relicense under dual MIT/Apache-2.0

This issue was automatically generated. Feel free to close without ceremony if
you do not agree with re-licensing or if it is not possible for other reasons.
Respond to @cmr with any questions or concerns, or pop over to
#rust-offtopic on IRC to discuss.

You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.

TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.

Why?

The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback. However, this is not the
primary motivation for me creating these issues. The Apache license also has
protections from patent trolls and an explicit contribution licensing clause.
However, the Apache license is incompatible with GPLv2. This is why Rust is
dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for
GPLv2 compat), and doing so would be wise for this project. This also makes
this crate suitable for inclusion and unrestricted sharing in the Rust
standard distribution and other projects using dual MIT/Apache, such as my
personal ulterior motive, the Robigalia project.

Some ask, "Does this really apply to binary redistributions? Does MIT really
require reproducing the whole thing?" I'm not a lawyer, and I can't give legal
advice, but some Google Android apps include open source attributions using
this interpretation. Others also agree with
it.
But, again, the copyright notice redistribution is not the primary motivation
for the dual-licensing. It's stronger protections to licensees and better
interoperation with the wider Rust ecosystem.

How?

To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright, due to not being a "creative
work", e.g. a typo fix) and then add the following to your README:

## License

Licensed under either of

 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
 * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

### Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.

and in your license headers, if you have them, use the following boilerplate
(based on that used in Rust):

// Copyright 2016 httparse Developers
//
// Licensed under the Apache License, Version 2.0, <LICENSE-APACHE or
// http://apache.org/licenses/LICENSE-2.0> or the MIT license <LICENSE-MIT or
// http://opensource.org/licenses/MIT>, at your option. This file may not be
// copied, modified, or distributed except according to those terms.

It's commonly asked whether license headers are required. I'm not comfortable
making an official recommendation either way, but the Apache license
recommends it in their appendix on how to use the license.

Be sure to add the relevant LICENSE-{MIT,APACHE} files. You can copy these
from the Rust repo for a plain-text
version.

And don't forget to update the license metadata in your Cargo.toml to:

license = "MIT OR Apache-2.0"

I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!

Contributor checkoff

To agree to relicensing, comment with :

I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option.

Or, if you're a contributor, you can check the box in this repo next to your
name. My scripts will pick this exact phrase up and check your checkbox, but
I'll come through and manually review this issue later as well.

How get a header value easily ?

Hi,

It is not an issue, but more a question. There is no function to easily search/find a header in the header array. Is it because the need was not there it has not been done by anybody or is it because I missed something that could help me to do a such job ?

Same thing with the method. It returns you an Option<&Str>, but could be useful to know quickly if it is a GET, a POST...

For now, I wrote function on my side to help me, but I'm wondering if it could be useful in the crate itself.
Thanks

double quotes in headers can not be parsed

In the header name map, index 34 is false. So when parsing headers, the program will return Error::HeaderName when meeting double quotes(ascii number is 34) . However, bouble quotes in headers can be parsed correctly in chrome. Is this intentional or a bug?

Test Example:

 #[test]
    fn test_double_quotes() {
        use std::mem;
        let bytes= b"HTTP/1.1 200 OK\r\nServer: nginx/1.14.2\r\nDate: Mon, 25 Jan 2021 06:20:06 GMT\r\nContent-Type: image/png\r\nContent-Length: 24623\r\nConnection: keep-alive\r\n\"Access-Control-Allow-Origin: *\"\r\nAccept-Ranges: bytes\r\nAccess-Control-Allow-Origin: *\r\nCache-Control: 2592000\r\n\r\n";
        let mut headers: [Header; 10] = unsafe { mem::uninitialized() };
        let mut res = Response::new(&mut headers);
        let parsed_res = res.parse(bytes);
        println!("parsed res= {:?}", parsed_res);
    }

Parse from noncontiguous byte slices?

I am building a HTTP server on top of tokio that needs to perform minimal memory allocations per client TCP connection, regardless of how many HTTP requests are received through the connection. Therefore I have chosen to use a fixed-size circular buffer to store the raw request data read from the wire, and now I am trying to use httparse to parse the request information. The problem I have run into is that the Request.parse function takes in a single &[u8], but because I'm using a circular buffer I have two slices - one for the bytes in the remainder of the buffer, and one (optionally) for the bytes which wrapped around to the front of the buffer. This two-buffer approach works very well with vectored IO reads, but not so well so far with httparse.

At first I was hoping I that httparse's Request type would be persistent, so I could call parse in turn for both of the slices. But that appears to not be how the API works - it expects the one slice to have all the data, and when you call it a second time the same data should still be present, only with more added to the end.

Consequently, the only way I can find to use httparse today is to perform a copy of the data from the circular buffer into a secondary contiguous buffer. But the cost of such copying is potentially significant and I'd prefer to avoid it where possible. How feasible would it be to add some sort of parse_vectored function to httpparse which takes a &[std::io::IoSlice]?

Request header parsing fails when the horizontal tab is in the header field value

According to RFC7230, header field values seem to be able to include horizontal tabs.

tests fail to build with --no-default-features.

While updating the rust-httpparse package in Debian I noticed a failure building the tests with --no-default-features, due to use of std::u64::MAX, replacing it with core::u64::MAX fixed the issue.

Patch at https://salsa.debian.org/rust-team/debcargo-conf/-/blob/26e95fce67cc0210c85518ae25e38c58c4901334/src/httparse/debian/patches/fix-tests-no-default-features.patch

Should Line Folding get replaced by spaces?

Noticed this in RFC-7230:

A user agent that receives an obs-fold in a response message that is
not within a message/http container MUST replace each received
obs-fold with one or more SP octets prior to interpreting the field
value.

I think currently httparse will just reject a line fold. This paragraph seems to suggest it should reinterpret it with spaces. Is this interpretation correct?

I am running into this problem (and an invalid SOH character) with an internal server.

Thanks.

method might not refer to a location within buffer

Our app parses a request but doesn't need to do anything with it until the body is also received. In order to allow the underlying buffer to be used for reading the body, and also to avoid parsing the request twice, the app converts the various slices (method, path, headers) into integer indexes within the underlying buffer and then drops the Request in order to unborrow the buffer. Later on, the slices can be reestablished using the indexes.

As of httparse 1.8, our app started failing due to slice locations potentially existing outside of the buffer, likely due to "GET" and "POST" now being returned as static strings.

In hindsight I suppose we were abusing the API. httparse never guaranteed the slices would always point within the buffer. It was just an easy assumption to make since httparse is known to do in-place parsing without copying. I'm not sure if anything should be changed in httparse and we will look at reworking our code. Posting this in case anyone else ran into the same issue.

How to parse body from request or response?

I'm not sure if this is the right place to ask this, but I'm looking through the documentation, and I don't see how I could use this library to get the body of a request or a response. Neither of the two structs have a field for body, and the parse method just gives back an index, is that the starting byte-index of where the body should begin?

Build broken on 1.27.0 with target `i686-linux-android`

The specific error I get is "error[E0425]: cannot find function _tzcnt_u64 in this scope". I assume this is because it's a 32 bit platform but it could be something else.

`parse_headers()` enables undefined behavior

Simplest reproduction.

extern crate httparse;

use httparse::{EMPTY_HEADER, parse_headers};

fn main() {
    let mut buf = *b"Foo: Bar\r\n\r\n";

    let mut headers = [EMPTY_HEADER];

    let headers_len = {
        let (_, headers) = parse_headers(&mut buf, &mut headers).unwrap().unwrap();
        headers.len()
    } ;

    assert_eq!(headers_len, 1);

    buf[0] = b'B';

    // Prints "Boo"
    println!("{:?}", headers[0].name);
}

As you can see, parse_headers() allows borrows to buf to escape in headers, creating a double-borrow where the original buffer can be mutated while views to it exist.

Discovered by accident, I was working on some infinite-loop bugs in multipart when I took a double-take at this function and thought, "Wait a minute, how the hell did this work to begin with?" The r.consume() at 80 shouldn't be allowed, but the borrow is escaping.

Feature: Implementing parsing for the `http` crate.

This crate could implement the parsing mechanism for the http crate, which is already used by everyone, this would be really cool, so this lib serializes and deserializes http::Request and http::Response.

Allow providing a list of expected headers

When parsing a response, the number of headers is limited.

However, if an application knows upfront which headers it cares about, all other headers could be ignored, and simply skipped from returning.

It would be great if the user could provide a list of "expected" (or "allowed") headers. And only those headers would get recorded for the result.

Allow " in URIs

There are user-agents out there sending " without percent-encoding it in the URI, it would be nice if Hyper would allow those. The character was drive-by disallowed in #40. I can patch the non-optimised code but I'll need some hand-holding to patch the SIMD paths, as #41 still holds.

Clarify behavior of `parse_headers()` when passed a zero-length byte buf

Currently, it appears to return Status::Partial even though a zero-sized buf generally means the end of a stream. It should probably return Status::Complete((0, &[])) instead.

expose static Error str

It would be nice to be able to access the error descriptions as &'static str via the now private Error::description_str, especially because std::error::Error::description is deprecated.

Error::NewLine on responses with multi-word reasons

Typical offenders:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade

HTTP/1.1 404 Not Found

Headers lost on a Partial parse

I'm not sure if this is worth fixing, but it will almost certainly break some things (it broke our tests at least):

In 1.4, parsing a request would extract headers if the string was terminated with a single newline, however, that behavior changed in 1.5 (now two newlines are required). A reproducer is below:

    #[test]
    fn test_httparse() {
        let works_1_4_1 = b"GET /?Param2=value2&Param1=value1 HTTP/1.1\nHost:example.com\n";
        let works_1_5 = b"GET /?Param2=value2&Param1=value1 HTTP/1.1\nHost:example.com\n\n";
        let mut headers = [httparse::EMPTY_HEADER; 64];
        let n_headers =
            |req: httparse::Request| req.headers.iter().filter(|h| !h.name.is_empty()).count();
        {
            // test for 1.4.1
            let mut req = httparse::Request::new(&mut headers);
            let _ = req.parse(works_1_4_1).unwrap();
            assert_eq!(n_headers(req), 1, "failed on 1.4.1 test");
        }
        {
            // test for 1.5
            let mut req = httparse::Request::new(&mut headers);
            let _ = req.parse(works_1_5).unwrap();
            assert_eq!(n_headers(req), 1);
        }
    }

	} else if !is_header_name_token(b) {
	return Err(Error::HeaderName);

	#[inline]
	fn shrink<T>(slice: &mut &mut [T], len: usize) {
	debug_assert!(slice.len() >= len);
	let ptr = slice.as_mut_ptr();
	*slice = unsafe { slice::from_raw_parts_mut(ptr, len) };
	}

	/// The request version, such as `HTTP/1.1`.
	pub version: Option<u8>,

seanmonstar / httparse Goto Github PK

httparse's People

Contributors

Stargazers

Watchers

Forkers

httparse's Issues

Why?

How?

Contributor checkoff

Recommend Projects

Recommend Topics

Recommend Org