I’ve been debugging a rather strange, timing-sensitive bug in a project that showed up after upgrading jetty from 9.2.13 -> 9.4.8 (still repro on 9.4.11), and have stumbled upon some suspicious behavior within jetty’s request handling/HTTP
parsing layers. I haven’t been able to isolate a standalone repro case yet, but while I work on doing so, I was wondering if any of this sounded familiar or suspicious to anyone?
Summary of conditions to repro:
- High volume of GET requests (200-400 per second)
- A fairly small payload per request (around 500 bytes including all HTTP headers)
- Somewhat sporadically (usually between 5-10 minutes, although it can take up to 30), we see an HTTP 400 error returned to the client, due to “unknown version” (i.e., invalid HTTP version spec in the request).
However, there is nothing unusual about the request that fails (compared to all the thousands that succeed). I’ve confirmed via Wireshark that the exact request in question is perfectly valid.
- With similar probability, we sometimes see a
duplicated response from an earlier request sent as the response to the subsequent request (so far, always on the same connection, but we haven’t ruled out that it might be happening across multiple connections). The end result is that all HTTP responses
are shifted down by one, causing mayhem on the client side. According to Wireshark, the HTTP stream goes something like this:
I’ve narrowed the investigation down to what appears to be a race condition in filling on-heap buffers from an org.eclipse.jetty.io.ByteBufferPool with the HTTP request bytes, and parsing the request out of those same buffers. In particular,
when the 400 “unknown version” error hits,
org.eclipse.jetty.http.HttpParser.parseNext(buffer) is called with a buffer that appears to be in
fill mode (position at end-of-content, limit at max capacity); yet by the time code has progressed to
where the BAD_REQUEST_400 is thrown, the buffer has flipped back to flush mode (with position and limit both at the end-of-content). _methodString and _uri are both parsed to be gibberish (based on content that is read past the end of the buffer’s
limit), and _version remains null.
Can anyone help confirm that the ByteBuffer behavior inside of HttpParser is unexpected? And, if by chance, this looks like anything else you all may have encountered?
Daniel Potter // Senior Software Engineer
PROS® // Powering Modern Commerce with Dynamic Pricing Science