Harden chunked transfer-encoding decoder#5
Conversation
There was a problem hiding this comment.
Build & Tests
CI passes on both macOS and Ubuntu. Compiled the C code locally and ran 21 test cases against the hardened decoder: normal chunks, zero-length end-of-stream, invalid hex, negative sizes, chunk extensions (;), the 16 MiB cap (exact boundary and one-over), huge hex overflow, empty lines, incomplete data, leading spaces, plus signs, and zero-chunk-with-extension. 20/21 pass — the one "failure" is finding #1 below.
Findings
1. strtol base-16 accepts 0x prefix (cosmetic)
strtol(buf, &endptr, 16) silently accepts 0x/0X per the C standard. Input "0x5\r\nhello\r\n" parses as chunk-size 5 and is accepted because endptr lands on \r, passing the check. Per RFC 7230 §4.1, chunk-size = 1*HEXDIG — the 0x prefix is not valid ABNF. In practice no server sends this, so it's cosmetic, but worth noting. A strict fix would be to verify endptr - buf contains only hex digits. Not a blocker.
2. isxdigit cast is correct (src/chunked.h:36)
The (unsigned char) cast on buf[0] correctly prevents UB when char is signed and the high bit is set. Good.
3. endptr validation is sound (src/chunked.h:43)
endptr == buf || (endptr != (char *)crlf && *endptr != ';') correctly rejects: nothing consumed, unexpected stop characters, and trailing whitespace (e.g. "5 \r\n" — endptr stops at the space, which is neither crlf nor ;). Leading whitespace is already caught by the isxdigit check.
4. Overflow check is sound (src/chunked.h:58)
The 16 MiB cap catches large values before the INT_MAX overflow check matters on 64-bit. strtol overflow (LONG_MAX with errno = ERANGE) is caught by the cap since LONG_MAX > CHUNKED_MAX_CHUNK_SIZE on all platforms. The code doesn't check errno == ERANGE, but the cap makes it unnecessary.
5. Carp-side cond change is correct (http-client.carp:149-170)
The old fallthrough assumed rc was always 0 in the else branch. The new code explicitly matches rc == 0 for "need more data" and treats any negative value as terminal (set-done!, Nothing, break). Correct for both -1 (end-of-stream) and -2 (parse error).
6. 16 MiB cap is reasonable
Per-chunk, not per-response — streaming large responses still works. Matches common HTTP implementation defaults.
Verdict: merge
Solid hardening. The validation is thorough and correctly ordered (reject garbage early → validate parsed value → check overflow). The Carp-side change correctly generalizes error handling for the new return code. The 0x prefix acceptance is the only protocol nit and is harmless in practice.
Summary
Hardens
chunked_decode_oneinsrc/chunked.hagainst malicious or malformed chunked responses:strtolis checked viaendptrto ensure it stopped at an expected delimiter (\ror;for chunk extensions per RFC 7230)header_len + chunk_size + 2calculation againstINT_MAXoverflow before castinglongtointCHUNKED_MAX_CHUNK_SIZE) to prevent unbounded allocations from a malicious serverThe C function now returns
-2for parse errors (previously it only returned 1/0/-1). The Carp side (poll-chunked) is updated to handle any negative return code as end-of-stream, preventing an infinite retry loop on malformed input.Note: the existing test suite does not compile on the CI platform due to a pre-existing
TM.tm_zonetype mismatch unrelated to this change. The C hardening was verified with a standalone test exercising normal chunks, end-of-stream, incomplete data, invalid hex, garbage after hex, chunk extensions, overflow, and the 16 MiB cap.Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.