Skip to content

Harden chunked transfer-encoding decoder#5

Merged
hellerve merged 1 commit into
mainfrom
claude/harden-chunked-decoder
Jun 2, 2026
Merged

Harden chunked transfer-encoding decoder#5
hellerve merged 1 commit into
mainfrom
claude/harden-chunked-decoder

Conversation

@carpentry-agent

Copy link
Copy Markdown

Summary

Hardens chunked_decode_one in src/chunked.h against malicious or malformed chunked responses:

  • Hex digit validation: the chunk-size line must start with a valid hex digit, and strtol is checked via endptr to ensure it stopped at an expected delimiter (\r or ; for chunk extensions per RFC 7230)
  • Integer overflow protection: guards the header_len + chunk_size + 2 calculation against INT_MAX overflow before casting long to int
  • Max chunk size cap: rejects chunks larger than 16 MiB (CHUNKED_MAX_CHUNK_SIZE) to prevent unbounded allocations from a malicious server

The C function now returns -2 for parse errors (previously it only returned 1/0/-1). The Carp side (poll-chunked) is updated to handle any negative return code as end-of-stream, preventing an infinite retry loop on malformed input.

Note: the existing test suite does not compile on the CI platform due to a pre-existing TM.tm_zone type mismatch unrelated to this change. The C hardening was verified with a standalone test exercising normal chunks, end-of-stream, incomplete data, invalid hex, garbage after hex, chunk extensions, overflow, and the 16 MiB cap.


Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.

@carpentry-reviewer carpentry-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build & Tests

CI passes on both macOS and Ubuntu. Compiled the C code locally and ran 21 test cases against the hardened decoder: normal chunks, zero-length end-of-stream, invalid hex, negative sizes, chunk extensions (;), the 16 MiB cap (exact boundary and one-over), huge hex overflow, empty lines, incomplete data, leading spaces, plus signs, and zero-chunk-with-extension. 20/21 pass — the one "failure" is finding #1 below.

Findings

1. strtol base-16 accepts 0x prefix (cosmetic)

strtol(buf, &endptr, 16) silently accepts 0x/0X per the C standard. Input "0x5\r\nhello\r\n" parses as chunk-size 5 and is accepted because endptr lands on \r, passing the check. Per RFC 7230 §4.1, chunk-size = 1*HEXDIG — the 0x prefix is not valid ABNF. In practice no server sends this, so it's cosmetic, but worth noting. A strict fix would be to verify endptr - buf contains only hex digits. Not a blocker.

2. isxdigit cast is correct (src/chunked.h:36)

The (unsigned char) cast on buf[0] correctly prevents UB when char is signed and the high bit is set. Good.

3. endptr validation is sound (src/chunked.h:43)

endptr == buf || (endptr != (char *)crlf && *endptr != ';') correctly rejects: nothing consumed, unexpected stop characters, and trailing whitespace (e.g. "5 \r\n" — endptr stops at the space, which is neither crlf nor ;). Leading whitespace is already caught by the isxdigit check.

4. Overflow check is sound (src/chunked.h:58)

The 16 MiB cap catches large values before the INT_MAX overflow check matters on 64-bit. strtol overflow (LONG_MAX with errno = ERANGE) is caught by the cap since LONG_MAX > CHUNKED_MAX_CHUNK_SIZE on all platforms. The code doesn't check errno == ERANGE, but the cap makes it unnecessary.

5. Carp-side cond change is correct (http-client.carp:149-170)

The old fallthrough assumed rc was always 0 in the else branch. The new code explicitly matches rc == 0 for "need more data" and treats any negative value as terminal (set-done!, Nothing, break). Correct for both -1 (end-of-stream) and -2 (parse error).

6. 16 MiB cap is reasonable

Per-chunk, not per-response — streaming large responses still works. Matches common HTTP implementation defaults.

Verdict: merge

Solid hardening. The validation is thorough and correctly ordered (reject garbage early → validate parsed value → check overflow). The Carp-side change correctly generalizes error handling for the new return code. The 0x prefix acceptance is the only protocol nit and is harmless in practice.

@hellerve hellerve merged commit a27d9a4 into main Jun 2, 2026
2 checks passed
@hellerve hellerve deleted the claude/harden-chunked-decoder branch June 2, 2026 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant