Conversation
Signed-off-by: Janos SUTO <sj@acts.hu>
There was a problem hiding this comment.
Pull request overview
This PR addresses Issue #403 by making digest/header parsing resilient to unusually large email headers (eg from Exchange), preventing valid EML files from being rejected with hdr_len:0.
Changes:
- Add a new
max_header_sizeconfig option (default 131072) and document it in the sample config. - Update
make_digests()to use a heap-allocated buffer sized bymax_header_size, and propagate digest calculation failures to callers. - Update several call sites to handle/report digest calculation errors.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/digest.c | Switches digest read buffer to configurable heap allocation and adds error returns/logging. |
| src/cfg.h | Adds max_header_size to the config struct. |
| src/cfg.c | Adds parsing/default for max_header_size. |
| etc/example.conf | Documents max_header_size and provides a default value. |
| src/import.c | Handles make_digests() failure before rejecting on hdr_len. |
| src/piler.c | Aborts processing when digest calculation fails. |
| src/pilerexport.c | Reports digest calculation errors during export; changes default index list. |
| src/test.c | Adds basic error reporting for digest calculation failures. |
Comments suppressed due to low confidence (1)
src/pilerexport.c:407
- If make_digests() fails, the code still performs the digest/bodydigest comparisons and reports a generic “verification FAILED”. This conflates “could not compute digests” with “digests mismatch”. Consider short-circuiting on the error (eg mark verification_status and print a distinct message) to avoid misleading results.
// I want to export no matter what, so an error won't block
if(make_digests(sdata, cfg)) {
printf("digest calculation error: %s\n", filename);
}
if(strcmp(digest, sdata->digest) == 0 && strcmp(bodydigest, sdata->bodydigest) == 0){
printf("exported: %10llu\r", n); fflush(stdout);
}
else {
printf("verification FAILED. %s\n", filename);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| unsigned char *buf = calloc(1, cfg->max_header_size); | ||
| if(!buf) { | ||
| syslog(LOG_PRIORITY, "ERROR: calloc() %s:%d", __func__, __LINE__); | ||
| EVP_MD_CTX_free(ctx); | ||
| EVP_MD_CTX_free(ctx2); | ||
| return 1; | ||
| } | ||
|
|
||
| while((n = read(fd, buf, sizeof(buf))) > 0){ | ||
| while((n = read(fd, buf, cfg->max_header_size)) > 0){ | ||
| EVP_DigestUpdate(ctx2, buf, n); |
There was a problem hiding this comment.
search_header_end() iterates until it hits a NUL byte, but buf is allocated as exactly max_header_size and read() can fill it completely. When n == cfg->max_header_size, there may be no terminating NUL, so search_header_end() can read past the buffer. Ensure the buffer is NUL-terminated for the first chunk (eg allocate max_header_size+1 and set buf[n]=0, or switch search_header_end() to a length-bounded scan).
| unsigned char *buf = calloc(1, cfg->max_header_size); | ||
| if(!buf) { | ||
| syslog(LOG_PRIORITY, "ERROR: calloc() %s:%d", __func__, __LINE__); | ||
| EVP_MD_CTX_free(ctx); | ||
| EVP_MD_CTX_free(ctx2); | ||
| return 1; | ||
| } | ||
|
|
||
| while((n = read(fd, buf, sizeof(buf))) > 0){ | ||
| while((n = read(fd, buf, cfg->max_header_size)) > 0){ |
There was a problem hiding this comment.
cfg->max_header_size is parsed as an int and used as a size for calloc/read. A zero/negative value can lead to unexpected behavior (calloc(…,0), huge size_t conversion, etc.). Validate it before use (eg enforce a sane minimum/maximum and return an error if out of range).
| unsigned char *buf = calloc(1, cfg->max_header_size); | ||
| if(!buf) { | ||
| syslog(LOG_PRIORITY, "ERROR: calloc() %s:%d", __func__, __LINE__); | ||
| EVP_MD_CTX_free(ctx); | ||
| EVP_MD_CTX_free(ctx2); | ||
| return 1; | ||
| } |
There was a problem hiding this comment.
If calloc() fails after open() succeeds, the function returns without closing fd, leaking a file descriptor. Close(fd) on this error path (and consider a common cleanup label to handle all early returns).
| if(fd == -1) { | ||
| syslog(LOG_PRIORITY, "ERROR: open() %s:%d", __func__, __LINE__); | ||
| EVP_MD_CTX_free(ctx); | ||
| EVP_MD_CTX_free(ctx2); | ||
| return 1; | ||
| } | ||
|
|
||
| memset(buf, 0, sizeof(buf)); | ||
| unsigned char *buf = calloc(1, cfg->max_header_size); | ||
| if(!buf) { | ||
| syslog(LOG_PRIORITY, "ERROR: calloc() %s:%d", __func__, __LINE__); | ||
| EVP_MD_CTX_free(ctx); | ||
| EVP_MD_CTX_free(ctx2); | ||
| return 1; |
There was a problem hiding this comment.
The syslog messages for open()/calloc() failures don’t include the filename or strerror(errno), which makes diagnosing import failures harder. Log sdata->filename and the OS error string/value.
| int verbosity = 0; | ||
| int max_matches = 1000; | ||
| char *index_list = "main1,dailydelta1,delta1"; | ||
| char *index_list = "piler1"; |
There was a problem hiding this comment.
Changing the default Sphinx index list to "piler1" can break setups using the non-RT indices (main1,dailydelta1,delta1). Consider selecting the default after reading config based on cfg->rtindex (eg cfg->sphxdb when rtindex=1, otherwise the legacy list), similar to src/stats.c:123-126 where RT vs non-RT index names are handled.
| char *index_list = "piler1"; | |
| char *index_list = "main1,dailydelta1,delta1"; |
| unsigned char *buf = calloc(1, cfg->max_header_size); | ||
| if(!buf) { | ||
| syslog(LOG_PRIORITY, "ERROR: calloc() %s:%d", __func__, __LINE__); | ||
| EVP_MD_CTX_free(ctx); | ||
| EVP_MD_CTX_free(ctx2); | ||
| return 1; | ||
| } | ||
|
|
||
| while((n = read(fd, buf, sizeof(buf))) > 0){ | ||
| while((n = read(fd, buf, cfg->max_header_size)) > 0){ |
There was a problem hiding this comment.
This change is meant to fix hdr_len=0 for unusually large headers, but there’s no unit test covering a header larger than the previous fixed BIGBUFSIZE. Add a digest unit test case (eg in unit_tests/check_digest.c) with an EML having a very large header and cfg->max_header_size set accordingly, asserting hdr_len>0 and stable digests.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Added sanity check for max_header_size value.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| unsigned char *buf = calloc(1, cfg->max_header_size); | ||
| if(!buf) { |
There was a problem hiding this comment.
This change introduces a new behavior/config knob (max_header_size) intended to fix hdr_len=0 for very large headers, but there’s no regression test covering a message with headers larger than the previous fixed buffer. Since unit_tests/check_digest.c already covers make_digests(), please add a large-header fixture and test case validating hdr_len>0 and expected digests when max_header_size is increased.
| unsigned char *buf = calloc(1, cfg->max_header_size); | ||
| if(!buf) { | ||
| syslog(LOG_PRIORITY, "ERROR: calloc() %s:%d", __func__, __LINE__); |
There was a problem hiding this comment.
search_header_end() and the strcasestr() call inside it require a NUL-terminated buffer, but buf is allocated at exactly cfg->max_header_size and read() may fill it completely (n == max_header_size), leaving no terminator and causing out-of-bounds reads/undefined behavior. Allocate one extra byte and ensure buf[n] = '\0' before calling search_header_end (or rewrite search_header_end to iterate by length only).
| while((n = read(fd, buf, cfg->max_header_size)) > 0){ | ||
| EVP_DigestUpdate(ctx2, buf, n); |
There was a problem hiding this comment.
If read() fails (returns -1), the loop exits and the function currently proceeds to finalize and return success using partial data. Detect read errors (eg, check errno / n < 0 after the loop) and return an error after logging, while still freeing buf and closing fd.
| if(fd == -1) { | ||
| syslog(LOG_PRIORITY, "ERROR: open() %s:%d", __func__, __LINE__); | ||
| EVP_MD_CTX_free(ctx); | ||
| EVP_MD_CTX_free(ctx2); | ||
| return 1; |
There was a problem hiding this comment.
The open() failure log doesn’t include the filename or strerror(errno), which makes troubleshooting difficult. Include sdata->filename and strerror(errno) (and ideally errno) in this syslog message.
| // Some basic sanity checks | ||
| if(cfg.max_header_size < 1000) { | ||
| __fatal("invalid max_header_size value"); | ||
| } |
There was a problem hiding this comment.
max_header_size directly controls a per-message allocation in make_digests(); validating only a minimum value allows configurations that can cause excessive memory use or integer edge cases. Consider adding an upper bound (eg <= max_message_size and/or a reasonable hard cap) and validating it’s positive before use.
Fixes #403