feat: add checksum verification library (SHA-256, SHA-1, MD5)#324
feat: add checksum verification library (SHA-256, SHA-1, MD5)#324mvanhorn wants to merge 2 commits intoSurgeDM:mainfrom
Conversation
Add VerifyChecksum() to compute file hashes and compare against expected values, and ParseDigestHeader() to extract checksums from HTTP Digest response headers (RFC 3230). Supports hex and base64 encoded hashes. This is the core library for download integrity verification. Wiring into the download lifecycle and CLI flags will follow in a separate PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| func TestParseDigestHeader_SHA256Base64(t *testing.T) { | ||
| // sha256 of empty string in base64 | ||
| algo, hash := ParseDigestHeader("sha-256=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") | ||
| assert.Equal(t, "sha256", algo) | ||
| assert.NotEmpty(t, hash) | ||
| } |
There was a problem hiding this comment.
Weak assertion — exact hash value should be verified
assert.NotEmpty(t, hash) passes even if the base64 decode produced an entirely wrong result, as long as it returned something. The SHA-256 of an empty string is well-known; assert the exact expected hex value to ensure the decoding round-trip is correct:
| func TestParseDigestHeader_SHA256Base64(t *testing.T) { | |
| // sha256 of empty string in base64 | |
| algo, hash := ParseDigestHeader("sha-256=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") | |
| assert.Equal(t, "sha256", algo) | |
| assert.NotEmpty(t, hash) | |
| } | |
| func TestParseDigestHeader_SHA256Base64(t *testing.T) { | |
| // sha256 of empty string in base64; asserts the full decode round-trip is correct | |
| algo, hash := ParseDigestHeader("sha-256=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") | |
| assert.Equal(t, "sha256", algo) | |
| assert.Equal(t, "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", hash) | |
| } |
Rule Used: What: All code changes must include tests for edge... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/processing/checksum_test.go
Line: 52-57
Comment:
**Weak assertion — exact hash value should be verified**
`assert.NotEmpty(t, hash)` passes even if the base64 decode produced an entirely wrong result, as long as it returned *something*. The SHA-256 of an empty string is well-known; assert the exact expected hex value to ensure the decoding round-trip is correct:
```suggestion
func TestParseDigestHeader_SHA256Base64(t *testing.T) {
// sha256 of empty string in base64; asserts the full decode round-trip is correct
algo, hash := ParseDigestHeader("sha-256=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
assert.Equal(t, "sha256", algo)
assert.Equal(t, "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", hash)
}
```
**Rule Used:** What: All code changes must include tests for edge... ([source](https://app.greptile.com/review/custom-context?memory=2b22782d-3452-4d55-b059-e631b2540ce8))
How can I resolve this? If you propose a fix, please make it concise.| func TestVerifyChecksum_SHA256(t *testing.T) { | ||
| // Create a temp file with known content | ||
| dir := t.TempDir() | ||
| path := filepath.Join(dir, "test.bin") | ||
| content := []byte("hello surge") | ||
| require.NoError(t, os.WriteFile(path, content, 0o644)) | ||
|
|
||
| // Compute expected hash | ||
| h := sha256.Sum256(content) | ||
| expected := hex.EncodeToString(h[:]) | ||
|
|
||
| result, err := VerifyChecksum(path, "sha256", expected) | ||
| require.NoError(t, err) | ||
| assert.True(t, result.Match) | ||
| assert.Equal(t, expected, result.Actual) | ||
| } | ||
|
|
||
| func TestVerifyChecksum_Mismatch(t *testing.T) { | ||
| dir := t.TempDir() | ||
| path := filepath.Join(dir, "test.bin") | ||
| require.NoError(t, os.WriteFile(path, []byte("hello"), 0o644)) | ||
|
|
||
| result, err := VerifyChecksum(path, "sha256", "0000000000000000000000000000000000000000000000000000000000000000") | ||
| require.NoError(t, err) | ||
| assert.False(t, result.Match) | ||
| } | ||
|
|
||
| func TestVerifyChecksum_UnsupportedAlgorithm(t *testing.T) { | ||
| _, err := VerifyChecksum("/tmp/test", "sha512", "abc") | ||
| assert.Error(t, err) | ||
| assert.Contains(t, err.Error(), "unsupported") | ||
| } | ||
|
|
||
| func TestVerifyChecksum_EmptyArgs(t *testing.T) { | ||
| _, err := VerifyChecksum("", "sha256", "abc") | ||
| assert.Error(t, err) | ||
| } |
There was a problem hiding this comment.
Missing happy-path tests for MD5, SHA-1, and alias forms
VerifyChecksum supports md5, sha1/sha-1, and sha256/sha-256, but only sha256 has a happy-path test. The MD5 and SHA-1 code paths — and the hyphenated alias forms (sha-1, sha-256) accepted in the algorithm switch — are never exercised. If a future refactor breaks those branches the test suite would not catch it.
Add at least one happy-path test per supported algorithm and exercise the alias forms, for example:
func TestVerifyChecksum_MD5(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "test.bin")
content := []byte("hello surge")
require.NoError(t, os.WriteFile(path, content, 0o644))
h := md5.Sum(content)
expected := hex.EncodeToString(h[:])
result, err := VerifyChecksum(path, "md5", expected)
require.NoError(t, err)
assert.True(t, result.Match)
}Rule Used: What: All code changes must include tests for edge... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/processing/checksum_test.go
Line: 14-50
Comment:
**Missing happy-path tests for MD5, SHA-1, and alias forms**
`VerifyChecksum` supports `md5`, `sha1`/`sha-1`, and `sha256`/`sha-256`, but only `sha256` has a happy-path test. The MD5 and SHA-1 code paths — and the hyphenated alias forms (`sha-1`, `sha-256`) accepted in the algorithm switch — are never exercised. If a future refactor breaks those branches the test suite would not catch it.
Add at least one happy-path test per supported algorithm and exercise the alias forms, for example:
```go
func TestVerifyChecksum_MD5(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "test.bin")
content := []byte("hello surge")
require.NoError(t, os.WriteFile(path, content, 0o644))
h := md5.Sum(content)
expected := hex.EncodeToString(h[:])
result, err := VerifyChecksum(path, "md5", expected)
require.NoError(t, err)
assert.True(t, result.Match)
}
```
**Rule Used:** What: All code changes must include tests for edge... ([source](https://app.greptile.com/review/custom-context?memory=2b22782d-3452-4d55-b059-e631b2540ce8))
How can I resolve this? If you propose a fix, please make it concise.| case "sha-1": | ||
| algo = "sha1" | ||
| case "md5": | ||
| // already correct |
There was a problem hiding this comment.
"What" comment — explain why, not what
// already correct restates that the branch body is empty, which is obvious from reading it. A useful comment would explain why md5 needs no alias transformation while the other two algorithms do — e.g., because the canonical key used by VerifyChecksum is already "md5", whereas sha-1 and sha-256 must be stripped of their hyphens to match:
| // already correct | |
| case "md5": | |
| // "md5" is already the canonical key expected by VerifyChecksum; no alias transform needed unlike "sha-1"/"sha-256" |
Rule Used: What: Comments must explain why code exists, not... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/processing/checksum.go
Line: 84
Comment:
**"What" comment — explain why, not what**
`// already correct` restates that the branch body is empty, which is obvious from reading it. A useful comment would explain *why* `md5` needs no alias transformation while the other two algorithms do — e.g., because the canonical key used by `VerifyChecksum` is already `"md5"`, whereas `sha-1` and `sha-256` must be stripped of their hyphens to match:
```suggestion
case "md5":
// "md5" is already the canonical key expected by VerifyChecksum; no alias transform needed unlike "sha-1"/"sha-256"
```
**Rule Used:** What: Comments must explain *why* code exists, not... ([source](https://app.greptile.com/review/custom-context?memory=4e45ef13-32c2-4753-8060-a44ef767144d))
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| // VerifyChecksum computes the hash of a file and compares it to the expected value. | ||
| // algorithm should be one of: md5, sha1, sha256. | ||
| // expected should be a hex-encoded hash string. | ||
| func VerifyChecksum(filepath string, algorithm string, expected string) (*ChecksumResult, error) { |
There was a problem hiding this comment.
Parameter
filepath shadows the path/filepath stdlib package
Using filepath as a parameter name shadows the path/filepath standard library package. While that package isn't imported here today, it's a common Go pitfall — if it's ever added, the parameter silently wins. Rename to filePath or path per Go convention:
| func VerifyChecksum(filepath string, algorithm string, expected string) (*ChecksumResult, error) { | |
| func VerifyChecksum(filePath string, algorithm string, expected string) (*ChecksumResult, error) { |
Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/processing/checksum.go
Line: 27
Comment:
**Parameter `filepath` shadows the `path/filepath` stdlib package**
Using `filepath` as a parameter name shadows the `path/filepath` standard library package. While that package isn't imported here today, it's a common Go pitfall — if it's ever added, the parameter silently wins. Rename to `filePath` or `path` per Go convention:
```suggestion
func VerifyChecksum(filePath string, algorithm string, expected string) (*ChecksumResult, error) {
```
How can I resolve this? If you propose a fix, please make it concise.| return &ChecksumResult{ | ||
| Algorithm: algorithm, | ||
| Expected: expected, | ||
| Actual: actual, | ||
| Match: actual == expected, | ||
| }, nil |
There was a problem hiding this comment.
ChecksumResult.Algorithm stores unnormalized algorithm name
VerifyChecksum stores algorithm directly in the result without normalizing hyphenated aliases. When called with "sha-256" or "sha-1", result.Algorithm will hold "sha-256" / "sha-1". But ParseDigestHeader always normalizes to "sha256" / "sha1" before returning (lines 79-82).
Since these two functions are explicitly designed to be used together in the download lifecycle, any downstream code that compares or switches on result.Algorithm will silently receive different strings depending on which path was taken — VerifyChecksum(path, "sha-256", ...) produces Algorithm: "sha-256", while ParseDigestHeader("sha-256=...") → VerifyChecksum(path, "sha256", ...) produces Algorithm: "sha256".
Add normalization after the hash-function switch to keep result.Algorithm consistent with what ParseDigestHeader returns:
// Normalize hyphenated aliases so result.Algorithm is always consistent
// with what ParseDigestHeader returns ("sha1", "sha256", "md5").
switch algorithm {
case "sha-1":
algorithm = "sha1"
case "sha-256":
algorithm = "sha256"
}Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/processing/checksum.go
Line: 58-63
Comment:
**`ChecksumResult.Algorithm` stores unnormalized algorithm name**
`VerifyChecksum` stores `algorithm` directly in the result without normalizing hyphenated aliases. When called with `"sha-256"` or `"sha-1"`, `result.Algorithm` will hold `"sha-256"` / `"sha-1"`. But `ParseDigestHeader` always normalizes to `"sha256"` / `"sha1"` before returning (lines 79-82).
Since these two functions are explicitly designed to be used together in the download lifecycle, any downstream code that compares or switches on `result.Algorithm` will silently receive different strings depending on which path was taken — `VerifyChecksum(path, "sha-256", ...)` produces `Algorithm: "sha-256"`, while `ParseDigestHeader("sha-256=...")` → `VerifyChecksum(path, "sha256", ...)` produces `Algorithm: "sha256"`.
Add normalization after the hash-function switch to keep `result.Algorithm` consistent with what `ParseDigestHeader` returns:
```go
// Normalize hyphenated aliases so result.Algorithm is always consistent
// with what ParseDigestHeader returns ("sha1", "sha256", "md5").
switch algorithm {
case "sha-1":
algorithm = "sha1"
case "sha-256":
algorithm = "sha256"
}
```
How can I resolve this? If you propose a fix, please make it concise.|
Fixed the hex fallback - it now validates hash length before accepting, preventing wrong-length hashes from silently passing through. Also added support for unpadded base64 (RawStdEncoding/RawURLEncoding) since some servers omit padding in Digest headers. Pushed in 0ec5370. |
Summary
Add a checksum verification library that computes file hashes and compares them against expected values. Also parses HTTP Digest response headers (RFC 3230) for server-provided checksums.
Why this matters
aria2 has
--checksum=sha-256=HASH. wget verifies checksums. When downloading Linux ISOs or software releases, users expect integrity verification. Surge currently downloads files with no integrity check.Changes
VerifyChecksum(filepath, algorithm, expected)supporting MD5, SHA-1, SHA-256ParseDigestHeader(header)to extract checksums from RFC 3230 HTTP Digest headers (base64 and hex)internal/processing/checksum.go--checksumflag will follow in a separate PR.Testing
This contribution was developed with AI assistance (Codex + Claude Code).
Greptile Summary
This PR adds a new
internal/processing/checksumlibrary with two functions:VerifyChecksum(computes a file hash and compares it to an expected hex value) andParseDigestHeader(parses RFC 3230 HTTP Digest response headers and returns the algorithm and hex-encoded hash). The two functions are designed to be wired together in an upcoming download-lifecycle PR.Key findings:
VerifyChecksumstores the raw (lowercased but non-normalized) algorithm string inChecksumResult.Algorithm(e.g.,"sha-256"), whileParseDigestHeadernormalizes to"sha256". Since these functions are designed to be used in the same pipeline, any downstream switch or comparison onresult.Algorithmwill silently receive different strings depending on which path was taken.hex.DecodeString && len(value) == expectedHexLencondition already checked at lines 100-103, making them unreachable — they should be removed.ParseDigestHeader→VerifyChecksumend-to-end pipeline; this gap is what allowed the algorithm-normalization inconsistency to go undetected.Confidence Score: 4/5
Safe to merge after fixing the algorithm normalization inconsistency — it will cause silent mismatches in the upcoming wiring PR.
One clear P1 logic defect remains:
ChecksumResult.Algorithmreturns unnormalized hyphenated aliases (e.g., "sha-256") whileParseDigestHeaderreturns normalized forms ("sha256"), creating an inconsistency that will surface as a bug when the two functions are wired into the download lifecycle. All other findings are P2 (dead code, missing integration test).internal/processing/checksum.golines 58-63 (Algorithm normalization) andinternal/processing/checksum_test.go(integration test gap)Important Files Changed
ChecksumResult.Algorithmstores unnormalized hyphenated aliases inconsistently withParseDigestHeader's normalized output; unreachable hex-fallback block at lines 121-124ParseDigestHeaderandVerifyChecksum;TestParseDigestHeader_SHA256Base64usesassert.NotEmptyinstead of asserting the exact known hex valueSequence Diagram
sequenceDiagram participant C as Caller participant PDH as ParseDigestHeader participant VC as VerifyChecksum participant FS as os.File Note over C,PDH: Path A — algorithm from HTTP Digest header C->>PDH: ParseDigestHeader("sha-256=base64...") PDH-->>C: algo="sha256", hexHash="e3b0..." C->>VC: VerifyChecksum(path, "sha256", hexHash) VC->>FS: os.Open(path) FS-->>VC: file bytes VC->>VC: io.Copy → sha256.Sum VC-->>C: ChecksumResult{Algorithm:"sha256", Match:true} Note over C,VC: Path B — algorithm supplied directly by user C->>VC: VerifyChecksum(path, "sha-256", expected) VC->>FS: os.Open(path) FS-->>VC: file bytes VC->>VC: io.Copy → sha256.Sum VC-->>C: ChecksumResult{Algorithm:"sha-256", Match:bool} Note over C: ⚠️ Algorithm field differs: "sha256" (Path A) vs "sha-256" (Path B)Prompt To Fix All With AI
Reviews (2): Last reviewed commit: "fix(checksum): validate hash length in h..." | Re-trigger Greptile
Context used: