Skip to content

feat: add checksum verification library (SHA-256, SHA-1, MD5)#324

Open
mvanhorn wants to merge 2 commits intoSurgeDM:mainfrom
mvanhorn:feat/checksum-verification
Open

feat: add checksum verification library (SHA-256, SHA-1, MD5)#324
mvanhorn wants to merge 2 commits intoSurgeDM:mainfrom
mvanhorn:feat/checksum-verification

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

@mvanhorn mvanhorn commented Apr 5, 2026

Summary

Add a checksum verification library that computes file hashes and compares them against expected values. Also parses HTTP Digest response headers (RFC 3230) for server-provided checksums.

Why this matters

aria2 has --checksum=sha-256=HASH. wget verifies checksums. When downloading Linux ISOs or software releases, users expect integrity verification. Surge currently downloads files with no integrity check.

Changes

  • Add VerifyChecksum(filepath, algorithm, expected) supporting MD5, SHA-1, SHA-256
  • Add ParseDigestHeader(header) to extract checksums from RFC 3230 HTTP Digest headers (base64 and hex)
  • Both functions are in internal/processing/checksum.go
  • This is the core library. Wiring into the download lifecycle and CLI --checksum flag will follow in a separate PR.

Testing

Tests

go test ./... -count=1   # All 18 packages pass

This contribution was developed with AI assistance (Codex + Claude Code).

Greptile Summary

This PR adds a new internal/processing/checksum library with two functions: VerifyChecksum (computes a file hash and compares it to an expected hex value) and ParseDigestHeader (parses RFC 3230 HTTP Digest response headers and returns the algorithm and hex-encoded hash). The two functions are designed to be wired together in an upcoming download-lifecycle PR.

Key findings:

  • Algorithm field inconsistency (P1): VerifyChecksum stores the raw (lowercased but non-normalized) algorithm string in ChecksumResult.Algorithm (e.g., "sha-256"), while ParseDigestHeader normalizes to "sha256". Since these functions are designed to be used in the same pipeline, any downstream switch or comparison on result.Algorithm will silently receive different strings depending on which path was taken.
  • Dead hex-fallback block: Lines 121-124 repeat the exact same hex.DecodeString && len(value) == expectedHexLen condition already checked at lines 100-103, making them unreachable — they should be removed.
  • Missing integration test: No test exercises the ParseDigestHeaderVerifyChecksum end-to-end pipeline; this gap is what allowed the algorithm-normalization inconsistency to go undetected.

Confidence Score: 4/5

Safe to merge after fixing the algorithm normalization inconsistency — it will cause silent mismatches in the upcoming wiring PR.

One clear P1 logic defect remains: ChecksumResult.Algorithm returns unnormalized hyphenated aliases (e.g., "sha-256") while ParseDigestHeader returns normalized forms ("sha256"), creating an inconsistency that will surface as a bug when the two functions are wired into the download lifecycle. All other findings are P2 (dead code, missing integration test).

internal/processing/checksum.go lines 58-63 (Algorithm normalization) and internal/processing/checksum_test.go (integration test gap)

Important Files Changed

Filename Overview
internal/processing/checksum.go New checksum library; ChecksumResult.Algorithm stores unnormalized hyphenated aliases inconsistently with ParseDigestHeader's normalized output; unreachable hex-fallback block at lines 121-124
internal/processing/checksum_test.go Basic tests present; missing integration test between ParseDigestHeader and VerifyChecksum; TestParseDigestHeader_SHA256Base64 uses assert.NotEmpty instead of asserting the exact known hex value

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant PDH as ParseDigestHeader
    participant VC as VerifyChecksum
    participant FS as os.File

    Note over C,PDH: Path A — algorithm from HTTP Digest header
    C->>PDH: ParseDigestHeader("sha-256=base64...")
    PDH-->>C: algo="sha256", hexHash="e3b0..."
    C->>VC: VerifyChecksum(path, "sha256", hexHash)
    VC->>FS: os.Open(path)
    FS-->>VC: file bytes
    VC->>VC: io.Copy → sha256.Sum
    VC-->>C: ChecksumResult{Algorithm:"sha256", Match:true}

    Note over C,VC: Path B — algorithm supplied directly by user
    C->>VC: VerifyChecksum(path, "sha-256", expected)
    VC->>FS: os.Open(path)
    FS-->>VC: file bytes
    VC->>VC: io.Copy → sha256.Sum
    VC-->>C: ChecksumResult{Algorithm:"sha-256", Match:bool}

    Note over C: ⚠️ Algorithm field differs: "sha256" (Path A) vs "sha-256" (Path B)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: internal/processing/checksum.go
Line: 58-63

Comment:
**`ChecksumResult.Algorithm` stores unnormalized algorithm name**

`VerifyChecksum` stores `algorithm` directly in the result without normalizing hyphenated aliases. When called with `"sha-256"` or `"sha-1"`, `result.Algorithm` will hold `"sha-256"` / `"sha-1"`. But `ParseDigestHeader` always normalizes to `"sha256"` / `"sha1"` before returning (lines 79-82).

Since these two functions are explicitly designed to be used together in the download lifecycle, any downstream code that compares or switches on `result.Algorithm` will silently receive different strings depending on which path was taken — `VerifyChecksum(path, "sha-256", ...)` produces `Algorithm: "sha-256"`, while `ParseDigestHeader("sha-256=...")``VerifyChecksum(path, "sha256", ...)` produces `Algorithm: "sha256"`.

Add normalization after the hash-function switch to keep `result.Algorithm` consistent with what `ParseDigestHeader` returns:

```go
// Normalize hyphenated aliases so result.Algorithm is always consistent
// with what ParseDigestHeader returns ("sha1", "sha256", "md5").
switch algorithm {
case "sha-1":
    algorithm = "sha1"
case "sha-256":
    algorithm = "sha256"
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: internal/processing/checksum_test.go
Line: 52-75

Comment:
**Missing integration test for `ParseDigestHeader``VerifyChecksum` pipeline**

The PR description explicitly states these two functions are designed to be used together in the download lifecycle, but no test exercises the end-to-end flow: parse a `Digest` header, then feed the result directly into `VerifyChecksum`.

This gap is precisely what allowed the algorithm-normalization inconsistency to go undetected — a round-trip integration test would have immediately surfaced that `ParseDigestHeader` returns `"sha256"` while calling `VerifyChecksum` with `"sha-256"` stores `"sha-256"` in the result.

Consider adding:

```go
func TestParseDigestHeader_ThenVerifyChecksum(t *testing.T) {
    dir := t.TempDir()
    path := filepath.Join(dir, "test.bin")
    content := []byte("hello surge")
    require.NoError(t, os.WriteFile(path, content, 0o644))

    raw := sha256.Sum256(content)
    b64 := base64.StdEncoding.EncodeToString(raw[:])
    header := "sha-256=" + b64

    algo, hexHash := ParseDigestHeader(header)
    require.NotEmpty(t, algo)
    require.NotEmpty(t, hexHash)

    result, err := VerifyChecksum(path, algo, hexHash)
    require.NoError(t, err)
    assert.True(t, result.Match)
    assert.Equal(t, "sha256", result.Algorithm)
}
```

**Rule Used:** What: All code changes must include tests for edge... ([source](https://app.greptile.com/review/custom-context?memory=2b22782d-3452-4d55-b059-e631b2540ce8))

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "fix(checksum): validate hash length in h..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

Context used:

  • Rule used - What: All code changes must include tests for edge... (source)

Add VerifyChecksum() to compute file hashes and compare against expected
values, and ParseDigestHeader() to extract checksums from HTTP Digest
response headers (RFC 3230). Supports hex and base64 encoded hashes.

This is the core library for download integrity verification. Wiring
into the download lifecycle and CLI flags will follow in a separate PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +52 to +57
func TestParseDigestHeader_SHA256Base64(t *testing.T) {
// sha256 of empty string in base64
algo, hash := ParseDigestHeader("sha-256=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
assert.Equal(t, "sha256", algo)
assert.NotEmpty(t, hash)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Weak assertion — exact hash value should be verified

assert.NotEmpty(t, hash) passes even if the base64 decode produced an entirely wrong result, as long as it returned something. The SHA-256 of an empty string is well-known; assert the exact expected hex value to ensure the decoding round-trip is correct:

Suggested change
func TestParseDigestHeader_SHA256Base64(t *testing.T) {
// sha256 of empty string in base64
algo, hash := ParseDigestHeader("sha-256=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
assert.Equal(t, "sha256", algo)
assert.NotEmpty(t, hash)
}
func TestParseDigestHeader_SHA256Base64(t *testing.T) {
// sha256 of empty string in base64; asserts the full decode round-trip is correct
algo, hash := ParseDigestHeader("sha-256=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
assert.Equal(t, "sha256", algo)
assert.Equal(t, "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", hash)
}

Rule Used: What: All code changes must include tests for edge... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/processing/checksum_test.go
Line: 52-57

Comment:
**Weak assertion — exact hash value should be verified**

`assert.NotEmpty(t, hash)` passes even if the base64 decode produced an entirely wrong result, as long as it returned *something*. The SHA-256 of an empty string is well-known; assert the exact expected hex value to ensure the decoding round-trip is correct:

```suggestion
func TestParseDigestHeader_SHA256Base64(t *testing.T) {
	// sha256 of empty string in base64; asserts the full decode round-trip is correct
	algo, hash := ParseDigestHeader("sha-256=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
	assert.Equal(t, "sha256", algo)
	assert.Equal(t, "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", hash)
}
```

**Rule Used:** What: All code changes must include tests for edge... ([source](https://app.greptile.com/review/custom-context?memory=2b22782d-3452-4d55-b059-e631b2540ce8))

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +14 to +50
func TestVerifyChecksum_SHA256(t *testing.T) {
// Create a temp file with known content
dir := t.TempDir()
path := filepath.Join(dir, "test.bin")
content := []byte("hello surge")
require.NoError(t, os.WriteFile(path, content, 0o644))

// Compute expected hash
h := sha256.Sum256(content)
expected := hex.EncodeToString(h[:])

result, err := VerifyChecksum(path, "sha256", expected)
require.NoError(t, err)
assert.True(t, result.Match)
assert.Equal(t, expected, result.Actual)
}

func TestVerifyChecksum_Mismatch(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "test.bin")
require.NoError(t, os.WriteFile(path, []byte("hello"), 0o644))

result, err := VerifyChecksum(path, "sha256", "0000000000000000000000000000000000000000000000000000000000000000")
require.NoError(t, err)
assert.False(t, result.Match)
}

func TestVerifyChecksum_UnsupportedAlgorithm(t *testing.T) {
_, err := VerifyChecksum("/tmp/test", "sha512", "abc")
assert.Error(t, err)
assert.Contains(t, err.Error(), "unsupported")
}

func TestVerifyChecksum_EmptyArgs(t *testing.T) {
_, err := VerifyChecksum("", "sha256", "abc")
assert.Error(t, err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing happy-path tests for MD5, SHA-1, and alias forms

VerifyChecksum supports md5, sha1/sha-1, and sha256/sha-256, but only sha256 has a happy-path test. The MD5 and SHA-1 code paths — and the hyphenated alias forms (sha-1, sha-256) accepted in the algorithm switch — are never exercised. If a future refactor breaks those branches the test suite would not catch it.

Add at least one happy-path test per supported algorithm and exercise the alias forms, for example:

func TestVerifyChecksum_MD5(t *testing.T) {
    dir := t.TempDir()
    path := filepath.Join(dir, "test.bin")
    content := []byte("hello surge")
    require.NoError(t, os.WriteFile(path, content, 0o644))

    h := md5.Sum(content)
    expected := hex.EncodeToString(h[:])

    result, err := VerifyChecksum(path, "md5", expected)
    require.NoError(t, err)
    assert.True(t, result.Match)
}

Rule Used: What: All code changes must include tests for edge... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/processing/checksum_test.go
Line: 14-50

Comment:
**Missing happy-path tests for MD5, SHA-1, and alias forms**

`VerifyChecksum` supports `md5`, `sha1`/`sha-1`, and `sha256`/`sha-256`, but only `sha256` has a happy-path test. The MD5 and SHA-1 code paths — and the hyphenated alias forms (`sha-1`, `sha-256`) accepted in the algorithm switch — are never exercised. If a future refactor breaks those branches the test suite would not catch it.

Add at least one happy-path test per supported algorithm and exercise the alias forms, for example:

```go
func TestVerifyChecksum_MD5(t *testing.T) {
    dir := t.TempDir()
    path := filepath.Join(dir, "test.bin")
    content := []byte("hello surge")
    require.NoError(t, os.WriteFile(path, content, 0o644))

    h := md5.Sum(content)
    expected := hex.EncodeToString(h[:])

    result, err := VerifyChecksum(path, "md5", expected)
    require.NoError(t, err)
    assert.True(t, result.Match)
}
```

**Rule Used:** What: All code changes must include tests for edge... ([source](https://app.greptile.com/review/custom-context?memory=2b22782d-3452-4d55-b059-e631b2540ce8))

How can I resolve this? If you propose a fix, please make it concise.

case "sha-1":
algo = "sha1"
case "md5":
// already correct
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 "What" comment — explain why, not what

// already correct restates that the branch body is empty, which is obvious from reading it. A useful comment would explain why md5 needs no alias transformation while the other two algorithms do — e.g., because the canonical key used by VerifyChecksum is already "md5", whereas sha-1 and sha-256 must be stripped of their hyphens to match:

Suggested change
// already correct
case "md5":
// "md5" is already the canonical key expected by VerifyChecksum; no alias transform needed unlike "sha-1"/"sha-256"

Rule Used: What: Comments must explain why code exists, not... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/processing/checksum.go
Line: 84

Comment:
**"What" comment — explain why, not what**

`// already correct` restates that the branch body is empty, which is obvious from reading it. A useful comment would explain *why* `md5` needs no alias transformation while the other two algorithms do — e.g., because the canonical key used by `VerifyChecksum` is already `"md5"`, whereas `sha-1` and `sha-256` must be stripped of their hyphens to match:

```suggestion
	case "md5":
		// "md5" is already the canonical key expected by VerifyChecksum; no alias transform needed unlike "sha-1"/"sha-256"
```

**Rule Used:** What: Comments must explain *why* code exists, not... ([source](https://app.greptile.com/review/custom-context?memory=4e45ef13-32c2-4753-8060-a44ef767144d))

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

// VerifyChecksum computes the hash of a file and compares it to the expected value.
// algorithm should be one of: md5, sha1, sha256.
// expected should be a hex-encoded hash string.
func VerifyChecksum(filepath string, algorithm string, expected string) (*ChecksumResult, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Parameter filepath shadows the path/filepath stdlib package

Using filepath as a parameter name shadows the path/filepath standard library package. While that package isn't imported here today, it's a common Go pitfall — if it's ever added, the parameter silently wins. Rename to filePath or path per Go convention:

Suggested change
func VerifyChecksum(filepath string, algorithm string, expected string) (*ChecksumResult, error) {
func VerifyChecksum(filePath string, algorithm string, expected string) (*ChecksumResult, error) {
Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/processing/checksum.go
Line: 27

Comment:
**Parameter `filepath` shadows the `path/filepath` stdlib package**

Using `filepath` as a parameter name shadows the `path/filepath` standard library package. While that package isn't imported here today, it's a common Go pitfall — if it's ever added, the parameter silently wins. Rename to `filePath` or `path` per Go convention:

```suggestion
func VerifyChecksum(filePath string, algorithm string, expected string) (*ChecksumResult, error) {
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +58 to +63
return &ChecksumResult{
Algorithm: algorithm,
Expected: expected,
Actual: actual,
Match: actual == expected,
}, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ChecksumResult.Algorithm stores unnormalized algorithm name

VerifyChecksum stores algorithm directly in the result without normalizing hyphenated aliases. When called with "sha-256" or "sha-1", result.Algorithm will hold "sha-256" / "sha-1". But ParseDigestHeader always normalizes to "sha256" / "sha1" before returning (lines 79-82).

Since these two functions are explicitly designed to be used together in the download lifecycle, any downstream code that compares or switches on result.Algorithm will silently receive different strings depending on which path was taken — VerifyChecksum(path, "sha-256", ...) produces Algorithm: "sha-256", while ParseDigestHeader("sha-256=...")VerifyChecksum(path, "sha256", ...) produces Algorithm: "sha256".

Add normalization after the hash-function switch to keep result.Algorithm consistent with what ParseDigestHeader returns:

// Normalize hyphenated aliases so result.Algorithm is always consistent
// with what ParseDigestHeader returns ("sha1", "sha256", "md5").
switch algorithm {
case "sha-1":
    algorithm = "sha1"
case "sha-256":
    algorithm = "sha256"
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/processing/checksum.go
Line: 58-63

Comment:
**`ChecksumResult.Algorithm` stores unnormalized algorithm name**

`VerifyChecksum` stores `algorithm` directly in the result without normalizing hyphenated aliases. When called with `"sha-256"` or `"sha-1"`, `result.Algorithm` will hold `"sha-256"` / `"sha-1"`. But `ParseDigestHeader` always normalizes to `"sha256"` / `"sha1"` before returning (lines 79-82).

Since these two functions are explicitly designed to be used together in the download lifecycle, any downstream code that compares or switches on `result.Algorithm` will silently receive different strings depending on which path was taken — `VerifyChecksum(path, "sha-256", ...)` produces `Algorithm: "sha-256"`, while `ParseDigestHeader("sha-256=...")``VerifyChecksum(path, "sha256", ...)` produces `Algorithm: "sha256"`.

Add normalization after the hash-function switch to keep `result.Algorithm` consistent with what `ParseDigestHeader` returns:

```go
// Normalize hyphenated aliases so result.Algorithm is always consistent
// with what ParseDigestHeader returns ("sha1", "sha256", "md5").
switch algorithm {
case "sha-1":
    algorithm = "sha1"
case "sha-256":
    algorithm = "sha256"
}
```

How can I resolve this? If you propose a fix, please make it concise.

@mvanhorn
Copy link
Copy Markdown
Contributor Author

mvanhorn commented Apr 6, 2026

Fixed the hex fallback - it now validates hash length before accepting, preventing wrong-length hashes from silently passing through. Also added support for unpadded base64 (RawStdEncoding/RawURLEncoding) since some servers omit padding in Digest headers. Pushed in 0ec5370.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant