feat(block): async epoch fetching #2952

julienrbrt · 2026-01-05T16:25:04Z

github-actions · 2026-01-05T16:25:19Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Jan 5, 2026, 4:25 PM

claude · 2026-01-05T16:25:22Z

Claude finished @julienrbrt's task in 2m 29s —— View job

Code Review Complete

I've completed a comprehensive review of the async epoch fetching feature. This is a well-architected addition that introduces background prefetching of DA epoch data to improve performance at epoch boundaries. Below are my findings organized by category.

🟢 Strengths

Architecture & Design

Clean separation of concerns: The AsyncEpochFetcher is well-isolated with clear responsibilities
Proper use of interfaces: Integration through existing Client interface is clean
Fallback mechanism: Graceful degradation when cache misses occur (falls back to sync fetch)
Zero-dependency integration: Follows the project's pattern of modular, loosely-coupled components

Concurrency

Well-structured lifecycle management: Clean Start()/Stop() pattern with WaitGroup
Proper context cancellation: Background goroutine respects context cancellation
Appropriate use of atomics: atomic.Uint64 for height tracking and atomic.Pointer for state
Mutex usage: RWMutex correctly used for cache reads vs writes

Testing

Comprehensive unit tests: Good coverage of edge cases (cache miss, future heights, serialization)
Mocking strategy: Clean mock implementations for testing
Table-driven tests: Good use of table-driven tests in forced_inclusion_retriever_test.go

🟡 Issues to Address

1. Critical: Goroutine Leak Risk in Results Iterator

Location: async_epoch_fetcher.go:367

results, err := f.cache.Query(f.ctx, query)
if err != nil {
    f.logger.Warn().Err(err).Msg("failed to query cache for cleanup")
    return
}
defer results.Close()

for result := range results.Next() {  // ⚠️ Potential leak if context cancelled

Issue: If f.ctx is cancelled while iterating, the for range results.Next() may not complete, and the deferred results.Close() won't be called immediately. The iterator channel may be blocked.

Recommendation: Add explicit context check or use errgroup with cancellation:

for {
    select {
    case <-f.ctx.Done():
        return
    case result, ok := <-results.Next():
        if !ok {
            break
        }
        // process result
    }
}

2. Race Condition: Context Created in Constructor

Location: async_epoch_fetcher.go:62

ctx, cancel := context.WithCancel(context.Background())

Issue: The context is created from background in the constructor, independent of any parent context. If the parent service is shut down before Start() is called, or if there's a startup failure, this goroutine may not respect the parent's lifecycle.

Recommendation: Accept a parent context.Context in NewAsyncEpochFetcher() or Start() to properly integrate with parent lifecycle:

func Start(ctx context.Context) {
    f.ctx, f.cancel = context.WithCancel(ctx)
    // ...
}

3. Missing AsyncFetcher Cleanup on Sequencer Errors

Location: pkg/sequencers/single/sequencer.go:95-109, 206-218

Issue: AsyncEpochFetcher.Start() is called, but if subsequent initialization fails (e.g., checkpoint save fails), the background goroutine is never stopped.

Recommendation: Add cleanup on error paths:

s.asyncFetcher = block.NewAsyncEpochFetcher(...)
s.asyncFetcher.Start()

s.fiRetriever = block.NewForcedInclusionRetriever(...)
if err := someOperation(); err != nil {
    s.asyncFetcher.Stop()  // ← Add this
    return nil, err
}

Better yet, add a cleanup method or defer pattern.

4. Buffer Overflow Risk in Serialization

Location: async_epoch_fetcher.go:398-429

Issue: The serialization calculates size upfront but doesn't validate buffer bounds during writes. If size calculation is wrong (e.g., integer overflow), buffer overflow could occur.

Recommendation: Use encoding/binary package which has built-in bounds checking, or add explicit checks:

if offset+8 > len(buf) {
    return nil, errors.New("buffer overflow during serialization")
}

Alternatively, use a bytes.Buffer which grows dynamically and is safer.

5. Deserialization: Missing Input Validation

Location: async_epoch_fetcher.go:435-478

Issue: While there's a minimum length check (len(data) < 28), there's no validation on:

Maximum reasonable transaction count (could cause OOM if corrupted data has txCount = 0xFFFFFFFF)
Maximum transaction size
Total deserialized size

Recommendation: Add safety limits:

const (
    maxTxCount = 100_000  // Reasonable upper bound
    maxTxSize  = 2 * 1024 * 1024  // 2MB per tx
)

if txCount > maxTxCount {
    return nil, fmt.Errorf("tx count %d exceeds maximum %d", txCount, maxTxCount)
}

6. Potential Memory Growth: No Cache Size Limit

Location: async_epoch_fetcher.go:349-393

Issue: While old epochs are cleaned up, there's no maximum cache size limit. If epochs are very large or the prefetch window is increased, memory could grow unbounded during the grace period.

Recommendation: Add a maximum cache size check and evict oldest entries if exceeded:

type AsyncEpochFetcher struct {
    // ...
    maxCacheSize int64  // bytes
    currentCacheSize atomic.Int64
}

7. Error Handling: Silent Failure in Prefetch

Location: async_epoch_fetcher.go:239-244, 276-283

Issue: When prefetch fails (e.g., StatusHeightFromFuture or processing errors), errors are logged but the cache entry is not created. The next call will hit a cache miss and fall back to sync fetch, but there's no exponential backoff or retry strategy for transient errors.

Recommendation: Consider implementing:

Exponential backoff for StatusHeightFromFuture (instead of retrying every poll interval)
Distinguish between permanent errors (don't retry) vs transient (retry with backoff)
Metrics to track prefetch success/failure rates

8. Code Duplication: `processForcedInclusionBlobs`

Location: async_epoch_fetcher.go:316-347 and forced_inclusion_retriever.go:170-201

Issue: This exact method is duplicated in both files. Violates DRY principle and increases maintenance burden.

Recommendation: Extract to a shared helper function or make one call the other.

9. Documentation: Missing AsyncFetcher Nil Check Justification

Location: pkg/sequencers/single/sequencer.go:206

if c.asyncFetcher == nil {

Issue: It's unclear when asyncFetcher could be nil at this point in the code flow. The constructor always creates it (lines 96, 124), so this seems defensive but undocumented.

Recommendation: Either:

Document why this can be nil (race condition? specific initialization path?)
Remove the check if it's genuinely impossible
Make it a panic if it's a programming error

10. Performance: Serialization Uses Manual Byte Operations

Location: async_epoch_fetcher.go:480-514

Issue: Manual byte shifting for encoding is error-prone and harder to maintain than standard library approaches.

Recommendation: Use encoding/binary for consistency and safety:

import "encoding/binary"

func writeUint64(buf []byte, val uint64) {
    binary.LittleEndian.PutUint64(buf, val)
}

func readUint64(buf []byte) uint64 {
    return binary.LittleEndian.Uint64(buf)
}

This is what the standard library is for - it's tested, safe, and more readable.

🔵 Suggestions for Improvement

11. Consider Using Protocol Buffers for Serialization

Current: Custom binary format

Suggestion: Since the project already uses protobuf (see types/pb/), consider using it for cache serialization:

Self-describing format (easier to evolve)
Built-in backwards compatibility
Less error-prone than manual binary encoding
Better tooling support

12. Add Metrics for Observability

Missing: No metrics for async fetcher performance

Suggestion: Add metrics for:

Cache hit/miss rate
Prefetch success/failure
Average prefetch latency
Cache memory usage
Cleanup operations

Example:

metrics.CacheHits.Inc()
metrics.PrefetchLatency.Observe(duration.Seconds())

13. Configuration: Hard-coded Poll Interval and Prefetch Window

Location: Multiple call sites with 3*time.Second, 1*time.Second, 1 or 2 epochs

Suggestion: Make these configurable via genesis or node config:

type AsyncEpochConfig struct {
    PrefetchWindow uint64        `json:"prefetch_window"`
    PollInterval   time.Duration `json:"poll_interval"`
}

14. Testing: Add Integration Tests

Missing: No integration test showing full flow with real cache, concurrent access, and lifecycle

Suggestion: Add test that:

Starts fetcher
Simulates concurrent GetCachedEpoch calls
Advances DA height
Verifies prefetching happens correctly
Stops fetcher and verifies clean shutdown

15. Logging: Too Many Debug Logs in Hot Path

Location: Multiple debug logs per operation

Issue: In production with high block rate, debug logs in GetCachedEpoch (line 159) and prefetchEpochs (line 227) could impact performance.

Suggestion:

Use sampling for hot-path debug logs
Consider using trace-level for very frequent operations
Add a "quiet mode" configuration

🟢 Security Assessment

✅ No critical security issues found

Input validation: Generally good, but deserialization needs bounds (see Mempool #5)
Resource limits: Needs cache size limits (see Fraud proof generation #6)
Credential handling: No credentials exposed in logs ✓
Context propagation: Proper use of context.Context ✓
Cancellation: Respects cancellation signals ✓

🟢 Performance Assessment

Overall: Well-designed for performance

Strengths:

Background prefetching reduces latency at epoch boundaries
Read locks for cache access minimize contention
In-memory cache avoids disk I/O

Concerns:

Unbounded cache growth during high-volume epochs (see Fraud proof generation #6)
No backoff strategy may cause unnecessary DA load (see Rollback of invalidated transactions #7)
Manual serialization is slower than protobuf but probably acceptable

📊 Test Coverage Assessment

Coverage: Good unit test coverage for core functionality

Well-tested:

✅ Cache hit/miss scenarios
✅ Serialization/deserialization
✅ Epoch boundary conditions
✅ Error cases (no namespace, future heights)
✅ Multi-height epochs

Needs more tests:

⚠️ Concurrent access patterns (multiple goroutines calling GetCachedEpoch)
⚠️ Lifecycle edge cases (stop before start, multiple stops)
⚠️ Memory cleanup validation
⚠️ Context cancellation during iteration
⚠️ Integration with sequencer lifecycle

📝 Documentation Assessment

Code-level documentation: Good

Functions have clear doc comments ✓
Complex logic is explained ✓

Missing:

Architecture decision record (ADR) for why async prefetch vs alternatives
Performance characteristics (expected cache hit rate, memory usage)
Operational guide (monitoring, tuning parameters)
Migration guide (behavior change for existing deployments)

Summary & Recommendations

Priority Fixes (Should Address Before Merge)

Fix goroutine leak risk in cleanupOldEpochs (ABCI interface #1)
Fix context lifecycle in constructor (Peer discovery #2)
Add cleanup on error paths in sequencer (Transaction gossiping #3)
Add deserialization bounds checks (Mempool #5)
Deduplicate processForcedInclusionBlobs (Data Availability layer API #8)

Recommended Improvements (Consider for Follow-up)

Add cache size limits (Fraud proof generation #6)
Implement retry/backoff strategy (Rollback of invalidated transactions #7)
Use encoding/binary for safer serialization (Define Fraud Proof data type #10)
Add observability metrics (Initial project setup #12)
Make configuration tunable (Drop-in replacement of Tendermint Node #13)
Add integration tests (P2P bootstrapping #14)

Overall Assessment

This is a high-quality implementation that follows Go best practices and integrates well with the existing architecture. The async prefetching design is sound and should deliver the expected performance improvements. With the priority fixes addressed, this PR will be production-ready.

Recommendation: Approve with changes requested for priority fixes #1-5.

gemini-code-assist · 2026-01-05T16:25:27Z

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements an asynchronous epoch fetching mechanism to optimize the retrieval of forced inclusion transactions from the Data Availability (DA) layer. By introducing an AsyncEpochFetcher that prefetches and caches upcoming epoch data in the background, the system can significantly reduce the latency associated with fetching this data, particularly at epoch transitions. This enhancement improves the overall performance and responsiveness of sequencers and the block syncing process by providing immediate access to frequently needed DA epoch information.

Highlights

Introduced AsyncEpochFetcher: A new component for background prefetching and caching of Data Availability (DA) epoch data, designed to reduce latency at epoch boundaries.
Integrated with ForcedInclusionRetriever: The existing ForcedInclusionRetriever now utilizes the AsyncEpochFetcher to first check for cached epoch data before performing a synchronous fetch.
Deployment across Sequencers and Syncer: The AsyncEpochFetcher is instantiated and started in the createSequencer functions for EVM, gRPC, and TestApp, as well as within the Syncer component, ensuring widespread performance benefits.
Serialization and Deserialization: The AsyncEpochFetcher includes custom serialization and deserialization logic for ForcedInclusionEvent to efficiently store and retrieve cached epoch data.
Comprehensive Testing: New unit tests have been added for the AsyncEpochFetcher to ensure its correct functionality, including caching, background operations, and graceful shutdown.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an AsyncEpochFetcher component designed to prefetch Data Availability (DA) epoch data in the background, thereby improving performance for sequencers and the block syncer. The new fetcher includes an in-memory cache for storing prefetched epochs, manages background fetching, tracks the current DA height, and cleans up old cached data. The ForcedInclusionRetriever has been updated to leverage this AsyncEpochFetcher, prioritizing cached data before making direct DA client calls. The AsyncEpochFetcher is integrated into the evm, grpc, and testapp applications, the block syncer, and the single sequencer, with corresponding unit tests added and existing tests updated. Review comments indicate that the mu mutex in AsyncEpochFetcher is redundant due to dsync.MutexWrap already providing thread-safety, leading to double-locking, and suggest replacing the fragile fmt.Sscanf for parsing datastore keys with key.Name() and strconv.ParseUint for robustness.

gemini-code-assist · 2026-01-05T16:27:51Z

block/internal/da/async_epoch_fetcher.go

+
+	// In-memory cache for prefetched epoch data
+	cache ds.Batching
+	mu    sync.RWMutex


The cache field is already wrapped with dsync.MutexWrap, which makes it thread-safe. The additional mu field is redundant and leads to double-locking. Please remove this mutex and all its usages (f.mu.Lock/Unlock/RLock/RUnlock) throughout the file.

gemini-code-assist · 2026-01-05T16:27:51Z

block/internal/da/async_epoch_fetcher.go

+		var epochEnd uint64
+		_, err := fmt.Sscanf(key.String(), "/epoch/%d", &epochEnd)
+		if err != nil {
+			continue
+		}


Using fmt.Sscanf to parse the key string is fragile. A more robust and idiomatic way to handle datastore keys is to use key.Name() to get the last component of the key and then parse it using strconv.ParseUint. You will need to import the strconv package.

Suggested change

var epochEnd uint64

_, err := fmt.Sscanf(key.String(), "/epoch/%d", &epochEnd)

if err != nil {

continue

}

var epochEnd uint64

epochEndStr := key.Name()

epochEnd, err := strconv.ParseUint(epochEndStr, 10, 64)

if err != nil {

continue

}

codecov · 2026-01-05T16:28:54Z

Codecov Report

❌ Patch coverage is 71.26100% with 98 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.28%. Comparing base (66b491b) to head (229da68).

Files with missing lines	Patch %	Lines
block/internal/da/async_epoch_fetcher.go	78.49%	43 Missing and 17 partials ⚠️
block/internal/da/forced_inclusion_retriever.go	18.75%	11 Missing and 2 partials ⚠️
pkg/sequencers/single/sequencer.go	63.63%	12 Missing ⚠️
block/internal/syncing/syncer.go	0.00%	10 Missing ⚠️
block/public.go	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2952      +/-   ##
==========================================
+ Coverage   58.73%   59.28%   +0.54%     
==========================================
  Files          90       91       +1     
  Lines        8720     9055     +335     
==========================================
+ Hits         5122     5368     +246     
- Misses       3011     3082      +71     
- Partials      587      605      +18

Flag	Coverage Δ
combined	`59.28% <71.26%> (+0.54%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

feat(block): async epoch fetching

229da68

github-actions bot assigned julienrbrt Jan 5, 2026

julienrbrt mentioned this pull request Jan 5, 2026

Force inclusion follow ups #2906

Open

13 tasks

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(block): async epoch fetching #2952

feat(block): async epoch fetching #2952

Uh oh!

julienrbrt commented Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

claude bot commented Jan 5, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Uh oh!

gemini-code-assist bot Jan 5, 2026

Uh oh!

codecov bot commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(block): async epoch fetching #2952

Are you sure you want to change the base?

feat(block): async epoch fetching #2952

Uh oh!

Conversation

julienrbrt commented Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

claude bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

🟢 Strengths

Architecture & Design

Concurrency

Testing

🟡 Issues to Address

1. Critical: Goroutine Leak Risk in Results Iterator

2. Race Condition: Context Created in Constructor

3. Missing AsyncFetcher Cleanup on Sequencer Errors

4. Buffer Overflow Risk in Serialization

5. Deserialization: Missing Input Validation

6. Potential Memory Growth: No Cache Size Limit

7. Error Handling: Silent Failure in Prefetch

8. Code Duplication: processForcedInclusionBlobs

9. Documentation: Missing AsyncFetcher Nil Check Justification

10. Performance: Serialization Uses Manual Byte Operations

🔵 Suggestions for Improvement

11. Consider Using Protocol Buffers for Serialization

12. Add Metrics for Observability

13. Configuration: Hard-coded Poll Interval and Prefetch Window

14. Testing: Add Integration Tests

15. Logging: Too Many Debug Logs in Hot Path

🟢 Security Assessment

🟢 Performance Assessment

📊 Test Coverage Assessment

📝 Documentation Assessment

Summary & Recommendations

Priority Fixes (Should Address Before Merge)

Recommended Improvements (Consider for Follow-up)

Overall Assessment

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 5, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Jan 5, 2026 •

edited

Loading

8. Code Duplication: `processForcedInclusionBlobs`