Skip to content

Conversation

@julienrbrt
Copy link
Member

ref: #2906

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 5, 2026, 4:25 PM

@claude
Copy link
Contributor

claude bot commented Jan 5, 2026

Claude finished @julienrbrt's task in 2m 29s —— View job


Code Review Complete

I've completed a comprehensive review of the async epoch fetching feature. This is a well-architected addition that introduces background prefetching of DA epoch data to improve performance at epoch boundaries. Below are my findings organized by category.


🟢 Strengths

Architecture & Design

  • Clean separation of concerns: The AsyncEpochFetcher is well-isolated with clear responsibilities
  • Proper use of interfaces: Integration through existing Client interface is clean
  • Fallback mechanism: Graceful degradation when cache misses occur (falls back to sync fetch)
  • Zero-dependency integration: Follows the project's pattern of modular, loosely-coupled components

Concurrency

  • Well-structured lifecycle management: Clean Start()/Stop() pattern with WaitGroup
  • Proper context cancellation: Background goroutine respects context cancellation
  • Appropriate use of atomics: atomic.Uint64 for height tracking and atomic.Pointer for state
  • Mutex usage: RWMutex correctly used for cache reads vs writes

Testing

  • Comprehensive unit tests: Good coverage of edge cases (cache miss, future heights, serialization)
  • Mocking strategy: Clean mock implementations for testing
  • Table-driven tests: Good use of table-driven tests in forced_inclusion_retriever_test.go

🟡 Issues to Address

1. Critical: Goroutine Leak Risk in Results Iterator

Location: async_epoch_fetcher.go:367

results, err := f.cache.Query(f.ctx, query)
if err != nil {
    f.logger.Warn().Err(err).Msg("failed to query cache for cleanup")
    return
}
defer results.Close()

for result := range results.Next() {  // ⚠️ Potential leak if context cancelled

Issue: If f.ctx is cancelled while iterating, the for range results.Next() may not complete, and the deferred results.Close() won't be called immediately. The iterator channel may be blocked.

Recommendation: Add explicit context check or use errgroup with cancellation:

for {
    select {
    case <-f.ctx.Done():
        return
    case result, ok := <-results.Next():
        if !ok {
            break
        }
        // process result
    }
}

2. Race Condition: Context Created in Constructor

Location: async_epoch_fetcher.go:62

ctx, cancel := context.WithCancel(context.Background())

Issue: The context is created from background in the constructor, independent of any parent context. If the parent service is shut down before Start() is called, or if there's a startup failure, this goroutine may not respect the parent's lifecycle.

Recommendation: Accept a parent context.Context in NewAsyncEpochFetcher() or Start() to properly integrate with parent lifecycle:

func Start(ctx context.Context) {
    f.ctx, f.cancel = context.WithCancel(ctx)
    // ...
}

3. Missing AsyncFetcher Cleanup on Sequencer Errors

Location: pkg/sequencers/single/sequencer.go:95-109, 206-218

Issue: AsyncEpochFetcher.Start() is called, but if subsequent initialization fails (e.g., checkpoint save fails), the background goroutine is never stopped.

Recommendation: Add cleanup on error paths:

s.asyncFetcher = block.NewAsyncEpochFetcher(...)
s.asyncFetcher.Start()

s.fiRetriever = block.NewForcedInclusionRetriever(...)
if err := someOperation(); err != nil {
    s.asyncFetcher.Stop()  // ← Add this
    return nil, err
}

Better yet, add a cleanup method or defer pattern.

4. Buffer Overflow Risk in Serialization

Location: async_epoch_fetcher.go:398-429

Issue: The serialization calculates size upfront but doesn't validate buffer bounds during writes. If size calculation is wrong (e.g., integer overflow), buffer overflow could occur.

Recommendation: Use encoding/binary package which has built-in bounds checking, or add explicit checks:

if offset+8 > len(buf) {
    return nil, errors.New("buffer overflow during serialization")
}

Alternatively, use a bytes.Buffer which grows dynamically and is safer.

5. Deserialization: Missing Input Validation

Location: async_epoch_fetcher.go:435-478

Issue: While there's a minimum length check (len(data) < 28), there's no validation on:

  • Maximum reasonable transaction count (could cause OOM if corrupted data has txCount = 0xFFFFFFFF)
  • Maximum transaction size
  • Total deserialized size

Recommendation: Add safety limits:

const (
    maxTxCount = 100_000  // Reasonable upper bound
    maxTxSize  = 2 * 1024 * 1024  // 2MB per tx
)

if txCount > maxTxCount {
    return nil, fmt.Errorf("tx count %d exceeds maximum %d", txCount, maxTxCount)
}

6. Potential Memory Growth: No Cache Size Limit

Location: async_epoch_fetcher.go:349-393

Issue: While old epochs are cleaned up, there's no maximum cache size limit. If epochs are very large or the prefetch window is increased, memory could grow unbounded during the grace period.

Recommendation: Add a maximum cache size check and evict oldest entries if exceeded:

type AsyncEpochFetcher struct {
    // ...
    maxCacheSize int64  // bytes
    currentCacheSize atomic.Int64
}

7. Error Handling: Silent Failure in Prefetch

Location: async_epoch_fetcher.go:239-244, 276-283

Issue: When prefetch fails (e.g., StatusHeightFromFuture or processing errors), errors are logged but the cache entry is not created. The next call will hit a cache miss and fall back to sync fetch, but there's no exponential backoff or retry strategy for transient errors.

Recommendation: Consider implementing:

  • Exponential backoff for StatusHeightFromFuture (instead of retrying every poll interval)
  • Distinguish between permanent errors (don't retry) vs transient (retry with backoff)
  • Metrics to track prefetch success/failure rates

8. Code Duplication: processForcedInclusionBlobs

Location: async_epoch_fetcher.go:316-347 and forced_inclusion_retriever.go:170-201

Issue: This exact method is duplicated in both files. Violates DRY principle and increases maintenance burden.

Recommendation: Extract to a shared helper function or make one call the other.

9. Documentation: Missing AsyncFetcher Nil Check Justification

Location: pkg/sequencers/single/sequencer.go:206

if c.asyncFetcher == nil {

Issue: It's unclear when asyncFetcher could be nil at this point in the code flow. The constructor always creates it (lines 96, 124), so this seems defensive but undocumented.

Recommendation: Either:

  1. Document why this can be nil (race condition? specific initialization path?)
  2. Remove the check if it's genuinely impossible
  3. Make it a panic if it's a programming error

10. Performance: Serialization Uses Manual Byte Operations

Location: async_epoch_fetcher.go:480-514

Issue: Manual byte shifting for encoding is error-prone and harder to maintain than standard library approaches.

Recommendation: Use encoding/binary for consistency and safety:

import "encoding/binary"

func writeUint64(buf []byte, val uint64) {
    binary.LittleEndian.PutUint64(buf, val)
}

func readUint64(buf []byte) uint64 {
    return binary.LittleEndian.Uint64(buf)
}

This is what the standard library is for - it's tested, safe, and more readable.


🔵 Suggestions for Improvement

11. Consider Using Protocol Buffers for Serialization

Current: Custom binary format

Suggestion: Since the project already uses protobuf (see types/pb/), consider using it for cache serialization:

  • Self-describing format (easier to evolve)
  • Built-in backwards compatibility
  • Less error-prone than manual binary encoding
  • Better tooling support

12. Add Metrics for Observability

Missing: No metrics for async fetcher performance

Suggestion: Add metrics for:

  • Cache hit/miss rate
  • Prefetch success/failure
  • Average prefetch latency
  • Cache memory usage
  • Cleanup operations

Example:

metrics.CacheHits.Inc()
metrics.PrefetchLatency.Observe(duration.Seconds())

13. Configuration: Hard-coded Poll Interval and Prefetch Window

Location: Multiple call sites with 3*time.Second, 1*time.Second, 1 or 2 epochs

Suggestion: Make these configurable via genesis or node config:

type AsyncEpochConfig struct {
    PrefetchWindow uint64        `json:"prefetch_window"`
    PollInterval   time.Duration `json:"poll_interval"`
}

14. Testing: Add Integration Tests

Missing: No integration test showing full flow with real cache, concurrent access, and lifecycle

Suggestion: Add test that:

  • Starts fetcher
  • Simulates concurrent GetCachedEpoch calls
  • Advances DA height
  • Verifies prefetching happens correctly
  • Stops fetcher and verifies clean shutdown

15. Logging: Too Many Debug Logs in Hot Path

Location: Multiple debug logs per operation

Issue: In production with high block rate, debug logs in GetCachedEpoch (line 159) and prefetchEpochs (line 227) could impact performance.

Suggestion:

  • Use sampling for hot-path debug logs
  • Consider using trace-level for very frequent operations
  • Add a "quiet mode" configuration

🟢 Security Assessment

✅ No critical security issues found

  • Input validation: Generally good, but deserialization needs bounds (see Mempool #5)
  • Resource limits: Needs cache size limits (see Fraud proof generation #6)
  • Credential handling: No credentials exposed in logs ✓
  • Context propagation: Proper use of context.Context ✓
  • Cancellation: Respects cancellation signals ✓

🟢 Performance Assessment

Overall: Well-designed for performance

Strengths:

  • Background prefetching reduces latency at epoch boundaries
  • Read locks for cache access minimize contention
  • In-memory cache avoids disk I/O

Concerns:


📊 Test Coverage Assessment

Coverage: Good unit test coverage for core functionality

Well-tested:

  • ✅ Cache hit/miss scenarios
  • ✅ Serialization/deserialization
  • ✅ Epoch boundary conditions
  • ✅ Error cases (no namespace, future heights)
  • ✅ Multi-height epochs

Needs more tests:

  • ⚠️ Concurrent access patterns (multiple goroutines calling GetCachedEpoch)
  • ⚠️ Lifecycle edge cases (stop before start, multiple stops)
  • ⚠️ Memory cleanup validation
  • ⚠️ Context cancellation during iteration
  • ⚠️ Integration with sequencer lifecycle

📝 Documentation Assessment

Code-level documentation: Good

  • Functions have clear doc comments ✓
  • Complex logic is explained ✓

Missing:

  • Architecture decision record (ADR) for why async prefetch vs alternatives
  • Performance characteristics (expected cache hit rate, memory usage)
  • Operational guide (monitoring, tuning parameters)
  • Migration guide (behavior change for existing deployments)

Summary & Recommendations

Priority Fixes (Should Address Before Merge)

  1. Fix goroutine leak risk in cleanupOldEpochs (ABCI interface #1)
  2. Fix context lifecycle in constructor (Peer discovery #2)
  3. Add cleanup on error paths in sequencer (Transaction gossiping #3)
  4. Add deserialization bounds checks (Mempool #5)
  5. Deduplicate processForcedInclusionBlobs (Data Availability layer API #8)

Recommended Improvements (Consider for Follow-up)

  1. Add cache size limits (Fraud proof generation #6)
  2. Implement retry/backoff strategy (Rollback of invalidated transactions #7)
  3. Use encoding/binary for safer serialization (Define Fraud Proof data type #10)
  4. Add observability metrics (Initial project setup #12)
  5. Make configuration tunable (Drop-in replacement of Tendermint Node #13)
  6. Add integration tests (P2P bootstrapping #14)

Overall Assessment

This is a high-quality implementation that follows Go best practices and integrates well with the existing architecture. The async prefetching design is sound and should deliver the expected performance improvements. With the priority fixes addressed, this PR will be production-ready.

Recommendation: Approve with changes requested for priority fixes #1-5.


@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements an asynchronous epoch fetching mechanism to optimize the retrieval of forced inclusion transactions from the Data Availability (DA) layer. By introducing an AsyncEpochFetcher that prefetches and caches upcoming epoch data in the background, the system can significantly reduce the latency associated with fetching this data, particularly at epoch transitions. This enhancement improves the overall performance and responsiveness of sequencers and the block syncing process by providing immediate access to frequently needed DA epoch information.

Highlights

  • Introduced AsyncEpochFetcher: A new component for background prefetching and caching of Data Availability (DA) epoch data, designed to reduce latency at epoch boundaries.
  • Integrated with ForcedInclusionRetriever: The existing ForcedInclusionRetriever now utilizes the AsyncEpochFetcher to first check for cached epoch data before performing a synchronous fetch.
  • Deployment across Sequencers and Syncer: The AsyncEpochFetcher is instantiated and started in the createSequencer functions for EVM, gRPC, and TestApp, as well as within the Syncer component, ensuring widespread performance benefits.
  • Serialization and Deserialization: The AsyncEpochFetcher includes custom serialization and deserialization logic for ForcedInclusionEvent to efficiently store and retrieve cached epoch data.
  • Comprehensive Testing: New unit tests have been added for the AsyncEpochFetcher to ensure its correct functionality, including caching, background operations, and graceful shutdown.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an AsyncEpochFetcher component designed to prefetch Data Availability (DA) epoch data in the background, thereby improving performance for sequencers and the block syncer. The new fetcher includes an in-memory cache for storing prefetched epochs, manages background fetching, tracks the current DA height, and cleans up old cached data. The ForcedInclusionRetriever has been updated to leverage this AsyncEpochFetcher, prioritizing cached data before making direct DA client calls. The AsyncEpochFetcher is integrated into the evm, grpc, and testapp applications, the block syncer, and the single sequencer, with corresponding unit tests added and existing tests updated. Review comments indicate that the mu mutex in AsyncEpochFetcher is redundant due to dsync.MutexWrap already providing thread-safety, leading to double-locking, and suggest replacing the fragile fmt.Sscanf for parsing datastore keys with key.Name() and strconv.ParseUint for robustness.


// In-memory cache for prefetched epoch data
cache ds.Batching
mu sync.RWMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cache field is already wrapped with dsync.MutexWrap, which makes it thread-safe. The additional mu field is redundant and leads to double-locking. Please remove this mutex and all its usages (f.mu.Lock/Unlock/RLock/RUnlock) throughout the file.

Comment on lines +374 to +378
var epochEnd uint64
_, err := fmt.Sscanf(key.String(), "/epoch/%d", &epochEnd)
if err != nil {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using fmt.Sscanf to parse the key string is fragile. A more robust and idiomatic way to handle datastore keys is to use key.Name() to get the last component of the key and then parse it using strconv.ParseUint. You will need to import the strconv package.

Suggested change
var epochEnd uint64
_, err := fmt.Sscanf(key.String(), "/epoch/%d", &epochEnd)
if err != nil {
continue
}
var epochEnd uint64
epochEndStr := key.Name()
epochEnd, err := strconv.ParseUint(epochEndStr, 10, 64)
if err != nil {
continue
}

@codecov
Copy link

codecov bot commented Jan 5, 2026

Codecov Report

❌ Patch coverage is 71.26100% with 98 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.28%. Comparing base (66b491b) to head (229da68).

Files with missing lines Patch % Lines
block/internal/da/async_epoch_fetcher.go 78.49% 43 Missing and 17 partials ⚠️
block/internal/da/forced_inclusion_retriever.go 18.75% 11 Missing and 2 partials ⚠️
pkg/sequencers/single/sequencer.go 63.63% 12 Missing ⚠️
block/internal/syncing/syncer.go 0.00% 10 Missing ⚠️
block/public.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2952      +/-   ##
==========================================
+ Coverage   58.73%   59.28%   +0.54%     
==========================================
  Files          90       91       +1     
  Lines        8720     9055     +335     
==========================================
+ Hits         5122     5368     +246     
- Misses       3011     3082      +71     
- Partials      587      605      +18     
Flag Coverage Δ
combined 59.28% <71.26%> (+0.54%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants