feat: High availabilty via RAFT #2954

alpe · 2026-01-06T14:40:08Z

Replaces #2836

* main: fix: remove duplicate error logging in light node shutdown (#2841) chore: fix incorrect function name in comment (#2840) chore: remove sequencer go.mod (#2837)

* main: build(deps): Bump the go_modules group across 2 directories with 3 updates (#2846) build(deps): Bump github.com/dvsekhvalnov/jose2go from 1.7.0 to 1.8.0 in /test/e2e (#2851) build(deps): Bump github.com/consensys/gnark-crypto from 0.18.0 to 0.18.1 in /test/e2e (#2844) build(deps): Bump github.com/cometbft/cometbft from 0.38.17 to 0.38.19 in /test/e2e (#2843) build(deps): Bump github.com/dvsekhvalnov/jose2go from 1.6.0 to 1.7.0 in /test/e2e (#2845)

(cherry picked from commit c44cd77e665f6d5d463295c6ed61c59a56d88db3)

* main: chore: reduce log noise (#2864) fix: sync service for non zero height starts with empty store (#2834) build(deps): Bump golang.org/x/crypto from 0.43.0 to 0.45.0 in /execution/evm (#2861) chore: minor improvement for docs (#2862)

* main: chore: bump da (#2866) chore: bump core (#2865)

* main: chore: fix some comments (#2874) chore: bump node in evm-single (#2875) refactor(syncer,cache): use compare and swap loop and add comments (#2873) refactor: use state da height as well (#2872) refactor: retrieve highest da height in cache (#2870) chore: change from event count to start and end height (#2871)

* main: chore: remove extra github action yml file (#2882) fix(execution/evm): verify payload status (#2863) feat: fetch included da height from store (#2880) chore: better output on errors (#2879) refactor!: create da client and split cache interface (#2878) chore!: rename `evm-single` and `grpc-single` (#2839) build(deps): Bump golang.org/x/crypto from 0.42.0 to 0.45.0 in /tools/da-debug in the go_modules group across 1 directory (#2876) chore: parallel cache de/serialization (#2868) chore: bump blob size (#2877)

* main: build(deps): Bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /docs in the npm_and_yarn group across 1 directory (#2900) refactor(block): centralize timeout in client (#2903) build(deps): Bump the all-go group across 2 directories with 3 updates (#2898) chore: bump default timeout (#2902) fix: revert default db (#2897) refactor: remove obsolete // +build tag (#2899) fix:da visualiser namespace (#2895) refactor: omit unnecessary reassignment (#2892) build(deps): Bump the all-go group across 5 directories with 6 updates (#2881) chore: fix inconsistent method name in retryWithBackoffOnPayloadStatus comment (#2889) fix: ensure consistent network ID usage in P2P subscriber (#2884) build(deps): Bump golangci/golangci-lint-action from 9.0.0 to 9.1.0 (#2885) build(deps): Bump actions/checkout from 5 to 6 (#2886)

* main: (34 commits) feat: make reaper poll duration configurable (#2951) chore!: move sequencers to pkg (#2931) feat: Ensure Header integrity on DA (#2948) feat(testda): add header support with GetHeaderByHeight method (#2946) chore: improve code comments clarity (#2947) chore(sequencers): optimize store check (#2945) fix: make evm_execution more robust (#2942) fix(sequencers/single): deterministic queue (#2938) fix(block): fix init logic sequencer for da epoch fetching (#2926) feat: use DA timestamp (#2939) chore: improve code comments clarity (#2943) build(deps): bump libp2p (#2937) build(deps): Bump actions/cache from 4 to 5 (#2934) build(deps): Bump actions/download-artifact from 6 to 7 (#2933) build(deps): Bump actions/upload-artifact from 5 to 6 (#2932) feat: DA Client remove interface part 3, replace types with new code (#2910) DA Client remove interface: Part 2.5, create e2e test to validate that a blob is posted in DA layer. (#2920) fix(syncing): skip forced txs checks for p2p blocks (#2922) build(deps): Bump the all-go group across 5 directories with 5 updates (#2919) chore: loosen syncer state check (#2927) ...

github-actions · 2026-01-06T14:40:23Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Jan 7, 2026, 3:59 PM

claude · 2026-01-06T14:40:26Z

Claude finished @alpe's task in 3m 56s —— View job

Comprehensive Code Review: High Availability via RAFT

I've completed a thorough review of the RAFT high availability implementation across all 75 changed files. This is a significant architectural enhancement that introduces distributed consensus for leader election and state replication. Below are my findings organized by priority and category.

Critical Issues 🔴

1. Bootstrap-Only Mode Restriction (`pkg/raft/node.go:111-114`)

Severity: High - Prevents new nodes from joining existing clusters

if !n.config.Bootstrap {
    return fmt.Errorf("raft cluster requires bootstrap mode")
}

Issue: The Start() method fails fast for any node where Bootstrap=false, making it impossible for new nodes to join an existing cluster. This conflicts with the comment on line 112 stating "at this stage only bootstrap mode is supported."

Impact:

Nodes cannot join running clusters dynamically
No horizontal scaling capability
Forces all nodes to bootstrap together

Recommendation: Fix this →

2. Naked Return Swallows Errors (`node/full.go:113-119`)

Severity: High - Gemini correctly identified these

// Lines 113-115
if leaderElection, err = newSingleRoleElector(leaderFactory); err != nil {
    return  // ❌ Returns (nil, nil) instead of (nil, err)
}

// Lines 117-119  
if leaderElection, err = newSingleRoleElector(followerFactory); err != nil {
    return  // ❌ Same issue
}

Impact: Initialization errors are silently ignored, leading to nil pointer panics later.

Recommendation: Fix this →

3. Race Condition in Leader Election (`pkg/raft/election.go:90-111`)

Severity: Medium-High - Potential double-signing risk

The election logic checks leadership status and syncing state in two separate calls:

// Line 97: First check
if d.node.leaderID() != d.node.NodeID() {
    continue
}
// Line 101: Second check (may have changed!)
if !raftSynced || !runnable.IsSynced(d.node.GetState()) {
    // Transfer leadership
}

Issue: Between lines 97 and 101, leadership could be lost to another node, but the current node still proceeds to check sync status. The code mitigates this with a second leaderID() check, but the TOCTOU (time-of-check-time-of-use) gap remains.

Impact: While the code handles this correctly by checking again, the comment on line 104 acknowledges the risk: "Better stop than double sign on old state."

Recommendation: Consider using a read lock or snapshot of the leadership state to eliminate the race window entirely.

4. Incomplete State Validation (`pkg/raft/types.go:13-21`)

Severity: Medium - Documented limitation

// assertValid checks basic constraints but does not ensure that no gaps exist or chain continuity
func assertValid(s *RaftBlockState, next *RaftBlockState) error {
    if s.Height > next.Height {
        return fmt.Errorf("invalid height: %d > %d", s.Height, next.Height)
    }
    // Missing: height continuity check (next.Height should be s.Height or s.Height+1)
    // Missing: block hash chain validation
}

Issue: The comment explicitly states this doesn't check for gaps or chain continuity. An attacker or bug could submit height=100 followed by height=200 and it would pass validation.

Recommendation: Fix this →

Architecture & Design 🏗️

Strengths ✅

Clean Interface Segregation: The Runnable interface (pkg/raft/election.go:17-23) properly separates concerns with Run() and IsSynced() methods.
Proper Raft Integration: Uses HashiCorp's battle-tested Raft implementation with appropriate store backends (BoltDB for logs/stable, FileSnapshot for snapshots).
State Machine Design: The FSM implementation (pkg/raft/node.go:41-46) uses atomic.Pointer for lock-free reads, which is efficient for high-frequency state queries.
Graceful Leadership Transitions: The DynamicLeaderElection properly waits for in-flight Raft messages before switching roles (line 96), preventing state inconsistencies.
Non-blocking Apply: The FSM's apply callback (lines 292-298) uses a non-blocking send with a warning, preventing slow consumers from blocking consensus.

Concerns ⚠️

Configuration Complexity: RAFT adds 7 new configuration parameters. The relationship between HeartbeatTimeout and LeaderLeaseTimeout (set to HeartbeatTimeout/2 on line 58) should be documented.
P2P Peer Management: In aggregator mode, P2P peers are cleared (node/full.go:98). Document why this is necessary for RA FT clusters.
Error Handling in Apply: The FSM's Apply() returns errors for validation failures, but these are swallowed by Raft. Consider logging at ERROR level or using metrics for visibility.

Error Handling & Resilience 🛡️

Good Practices ✅

Proper Context Cancellation: All goroutines respect context cancellation (e.g., election.go:153-154, raft_retriever.go:92-93).
Graceful Shutdown: The failoverState.Run() method properly handles shutdown with timeouts (3s) and error accumulation.
Error Wrapping: Consistent use of fmt.Errorf(...%w...) for error context throughout.

Issues ⚠️

Silent Channel Drops (pkg/raft/node.go:292-298)

select {
case f.applyCh <- RaftApplyMsg{...}:
default:
    f.logger.Warn().Msg("apply channel full, dropping message")
}

Issue: Dropped messages mean followers might miss state updates. The comment says "Followers can sync from DA or other peers," but this should be verified with metrics.

Recommendation: Add a metric counter for dropped messages and alert if it's consistently non-zero.

Incomplete Error Checks (block/internal/syncing/raft_retriever.go:95-97)

if err := r.consumeRaftBlock(ctx, msg.State); err != nil {
    r.logger.Error().Err(err).Uint64("height", msg.State.Height).Msg("failed to apply raft block")
    // No retry or recovery mechanism
}

Impact: Failed raft blocks are logged but not retried. Depending on the error type, this could leave the node in an inconsistent state.

Recommendation: Categorize errors (transient vs permanent) and implement retry logic for transient failures.

Concurrency & Resource Management ⚡

Strengths ✅

Lock-Free Reads: atomic.Pointer[RaftBlockState] allows lock-free state queries from multiple goroutines.
Proper WaitGroup Usage: All background goroutines properly tracked (election.go:57,62,74, raft_retriever.go:65,82).
Mutex Protection: Raft retriever correctly protects cancel field with mutex (raft_retriever.go:56,76).

Concerns ⚠️

Unbuffered Channels in Hot Path (pkg/raft/election.go:58)

errCh := make(chan error, 1)

Good: Buffer size of 1 prevents blocking. However, multiple workers could theoretically race to send errors, and only the first would succeed (lines 77-79).

Ticker Cleanup (election.go:83,136-137)

ticker := time.NewTicker(300 * time.Millisecond)
defer ticker.Stop()
// Later...
ticker.Stop()
ticker.C = nil  // Why set to nil?

Question: Why set ticker.C = nil after stopping? This is unusual. The defer ticker.Stop() already ensures cleanup.

Security 🔒

Good Practices ✅

Input Validation: Peer addresses are validated in splitPeerAddr() (lines 354-373).
Deduplication: Duplicate servers are detected (deduplicateServers()) before bootstrap.
Proposer Verification: Raft blocks verify the proposer address matches genesis (raft_retriever.go:119-122).

Concerns ⚠️

No TLS for Raft Transport: The TCP transport is created without TLS (node.go:87):

transport, err := raft.NewTCPTransport(cfg.RaftAddr, addr, 3, 10*time.Second, os.Stderr)

Risk: Raft communication (including block data) is sent over plaintext TCP. In production, this should use mutual TLS.

Recommendation: Add Raft TLS configuration options for production deployments.

No Authentication for Raft RPC (pkg/rpc/server/http.go:136-152)

The /raft/node endpoint has no authentication:

mux.HandleFunc("/raft/node", func(w http.ResponseWriter, r *http.Request) {
    // No auth check
    rsp := struct {
        IsLeader bool   `json:"is_leader"`
        NodeID   string `json:"node_id"`
    }{...}
})

Risk: Information disclosure. Anyone can query leadership status.

Impact: Low for read-only endpoint, but should align with RPC server's general auth policy.

Testing & Coverage 🧪

Test Quality ✅

Comprehensive Election Tests (pkg/raft/election_test.go): Table-driven tests cover follower start, leader loss, and worker errors.
E2E Failover Test (test/e2e/failover_e2e_test.go): Realistic 3-node cluster with leader kill and transaction verification across failover.
Unit Tests for Utilities: splitPeerAddr() and deduplicateServers() have thorough edge case coverage.

Coverage Gaps ⚠️

Codecov Report: 40.04% patch coverage with 560 lines missing coverage

Specific files with low coverage:

pkg/raft/node.go: 12.50% (168 missing lines) 🔴
pkg/raft/node_mock.go: 45.08% (95 missing lines)
block/internal/syncing/raft_retriever.go: 0.00% (63 missing lines) 🔴
node/full.go: 32.81% (43 missing lines)
block/internal/executing/executor.go: 5.55% (34 missing lines for RAFT code)

Critical Missing Tests:

Raft Node Lifecycle: Start/Stop, Bootstrap with peers, Leadership transfer
FSM Snapshot/Restore: Zero coverage for snapshot persistence
Raft Retriever: Complete lack of unit tests
Failover State Transitions: Only E2E tests, no unit tests

Recommendation: Add tests →

Performance Considerations ⚡

Positive Aspects ✅

Lock-Free State Reads: Frequent IsLeader() and GetState() calls use atomic operations.
Bounded Buffers: Apply channel has buffer of 1 (raft_retriever.go:62) to prevent blocking Raft.
Efficient Serialization: Uses protobuf for Raft state (node.go:213).

Potential Issues ⚠️

Synchronous Raft Apply (executor.go:474)

if err := e.raftNode.Broadcast(e.ctx, raftState); err != nil {
    return fmt.Errorf("failed to propose block to raft: %w", err)
}

Impact: Block production is blocked waiting for Raft quorum (SendTimeout = configured value). With 3 nodes and typical Raft latencies (10-50ms), this adds measurable latency to block production.

Trade-off: This is necessary for strong consistency, but should be monitored.

Large Raft Log Entries: Each block state includes full header and data bytes (executor.go:466-470). For blocks with many transactions, this could be megabytes per Raft log entry.

Recommendation: Consider storing blocks separately and only replicating block hashes via Raft, similar to how some blockchain implementations handle large blocks.

Code Quality & Maintainability 📝

Strengths ✅

Clear Naming: Interfaces like Runnable, sourceNode, types like RaftBlockState are self-documenting.
Structured Logging: Consistent use of zerolog with relevant fields (height, node_id, etc.).
Zero External Dependencies in Mock: Mock is manually written, not generated, for full control.

Issues ⚠️

Inconsistent Field Naming (pkg/rpc/server/http.go:143)

Gemini identified this:

NodeId string `json:"node_id"`  // Should be NodeID per Go conventions

Recommendation: Rename to NodeID for consistency with Go's naming conventions for initialisms.

Magic Numbers:
- election.go:83: 300 * time.Millisecond for ticker interval
- failover.go:179: 3*time.Second for shutdown timeout
- node.go:87: 3 and 10*time.Second for TCP transport parameters

Recommendation: Extract as named constants with comments explaining the rationale.

Debug Log in Production Code (syncing/syncer.go:184)

Gemini correctly noted:

s.logger.Debug().Uint64("header_height", state.LastSubmittedDAHeaderHeight)...

The "+++" prefix suggests this was debugging code that should be removed or logged at Debug level.

Configuration & Documentation 📚

Configuration Changes

New Raft Config Section:

type RaftConfig struct {
    Enable           bool
    NodeID           string
    RaftAddr         string
    RaftDir          string
    Bootstrap        bool
    Peers            string  // CSV format: "node1@addr1,node2@addr2"
    SnapCount        uint64
    SendTimeout      time.Duration
    HeartbeatTimeout time.Duration
}

Validation: Good validation in config.go:279-303 ensures required fields and positive timeouts.

Documentation Gaps

No ADR: The PR description says "Replaces feat: High availabilty via RAFT #2836" but doesn't link to an ADR (Architecture Decision Record). Given the scope, an ADR explaining RAFT selection, trade-offs, and operational considerations would be valuable.
Peer Format: The CSV format "node1@addr1,node2@addr2" is only documented in code comments. This should be in user-facing docs.
Bootstrap Process: How to bootstrap a cluster, add nodes, and handle failures needs operational documentation.

Summary & Recommendations

This is a well-architected implementation of RAFT consensus for high availability. The core design is sound:

Clean interface boundaries
Proper use of HashiCorp Raft
Graceful failover mechanics
Good E2E testing

However, several critical issues must be addressed before merge:

Must Fix 🔴

Bootstrap-only restriction preventing node joins
Naked returns swallowing errors in full.go
State validation allowing height gaps

Should Fix 🟡

Add unit test coverage for core Raft logic (currently 12.5%)
Consider TLS for Raft transport in production
Document RAFT operational procedures
Fix naming convention inconsistencies
Add metrics for dropped Raft messages

Consider for Future 🟢

Optimize Raft log size for large blocks
Implement retry logic for failed Raft block applications
Add authentication to /raft/node endpoint
Extract magic numbers to named constants

Test Coverage: The 40% patch coverage is concerning for such a critical feature. I recommend adding unit tests for the core Raft node lifecycle before merging.

Overall Assessment: Strong foundation with excellent E2E testing, but needs critical bug fixes and improved unit test coverage for production readiness.

gemini-code-assist · 2026-01-06T14:42:38Z

Summary of Changes

Hello @alpe, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Rollkit network's resilience and fault tolerance by integrating the Raft consensus algorithm. The changes enable a robust failover mechanism, allowing nodes to seamlessly transition between leader and follower roles. This ensures continuous operation and consistent state across the network, even if individual nodes experience failures. The update touches various core components, from dependency management and block processing to RPC services and node lifecycle management, all aimed at achieving high availability.

Highlights

Raft Consensus Integration: Introduced Raft consensus for high availability, enabling dynamic leader election and state replication. This involves new dependencies on github.com/hashicorp/raft and github.com/hashicorp/raft-boltdb.
Node Architecture Refactoring: The node's core logic has been refactored to support a failover mechanism, allowing nodes to dynamically switch between aggregator (leader) and sync (follower) modes based on Raft's leadership status. This includes changes to how P2P clients are initialized and passed to node startup commands.
Block Processing with Raft: The block executor now checks Raft leadership before producing new blocks and proposes the block state to the Raft cluster after execution, ensuring state consistency across the cluster.
Cache and State Management Enhancements: Added methods to retrieve and persist the last submitted DA header and data heights from the cache. Logic was introduced to skip in-flight (temporary) headers and data during cache retrieval to prevent processing incomplete blocks.
New RPC Endpoints and Configuration: Added new RPC endpoints for querying Raft node status (e.g., /raft/node) and introduced new configuration flags for Raft-specific settings like node_id, raft_addr, bootstrap, and peers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces high availability via RAFT consensus, which is a significant architectural change. The implementation is extensive, touching many parts of the system from node startup to block processing and syncing. Overall, the changes are well-structured, introducing clear interfaces for RAFT integration and refactoring the node components to support dynamic role switching between leader and follower. I've identified a few critical issues related to error handling and startup logic that need to be addressed, along with some medium-severity issues for improving code clarity and maintainability. Great work on this complex feature.

I am having trouble creating individual review comments. Click here to see my feedback.

node/full.go (113-115)

The naked return here will not propagate the error from newSingleRoleElector. This will cause the function to return (nil, nil), masking the underlying error. The error should be returned to the caller.

		if leaderElection, err = newSingleRoleElector(leaderFactory); err != nil {
			return nil, err
		}

node/full.go (117-119)

Similar to the previous case, the naked return here will swallow the error from newSingleRoleElector. The error should be propagated up to the caller.

		if leaderElection, err = newSingleRoleElector(followerFactory); err != nil {
			return nil, err
		}

pkg/raft/node.go (111-113)

This check prevents a node from starting if Bootstrap is false, which is problematic for nodes joining an existing cluster. A new node attempting to join will fail to start. The bootstrap logic should only execute if n.config.Bootstrap is true, and the function should return nil otherwise, allowing non-bootstrap nodes to start and join a cluster.

block/internal/cache/pending_headers.go (69-71)

The method name GetLastSubmittedDataHeight is misleading as it's part of the PendingHeaders struct. For clarity and consistency, it should be renamed to GetLastSubmittedHeaderHeight.

This change will also require updating the call site in block/internal/cache/manager.go.

func (ph *PendingHeaders) GetLastSubmittedHeaderHeight() uint64 {
	return ph.base.getLastSubmittedHeight()
}

block/internal/executing/executor.go (570-572)

The explicit type conversion types.Tx(tx) is redundant since types.Tx is an alias for []byte, and tx is already of type []byte. The change to a direct assignment is good, but it seems this loop could be replaced with a single, more efficient append call.

	data.Txs = append(data.Txs, batchData.Transactions...)

block/internal/syncing/syncer.go (184)

This log message seems to be for debugging purposes, indicated by the +++ prefix. It should be logged at the Debug level instead of Info to avoid cluttering the logs in a production environment.

			s.logger.Debug().Uint64("header_height", state.LastSubmittedDAHeaderHeight).Uint64("data_height", state.LastSubmittedDADataHeight).Msg("received raft block state")

pkg/rpc/server/http.go (143-146)

To adhere to Go's naming conventions for initialisms, the struct field NodeId should be renamed to NodeID.

				NodeID   string `json:"node_id"`
			}{
				IsLeader: raftNode.IsLeader(),
				NodeID:   raftNode.NodeID(),

codecov · 2026-01-07T14:35:59Z

Codecov Report

❌ Patch coverage is 39.85122% with 566 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.71%. Comparing base (f14c6a7) to head (3f7f431).

Files with missing lines	Patch %	Lines
pkg/raft/node.go	12.12%	174 Missing ⚠️
pkg/raft/node_mock.go	45.40%	74 Missing and 21 partials ⚠️
block/internal/syncing/raft_retriever.go	0.00%	63 Missing ⚠️
node/full.go	32.81%	36 Missing and 7 partials ⚠️
block/internal/syncing/syncer.go	25.45%	39 Missing and 2 partials ⚠️
node/failover.go	74.45%	22 Missing and 13 partials ⚠️
block/internal/executing/executor.go	5.55%	30 Missing and 4 partials ⚠️
pkg/raft/election.go	80.00%	12 Missing and 5 partials ⚠️
pkg/rpc/server/http.go	6.66%	13 Missing and 1 partial ⚠️
block/components.go	27.27%	7 Missing and 1 partial ⚠️
... and 13 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2954      +/-   ##
==========================================
- Coverage   58.74%   56.71%   -2.03%     
==========================================
  Files          90       97       +7     
  Lines        8722     9487     +765     
==========================================
+ Hits         5124     5381     +257     
- Misses       3011     3482     +471     
- Partials      587      624      +37

Flag	Coverage Δ
combined	`56.71% <39.85%> (-2.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alpe added 28 commits November 12, 2025 15:16

Add high availabiliy via raft

d4c7e7d

Prefent duplicate peers

820b9a3

Merge branch 'main' into alex/adr023_ha-failover_raft7

35cf0c8

* main: fix: remove duplicate error logging in light node shutdown (#2841) chore: fix incorrect function name in comment (#2840) chore: remove sequencer go.mod (#2837)

Go mod tidy

45a6e27

Minor updates

4562ded

Merge branch 'main' into alex/adr023_ha-failover_raft7

4d47754

Fix test

0804346

Leadership fix

1c2e3f1

(cherry picked from commit c44cd77e665f6d5d463295c6ed61c59a56d88db3)

Fix test

c320a7a

Remove sleep

72ec5fe

Increase buffer size for raft msgs

b34d7a0

Review feedback

406377e

Merge branch 'main' into alex/adr023_ha-failover_raft7

0ff25b8

* main: chore: bump da (#2866) chore: bump core (#2865)

Add replace to go.mod

406ad8b

Fix merge conflicts

abfa8d3

Linter fix

aebbdff

Minor test update

50d7caf

Remove store.Sync

5de9f0e

Merge

695324e

Add last submitted da heights to raft

2463919

Make test compile again

3e1cbca

Complete merge

640924b

github-actions bot assigned alpe Jan 6, 2026

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

alpe added 2 commits January 7, 2026 11:04

Merge branch 'main' into alex/adr023_ha-failover_raft8

11038cd

Fix integration test

095f882

alpe added 2 commits January 7, 2026 16:11

Review feedback and linting

99afdad

Convert raft state into proto type

3f7f431

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: High availabilty via RAFT #2954

feat: High availabilty via RAFT #2954

Uh oh!

alpe commented Jan 6, 2026

Uh oh!

github-actions bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

codecov bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: High availabilty via RAFT #2954

Are you sure you want to change the base?

feat: High availabilty via RAFT #2954

Uh oh!

Conversation

alpe commented Jan 6, 2026

Uh oh!

github-actions bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comprehensive Code Review: High Availability via RAFT

Critical Issues 🔴

1. Bootstrap-Only Mode Restriction (pkg/raft/node.go:111-114)

2. Naked Return Swallows Errors (node/full.go:113-119)

3. Race Condition in Leader Election (pkg/raft/election.go:90-111)

4. Incomplete State Validation (pkg/raft/types.go:13-21)

Architecture & Design 🏗️

Strengths ✅

Concerns ⚠️

Error Handling & Resilience 🛡️

Good Practices ✅

Issues ⚠️

Concurrency & Resource Management ⚡

Strengths ✅

Concerns ⚠️

Security 🔒

Good Practices ✅

Concerns ⚠️

Testing & Coverage 🧪

Test Quality ✅

Coverage Gaps ⚠️

Performance Considerations ⚡

Positive Aspects ✅

Potential Issues ⚠️

Code Quality & Maintainability 📝

Strengths ✅

Issues ⚠️

Configuration & Documentation 📚

Configuration Changes

Documentation Gaps

Summary & Recommendations

Must Fix 🔴

Should Fix 🟡

Consider for Future 🟢

Uh oh!

gemini-code-assist bot commented Jan 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

node/full.go (113-115)

node/full.go (117-119)

pkg/raft/node.go (111-113)

block/internal/cache/pending_headers.go (69-71)

block/internal/executing/executor.go (570-572)

block/internal/syncing/syncer.go (184)

pkg/rpc/server/http.go (143-146)

Uh oh!

codecov bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 6, 2026 •

edited

Loading

claude bot commented Jan 6, 2026 •

edited

Loading

1. Bootstrap-Only Mode Restriction (`pkg/raft/node.go:111-114`)

2. Naked Return Swallows Errors (`node/full.go:113-119`)

3. Race Condition in Leader Election (`pkg/raft/election.go:90-111`)

4. Incomplete State Validation (`pkg/raft/types.go:13-21`)

codecov bot commented Jan 7, 2026 •

edited

Loading