GoBlocks - Distributed Block Store

A distributed block storage system built in Go with configurable replication, implementing consistent hashing with virtual nodes (vnodes) for optimal data distribution. Inspired by Amazon Dynamo, Apache Cassandra, and Ceph.

Features

Storage & Replication

Fixed 4KB block storage for efficient and predictable data management
Consistent hashing with virtual nodes (vnodes) for balanced data distribution
Configurable replication factor (default: 3-way replication)
Leaderless architecture: any node can coordinate reads/writes
Synchronous replication for strong consistency guarantees

Service Discovery & Fault Tolerance

Zookeeper-based service discovery for dynamic cluster membership
Automatic detection of node joins and leaves
Real-time hash ring updates on topology changes

Ring Rebalancing & Migration

Automatic data migration when nodes join or leave the cluster
ZooKeeper-coordinated two-phase migration protocol
Zero-downtime rebalancing with graceful handoff
Stale block management with configurable grace period
Background garbage collection after migration completes
Deferred migration for rapid topology changes during active migration

Routing & Coordination

Smart request routing based on consistent hashing
Any node can act as coordinator for any request
Automatic forwarding to responsible nodes
Fault-tolerant reads with replica fallback

Architecture Overview

GoBlockStore/
├── config/           # Configuration management
├── storage/          # In-memory block storage layer with stale block tracking
├── replication/      # Core distributed systems logic
│   ├── node.go       # Zookeeper integration, service discovery & migration coordination
│   ├── ring.go       # Consistent hash ring with vnodes & migration state
│   ├── vnode.go      # Virtual node implementation
│   └── client.go     # Replication client for inter-node communication
├── server/           # HTTP API handlers and routing
└── k8s/              # Kubernetes deployment manifests
    ├── ns.yaml
    ├── zookeeper/    # ZooKeeper deployment
    ├── observability/ # Jaeger for distributed tracing
    └── store/        # StatefulSet, Services, ConfigMap

Observability & Monitoring

GoBlockStore includes comprehensive observability with OpenTelemetry, Jaeger for distributed tracing, and Prometheus + Grafana for metrics and dashboards.

Metrics Collected

Request Counters:

blocks_put_count - Total PUT operations
blocks_get_count - Total GET operations
blocks_delete_count - Total DELETE operations
blocks_internalput_count - Internal replication operations
blocks_internaldelete_count - Internal deletion operations
blocks_health_count - Health check requests
blocks_errors_count - Errors by operation type and error category

Latency Histograms:

blocks_put_duration - PUT operation latency (p50, p95, p99)
blocks_get_duration - GET operation latency
blocks_delete_duration - DELETE operation latency

Storage Gauges:

blocks_stored_count - Current number of blocks stored per node

Deploying Observability Stack

# Deploy Jaeger for distributed tracing
kubectl apply -f k8s/observability/jeager.yaml

# Deploy Prometheus for metrics
kubectl apply -f k8s/observability/prometheus.yaml

# Deploy Grafana for dashboards
kubectl apply -f k8s/observability/grafana.yaml

Accessing UIs

Jaeger (Distributed Tracing):

kubectl port-forward -n goblocks svc/jaeger-query 16686:16686
# Open http://localhost:16686

Prometheus (Metrics):

# Access via NodePort
minikube service prometheus -n goblocks
# Or port-forward
kubectl port-forward -n goblocks svc/prometheus 9090:9090
# Open http://localhost:9090

Grafana (Dashboards):

# Port-forward to Grafana
kubectl port-forward -n goblocks svc/grafana 3000:3000
# Open http://localhost:3000
# Default credentials: admin/admin

Distributed Tracing with Jaeger

What's Traced:

HTTP Handlers: All API endpoints (PUT/GET/DELETE)
Replication Flows: Track data replication across nodes
Context Propagation: Distributed traces across multiple nodes
Error Tracking: Failed operations with stack traces

Trace Attributes:

block.id - Block being operated on
replica.name - Replica handling the request
operation - Operation type
http.status_code - Response code
error.description - Error details

Using Jaeger UI:

Select service: goblocks
Click "Find Traces"
View distributed traces showing:
- Request flow across replicas
- Span timing and latency
- Replication coordinator → replica calls
- Success/failure status

Metrics & Dashboards with Prometheus + Grafana

Setting up Grafana:

Access Grafana UI (see above)
Add Prometheus data source:
- URL: http://prometheus.goblocks.svc.cluster.local:9090 (or http://prometheus:9090)
- Click "Save & Test"

Useful Prometheus Queries:

Request Rate (ops/sec):

rate(blocks_put_count[1m])
rate(blocks_get_count[1m])
rate(blocks_delete_count[1m])

Latency Percentiles:

# p99 latency
histogram_quantile(0.99, rate(blocks_put_duration_bucket[5m]))
histogram_quantile(0.99, rate(blocks_get_duration_bucket[5m]))

# p50 latency
histogram_quantile(0.50, rate(blocks_put_duration_bucket[5m]))

# Average latency
rate(blocks_put_duration_sum[5m]) / rate(blocks_put_duration_count[5m])

Error Rate:

rate(blocks_errors_count[5m])
# Errors by operation
sum by (operation) (rate(blocks_errors_count[5m]))

Requests by Node:

sum by (instance) (rate(blocks_put_count[5m]))

Storage Usage:

blocks_stored_count
# Total across cluster
sum(blocks_stored_count)

Suggested Grafana Panels:

Request Rate: Time series graph with PUT/GET/DELETE rates
Latency Heatmap: p50, p95, p99 latencies over time
Error Rate: Time series of errors/sec by operation
Storage Distribution: Bar gauge showing blocks per node
Throughput: Single stat showing total ops/sec
Node Health: State timeline showing up/down status

How It Works

1. Service Discovery

Each node registers itself with Zookeeper using ephemeral sequential nodes
Nodes watch for membership changes and maintain an up-to-date replica list
Logical node names are stored in Zookeeper data to handle ephemeral node naming

2. Consistent Hashing with VNodes

Each physical node is assigned 20 virtual nodes (vnodes) on the hash ring
Block placement is determined by hashing the block ID and finding the next vnodes on the ring
Vnodes ensure balanced distribution even with few physical nodes
Replication factor determines how many unique physical nodes store each block

3. Request Coordination

External clients can send requests to any node (coordinator pattern)
The coordinator checks the hash ring to find responsible nodes
If the coordinator is responsible, it serves the request locally
Otherwise, it forwards the request to the appropriate node(s)

4. Data Placement

Blocks are distributed across nodes based on consistent hashing
Each block is replicated to R nodes (where R = replication factor)
The hash ring ensures minimal data movement when nodes join/leave

5. Ring Rebalancing

When nodes join or leave the cluster, GoBlockStore automatically rebalances data using a coordinated migration protocol:

Migration Protocol

Phase 1: Detection

Nodes detect topology changes via ZooKeeper watches
First node to detect creates migration barrier in ZooKeeper
All existing nodes register as participants
New/departing nodes trigger ring recalculation

Phase 2: Data Transfer

Each node independently calculates which blocks need migration
Blocks are transferred to new owners (nodes that didn't have them before)
Blocks no longer owned by current node are marked as stale
Stale blocks remain readable during grace period (zero-downtime)

Phase 3: Coordination

Nodes mark themselves "ready" in ZooKeeper after completing transfers
Any node can act as coordinator to check if all participants are ready
When all ready, coordinator deletes the migration barrier

Phase 4: Atomic Ring Swap

All nodes watch for barrier deletion
Upon deletion, all nodes atomically swap to the new ring topology
During migration, routing uses OLD ring for stability
After swap, routing uses NEW ring

Phase 5: Garbage Collection

After configurable grace period (default: 5 minutes)
Stale blocks are garbage collected in background
Memory/disk footprint reduced

Handling Rapid Topology Changes

If nodes join/leave during active migration:

Topology changes are deferred using a PendingMigration flag
Ring watches are temporarily disabled during migration
When current migration completes, pending migration starts automatically
Multiple topology changes are batched into single migration cycle

This prevents migration storms and ensures cluster stability.

Deployment

Kubernetes Deployment (Recommended)

GoBlockStore is designed to run on Kubernetes using StatefulSets.

Quick Start

# 1. Start Minikube (or use existing cluster)
minikube start

# 2. Build image in Minikube's Docker
eval $(minikube docker-env)
docker build -t goblocks:latest .

# 3. Deploy namespace, ZooKeeper, BlockStore, and Observability
kubectl apply -f k8s/ns.yaml
kubectl apply -f k8s/zookeeper/deploy.yaml
kubectl apply -f k8s/zookeeper/service.yaml
kubectl apply -f k8s/store/configmap.yaml
kubectl apply -f k8s/store/service-headless.yaml
kubectl apply -f k8s/store/statefulset.yaml
kubectl apply -f k8s/store/service-external.yaml

# Observability stack
kubectl apply -f k8s/observability/jeager.yaml
kubectl apply -f k8s/observability/prometheus.yaml
kubectl apply -f k8s/observability/grafana.yaml

# 4. Wait for pods to be ready
kubectl get pods -n goblocks -w

# 5. Access the service
minikube service blockstore -n goblocks --url

Why StatefulSet?

Stable pod names: blockstore-0, blockstore-1, blockstore-2
Stable network identities: Each pod gets DNS entry via headless service
Ordered deployment: Pods created sequentially (important for ring formation)
ZooKeeper registration: Each pod registers with unique, predictable name

Testing Ring Rebalancing

# Get service URL
SERVICE_URL=$(minikube service blockstore -n goblocks --url)

# Write test data
for i in {1..20}; do
  dd if=/dev/urandom bs=4096 count=1 2>/dev/null | \
  curl -X PUT "$SERVICE_URL/block/test-$i" --data-binary @-
done

# Scale up to trigger rebalancing
kubectl scale statefulset blockstore --replicas=4 -n goblocks

# Watch migration logs
kubectl logs -f blockstore-0 -n goblocks | grep -E "Migration|transfer|barrier|swap"

# Verify data still accessible
for i in {1..20}; do
  curl "$SERVICE_URL/block/test-$i" > /dev/null && echo "✅ test-$i" || echo "❌ test-$i"
done

# Scale down
kubectl scale statefulset blockstore --replicas=3 -n goblocks

Key Kubernetes Concepts Used

StatefulSet: Stable identities for replicated blockstore nodes
Headless Service: Pod-to-pod DNS for ring communication
NodePort Service: External access for clients
ConfigMap: ZooKeeper connection configuration
Namespaces: Isolated deployment environment

Monitoring Migration

# Watch ZooKeeper migration state
kubectl exec -it deploy/zookeeper -n goblocks -- zkCli.sh
# Then: ls /migration/participants

# Check migration logs across all nodes
kubectl logs -l app=blockstore -n goblocks --tail=50 | grep Migration

# Monitor garbage collection
kubectl logs -l app=blockstore -n goblocks | grep "Garbage collection"

# Check for errors
kubectl logs -l app=blockstore -n goblocks | grep -i error

Troubleshooting

Migration stuck?

# Check if all participants are registered
kubectl exec -it deploy/zookeeper -n goblocks -- zkCli.sh
ls /migration/participants
get /migration/participants/blockstore-0

# Check for network issues
kubectl exec -it blockstore-0 -n goblocks -- sh
wget -O- http://blockstore-1.blockstore-headless.goblocks.svc.cluster.local:3000/health

Data inconsistency after migration?

# Check which nodes think they own a block
for pod in blockstore-{0,1,2,3}; do
  echo "=== $pod ==="
  kubectl exec $pod -n goblocks -- wget -qO- http://localhost:3000/block/test-1 && echo "HAS BLOCK" || echo "NO BLOCK"
done

Getting Started

Prerequisites

Go 1.21 or higher
Zookeeper 3.x (standalone or ensemble)

Setup

Start Zookeeper

# Using Docker
docker run -d -p 2181:2181 --name zookeeper zookeeper:3.8

Build the project

go build -o blockstore

Run multiple nodes

# Node 1
export ZKAddress="localhost:2181"
export ReplicaName="node1"
export ReplicaAddress="localhost"
export ReplicationFactor=3
./blockstore -port 3001

# Node 2
export ReplicaName="node2"
./blockstore -port 3002

# Node 3
export ReplicaName="node3"
./blockstore -port 3003

Environment Variables

Core Configuration:

ZKAddress: Zookeeper server address (default: localhost:2181)
ReplicaName: Unique name for this node (required)
ReplicaAddress: Address where this node is reachable (required)
ReplicationFactor: Number of replicas per block (default: 3)

Observability (OpenTelemetry):

OTEL_EXPORTER_OTLP_ENDPOINT: OTLP endpoint for traces (e.g., http://jaeger-collector:4317)
OTEL_EXPORTER_OTLP_PROTOCOL: Protocol to use (grpc or http)
OTEL_SERVICE_NAME: Service name shown in tracing UI (default: goblocks)
OTEL_RESOURCE_ATTRIBUTES: Additional resource attributes (e.g., service.version=1.0)

Note: Metrics are exposed via /metrics endpoint in Prometheus format. Traces are sent to OTLP endpoint (Jaeger).

HTTP API

Block Operations

PUT /block/{id}: Store a 4KB block (automatically routed to responsible nodes)
GET /block/{id}: Retrieve a block (fetched from responsible nodes)
DELETE /block/{id}: Delete a block
GET /health: Health check endpoint
GET /metrics: Prometheus metrics endpoint

Internal Endpoints (inter-node communication)

PUT /internal/block/{id}: Direct replication endpoint
DELETE /internal/block/{id}: Direct deletion endpoint

Example Usage

# Store a block
dd if=/dev/urandom of=block.bin bs=4096 count=1
curl -X PUT --data-binary @block.bin http://localhost:3001/block/myblock

# Retrieve a block
curl http://localhost:3002/block/myblock -o retrieved.bin

# Delete a block
curl -X DELETE http://localhost:3003/block/myblock

# View Prometheus metrics
curl http://localhost:3001/metrics

# View traces in Jaeger (Kubernetes)
kubectl port-forward -n goblocks svc/jaeger-query 16686:16686
# Open http://localhost:16686 and select service "goblocks"

# View metrics in Prometheus (Kubernetes)
kubectl port-forward -n goblocks svc/prometheus 9090:9090
# Open http://localhost:9090 and query: rate(blocks_put_count[1m])

Roadmap

✅ Phase 1: Core Rebalancing & Foundation (Completed)

Ring Rebalancing: Automatic data migration when nodes join/leave
ZooKeeper Coordination: Barrier-based migration protocol
Graceful Migration: Zero-downtime rebalancing with stale block management
Garbage Collection: Background cleanup after grace period
Kubernetes Deployment: StatefulSet-based deployment with stable identities

✅ Phase 2: Observability & Monitoring (Completed)

📋 Phase 3: Reliability & Consistency

Hinted Handoff: Temporary storage for failed writes with pull-based delivery
Read Repair: Detect and fix inconsistencies during reads
Anti-entropy (Merkle Trees): Background consistency checks
Configurable Consistency Levels: Quorum reads/writes (R+W > N)
Sloppy Quorums: Handle temporary failures gracefully

🏗️ Phase 4: Production Hardening

Persistent Storage: Replace in-memory storage with disk-backed solution (RocksDB/BadgerDB)
Write-Ahead Log (WAL): Durability for in-flight operations
Compression: Block-level compression for storage efficiency
Security: TLS for inter-node communication, Zookeeper ACLs
Admin API: Cluster status, ring visualization, rebalancing controls
Rate Limiting: Per-node and per-client rate limiting

🌍 Phase 5: Multi-Datacenter Replication

Log-Based Replication: Asynchronous replication across datacenters
Rack-Aware Placement: Topology-aware replica placement
Cross-DC Consistency: Vector clocks or last-write-wins (LWW)
Datacenter Failover: Automatic failover between datacenters
Geo-replication Lag Monitoring: Track replication delays

Project Motivation

This project is built as part of an intensive learning program covering:

Distributed Systems: Consensus, replication, partitioning, fault tolerance
Site Reliability Engineering: Observability, SLOs, error budgets, incident response
Database Internals: Storage engines, indexing, consistency models
Production Systems: Monitoring, alerting, capacity planning

Technical Highlights

Virtual Nodes (VNodes): Better load distribution and faster rebalancing
ZooKeeper Integration: Production-grade service discovery and migration coordination
Coordinator Pattern: Any-node request handling
Consistent Hashing: Minimal data movement on topology changes
Zero-Downtime Rebalancing: Graceful migration with stale block management
Barrier Synchronization: ZooKeeper-coordinated atomic ring swaps
Deferred Migration: Batches rapid topology changes to prevent migration storms
Kubernetes-Native: StatefulSet-based deployment with stable pod identities

Contributing

This is a learning project, but feedback and suggestions are welcome!

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
config		config
k8s		k8s
observability		observability
replication		replication
server		server
sre-lab		sre-lab
storage		storage
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Folders and files

Latest commit

History

Repository files navigation

GoBlocks - Distributed Block Store

Features

Storage & Replication

Service Discovery & Fault Tolerance

Ring Rebalancing & Migration

Routing & Coordination

Architecture Overview

Observability & Monitoring

Metrics Collected

Deploying Observability Stack

Accessing UIs

Distributed Tracing with Jaeger

Metrics & Dashboards with Prometheus + Grafana

How It Works

1. Service Discovery

2. Consistent Hashing with VNodes

3. Request Coordination

4. Data Placement

5. Ring Rebalancing

Migration Protocol

Handling Rapid Topology Changes

Deployment

Kubernetes Deployment (Recommended)

Quick Start

Why StatefulSet?

Testing Ring Rebalancing

Key Kubernetes Concepts Used

Monitoring Migration

Troubleshooting

Getting Started

Prerequisites

Setup

Environment Variables

HTTP API

Block Operations

Internal Endpoints (inter-node communication)

Example Usage

Roadmap

✅ Phase 1: Core Rebalancing & Foundation (Completed)

✅ Phase 2: Observability & Monitoring (Completed)

📋 Phase 3: Reliability & Consistency

🏗️ Phase 4: Production Hardening

🌍 Phase 5: Multi-Datacenter Replication

Project Motivation

Technical Highlights

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages