Distributed systems are about coordinating computation across machines that fail independently, communicate over unreliable networks, and disagree about time. This page covers the patterns; the C++ angle is which libraries and idioms make implementing them tractable.
- 1. Why Distributed Is Different
- 2. RPC
- 3. Message Queues and Streams
- 4. Consensus and Replication
- 5. Sharding and Partitioning
- 6. Consistent Hashing
- 7. Backpressure and Load Shedding
- 8. Idempotency, Retries, Deduplication
- 9. Service Discovery and Health
- 10. Time, Clocks, and Ordering
- 11. C++ Libraries
Single-machine code can assume:
- Memory is shared and consistent.
- Time advances uniformly.
- Failure means the whole thing crashes — observable.
- Latency is deterministic and small.
Distributed code can't:
- Network partitions are real. Two nodes that can't reach each other both think the other is dead.
- Partial failure — node A is up, B is down, C is slow. Most of your design budget goes here.
- No global clock. Node clocks drift, NTP corrects them in jumps. "Before" and "after" are subjective.
- Latency varies wildly — μs locally, ms within DC, 100s of ms across continents.
The CAP theorem formalizes the basic tradeoff: under network partition, you can have Consistency or Availability, not both. Choose deliberately.
A function call to a remote service. The classic API:
auto result = client.greet("Alice"); // blocks until responseImplementations in C++:
- gRPC — Google's HTTP/2 + Protobuf RPC. The default for most new services.
- Cap'n Proto RPC — tied to Cap'n Proto serialization; promise pipelining.
- Apache Thrift — older Facebook system, multi-language, similar shape.
- REST/HTTP — not strictly RPC, but ubiquitous.
service Greeter {
rpc Greet (GreetRequest) returns (GreetResponse);
}Async vs sync: prefer async APIs in production. Blocking RPC ties up threads; an async client lets one thread handle thousands of in-flight requests. C++20 coroutines + async gRPC give a synchronous-looking style without the threading cost.
The fallacies of distributed computing — memorize them:
- The network is reliable. (Not.)
- Latency is zero. (Not.)
- Bandwidth is infinite. (Not.)
- The network is secure. (Not.)
- Topology doesn't change. (It does.)
- There is one administrator. (There isn't.)
- Transport cost is zero. (It isn't.)
- The network is homogeneous. (It isn't.)
Async, decoupled communication. Producer pushes a message; the broker stores it; consumers pull.
| System | Style | Notes |
|---|---|---|
| Kafka | Append-only log, partitioned, replicated | High throughput; replay; ordering per partition |
| RabbitMQ | AMQP queues + exchanges | Flexible routing; lower throughput |
| NATS | Pub/sub, optional persistence | Very low latency; simple |
| Redis Streams | Append log within Redis | Simple ops; pairs with caching |
| AWS SQS / SNS | Managed queue / topic | Easy; vendor lock-in |
| MQTT | Pub/sub for IoT | Tiny clients; QoS levels |
C++ clients exist for all the above. See Apache Kafka, RabbitMQ.
When a queue: producer and consumer have different rates, work is async, you need durability / retry.
When NOT a queue: synchronous request/response (use RPC), small inter-thread comm (use a BoundedQueue), high-throughput same-host (use shared memory).
Multiple nodes need to agree on a value (e.g., who's the leader, what's the next entry in a log). Provably hard. Algorithms:
- Raft — the modern, understandable choice. Leader election + log replication. Used in etcd, Consul, CockroachDB.
- Paxos — older, more general, harder to understand. Multi-Paxos for log replication. Google's Chubby uses it.
- EPaxos / Mencius — leaderless variants.
- Byzantine fault tolerance (PBFT, HotStuff) — when nodes can be malicious, not just crashed. Slower; required for blockchains.
Don't implement Raft yourself. Use a battle-tested implementation:
- etcd-cpp-apiv3 — client for etcd.
- Apache Kudu's consensus — Raft in C++.
- BRaft — Baidu's production Raft library.
When data doesn't fit on one node, partition it. Options:
- Hash sharding —
hash(key) % N. Even distribution; reshard is painful. See consistent hashing below. - Range sharding — partition by key range. Easy lookup, supports range queries. Hot spots possible.
- Lookup-table sharding — explicit map from key → shard. Most flexible; metadata service required.
Cross-shard queries are expensive. Design schemas so most queries hit one shard.
Re-sharding (adding/removing nodes) is the hardest part. Strategies:
- Pre-sharding — create many more logical shards than physical nodes. Move shards between nodes without re-hashing keys.
- Consistent hashing — see next section.
- Online migration — copy data while still serving; switch atomically.
Place servers and keys on a virtual ring. A key is owned by the next server clockwise from its hash. Adding/removing a server only affects keys near it on the ring — O(K/N) keys move instead of O(K).
#include <iostream>
#include <map>
#include <string>
class ConsistentHashRing {
std::map<size_t, std::string> ring; // hash -> server
int virtualNodes = 150;
public:
void addServer(const std::string& server) {
for (int i = 0; i < virtualNodes; ++i) {
size_t h = std::hash<std::string>{}(server + std::to_string(i));
ring[h] = server;
}
}
std::string serverFor(const std::string& key) const {
size_t h = std::hash<std::string>{}(key);
auto it = ring.lower_bound(h);
if (it == ring.end()) it = ring.begin(); // wrap around the ring
return it->second;
}
};
int main() {
ConsistentHashRing ring;
ring.addServer("server-A");
ring.addServer("server-B");
ring.addServer("server-C");
for (const std::string& key : {"user:42", "user:99", "order:7", "order:1234"}) {
std::cout << key << " -> " << ring.serverFor(key) << "\n";
}
}Virtual nodes (~100–500 per server) smooth out the distribution; without them, fewer-server rings get lumpy.
When demand exceeds capacity, you have two options: queue (latency grows) or shed (some requests fail fast). Queuing without bound is the path to outage.
Patterns:
- Bounded queues with rejection. When full, return 503 / fail. The queue depth is the latency budget.
- Token bucket / leaky bucket rate limiters. Smooth bursty traffic.
- Circuit breaker. When a downstream is failing, fail fast for a cooling-off period instead of dogpiling.
- Adaptive concurrency limits (Netflix concurrency-limits). Track latency; reduce concurrency when latency rises.
- Priority shedding. Drop low-priority requests first. A search service might shed analytics traffic before user-facing.
See Error Handling Strategies §6 for circuit breakers and bulkheads.
In distributed systems, network failures mean you can't tell whether your request was processed:
- The request was lost.
- The request succeeded but the response was lost.
- The request is taking too long but is still in flight.
The retry mechanism cannot know which. Therefore: make every operation idempotent.
- Idempotency keys. Client generates a UUID per logical operation; server dedupes. The first request with key X executes; retries with key X are no-ops returning the original result.
- Conditional writes. "Set X = new only if version = old." Datastores like DynamoDB, Spanner, etcd support this.
- Natural idempotency. "Set user's email to X" is idempotent; "increment count by 1" isn't (use "set count to N" with a CAS, or use a counter dedicated to handling retries).
Retry policy:
- Exponential backoff with jitter.
delay = base * 2^attempt + random(0, base). Without jitter, all retrying clients converge on the same retry timing — thundering herd. - Bounded retries. Cap the attempts. Failing fast is better than hammering a sick service.
- Don't retry obvious user errors (400 Bad Request).
In a fleet, services come and go. How does client A find service B?
- DNS. Simplest. SRV records or A records pointing at a load balancer.
- Service registry. etcd, Consul, ZooKeeper. Services register themselves; clients query.
- Service mesh (Istio, Linkerd, Consul Connect). Sidecar proxy handles discovery, mTLS, retries, observability.
- Kubernetes services. kube-dns + endpoints; the platform's built-in mechanism.
Health checks:
- Liveness. "Is the process alive?" — used by orchestrator to restart.
- Readiness. "Should I receive traffic?" — used by load balancer to route.
- Startup. Slow-starting service; delay liveness checks until ready.
A common bug: liveness fails on a transient downstream issue, the orchestrator restarts the container, the same issue happens to the new container. Liveness should depend only on the process itself, not on dependencies.
Physical clocks lie:
- NTP can jump backwards.
- Different machines drift apart by milliseconds.
- A clock-based "happened before" can be wrong.
Logical clocks:
- Lamport timestamps —
(counter)per node. Provides partial order: if A happened-before B, thenlamport(A) < lamport(B). Total order requires tiebreak (e.g., node ID). - Vector clocks — per-node array of counters. Captures causality fully. Storage grows with cluster size.
- Hybrid Logical Clocks (HLC) — combine physical and logical; bounded skew, monotonic across nodes. Used by CockroachDB, MongoDB.
- TrueTime (Spanner) — bounded uncertainty interval (
[earliest, latest]) backed by GPS + atomic clocks. Enables externally consistent transactions.
For application code: never use wall-clock time for ordering or causality. Use CLOCK_MONOTONIC for elapsed time. For "this happened before that," use logical timestamps.
| Need | Library |
|---|---|
| RPC | gRPC, Cap'n Proto |
| Async I/O | Boost.Asio, libuv, [io_uring wrappers] |
| Service discovery | etcd-cpp, Consul HTTP API |
| Consensus | BRaft, Apache Kudu's consensus |
| Distributed cache | hiredis (Redis), libmemcached |
| Streaming | librdkafka (Kafka), nats.c |
| Service mesh | Envoy (in C++ itself; programmable via xDS) |
| Metrics / tracing | OpenTelemetry C++ |
- Designing Data-Intensive Applications, Martin Kleppmann — the single best book on distributed systems for engineers.
- Database Internals, Alex Petrov — for replication and consensus details.
- Release It!, Michael Nygard — fault tolerance patterns.
- The Raft paper, Ongaro & Ousterhout.
- Jepsen analyses — empirical testing of "consistent" databases.
- Error Handling Strategies, Concurrency Patterns, Pipeline and Dataflow, IPC and Shared Memory.