Comprehensive notes from "Distributed Systems for Fun and Profit" and related concepts
- Chapter 1: Basics
- Chapter 2: Up and Down the Level of Abstraction
- Chapter 3: Time and Order
- Impossibility Results Cheat Sheet
- Fallacies of Distributed Systems
- Quick Reference
- Theme of chapter: Why distributed systems exist, core challenges, and key goals (scalability, performance, availability, fault tolerance).
- Guiding Questions:
- Why do we need distributed systems instead of just one big machine?
- What are the key trade-offs when scaling systems?
- What design techniques (partitioning, replication) are foundational?
- Distributed systems = solving single-machine problems across multiple machines.
- Motivations: single-machine upgrades become impossible/too costly. Commodity hardware + fault-tolerant software is cheaper.
- Goals: Scalability, Performance (esp. latency), Availability (fault tolerance).
- Constraints: more nodes β more failures, more communication, more latency (speed of light).
- Design tools: Partitioning (divide data) + Replication (copy data).
- Distributed programming = solving storage + computation across multiple machines.
- Scalability = handle growth in size, geography, and admin overhead without breaking.
- Performance = throughput + latency (latency limited by speed of light + hardware).
- Availability = uptime / (uptime + downtime), improved via redundancy.
- Fault tolerance = design for expected faults.
- Abstractions/Models:
- System model (synchronous vs. asynchronous)
- Failure model (crash, partition, Byzantine)
- Consistency model (strong vs. eventual)
- Distributed systems exist because infinite single-node scaling isn't practical.
- Every system design is a balance between performance, availability, and consistency under physical constraints.
- Partitioning and replication are the "divide & conquer" techniques at the heart of distributed system design.
ββββββββββββ
β Goals β
β Scalability β
β Availability β
β Performance β
ββββββββββββ
β
ββββββββββββ΄βββββββββββ
Partitioning Replication
(divide dataset) (duplicate dataset)
- Why can't we just keep upgrading single machines forever?
- What are the three kinds of scalability discussed (size, geographic, administrative)?
- How do partitioning and replication differ, and what trade-offs do they introduce?
- Why is latency harder to solve with money than throughput?
- What role do abstractions (system/failure/consistency models) play?
- Real-world example:
- Amazon Dynamo β AP design, favors availability.
- Google Spanner β CP design, favors consistency with TrueTime.
- Limitation/assumption: Network partitions and independent node failures are unavoidable β must pick trade-offs.
- Own Example:
- A chat app β replicate messages across servers for low latency, but must handle message order inconsistencies.
- Theme of chapter: Abstractions in distributed systems, impossibility results (FLP & CAP), and consistency models.
- Guiding Questions:
- Why are abstractions necessary in distributed systems?
- What are the key impossibility results (FLP, CAP) and what do they imply?
- What are strong vs. weak consistency models, and why do they matter?
- Abstractions make complex systems manageable, but they always ignore some reality.
- System models define assumptions about nodes, communication, and time.
- Consensus problem is central: all nodes must agree on one value.
- FLP impossibility: no consensus algorithm works under full asynchrony with even one crash.
- CAP theorem: can only have two of Consistency, Availability, Partition tolerance.
- Consistency isn't binary β many models exist beyond "strong consistency."
-
System Model:
- Nodes run concurrently, local state only, independent failures.
- Communication links may delay/drop messages.
- Clocks unsynchronized β order is not global.
-
Consensus Problem (Agreement, Integrity, Termination, Validity).
-
FLP Impossibility (1985): No deterministic consensus algorithm under asynchronous model with crash failures. β Tradeoff: can't guarantee both safety and liveness.
-
CAP Theorem (Brewer, 2000):
- Consistency: all nodes see same data.
- Availability: system continues serving.
- Partition tolerance: system continues despite message loss.
- Only two out of three at a time.
-
Consistency Models:
- Strong: Linearizable, Sequential.
- Weak: Causal, Eventual, Client-centric.
- "Consistency = contract between system and programmer."
- Abstractions hide complexity but introduce trade-offs: too much hiding = inefficiency, too much exposure = confusion.
- FLP shows the limits of what's possible in asynchronous distributed systems.
- CAP highlights real-world trade-offs: during partitions, must choose between availability and strong consistency.
- "Consistency" is not one thing but a spectrum of models, each suited to different applications.
Consensus β FLP (impossible under async+crash)
CAP β Pick 2 out of {C, A, P}
CA: Consistency + Availability (no partitions)
CP: Consistency + Partition tolerance (lose some availability)
AP: Availability + Partition tolerance (weaker consistency)
- What does a system model define in distributed systems?
- Why can't consensus be guaranteed in asynchronous systems (FLP)?
- What does CAP theorem mean in practice for system designers?
- Difference between linearizable and sequential consistency?
- Why is "consistency" not a single well-defined property?
- Real-world examples:
- CA: Two-phase commit in traditional databases.
- CP: Paxos, Raft (majority quorum).
- AP: Dynamo, Cassandra (accept divergence + reconcile later).
- Limitation: Strong consistency = high latency + reduced availability under partitions.
- Own Example: Social media feed: Eventual consistency works (you don't need strict ordering), but banking transactions demand strong consistency.
- Theme of chapter: How distributed systems deal with time, ordering of events, and causality when there is no global clock.
- Guiding Questions:
- Why can't we rely on physical clocks in distributed systems?
- What are logical clocks, and how do they help?
- What's the difference between total order and causal order?
- How do vector clocks extend Lamport clocks?
- Physical clocks drift β synchronization impossible across all nodes.
- Instead, distributed systems use logical clocks to capture event ordering.
- Lamport clocks provide a way to order events consistently, but not capture causality perfectly.
- Vector clocks capture causality more precisely but at higher overhead.
- Ordering is critical for consistency models and replication.
-
Problem with physical time:
- Clocks drift β hard to keep synchronized.
- Network delays make comparing timestamps unreliable.
-
Happens-Before Relation (β):
- If event A happens before B in the same process, A β B.
- If A is a message send and B is the receive, A β B.
- Otherwise, events are concurrent.
-
Lamport Logical Clocks:
- Each process maintains a counter.
- On each event, increment counter.
- On message send, include counter. On receive, set local counter = max(local, received) + 1.
- Provides a consistent total order, but doesn't capture concurrency explicitly.
-
Vector Clocks:
- Each process maintains a vector of counters (one per process).
- Update rules:
- On event: increment own counter.
- On send: attach vector.
- On receive: take element-wise max.
- Captures causality: if V(A) < V(B), then A β B. If incomparable, events are concurrent.
-
Ordering Guarantees:
- Total order β all events ordered (may be artificial).
- Causal order β respects causality but allows concurrency.
- Physical time is unreliable in distributed systems β we shift focus from "when" to "what order."
- Lamport clocks give a total order but overapproximate causality.
- Vector clocks give partial order that exactly matches causality but cost more (O(n) storage).
- Choice depends on trade-off between precision and efficiency.
Event Ordering
βββ Physical clocks β drift & delays
βββ Logical clocks β
β βββ Lamport: total order (coarse)
β βββ Vector: causal order (precise)
βββ Happens-before relation: defines causality
- Why can't we rely on physical clocks for event ordering in distributed systems?
- What is the "happens-before" relation?
- How do Lamport clocks assign order to events?
- What limitation do Lamport clocks have in terms of causality?
- How do vector clocks improve on Lamport clocks?
- When are two events considered concurrent in vector clocks?
- Real-world examples:
- Version control systems (Git): use DAGs to track causality between commits.
- Distributed databases: vector clocks used to detect conflicting updates (e.g., Dynamo).
- Limitation: Vector clocks scale poorly (require vector size = number of processes).
- Own Example: In a chat system, Lamport clocks could order all messages, but vector clocks can show which messages are replies and which are independent.
- Messages can be delayed, lost, or reordered.
- Nodes can crash or act maliciously.
- No global clock β can't distinguish "slow" from "failed." β‘οΈ This leads to fundamental trade-offs.
- Scope: Consensus in asynchronous systems.
- Statement: In a fully asynchronous system, no deterministic algorithm can guarantee consensus if even one node may crash.
- Trade-off: Safety (agreement) vs. Liveness (progress).
- Practical outcome: Paxos, Raft guarantee safety always, liveness eventually (under partial synchrony).
- Scope: Data systems under partitions.
- Statement: In the presence of a partition, a system can provide at most 2 of:
- Consistency: All nodes see the same data.
- Availability: Every request gets a response.
- Partition Tolerance: System continues despite message loss.
- Trade-off:
- CP: Strong consistency, less availability (e.g., Spanner, Zookeeper).
- AP: High availability, weaker consistency (e.g., Dynamo, Cassandra).
- CA: Only possible if no partitions exist (idealized).
- Extension of CAP: Describes trade-offs when there is no partition.
- Statement:
- If Partition (P) β trade-off between Availability (A) and Consistency (C).
- Else (E) β trade-off between Latency (L) and Consistency (C).
- Example:
- Dynamo: PA/EL (AP under partition, favors latency otherwise).
- Spanner: PC/EC (CP under partition, favors consistency otherwise).
- Scope: Consensus with malicious (Byzantine) faults.
- Statement: To reach agreement with Byzantine nodes, need β₯ 3f + 1 nodes to tolerate f faulty nodes.
- Trade-off: Requires much higher replication & complexity.
- Practical outcome: Basis for PBFT, Tendermint, Blockchain protocols.
Unreliable Networks
βββ FLP β Safety vs. Liveness (Consensus)
βββ CAP β Consistency vs. Availability (under Partition)
βββ PACELC β Partition: CAP, Else: Latency vs. Consistency
βββ Byzantine β Agreement with malicious nodes (needs > 2/3 honest)
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology doesn't change
- There is one administrator
- Transport cost is zero
- The network is homogeneous
-
Network reliability: Software applications are written with little error-handling on networking errors. During a network outage, such applications may stall or infinitely wait for an answer packet, permanently consuming memory or other resources. When the failed network becomes available, those applications may also fail to retry any stalled operations or require a (manual) restart.
-
Latency ignorance: Ignorance of network latency, and of the packet loss it can cause, induces application- and transport-layer developers to allow unbounded traffic, greatly increasing dropped packets and wasting bandwidth.
-
Bandwidth limits: Ignorance of bandwidth limits on the part of traffic senders can result in bottlenecks.
-
Security complacency: Complacency regarding network security results in being blindsided by malicious users and programs that continually adapt to security measures.
-
Topology changes: Changes in network topology can have effects on both bandwidth and latency issues, and therefore can have similar problems.
-
Multiple administrators: Multiple administrators, as with subnets for rival companies, may institute conflicting policies of which senders of network traffic must be aware in order to complete their desired paths.
-
Transport costs: The "hidden" costs of building and maintaining a network or subnet are non-negligible and must consequently be noted in budgets to avoid vast shortfalls.
-
Network homogeneity: If a system assumes a homogeneous network, then it can lead to the same problems that result from the first three fallacies.
- What does FLP impossibility say about consensus?
- During a partition, what does CAP force you to choose between?
- What does PACELC add on top of CAP?
- How many nodes are needed to tolerate f Byzantine faults?
- Why can't money fix latency the same way it fixes throughput?
- Partitioning: Divide data across nodes for scalability
- Replication: Copy data across nodes for availability
- Consensus: Agree on values despite failures
- Logical Clocks: Order events without global time
- Vector Clocks: Track causality precisely
- Consistency vs Availability (CAP)
- Latency vs Consistency (PACELC)
- Safety vs Liveness (FLP)
- Precision vs Efficiency (Lamport vs Vector clocks)
Source: "Distributed Systems for Fun and Profit" by Mixu, Wikipedia, and various distributed systems literature