EPaxos
Paxos issues Also apply to Chubby ZooKeeper All requests go to master Bad scalability High latency when geo-replicate (remote master) Sensitive to load spikes & network delay Master down -> system down (for a while) EPaxos Goal Optimal commit latency in the wide area Optimal load balancing across all replicas Graceful performance degradation So… Every replica need to act as proposers or else the latency will be high imply no leader Minimize proposer’s communication with remote Quorum composition must be flexible to avoid slow nodes Key Ordering Old: Leader choose order or choose pre ordered command slot EPaxos: dynamic & decentralized Not necessary to enforce a consistent ordering for non-interfering commands Non-interfering 1 RTT Need fast-past quorum of node $F + \lfloor\frac{F+1}{2}\rfloor$ $F$ = min# tolerable node failure R1: PreAccept C1 R5: PreAccept C2 R2, 3, 4: OK => Commit Interfering 2 RTT quorum size $F+1$ R5: PreAccept C4 R3: OK C4 R1: PreAccept C3 R3: OK C3 should go after C4 R1: Receive inconsistent response, second phase R1: Accept C3 -> C4 R 2 3: OK, Commit Protocol Commit Protocol (Unoptimized version, fast-path quorum = $2F$) Replica $L$ receive a request Replica choose the next available instance Attach attrs deps: list of all instances that interfere seq: number for breaking dependency cycles, larger than max(seq deps) Replica forwards command and attr to at least fast-path quorum of nodes (PreAccept) If quorum responds and attr same => commit else update attr (union deps, set max seq) => tell simple majority to accept => commit Execution Algorithm find strongly connected components topo sort bla bla bla Paper 4....