Linearizable
Reusable Infrastructure for Linearizability At least once => exactly once Assume system with RPC Underlying RPC provide at-least-once semantics Not abort after server crash Architecture RPC ID Assigned by clients | ClientID | Seq number| Completion record durability RPC ID + result atomically create with update Retry rendezvous Retry must replied with previous result Might retry from another server => need to ensure completion record are there Store/Move completion record with associate object Garbage Collection Server can remove completion record when: Client ack server response Client crashes Lease Each client has private lease Client ID = Lease ID Client crash => Lease expire => reboot => new client ID Design Detail Server upon receiving an RPC => Check duplicate Normal case: NEW Execute RPC COMPLETED return old result IN_PROGRESS discard or notify client STALE retry msg get delayed after client ACK return error Create Completion record RPC ID, Object ID, result create atomically with update mark COMPLETED and persist on disk before return Client receive reply Client mark this seq num as completed On next RPC call: Piggyback min incomplete seq number Server garbage collect item with smaller seq number LeaseServer Zoo Keeper Scalability Don’t store lease expire time on disk (Only the existence of lease) Validation Lease server implements cluster clock Client get clock value when renew lease include clock in RPC call Server checks lease server if lease close to expire Transaction Use RIFL for requestAbort & prepare Normal case Client send prepare for each object’s server (participant) Servers acquire object locks return PREPARED, ABORT Client send decision Client Crash First participant become recovery coordinator Send requestAbort (with same RPC ID as prepare) Receive PREPARED (Already prepared) ABORT Coordinator send decision Garbage Collect Client only mark prepare completed after receiving decsion response Occult Prevent slowdown cascades Observable Causal Consistency Old: Evolve data store only through monotonically non-decreasing update Occult: Let clients decide when to read safely Framework Key-Value store Divided to shard A master server per shard, multiple slaves Client write to master, read from any shardstamp for each shard causal timestamp = vector of shardstamp Write Client attach causal timestamp, send to master Master increase shardstamp, stored with new value, return Client update its causal timestamp Read Server return value and causal timestamp Client check if greater than local timestamp If no => retry or go to master Client update local timestamp Transaction Guarantees Observe a consistent snapshot no write-write conflicts transactions in the same client session must be totally ordered Protocol to execute a transaction $T$...