Class

各種課堂筆記

Nil-Ext

Nil-externality Does not return an execution result or execution error validation error => OK Nilext-aware Replication Normal (Paxos) Write: 2RTT Read: Read from master (1 RTT) Nilext-aware Client send request to all replicas receive replies from super majority and master Other approaches Network Ordering Special underlying network Speculative Execution Execute and rollback Exploiting Commutativity SKYROS Update Send to all replicas Wait for supermajority $f + \lceil f/2 \rceil + 1$ with leader included Replica write to Durability log and return Background Ordering and Execution Batch durability log Leader adds to consensus log Leader sends prepare to $f$ followers, followers add to consensus log Leader applies update and removes from durability log Leader sends commit to followers Leader applies update and removes from durability log Read If in durability log adds all request from the d-log, wait for $f$ followers to respond Serve the read Else directly return data Non-nilext Updates Client sends to leader only Leader add all and this update to consensus log Wait for $f$ followers => return Recovery Replica Fail Mark status as recovering Send recovery, wait for $f+1$ replies including leader from latest view copy d-log, c-log Master Fail C-Log Find latest log with largest view # D-Log

Linearizable

Each operation appears to occur instantaneously and exactly once

Linearizability, Occult

Linearizable Reusable Infrastructure for Linearizability At least once => exactly once Assume system with RPC Underlying RPC provide at-least-once semantics Not abort after server crash Architecture RPC ID Assigned by clients | ClientID | Seq number| Completion record durability RPC ID + result atomically create with update Retry rendezvous Retry must replied with previous result Might retry from another server => need to ensure completion record are there Store/Move completion record with associate object Garbage Collection Server can remove completion record when: Client ack server response Client crashes Lease Each client has private lease Client ID = Lease ID Client crash => Lease expire => reboot => new client ID Design Detail Server upon receiving an RPC => Check duplicate Normal case: NEW Execute RPC COMPLETED return old result IN_PROGRESS discard or notify client STALE retry msg get delayed after client ACK return error Create Completion record RPC ID, Object ID, result create atomically with update mark COMPLETED and persist on disk before return Client receive reply Client mark this seq num as completed On next RPC call: Piggyback min incomplete seq number Server garbage collect item with smaller seq number LeaseServer Zoo Keeper Scalability Don’t store lease expire time on disk (Only the existence of lease) Validation Lease server implements cluster clock Client get clock value when renew lease include clock in RPC call Server checks lease server if lease close to expire Transaction Use RIFL for requestAbort & prepare Normal case Client send prepare for each object’s server (participant) Servers acquire object locks return PREPARED, ABORT Client send decision Client Crash First participant become recovery coordinator Send requestAbort (with same RPC ID as prepare) Receive PREPARED (Already prepared) ABORT Coordinator send decision Garbage Collect Client only mark prepare completed after receiving decsion response Occult Prevent slowdown cascades Observable Causal Consistency Old: Evolve data store only through monotonically non-decreasing update Occult: Let clients decide when to read safely Framework Key-Value store Divided to shard A master server per shard, multiple slaves Client write to master, read from any shardstamp for each shard causal timestamp = vector of shardstamp Write Client attach causal timestamp, send to master Master increase shardstamp, stored with new value, return Client update its causal timestamp Read Server return value and causal timestamp Client check if greater than local timestamp If no => retry or go to master Client update local timestamp Transaction Guarantees Observe a consistent snapshot no write-write conflicts transactions in the same client session must be totally ordered Protocol to execute a transaction $T$...

Bayou

Weakly consistent May return conflicting data (Will let clients know) Supporting application-specific conflict detection and resolution Example App Meeting room scheduler Users select several acceptable meeting times Reservation may be “tentative” System Model Client write to any server Server assign WriteID Servers propagate writes during pair-wise contact aka anti-entropy sessions Dependency Check Bayou_Write (update, dependency_check, mergeproc) Write consist application-supplied query and expected result unexpected result => conflict => apply merge procedures Replica Consistency Writes are performed in the same order at all servers Conflict detection and merge procedures are deterministic Servers need to know how to roll back tentative execution Write Stability Primary commit Which server to choose as primary?...

Consistency

6 Consistency Guarantees Strong Consistency See all previous writes Eventual Consistency See (any) subset of previous writes Weakest Consistency Prefix See an ordered sequence of writes starting with the first write The read result exist at some point in the master Bounded Staleness See all “old” writes e.g. See all writes more than 5 minutes ago Monotonic Reads See increasing subset of writes Similar to eventual consistency Later read return more recently value Read my write See all writes performed by reader Strength of consistency guarantee Defined by the size of the set of allowable results Strong consistency Set size = 1 (only the latest value) Eventual Consistency Set size = large (any value) In-Class Linearizability Strong consistency, single copy All ops seen in same global order Global order determined by real time if a completes before b begins, a ordered before b else a + b are concurrent Simpler to reason about Poor performance Sequential Consistency All ops totally ordered but no real-time ordering linearizable w/o realtime Ops from one client are ordered in program order Causal Consistency No total order seen by all clients Causally-related ops: same order observed by all clients