CS739

HA-NFS

server reliability dual-ported disk disk reliability mirroring files on different disks (same server) network reliability network components replication (work load distributed over networks) Goal Transparent fail & recover No penalty when normal No client modification Architecture Each node consists Two servers Each server has 2 network interface & ip Use secondary when impersonating or re-integrating A number of SCSI buses Each disk has one primary server Normal operation Both servers exchange NFS RFS_NULL if failed, ping via network and SCSI take over if both fail Take-over Other server restore file system Change secondary MAC/IP to failed server Re-Integration Failed server turn off primary iface and send request to backup server Backup unmount and reset secondary iface Failed server restore Network Failure When normal Two servers in same node use different network as primary (Load balancing) Servers broadcast heartbeat to network When network fail Client daemon timeout and reroute to alternative path In-class Goal: Replicated, distributed service Why Hard?...

LBFS + Lease

cross-file similarities Files often contain a number of segments in common with other files or previous versions of the same file divide files into chunks and indexes the chunk Design only close-to-open consistency Cache Chunk Fingerprint every over-lapping 48-byte region if last 13 bit of region equal to a magic value, place break point Enforce min/max chunk size 2K/64K Chunk Database use first 64bit of SHA-1 as key (file, offset, count) as value only as a hint READ Client GETHASH -> Server Server response a vector of hashes -> Client Client request missing data WRITE Implementation Notes Motivation File system for low-bandwidth networks Existing solutions local copy work local copy copy to server when done manual, mistakes, conflict work remotely ssh remote machine Goal: Min bandwidth Related technique: compression Workload Assumptions make small changes to files, similar versions e....

Logical Clocks, Global Snapshots

Clocks Partial Ordering Define “happened before” (->) without clock If a and b are events in the same process, and a comes before b, then a -> b If a is the sending of a message and b is the receipt of the same message, then a -> b If a -> b and b ->c then a -> Concurrent: a -/> b and b -/> a Logical Clocks For any events $a$, $b$: if $a \rightarrow b$ then $C(a) < C(b)$ C1....

NFS

The Role of Distributed State State information retained in one place that describes something, or is determined by something, somewhere else in the system. Pros Performance (Cache) Coherency (Seq num to detect duplicates or out-of-order) Reliability (Recover from cache if center die) Cons Consistency Detect stale data on use (DNS cache) Prevent inconsistency (Direct to a single copy when updating) Tolerate inconsistency Crash sensitivity Crash on one machine crashes the whole system Time & Space Overheads Mainly due to maintaining consistency Space: same data on many machines Complexity NFS Idempotent State almost exclusively on clients Client State File identifiers File data (read) File attributes (lookup) Name translations (lookup, name -> File identifiers) Pros Handle server crashes with ease (Client notice delay) Simplicity Cons Performance Change will have to be written to disk before write returns Consistency Server cannot notify other clients if one client modify its file....

Paxos

paxos-simple-Copy.pdf (microsoft.com) Problem Choosing a single value among all Proposers, Acceptors, Learners Choosing a Value P1. An acceptor must accept the first proposal (Because what if there is only a single proposal) imply => Acceptors must be allowed to accept more (If not, what if no majority?) However, need to ensure all chosen proposals have the same value => => introduce proposal number A value is chosen when a single proposal with that value has been accepted by a majority of the acceptors....