HA-NFS
server reliability dual-ported disk disk reliability mirroring files on different disks (same server) network reliability network components replication (work load distributed over networks) Goal Transparent fail & recover No penalty when normal No client modification Architecture Each node consists Two servers Each server has 2 network interface & ip Use secondary when impersonating or re-integrating A number of SCSI buses Each disk has one primary server Normal operation Both servers exchange NFS RFS_NULL if failed, ping via network and SCSI take over if both fail Take-over Other server restore file system Change secondary MAC/IP to failed server Re-Integration Failed server turn off primary iface and send request to backup server Backup unmount and reset secondary iface Failed server restore Network Failure When normal Two servers in same node use different network as primary (Load balancing) Servers broadcast heartbeat to network When network fail Client daemon timeout and reroute to alternative path In-class Goal: Replicated, distributed service Why Hard?...