Home » Posts » Class

CS537 12/9

February 24, 2022

Table of Contents

Distributed File Systems
Distributed Systems
- Different Than “local” System?
NFS
- Server Crashes: How to Handle?
- Protocol:

Distributed File Systems

Sun Network File System (NFS)
Server crash recovery
- design of network protocol

Distributed Systems

Client/Server
- One Server
Replicated Servers
- Many servers

Different Than “local” System?

machine crash
network lose packets
performance latency, bandwidth
resource sharing policies

NFS

Basics
Protocol
from protocol to FS API
idempotency: key to failure handling
performance: caching

Server Crashes: How to Handle?

lead to unavailibility
key idea: when there is a problem => retry
File Handle 3 parts:
- <volume#, inode #, generation #>
- volume: which fs?
- inode: which file?
- generation: updated on delete

Protocol:

request have all info needed to complete operation (“statelessness”)

NFS Protocol

read(file handle, offset, size)
- return error code, data
write(fh, data, offset, size)
- return error rate
create(parent file handle, name)
lookup(parent fh, name)
- return file handle of name

Example

open file + read it

int fd = open("/a/b.txt", O_RDONLY);
read(fd, buffer, size);

assume: client has root directory file handle
open:

lookup(root fh, "a")
=> a's fh
lookup(a's fh, "b.txt")
=> b.txt fh
return fd

Crashes

Client req lost
Server reply lost
Server down
Uniform approach
- timeout, retry (wait for a little while)
- Property: idempotency
  - doing N times same as doing it once

Cache

Problems
- Staleness
- Visibility
Example - Staleness
- T=1 => C1 read A and place to cache
- T=2 => C2 write A'
- T=3 => C1 read A from cache?
Example - Visibility
- C1 write buffer A’ (nobody else know)
Solution - Staleness
- Check with server if data has change before using the cached version
Solution - Visibility
- “flush on close”
  - all dirty data written to server on close

comments powered by Disqus