Distributed File Systems
- Sun Network File System (NFS)
- Server crash recovery
- design of network protocol
Distributed Systems
- Client/Server
- One Server
- Replicated Servers
- Many servers
Different Than “local” System?
- machine crash
- network lose packets
- performance latency, bandwidth
- resource sharing policies
NFS
- Basics
- Protocol
- from protocol to FS API
- idempotency: key to failure handling
- performance: caching
Server Crashes: How to Handle?
- lead to unavailibility
- key idea: when there is a problem => retry
- File Handle 3 parts:
<volume#, inode #, generation #>- volume: which fs?
- inode: which file?
- generation: updated on delete
Protocol:
- request have all info needed to complete operation (“statelessness”)
NFS Protocol
read(file handle, offset, size)- return error code, data
write(fh, data, offset, size)- return error rate
create(parent file handle, name)lookup(parent fh, name)- return file handle of
name
- return file handle of
Example
- open file + read it
int fd = open("/a/b.txt", O_RDONLY); read(fd, buffer, size);- assume: client has root directory file handle
- open:
lookup(root fh, "a") => a's fh lookup(a's fh, "b.txt") => b.txt fh return fd
Crashes
- Client req lost
- Server reply lost
- Server down
- Uniform approach
- timeout, retry (wait for a little while)
- Property: idempotency
- doing N times same as doing it once
Cache
- Problems
- Staleness
- Visibility
- Example - Staleness
- T=1 => C1 read A and place to cache
- T=2 => C2 write A'
- T=3 => C1 read A from cache?
- Example - Visibility
- C1 write buffer A’ (nobody else know)
- Solution - Staleness
- Check with server if data has change before using the cached version
- Solution - Visibility
- “flush on close”
- all dirty data written to server on close
- “flush on close”