AFS + Coda

AFS

Vice (Server), Venus (Client)

Prototype

Stub directories
- represents portions of the Vice name space located on other servers
- If file not on that server, search stub to find which server
Named file by full pathname
- no inode
Replication
- read-only replicate for topmost levels
Cache
- Venus ask server for timestamp on open
Performance
- many stat -> bad performance
- limit to ~20 users
  - one process per client
Hard to move file between servers

Benchmark

Many TestAuth (cache validation) GetFileStat () call
CPU Bottleneck
- Context switch
- path traversing
Unbalanced server load

Revise

Cache Management
- Cache dir contents & symbolic links
- status cache
  - in memory (for stat)
- data cache
  - in disk
- Modify directory directly on server
- Consistency
  - Old: Client ask Server if changed
  - New: Client cache, server promise to notify if change
Name Resolution
- reintroduce two-level name (fid, pathname)
- Client covert pathname to fid
- fid: (volume number, vnode number, uniquifier)
Low-Level Storage Representation
- Server: use talbe[vnode number] = inode number
- (Use vnode number as the index)
Overview
- When client open a file
  - go through each path component
  - put to cache and setup callback (if not existed)
- Client select server by checking volume number in mapping cache
  - if not in cache, contact any server
Semantic
- Writes to file are immediately visible to process on the same machine but not in the network
- Flush file change on close
- other file operation are visible immediately everywhere
- multiple workstations can perform operation concurrently but need programs to cooperate (if cared)

Disadvantages

no concurrent read/write across clients
no diskless operation
building a distributed database is hard
latency

Coda

Availability

Volume storage group (VSG)
Disconnected operation

Scalability

Callback-based cache coherence
Whole-file caching
Place functionality on clients
Avoid system-wide rapid change

First & Second Class Replication

First Class
- Servers
- persistent, secure, complete…
Second Class
- Clients

Optimistic Vs Pessimistic

Pessimistic
- Client acquire exclusive control
  - Block r/w on others
- Client acquire shared control
  - allow reading at other replicas
Optimistic (Coda use this)
- Read/write everywhere
- Deal with conflict later

Implementation

States
- Hoarding (Normal)
  - Hoard database + file usage history
  - Hierarchical cache management - parent cannot be remove before child
- Emulation (Disconnected)
- Reintegration (Resume connection)
Hoarding
- Hoard Walking
  - Run every 10 min
  - Update name binding (check new file for + entries, which indicate future children need high priority)
  - Restore equilibrium by fetch and evict cache
- On callback break
  - Files and symbolic link
    - purge the object
    - update on demand or during next hoard walk
  - Directory
    - mark cache as suspicious
Emulation
- modified object has infinite priority
- Log all changes to log file
  - optimization: multiple write into store
- Store meta data to recoverable virtual memory (RVM)
Replay
- Algorithm
  1. parse log, lock all related files
  2. validation and execute (only execute meta data update for store)
  3. data transfer for store
  4. commit and release locks
- Conflict
  - during phase 2 of replay, check if storeid if the same
  - if server has new storeid - abort

Questions

AFS

Initial Prototype. What were the primary goals of the Andrew File System? Why did the authors decide to implement a usable prototype first? What were the primary problems they found with their prototype and what are the general implications?
- Goal
  - Scalability
  - administration
- Why prototype?
  - need experience with issues
  - need system to evaluate
  - need workload
- Prototype issue
  - Too many overhead messages (TestAuth, GetFileStat)
    - change protocol, reduce server interaction
  - Cpu load too high on server
    - Pathname traversal on server
      - change protocol - more work to clients
    - Too many context switches
      - Change implementation (threads)
  - Load-imbalance across servers
Whole File Caching. Why does AFS use whole file caching? Where are files cached? What are the pros and cons of this approach? For what workloads is this a good idea? When is it a bad idea?
- Why?
  - User tend to access whole file (from study)
  - reads/writes are local
    - efficient (on client
    - no load on server
  - good semantics, handle failure easily, clear consistency model
  - small amount of sharing within a file
- When bad?
  - Only access part of the file
  - lot of sharing
  - large files
    - larger than disk space -> won’t work
  - streaming
Client Caching. AFS clients perform caching to improve performance. For read requests, how does a client know that its cached copy is up to date? When are writes sent from the client to the server? What happens when the server receives a write? What happens when a client crashes and reboots? What are the pros and cons of the AFS approach versus the NFS approach?
- open(A)
  - A cached locally?
    - yes: callback existing?
      - yes: use local copy
    - no: fetch from server
- read()
  - read local copy
  - how to know up to date?
    - by definition, same contents for this open-to-close
- write()
  - write local
- close()
  - no dirty data: no server interaction
  - updates: send them to server
  - break calls with other clients
- What must client do on reboot?
  - discard all cached files
- Pros
  - Clear consistency model
  - Helps with scalability (less communication)
- Cons
  - Server states
Consistency Semantics. Can you describe the consistency semantics of AFS? When a client reads from a file, what version will it see? If to clients write to a file, while one will end up being stored on the server?
- When see changes to file?
  - only on next open
- Open-close semantics
  - see same/one copy
- Last-closer-wins
  - no intermixing

Coda

Motivation

What were the goals of Coda? What assumptions did Coda make? How good of a job did the designers do of predicting technology trends? How do wireless networks change the picture?
- Goals?
  - Enjoy benefits of shared FS
    - continue work when inaccessible
      - voluntary client disconnect (mobile)
      - involuntary (server crash)
  - Scalable
  - Transparency
- Assumptions
  - High bandwidth connection
  - untrusted
  - conventional hw
  - local hard disk
Coda developed from AFS. Briefly, how did AFS work with regard to caching files?? What type of data consistency does AFS provide? (Addressed for AFS…)
Replication is often used to increase availability, but there are trade-offs that must be considered. Is it possible to simultaneously achieve perfect consistency and availability when suffering from network partitions? Why or why not? Which does Coda place more emphasis on?
- Availability: Client able to access file when partitioned
When a network is partitioned, replicas can be controlled with either pessimistic or optimistic replica control. What is pessimistic replica control? What are the pros and cons of it? Why don’t leases solve the problem?
- Pessimistic:
  - disallow operations when partitioned
    - disallow all writes, but allow reads
    - give ownership to one partition
  - Cons:
    - some clients can’t do work
- Optimistic:
  - Permit ops when partitioned, detect + fix problems when connected
  - Cons:
    - updates conflicted, resolve?
What is optimistic replica control? What are its pros and cons? Why was optimistic replica control chosen in Coda? Can you think of an environment where pessimistic replica control would be more appropriate?
Coda performs replication on both the servers (VSG, volume storage group) and clients. What are the differences between these two types of replicas? What does Coda do if some, but not all, servers are available? With a different view of servers, how might you design a file system for disconnected operation?
- Client vs Server Replicas
  - Client: untrusted, limited disk capacity
    - 2nd-class replicas
  - Server: 1st-class replicas
- Client better for availability

Detailed Design and Implementation

Clients are managed by a software layer called Venus. How does the state and behavior of Venus change as the client becomes disconnected or connected?
- Hoarding
  - Normal state
  - Cache files used currently & future need
- Emulation
  - Disconnected
  - Venus does work server usually does
- Reintegration
  - Reconnect
  - Update to server
Consider the hoarding state first, in which Venus attempts to hoard useful data in anticipation of disconnection. The challenge for hoarding is that the amount of cache space on the clients is, of course, limited. During hoarding, what tensions must Venus balance in how it manages the client cache? How does Venus decide what is cached? (What information is given infinite priority in the local cache? Why?
- Hoarding: Collect useful data before disconnection
  - Challenge? Limited Disk Cache Space
  - Tensions
    - useful data if disconnected
    - performance: (accessing now)
  - Combine explicit statements (hoard db)＋ implicit usage for dynamic priority
  - Cache highest priority
  - Infinite priority: higher directory
Is Venus during the hoarding stage identical to AFS? Why might the performance of Coda Hoarding be worse than AFS?
- Hoard walk: Periodic, keep in equilibrium
  - no uncached object (priority) > cached object
- Callback breaks?
  - AFS: Refetch when open
  - Coda: Refetch when open, hoard walk
Imagine that Venus includes a command so a user can specify that disconnection is about to take place. How should Venus respond?
During emulation, Venus on the client performs many of the actions normally handled by the servers. What types of tasks does this include? How does Venus record enough information to update the servers during reintegration? How does Venus save space? What happens when all space is consumed?
- Emulation: Venus performs actions normally handled by servers
  - Create new file ids (pre-allocate during hoard)
  - Manage cache
    - infinite priority for dirty files
    - discard deleted files
  - Actions in Log
    - What actions? modifying
    - Data in cache, not part of log
    - Intermediate writes? free previous write
  - Full:
    - can’t do modifying actions
During reintegration, Venus propagates changes made during emulation to the servers and updates its cache to reflect current server state. What are the steps of reintegration? Under what circumstances will the replay fail? How is failure detected? What happens when the replay fails? Do you think Coda chose the right level of granularity for conflict resolution?
- Obtain permanent FIDs
- Ship log to AVSG
  - parse log, lock object
  - validate ops look for conflict
    - check storeid
  - back-fetching of all data
    - Why? don’t do if conflict
  - commit + release lock
- Granularity of Failure?
  - any write write conflict causes all update to file
  - handle directory conflict automatically

Coda Today

Dropbox comparison
HW Environment:
- Clients
  - disk space
  - fit all files
- Network
  - more connected, assume connected
  - aware when disconnected
Workload?
- local: no conflicts
- sharing: explicit
Conflict Resolution?
- Versions

Evaluation and Status

About how long is reintegration expected to take? Why is the time for this step crucial? How are technology trends likely to impact this time? Is a design change needed?
How did they determine the size of a needed local disk? How are technology trends likely to impact this? Is a design change needed?
How likely is a conflict during reintegration? Will technology trends impact this? Is a design change needed?

AFS #

Prototype #

Benchmark #

Revise #

Disadvantages #

Coda

Availability #

Scalability #

First & Second Class Replication #

Optimistic Vs Pessimistic #

Implementation #

Questions

AFS #

Coda

Motivation #

Detailed Design and Implementation #

Coda Today #

Evaluation and Status #

AFS

Prototype

Benchmark

Revise

Disadvantages

Availability

Scalability

First & Second Class Replication

Optimistic Vs Pessimistic

Implementation

AFS

Motivation

Detailed Design and Implementation

Coda Today

Evaluation and Status