AFS

  • Vice (Server), Venus (Client)

Prototype

  • Stub directories
    • represents portions of the Vice name space located on other servers
    • If file not on that server, search stub to find which server
  • Named file by full pathname
    • no inode
  • Replication
    • read-only replicate for topmost levels
  • Cache
    • Venus ask server for timestamp on open
  • Performance
    • many stat -> bad performance
    • limit to ~20 users
      • one process per client
  • Hard to move file between servers

Benchmark

  • Many TestAuth (cache validation) GetFileStat () call
  • CPU Bottleneck
    • Context switch
    • path traversing
  • Unbalanced server load

Revise

  • Cache Management
    • Cache dir contents & symbolic links
    • status cache
      • in memory (for stat)
    • data cache
      • in disk
    • Modify directory directly on server
    • Consistency
      • Old: Client ask Server if changed
      • New: Client cache, server promise to notify if change
  • Name Resolution
    • reintroduce two-level name (fid, pathname)
    • Client covert pathname to fid
    • fid: (volume number, vnode number, uniquifier)
  • Low-Level Storage Representation
    • Server: use talbe[vnode number] = inode number
    • (Use vnode number as the index)
  • Overview
    • When client open a file
      • go through each path component
      • put to cache and setup callback (if not existed)
    • Client select server by checking volume number in mapping cache
      • if not in cache, contact any server
  • Semantic
    • Writes to file are immediately visible to process on the same machine but not in the network
    • Flush file change on close
    • other file operation are visible immediately everywhere
    • multiple workstations can perform operation concurrently but need programs to cooperate (if cared)

Disadvantages

  • no concurrent read/write across clients
  • no diskless operation
  • building a distributed database is hard
  • latency

Coda

Availability

  • Volume storage group (VSG)
  • Disconnected operation

Scalability

  • Callback-based cache coherence
  • Whole-file caching
  • Place functionality on clients
  • Avoid system-wide rapid change

First & Second Class Replication

  • First Class
    • Servers
    • persistent, secure, complete…
  • Second Class
    • Clients

Optimistic Vs Pessimistic

  • Pessimistic
    • Client acquire exclusive control
      • Block r/w on others
    • Client acquire shared control
      • allow reading at other replicas
  • Optimistic (Coda use this)
    • Read/write everywhere
    • Deal with conflict later

Implementation

  • States
    • Hoarding (Normal)
      • Hoard database + file usage history
      • Hierarchical cache management - parent cannot be remove before child
    • Emulation (Disconnected)
    • Reintegration (Resume connection)
  • Hoarding
    • Hoard Walking
      • Run every 10 min
      • Update name binding (check new file for + entries, which indicate future children need high priority)
      • Restore equilibrium by fetch and evict cache
    • On callback break
      • Files and symbolic link
        • purge the object
        • update on demand or during next hoard walk
      • Directory
        • mark cache as suspicious
  • Emulation
    • modified object has infinite priority
    • Log all changes to log file
      • optimization: multiple write into store
    • Store meta data to recoverable virtual memory (RVM)
  • Replay
    • Algorithm
      1. parse log, lock all related files
      2. validation and execute (only execute meta data update for store)
      3. data transfer for store
      4. commit and release locks
    • Conflict
      • during phase 2 of replay, check if storeid if the same
      • if server has new storeid - abort

Questions

AFS

  1. Initial Prototype. What were the primary goals of the Andrew File System? Why did the authors decide to implement a usable prototype first? What were the primary problems they found with their prototype and what are the general implications?
    • Goal
      • Scalability
      • administration
    • Why prototype?
      • need experience with issues
      • need system to evaluate
      • need workload
    • Prototype issue
      • Too many overhead messages (TestAuth, GetFileStat)
        • change protocol, reduce server interaction
      • Cpu load too high on server
        • Pathname traversal on server
          • change protocol - more work to clients
        • Too many context switches
          • Change implementation (threads)
      • Load-imbalance across servers
  2. Whole File Caching. Why does AFS use whole file caching? Where are files cached? What are the pros and cons of this approach? For what workloads is this a good idea? When is it a bad idea?
    • Why?
      • User tend to access whole file (from study)
      • reads/writes are local
        • efficient (on client
        • no load on server
      • good semantics, handle failure easily, clear consistency model
      • small amount of sharing within a file
    • When bad?
      • Only access part of the file
      • lot of sharing
      • large files
        • larger than disk space -> won’t work
      • streaming
  3. Client Caching. AFS clients perform caching to improve performance. For read requests, how does a client know that its cached copy is up to date? When are writes sent from the client to the server? What happens when the server receives a write? What happens when a client crashes and reboots? What are the pros and cons of the AFS approach versus the NFS approach?
    • open(A)
      • A cached locally?
        • yes: callback existing?
          • yes: use local copy
        • no: fetch from server
    • read()
      • read local copy
      • how to know up to date?
        • by definition, same contents for this open-to-close
    • write()
      • write local
    • close()
      • no dirty data: no server interaction
      • updates: send them to server
      • break calls with other clients
    • What must client do on reboot?
      • discard all cached files
    • Pros
      • Clear consistency model
      • Helps with scalability (less communication)
    • Cons
      • Server states
  4. Consistency Semantics. Can you describe the consistency semantics of AFS? When a client reads from a file, what version will it see? If to clients write to a file, while one will end up being stored on the server?
    • When see changes to file?
      • only on next open
    • Open-close semantics
      • see same/one copy
    • Last-closer-wins
      • no intermixing

Coda

Motivation

  1. What were the goals of Coda? What assumptions did Coda make? How good of a job did the designers do of predicting technology trends? How do wireless networks change the picture?

    • Goals?
      • Enjoy benefits of shared FS
        • continue work when inaccessible
          • voluntary client disconnect (mobile)
          • involuntary (server crash)
      • Scalable
      • Transparency
    • Assumptions
      • High bandwidth connection
      • untrusted
      • conventional hw
      • local hard disk
  2. Coda developed from AFS. Briefly, how did AFS work with regard to caching files?? What type of data consistency does AFS provide? (Addressed for AFS…)

  3. Replication is often used to increase availability, but there are trade-offs that must be considered. Is it possible to simultaneously achieve perfect consistency and availability when suffering from network partitions? Why or why not? Which does Coda place more emphasis on?

    • Availability: Client able to access file when partitioned
  4. When a network is partitioned, replicas can be controlled with either pessimistic or optimistic replica control. What is pessimistic replica control? What are the pros and cons of it? Why don’t leases solve the problem?

    • Pessimistic:
      • disallow operations when partitioned
        • disallow all writes, but allow reads
        • give ownership to one partition
      • Cons:
        • some clients can’t do work
    • Optimistic:
      • Permit ops when partitioned, detect + fix problems when connected
      • Cons:
        • updates conflicted, resolve?
  5. What is optimistic replica control? What are its pros and cons? Why was optimistic replica control chosen in Coda? Can you think of an environment where pessimistic replica control would be more appropriate?

  6. Coda performs replication on both the servers (VSG, volume storage group) and clients. What are the differences between these two types of replicas? What does Coda do if some, but not all, servers are available? With a different view of servers, how might you design a file system for disconnected operation?

    • Client vs Server Replicas
      • Client: untrusted, limited disk capacity
        • 2nd-class replicas
      • Server: 1st-class replicas
    • Client better for availability

Detailed Design and Implementation

  1. Clients are managed by a software layer called Venus. How does the state and behavior of Venus change as the client becomes disconnected or connected?
    • Hoarding
      • Normal state
      • Cache files used currently & future need
    • Emulation
      • Disconnected
      • Venus does work server usually does
    • Reintegration
      • Reconnect
      • Update to server
  2. Consider the hoarding state first, in which Venus attempts to hoard useful data in anticipation of disconnection. The challenge for hoarding is that the amount of cache space on the clients is, of course, limited. During hoarding, what tensions must Venus balance in how it manages the client cache? How does Venus decide what is cached? (What information is given infinite priority in the local cache? Why?
    • Hoarding: Collect useful data before disconnection
      • Challenge? Limited Disk Cache Space
      • Tensions
        • useful data if disconnected
        • performance: (accessing now)
      • Combine explicit statements (hoard db)+ implicit usage for dynamic priority
      • Cache highest priority
      • Infinite priority: higher directory
  3. Is Venus during the hoarding stage identical to AFS? Why might the performance of Coda Hoarding be worse than AFS?
    • Hoard walk: Periodic, keep in equilibrium
      • no uncached object (priority) > cached object
    • Callback breaks?
      • AFS: Refetch when open
      • Coda: Refetch when open, hoard walk
  4. Imagine that Venus includes a command so a user can specify that disconnection is about to take place. How should Venus respond?
  5. During emulation, Venus on the client performs many of the actions normally handled by the servers. What types of tasks does this include? How does Venus record enough information to update the servers during reintegration? How does Venus save space? What happens when all space is consumed?
    • Emulation: Venus performs actions normally handled by servers
      • Create new file ids (pre-allocate during hoard)
      • Manage cache
        • infinite priority for dirty files
        • discard deleted files
      • Actions in Log
        • What actions? modifying
        • Data in cache, not part of log
        • Intermediate writes? free previous write
      • Full:
        • can’t do modifying actions
  6. During reintegration, Venus propagates changes made during emulation to the servers and updates its cache to reflect current server state. What are the steps of reintegration? Under what circumstances will the replay fail? How is failure detected? What happens when the replay fails? Do you think Coda chose the right level of granularity for conflict resolution?
    • Obtain permanent FIDs
    • Ship log to AVSG
      • parse log, lock object
      • validate ops look for conflict
        • check storeid
      • back-fetching of all data
        • Why? don’t do if conflict
      • commit + release lock
    • Granularity of Failure?
      • any write write conflict causes all update to file
      • handle directory conflict automatically

Coda Today

  • Dropbox comparison
  • HW Environment:
    • Clients
      • disk space
      • fit all files
    • Network
      • more connected, assume connected
      • aware when disconnected
  • Workload?
    • local: no conflicts
    • sharing: explicit
  • Conflict Resolution?
    • Versions

Evaluation and Status

  1. About how long is reintegration expected to take? Why is the time for this step crucial? How are technology trends likely to impact this time? Is a design change needed?
  2. How did they determine the size of a needed local disk? How are technology trends likely to impact this? Is a design change needed?
  3. How likely is a conflict during reintegration? Will technology trends impact this? Is a design change needed?