Brief

  • RAID
    • S/W (file system)
    • H/W (hardware) disks, RAIDs, SSD
  • File System
    • API
    • Internals

RAID

  • Why?
    • Performance
    • Capacity
    • Reliability (Durability)
  • Failure model
    • entire drive:
      • Working or (Completely) failed
      • Easily detected (by RAID controller)
  • RAID “Levels”
    • Level 0: no redundancy (striping / JBOD)
      • Disk 0Disk 1Disk 2
        Block 012
        345
      • No redundancy: can’t handle failure
  • Level 1: Mirroring
    • For each block, have copies on some other drive
      • Disk 0Disk 1Disk 2Disk 3
        0011
        2233
    • More advanced failure model:
      • Block could become corrupt
      • Solutions
        • have > 2 copies, vote
        • Checksum
    • Good:
      • Performance (1 logical write => 2 physical write)
      • Tolerate failure
    • Bad:
      • Capacity (1/2 for 2 way mirror)
  • Level 4: Parity
    • Bit level example, each row has even # of 1’s
    • Disk 0Disk 1Disk 2Parity Disk
      0101
      0000
    • “Full stripe write”
      • Write: Disk 0,1,2 => RAID controller, compute parity
      • Do all writes in parallel
    • Random write: 1 block
      • Disk 0Disk 1Disk 2Parity
        012P0,1,2
        345P3,4,5
        678P6,7,8
      • How to write 4?
        • Approach #1 (Additive):
          • Read 3, 5
          • Compute Parity (Over 3, 4, 5)
          • Write 4, P3,4,5
        • Approach #2
          • Read old data
          • If different:
            • Read old parity
            • Compute (flip) the new parity
            • Write new data, new parity
          • 2 Reads + computation(free) + 2 writes
          • RAID 4: 1 Write => 4 I/Os
  • Level 5
    • Stagger Parity
      Disk 0Disk 1Disk 2Parity
      012P0,1,2
      34P3,4,55
      6P6,7,878
    • Reduce write bottleneck
    • More parallel read

Mirroring Vs RAID - 5

MirroringRaid 5
Small Writes2 writes per write4 I/Os
Sequential I/Osimilarsimilar
CapacityWastes 1/2 or moremuch more efficient
Reliability1 failure (fore sure)same
  • File System:
    • 2 abstractions
      • File:
        • array of bytes of some size, read or write or (grow or delete …)
          • FS doesn’t care about file contents
        • Has name: low-level (inode number)
      • Directory
        • List of files, directories
        • Map “human readable” name => low-level name