• Wide applicability, scalability, high performance, high availability

Data Model

Rows

  • Every read or write of data under a single row key is atomic
  • Rows sorted in lexicographic order
  • tablet: row range

Columns

  • Grouped into columns family
  • Column key = family:qualifier
  • Access control on family level

Timestamp

  • Store multiple versions in one cell

Implementation

Master

  • assigning tablets to tablet servers
  • detect the addition and expiration of tablet servers
  • load balancing
  • garbage collection

Tablet location

  • Read chubby file -> get location of the root tablet
  • Root tablet: contains all Metadata tablet location
  • METADATA table
    • Row key: (tablet's table id, end row)
    • Data: location of data tablet, other metadata

Tablet Assignment

  • Master
  • Tablet servers create file and lock in chubby
  • Master detect tablet alive by asking tablet server
    • If timeout or server report lost lock
    • Master try lock server file
    • If lock success => chubby up, server down => delete server file
    • server commit suicide
    • If can’t reach chubby => master kill itself
  • On master boot
    • Master scan chubby and ask every server to know tablet <=> server mapping
  • Servers split table => commit to METADATA table => notify master

Tablet Serving

  • memtable => new updates
  • SSTable => old updates

Optimizations

  • Locality groups
    • One SSTable for each locality group (column family)
  • Compression
  • Cache
  • Bloom filter
    • reduce SSTable access by asking if data is in that SSTable
  • Commit-log
    • all servers append commit log to same physical file