Strong consistency Guarantees

Strong consistency

Interface

  • query(objId, opts)
  • update(objId, newVal, opts)

Protocol

  • Reply: sent by tail
  • Query: processed at tail
  • Update: directed to the head, and chain to tail
    • compute at head, send state to tail
    • can be non-deterministic
  • Hist_objID = Hist_objID of tail
  • Pending_objId = requests received but not processed by tail

Failure

  • master service
    • never fail ([[Paxos]])
    • reconfigure chain

Vs Primary/backup

Downtime

  • Chain
    • Head failure
      • Query uninterrupted
      • Update unavailable for 2 message delay
    • Middle failure
      • Query, Update uninterrupted
      • Ongoing update 4 message delay
        • No visible to client
    • Tail failure
      • Query, update unavailable for 2 message delay
  • Primary/Backup
    • Primary failure
      • Query, Update unavailable for 5 message delay
      • Master broadcast to all backup
      • Backup response number of update processed
      • Master broadcast new primary
      • New primary forward all missing update
      • Master notify clients
    • Backup failure
      • Query uninterrupted
      • Update 1 message delay
        • Master notify primary no ACK needed

In-class

Primary-Backup Replication

  • Solve
    • Ordering
    • Non-determinism
  • Leaves: failure
  • Write: Send to all backups
  • Read: Only on primary
  • Read wait for concurrent write to complete

Chain-Replication

Goal

  • high availability
  • high throughput
  • strong consistency
  • handle failures fail-stop
  • low mean-time-to-failure
  • easy to add node

Non-goals

  • handling slow nodes, misbehaving nodes
  • Wide area network WAN

Basic Operations

  1. val = query(obj-id): idempotent
  2. update(obj-id, newVal): not idempotent
    • can handle more complex functions
    • What to do on loss?
      • query object to check if updated

Properties

  • Tail: queries
  • Head: update (send diff on chain)
  • every node write data
  • Strong consistency guarantees
  • Extra work for clients
    • track both head & tail
  • Retry if no ack

Latency

xprimary-backupchain
wirtemax(latency of each backup)sum of latency
readqueue after writelow

Failure

  • Who?
    • Sperate master server
  • How?
    • Heartbeats/ping
  • Result
    • Re-arrange chain to account for failed server
    • Notify clients of new head/tail
  • Case
    • Head, mid, tail failure

Head Node Fails

  • Notify new head
  • Notify clients that S is new head
  • Lost newest update if update only in head

Tail Node Fails

  • Notify S it is new tail
  • Notify client

Inner Node

  • S- notified that S+ is new successor
  • S- need to resend every update from the past
  • Optimization
    • sent-list: everything send-add
    • acked: remove from list

Steps for Protocol

  • Notify S+ and ask for latest id in history
  • S+ response with id = x
  • Master informs S- of S+ and id = x
  • S- send update to S+

Add New Nodes

  • When can T+ be new tail?
    • sent(T) + hist(T+) = hist(T)
  • What should T do when receives queries?
    • T deny requests
    • forward to T+

Variant: Week Chain

  • Idea: Read from any node
  • Benefits?
    • better read throughput
    • scale with # node
    • good performance hen read heavy
  • Challenges?
    • weak consistency
  • To maintain strong consistency
    • Check sent list if updates have been process to tail?
      • If not
        • deny request, client retry
        • re-route to tail
        • save old values and send (expensive)
        • wait

Best Suited for

  • low latency
  • all nodes operating well
  • homogeneous performance