Sustain $\lfloor \frac{n-1}{3} \rfloor$ failures
- $f$ don’t response
- $f$ response
- but cannot tell if faulty node respond or not
- need $f$ + 1 to out number
- Provide safety and liveness
- Safety:
- all non faulty replicas agree on a total order for the execution of requests despite failures.
System Model
- Network
- drop, delay, out of order, duplicate
- Faulty node behave arbitrarily
- Independent node failures
- $D(m)$ = digest of m
- $\langle m \rangle _{\sigma_i}$ = message sign by node $i$
- Adversary
- can
- coordinate faulty nodes
- delay correct nodes
- cannot
- delay correct nodes indefinitely
- quantum compute
- can
Algorithm
Terms
- replicas move through a succession of view
- primary of a view is $p = v \mod |R|$
- $p$ primary replica
- $v$ view number
- $|R|$ total replicas
- $f$ max # to fail
- Replica = all servers
- Primary, Backup
Process
- Client send request to primary
- primary multicasts all backups
- Replicas execute request and reply to client
- client wait for $f + 1$ same replies
Client
- Client $c$ send $\langle REQUEST, o, t, c \rangle_{\sigma_c}$ to master
- $t$ timestamp
- $o$ operation to be executed
- Master broadcast to backups
- Replicas reply $\langle REPLY,v,t,c,i,r \rangle$
- $r$ result
- $i$ replica number
- $v$ view number
- Client wait for $f + 1$ replies with same $t$ and $r$
- If timeout
- Client broadcast request to all replica
- If already processed
- re-send reply (remember)
- If not processed & not primary
- redirect to primary
Normal Case
Replica state
- message log
- current view
Three-phase atomic broadcast
Pre-prepare
- Multicast to all backup and write to log $\langle \langle PRE-PREAPRE, v, n, d \rangle_{\sigma_p}, m \rangle$
- $m$ is the client’s request (message)?
- maybe not the full content?
- $d$: $m$’s digest
- $n$: seq number
- $m$ is the client’s request (message)?
- Backup accepts if
- Signature match
- in view $v$
- has not accept same $v$, $n$ with different digest
- $h$ < seq num < $H$
- prevent exhausting seq number (by choosing a large one)
Prepare
- If backup accept, multicast and write to log
- $\langle PREPARE, v, n, d, i \rangle_{\sigma_i}$
- Replica accepts if ^d8ce5a
- Signature match
- In view
- h < seq num < H
- prepared(m, v, n, i) = true if the following are all written to $i$’s log
- request $m$
- a pre-prepare in $v$ and seq num $n$
- $2f$ prepares from different backups
Commit
- when prepared(m, v, n, i) = true
- multicast $\langle COMMIT, v, n, D(m), i\rangle_{\sigma_i}$
- Replicas accept and insert to log (same condition as before)
- execute if accepted $2f + 1$ commits
Invariant
- if
prepared(m, v, n i)is true thenprepared(m', v, n, j)is false- (No prepared with same seq but diff msg)

- If executed locally, at least f+1 non-faulty replicas have prepared
Checkpoint
- multicast $\langle CHECKPOINT, n, d, i \rangle_{\sigma_i}$
- every $k$ execution
- $d$ = checkpoint digest
- correct if $2f + 1$ same checkpoint
- update low, high water mark $H$, $h$
- $h$ = seq num of last stable checkpoint
- $H$ = $h$ + $k$ (big enough constant)
View-Change
- Backup start timer if receive request but not executed
- Multicast $\langle VIEW-CHANGE, v+1, n, C, P, i \rangle _{\sigma_i}$
- $n$: seq num of last checkpoint
- $C$: $2f + 1$ valid checkpoint messages
- $P$: un-execute pre-prepared & $2f$ valid prepare msgs
- When primary of view $v + 1$ receive $2f$ other valid view-change $2f+1$ in total - Multicast $\langle NEW-VIEW, v + 1, V, O\rangle_{\sigma_p}$
- $V$: all valid VIEW-CHANGE msgs
- $O$: new pre-prepares generated by new primary
- might be null
- Backup go to new view if receive valid new-view msg
Non-Determinism
- Primary select the non-determinism value
- Or let backups propose
In Class
Can’t trust single node
- Ask all nodes?
- Liveness
- Latency
- Primary can spoof other servers response
- Solution: Digital signature
Need a primary to order requests
What if primary fault?
- Primary lie to backup
- Sol: Signature
- Primary ignore request
- Sol: timeout => client broadcast
- Primary lie to backup
Can faulty replicas prevent progress?
- No, only wait for $2f$ replies, can tolerate $f$ faulty nodes
handle primary sending different ops
handle primary lie to clients (wait for $p+1$ response)
distributed systems - Why is the commit phase in PBFT necessary? - Computer Science Stack Exchange