- Why build distributed systems?
- Performance of single computer can’t handle load
- throughput, latency
- cost/performance
- commodity components
- elasticity
- incremental scalability
- Fault-tolerance
- availability
- reliability: don’t do something wrong
- data sharing
- Why study distributed systems?
- Important
- Practical
- Challenging / Interesting
- Why is it challenging
- Faults - Fail-stop, crash
- slow nodes, misbehaving nodes
- nodes disagree, who to trust?
- Interperoleat: lack of global state
- File A has different content
- how man jobs on node B?
- Nodes are part of the system?
- Nodes delays, ordering, lost messages
- Obtaining high performance
- network comm. adds overhead
- extra work in fault handling
- Lowering costs
- figure bottlenecks
- figure amount of redundancy needed for desired reliability/availabilty
What Types of Failures #
- Halting failures
- Fail-stop
- Halting failure + notification
- Omission failures
- Network failure
- Network partition failure
- Timing failures
- Byzantine failures