CS739 Intro

Why build distributed systems?
- Performance of single computer can’t handle load
  - throughput, latency
  - cost/performance
  - commodity components
  - elasticity
  - incremental scalability
- Fault-tolerance
  - availability
  - reliability: don’t do something wrong
  - data sharing
Why study distributed systems?
- Important
- Practical
- Challenging / Interesting
Why is it challenging
- Faults - Fail-stop, crash
  - slow nodes, misbehaving nodes
  - nodes disagree, who to trust?
- Interperoleat: lack of global state
  - File A has different content
  - how man jobs on node B?
  - Nodes are part of the system?
  - Nodes delays, ordering, lost messages
- Obtaining high performance
  - network comm. adds overhead
  - extra work in fault handling
- Lowering costs
  - figure bottlenecks
  - figure amount of redundancy needed for desired reliability/availabilty

What Types of Failures