Techniques

Goal

  • Speedup
  • Scaleup
    • Batch
      • Scaleup task presented as a single larger job
    • Transactional
      • N-times as many requests

Barriers

  • Startup time
  • Interference (Access shared resources)
  • Skew: some job may take too long

Different choices

  • Shared-memory
      • Network bottleneck
  • Shared-disks
      • Need to lock disk when writing
  • Shared-nothing
      • scaled up to hundreds of processors

Data Partitioning

  • Place relation fragments at different network sites

Schemes

  • Range partitioning
      • range query, cluster
      • data skew (place all data in same place)
  • Round-Robin Partitioning
      • hard to associatively access tuples
  • Hash Partitioning
      • no cluster

Dataflow graph

Different algorithm

  • Sort-merge join
  • Hash join

Problems

  • Lock for a long time
    • Read dirty database
    • Read old version
  • Priority inversion problem
    • Low pri client request to hi pri server
  • Query Optimization
  • Find optimal physical database design
  • Run utilities (e.g. create indices) without taking database offline

Question

  • NUMA, multi-socket