Statement processing

  1. Parsing
    • Parse, check syntax error
  2. Optimization
    • Get statistics of tables and columns
    • type check
    • select access path => Access Specification Language
  3. Code Generation
  4. Execution

Research Storage System

  • Segment Scan
    • Linear scan
  • Index Scan
    • Scan b-tree leaf

Costs of single relation access paths

  • COST = PAGE FETCHES + W * (RSI CALLS)
    • PAGE FETCHES: IO
    • RSI CALL: Predicted number of tuples
      • F * relation cardinalities
      • F: Selectivity factor of the Boolean factors
        • e.g. WHERE colulmn = value, F = 1 / unique key count
          • (Assume there is an index on column, assume even distribution)
  • Calculate cost for interesting order (some order as GROUP BY ORDER BY ) and unordered
    • Use the cheaper one

Join

  • List all join order permutations
    • 剪枝: cartesian products should perform late (Join relations with join predicates between first)
  • Calculate cost (BFS), decide algorithm (nested loop vs sort-merge) and find cheapest

Cost algorithm

C-nested-loop-join(pathl,path2) = 
   C-outer(path1) + N * C-inner(path2)

C-merge(pathl,path2) = 
   C-outer(path1) + N * C-inner(path2)
  • C-outer/inner: cost of scanning
    • C-inner will be much less for merge-sort join
  • N be the cardinality of the outer relation tuples which satisfy the applicable predicates