Background

  • Large memory (~$\sqrt{|Relation blocks|}$)
  • Uniprocessor
  • Notation:
    • Relation: $R$ , $S$, $|R| \leq |S|$
    • Memory: $M$
    • Fudge factor $F$, (A hash table of $R$ occupies $|R| \times F$ blocks)

Algorithms

Sort-Merge Join

  • Produce sorted runs of $R$ & $S$
  • Do $n$-way merge, output join result while merging
  • (Choose $n$ to be as big as possible, ideally n == number of runs)

Hash Based

  • Hash smaller relation $R$ on the joining attr
  • Scan $S$ and find match
  • Need to partition if hash table of $R$ does not fit in memory

Simple hash join algorithm

  • Choose a range of hash value
  • Build hash table for $R$ if it hash into the range
  • Scan $S$ and output matched
  • Choose another range and repeat

GRACE

  • Partition hash values so that $R$ will be partition into equal size (each approx main memory size)
  • Scan $R$ and partition into equal size
  • Scan $S$ and partition (probably not equal size)
  • for each $i$, build hash table for $R_i$ , scan $S_i$ and output matches

Hybrid

  • Select $B$ (Approx to the number of steps for simple hashing), parition hash values so that $R$ will be splitted into $B$ equal size partition
  • Scan $R$, if belongs to first partition, build hash table, else write to disk (obviously buffer first)
  • Scan $S$, if belongs to first partition, probe hash table and output if match, else write to disk
  • for each $i$, build hash table for $R_i$ , scan $S_i$ and output matches

Partition Overflow

  • Hash table of a single partition too large to fit in memory
  • Partition overflow on disk
    • Scan big partition and partition again
  • Partition overflow in memory
    • Reassign some hash values to other partition

Memory management

Issue with all real memory

  • Memory manager cannot assign appropriate memory:
    • To low => slow
    • To high => new process can’t run

Hotset + virtual memory

  • Issue: LRU policy does not play well with join operations
    • To-be-scan partition might be swap out before the partition which were just completed
  • Solution: Mark page as throw immediately

Question

  • A run will be 2 * | M | blocks long