Background Large memory (~$\sqrt{|Relation blocks|}$) Uniprocessor Notation: Relation: $R$ , $S$, $|R| \leq |S|$ Memory: $M$ Fudge factor $F$, (A hash table of $R$ occupies $|R| \times F$ blocks) Algorithms Sort-Merge Join Produce sorted runs of $R$ & $S$ Do $n$-way merge, output join result while merging (Choose $n$ to be as big as possible, ideally n == number of runs) Hash Based Hash smaller relation $R$ on the joining attr Scan $S$ and find match Need to partition if hash table of $R$ does not fit in memory Simple hash join algorithm Choose a range of hash value Build hash table for $R$ if it hash into the range Scan $S$ and output matched Choose another range and repeat GRACE Partition hash values so that $R$ will be partition into equal size (each approx main memory size) Scan $R$ and partition into equal size Scan $S$ and partition (probably not equal size) for each $i$, build hash table for $R_i$ , scan $S_i$ and output matches Hybrid Select $B$ (Approx to the number of steps for simple hashing), parition hash values so that $R$ will be splitted into $B$ equal size partition Scan $R$, if belongs to first partition, build hash table, else write to disk (obviously buffer first) Scan $S$, if belongs to first partition, probe hash table and output if match, else write to disk for each $i$, build hash table for $R_i$ , scan $S_i$ and output matches Partition Overflow Hash table of a single partition too large to fit in memory Partition overflow on disk Scan big partition and partition again Partition overflow in memory Reassign some hash values to other partition Memory management Issue with all real memory Memory manager cannot assign appropriate memory: To low => slow To high => new process can’t run Hotset + virtual memory Issue: LRU policy does not play well with join operations To-be-scan partition might be swap out before the partition which were just completed Solution: Mark page as throw immediately Question A run will be 2 * | M | blocks long