Background #
- Large memory (~$\sqrt{|Relation blocks|}$)
- Uniprocessor
- Notation:
- Relation: $R$ , $S$, $|R| \leq |S|$
- Memory: $M$
- Fudge factor $F$, (A hash table of $R$ occupies $|R| \times F$ blocks)
Algorithms
Sort-Merge Join #
- Produce sorted runs of $R$ & $S$
- Do $n$-way merge, output join result while merging
- (Choose $n$ to be as big as possible, ideally n == number of runs)
Hash Based #
- Hash smaller relation $R$ on the joining attr
- Scan $S$ and find match
- Need to partition if hash table of $R$ does not fit in memory
Simple hash join algorithm #
- Choose a range of hash value
- Build hash table for $R$ if it hash into the range
- Scan $S$ and output matched
- Choose another range and repeat
GRACE #
- Partition hash values so that $R$ will be partition into equal size (each approx main memory size)
- Scan $R$ and partition into equal size
- Scan $S$ and partition (probably not equal size)
- for each $i$, build hash table for $R_i$ , scan $S_i$ and output matches
Hybrid #
- Select $B$ (Approx to the number of steps for simple hashing), parition hash values so that $R$ will be splitted into $B$ equal size partition
- Scan $R$, if belongs to first partition, build hash table, else write to disk (obviously buffer first)
- Scan $S$, if belongs to first partition, probe hash table and output if match, else write to disk
- for each $i$, build hash table for $R_i$ , scan $S_i$ and output matches
Partition Overflow #
- Hash table of a single partition too large to fit in memory
- Partition overflow on disk
- Scan big partition and partition again
- Partition overflow in memory
- Reassign some hash values to other partition
Memory management
Issue with all real memory #
- Memory manager cannot assign appropriate memory:
- To low => slow
- To high => new process can’t run
Hotset + virtual memory #
- Issue: LRU policy does not play well with join operations
- To-be-scan partition might be swap out before the partition which were just completed
- Solution: Mark page as throw immediately
Question #
- A run will be 2 * | M | blocks long
comments powered by