CStore

Values for each single column are stored contiguously CPU faster than IO => Use CPU to save disk bandwidth Compress data (e.g. 1 => Wisconsin, 2 => Texas) Densepack values (e.g. pack N values each K bits into N*K bits) C-Store physically stores a collection of columns, each sorted on some attribute can store overlapping projections => improves performance and redundancy Insert: Write to WS, batch move to RS Delete: marked in RS, purge later Update: insert + delete Read: historical mode query select a timestamp return the correct answer as of that timestamp Data Model Logically still tables Store projections Tuples in a projection are stored column-wise sorted on the same sort key Horizontally partitioned => each segment is associated with a key range When Query: Join multiple segments Key: Storage key RS: index in the column WS: stored as int, larger than the largest in RS SID: Segment ID Colocate join-index with EMP3(Sender) and partitioned in the same way RS Use multiple different encoding tricks Storage key is not stored (Index) WS No encoding Store storage key Update An insert is represented as a collection of new objects in WS All inserts corresponding to a single logical record have the same storage key Keys in the WS will be consistent with RS storage keys because we set the initial value of this counter to be one larger than the largest key in RS Snapshot Isolation Insertion vector contains for each record the epoch in which the record was inserted Deleted record vector contains the delete time for each tuple Tuple Mover Find segment with modify time at or before LWM (Low Water Mark lowest time user can execute queries) if deleted => discard else => move to RS Write to new RS’ segment Questions Type 3 encoding save space, not int 1:1 mapping between RS and WS?...

September 21, 2022

Access Path

Statement processing Parsing Parse, check syntax error Optimization Get statistics of tables and columns type check select access path => Access Specification Language Code Generation Execution Research Storage System Segment Scan Linear scan Index Scan Scan b-tree leaf Costs of single relation access paths COST = PAGE FETCHES + W * (RSI CALLS) PAGE FETCHES: IO RSI CALL: Predicted number of tuples F * relation cardinalities F: Selectivity factor of the Boolean factors e....

September 19, 2022

LeanStore

Related Work Anti-Caching Mark each tuple as cold or hot, move cold tuples to cold storage 8 byte overhead per tuple cannot move indexes to cold storage Siberia Classify each tuple as cold or hot Indexes only cover hot tuples Bloom filter to access cold tuples complex Swapping No overhead (hardware support) database lose control Hardware-assisted Access Tracking Modify OS kernel => still has control over page eviction Cannot handle index structures Building Blocks Pointer Swizzling Traditional buffer manager: always use page identifier lookup hash table to find page Swizzling Use virtual address directly if in memory Use page identifier for disk Efficient Page Replacement Traditional: LRU, Second chance New: Randomly select 10% of page, put into a queue, mark as cooling If access, back to hot If reach to queue front, evict Leanstore Use a global lock to protect cooling and in-flight I/O data structures FIFO Queue: Might be a FIFO linked list, hash table points to entry in queue I/O: acquire global lock -> acquire OS I/O frame lock -> release global lcok Traditional: Lock per page Reading page acquire a lock, so other thread will not swap it out New epoch-based: Global epoch regularly increase thread set local epoch to global before reading Check min(all local epoch) before evicting a page Comments HDD/SSD

September 17, 2022

Buffer Managment

Buffer Management Buffer Management Strategies (berkeley.edu) Algorithm Domain Separation Algorithms Categorize pages into different domains e.g. one domain for non-leaf b-tree one for leaf Limitation Static: Can’t dynamic adjust domain, (importance might be different for different queries) Split into 2 domains a new domain A page and all page in domain A might be more important than any page in domain B, yet it will replace a page in domain A Does not prevent interference among competing users Need separate mechanism to prevent thrashing “New” Algorithm Buffer pool is subdivided and allocated on a per-relation basis Each active relation assigned a resident set that is initially empty, resident sets of relations are linked in a priority list....

September 15, 2022

Church Numerals

supplemental.pdf (rice.edu)

September 15, 2022