Silo

Occ issues Tracking “anti-dependencies” global transaction IDha Silo epoch-based group commit only know order across epoch boundaries return results at epoch boundaries Design Global Epochs A designated thread periodically advances global epoch number $E$ Each worker $w$ maintains a local epoch number TID |epoch|id distinguish same epoach transaction |lock bit|lastest-version bit|abset bit| (a) larger than the TID of any record read or written by the transaction (b) larger than the worker’s most recently chosen TID (c) in the current global epoch....

October 13, 2022

OCC

Locking disadvantage Overhead Deadlock: need special-purpose locking protocols Need to lock root node while accessing HDD Locks need to hold until the end of transaction Locks may be necessary only in the worst case OCC Reads are not restricted returning a result is considered as a write Writes Read phase -> Validation phase -> Write phase Read Phase Maintain create/write/read set Only write to local copies (copy-on-write) Assign transaction numbers at end of the read Phase Write Phase Rename all objects in write set to original name Validation Phase Issue Long read phase need to check all previous write sets not enough memory => fail transaction Starving => lock entire database after too many retries Serial Validation Obtain lock during validation / write phases serialization is the weakest criterion for preserving consistency of a concurrent transaction system,

October 7, 2022

Isolation

Issue Highest isolation + lower isolation transactions running concurrently Lower isolation corrupt data ANSI SQL Isolation levels P1 Dirty Read T1 modify -> T2 read -> T1 rollback T2 read never committed data P2 Non-repeatable or Fuzzy Read T1 read -> T2 modify commit -> T1 read T1 read inconsistent P3 Phantom T1 search <condition> -> T2 insert new data -> T1 search again T1 search inconsistent Locking Share/Exclusive Locks Predicate lock lock on a set of data satisfying <condition> might include phantom data Well-formed request a lock before using Two phase lock request all locks before releasing any Duration long: hold until the transaction commits or aborts short: otherwise New SQL Isolation levels Snapshot Isolation The transaction successfully commits only if no other transaction T2 with a Commit-Timestamp in T 1‘s interval [Start-Timestamp, Commit - Timestsamp] wrote data that T 1 also wrote....

October 1, 2022

LockGranularity

Granularity of locks Concurrency vs Overhead => Different granularities coexist Hierarchical locks Exclusive/shared access assign same lock implicitly to descendants Intention mode lock all ancestors To lock lock all ancestors in intention mode and lock self requested root to leaf, released leaf to root Compatible two lock requests can be granted concurrently NL IS IX S SIX X NL YES YES YES YES YES YES IS YES YES YES YES YES NO IX YES YES YES NO NO NO S YES YES NO YES NO NO SIX YES YES NO NO NO NO X YES NO NO NO NO NO DAG Lock Requesting S or IS should request at least one parent in IS (or greater) Requesting IX, SIX or X should request all parents in IX (or greater) Dynamic DAG Scheduling and granting requests Granted group: From |IS| … |IS| Next request: S -> Next group Conversions IS IX S SIX X IS IS IX S SIX X IX IX IX SIX SIX X S S SIX S SIX X SIX SIX SIX SIX SIX X X X X X X X One request in a queue may request conversion Must wait for other requests in the same group to be compatible Other request my convert too => dead lock Consistency Degree 2: might read 2 different values for the same key(?...

September 27, 2022

Parallel Database Systems

Techniques Goal Speedup Scaleup Batch Scaleup task presented as a single larger job Transactional N-times as many requests Barriers Startup time Interference (Access shared resources) Skew: some job may take too long Different choices Shared-memory Network bottleneck Shared-disks Need to lock disk when writing Shared-nothing scaled up to hundreds of processors Data Partitioning Place relation fragments at different network sites Schemes Range partitioning range query, cluster data skew (place all data in same place) Round-Robin Partitioning hard to associatively access tuples Hash Partitioning no cluster Dataflow graph Different algorithm Sort-merge join Hash join Problems Lock for a long time Read dirty database Read old version Priority inversion problem Low pri client request to hi pri server Query Optimization Find optimal physical database design Run utilities (e....

September 24, 2022