This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Two pggﬂalgvorithglgu TwoPass Algorithms Based o Du lieate elimin ’ 115 R ac? J'Sﬂnt F"
(HM
 Step 1: sort runs of size M, write .__>
— Cost: 2B(R) ' Step 2: merge Ml runs,
but include each tuple only once — Cost: B(R) VijﬂPm
‘
 Total cost 313 R Assumption: B(R)< Jwﬁ “l”: Q: What can sorting help? And, how? ' Selection?
R1 3, 9/254 / ..—r”' 0 Projection?
° Join?
 Duplicate elimination? cla‘mM "bf
Mai a .. ° Grouping? select .Cm 23 J TwoPass Algorithms Based on Sorting
Grouping: y éMTgym like 8 Jity, sumLprice)
 Same as’EEfore: sort, then compute the _ sum(price) for each group Pd 74 Sm}? .
 As before: compute sum(price) during the merge
phase. 5 WI  Total cost: 3B(R)  Assumption: B(R) <= M3 24 TwoPass Algorithms Based on Sorting
Swag 0,} Kr \S
Binary operations: R O S, R U S, R — S  Idea: sort R, sort S, then do the right thing  A closer look: — Step 1: split R into runs of size M, then split S into runs
of size M. Cost: 2B(R) + 2B(S) # Step 2: marge aHx 111115 frOlTl R; marge all y runs from
S; ouput I  . _ ; : : e by cases basis (.r+jr <= M)
 Total cos : 3B R +3B _. _ ' Assumption: B(R)+B(S)<= M2 f‘ﬂ—H‘ 25 TwoPass Algorithms Based on Sorting Join Rt><1 s W R Q) ' Start by sorting both R and S on the join attribute:
— Cost: 4B{R)+4B(S) (because need to write to disk) II Read both relations in sorted order, match tuples — Cost: B(R)+B(S)  Difﬁculty: many tuples in R may match many in S $21.39*“? — If at least one set of tuples fits in M, we are OK — Otherwise n   _ . up,highereost »\
 Total cos 5B '__ t“ my wJI‘W 55V“
 Assumption:  M2 ‘ 0 Q: Why is sortingbased “two” pass? P . 0r ‘
if? ’29“ b7 5 www R—v '31 “W” I)
80%. TL“? 5 é?
WMFMT} \3 3' ng
W st
2% WA 3 ~.. A
M .. 27 5953,
Two Pass Algorithms Based 0 Disk M main memory huﬁ'el‘s Disk .—p [
° Does each bucket ﬁt in main memo 9 Q: What can hashing help? And, how? Selection? Projection? Set operations? Join? Duplicate elimination?
Grouping? 29 Hash Based Algorithms for 2 Recall: QR) : duplicate elimination
Step 1. Partition R into buckets R7) 2, . _ , 1%
Step 2. Apply 5: to each bucket (may read in'ﬁTain # memory) W Fm: 5G2) —. M Cost: 3B(R) [2 ::
Assumption:B(R) < S Iago
Hash Based Algorithms for y  Recall: y(R) = grouping and aggregation
 Step 1.?artition R into buckets  Step 2. Apply y to each bucket (may read in main
memory)  Cost: 3B(R)
 Assumption:B(R) <= M2 3] Hashbased Join I'R[><IS ° Simple version: main memog hashbased foin
— Scan S? build buckets in main memory — Then scan R and join  Requirement: min(B(R), B(S)) <= M 32 T]. I74 T211
1.1,? VTW‘fiPartitioned Hash R D<1 S M '3' [577
Step 1: (D 5
“m2 HashSintoMbuekets S _> SI ' ' "" 10—1?
9 send all buckets to disk  tep 2 ® R
‘ — Hash R into M buckets R19 Rd * ' ‘ '51)
— Send all buckets to disk  Step 3 <3) 85:; D§ ‘21;
— Join every pair of buckets "ﬂ" \ _ wa$139
w £22,905; 33 Partitioned Egan]
HashJoin R * Partition both relations
using hash fn h: R
tuplos in partition i will
only match S tnples in
partition i. — — — — — — — — — — — — — — — — — — — — — — —  —  — — — o:o Read in a partition .
of R, hash it using
h2 (<:> ht). Scan
matching partition of 3, search for l {D D D matches. Partitioned Hash Join  Assumption: M I At least one full bucket of the smaller rel must ﬁt in mem0(R), B(S_)) <= M2
[we 6 Small +136 70 ML 35 Partitioned Hash Join  Assumptin:
At least one full bucket of the smaller rel must ﬁt in 35 / 3N 1m .
Indexbased algpﬁ’thms mpass!)dlmn 36 Indexed Based Algorithms  In a clustered index all tuples with the same value
of the key are clust ed on as few blocks as p0331ble W M ' ~/ 3T SimBﬁc FNWW VCR; “MM”
Wsed Selection £56190  Selection on equality: Ga=V(R) y(R’  Clustered index on a: cost B(R) 4L
 Unelustered index on a: cost T(R) 'V ' ,a) 35: ‘56:; black; wgte nmnolwwevl Ra erg €5.21 @926 Wei“) m blocks tar 549%: $4233
P 17170 .. (clusww‘) ‘— Lg—QED Emeglﬁ) / 38 Index Based Selection Example: B(R) = 2000, T(R) = 1003000, V(R, a) = 20,
compute the cost of 03=V(R) Cost of table scan: — If R is clustered: B(R) I 2000 I/Os — If R is unelustered: T(R) = 100,000 I/Os Cost of index based selection:
— If index is clustered: B(R)/V(R,a) = 100
— If index is unclustered: T(R)/V(R,a) = 5000 Notice: when V(R,a) is small, then unclustered index is useless 39 Index Based Join
R [><] S Assume S has an index on the join attribute Iterate ever R, for each tuple fetch cen‘espending
tuple(s) from S Assume R is elustere ; '
— Ifindex is clustere' B(R) + T( B(S)fV S,a)
— If index is unclustered: B R) .9 a) as; S
9L” 11% V ’ Average SQLLite Score: 3.2
—: Average SQL Tuning Score: 3.55
q—" :g 'ons
re engaging lectures
'  p psqldemo
op It
s gestion Combine into 1 lecture
hot topics in db ﬁeld more integration, no standalone ST lectures
' ' tion RDBMS topic on web crawling
vote on topics before hand
enum exam topics from ST lectures
topic on massively scalable DBs
easier topics have kevin teach ST lectures
lecturefrom industry more famous speakers
more hands one topics
more variety move ST lectures up no ST lecture on Fridays
remove SQLLite topic on DB's behind facebook, twitter topic on hash tables
topic on OODBMS topic on Oracle topic on speeding up sql
Use Previous Proiects rthan SQL Count l—‘LLJ
NI—‘I NMWWWWWWWW~WW ...
View
Full
Document
This note was uploaded on 02/17/2012 for the course CS 411 taught by Professor Winslett during the Spring '07 term at University of Illinois at Urbana–Champaign.
 Spring '07
 Winslett

Click to edit the document details