Example
R(A,B,C)
is a relation.
T(R) = 10,000
T
(
σ
A<10
(R)
)
= T(R)
×
1
3
≈
3334
73 / 106
Estimate values in range
Example
R(A,B,C)
is a relation.
T(R) = 10,000
The
DBMS
statistics show that the values of the
A
attribute lie within the
range
[8,57]
, uniformly distributed.
Question: what would be a reasonable estimate of
sel
A<10
(R)
?
Answer
We see that
578+1
different values of
A
are possible
however only records with values
A=8
or
A=9
satisfy the filter
A<10
.
Therefore,
sel
A<10
(R)=
2
(57

8+1)
=
2
50
=0.04
And hence,
T
(
σ
A<10
(R)
)
= T(R)
×
sel
A<10
(R) = 400
74 / 106
Result size estimation: 3. Notequal selection:
σ
A
=
c
(R)
Notequal selection
:
σ
A
=
c
(R)
, for attribute
A
and constant
c
:
can usually use
T(
σ
A
=
c
(R)) = T(R)
for simplicity
more accurately, subtract a fraction
1
V(R,A)
Fact:
σ
A
=
c
(R)
∪
σ
A=c
(R) = R
⇔
σ
A
=
c
(R) = R 
σ
A=c
(R)
Therefore,
sel
A
=
c
(R)=
V(R,A)1
V(R,A)
T
(
σ
A
=
c
(R)
)
= T(R)
*
V(R,A)1
V(R,A)
75 / 106
Result size estimation:4. Selection conditions with NOT
Selection conditions with NOT
:
σ
¬
p
(R)
sel
¬
p
(R)= 1 
sel
p
(R)
⇒
T(
σ
¬
p
(R)) = T(R)  T(
σ
p
(R))
76 / 106
Result size estimation: 5. Selection using several
conditions with AND
Selection using several conditions with AND
:
σ
P
1
∧
P
2
(R)
treat selection as a cascade of several selections
Treat
σ
P
1
∧
P
2
(R)
as
σ
P
1
(
σ
P
2
(R)
)
the order does not matter, treating this as
σ
P
2
(
σ
P
1
(R)
)
gives the same
results.
Estimated size is original size multiplied by the selectivity factor
sel
P
1
∧
P
2
(R)=
sel
P
1
(R)
*
sel
P
2
(R)
⇒
T(
σ
P
1
∧
P
2
(R)) = T(R)
*
sel
P
1
(R)
*
sel
P
2
(R)
77 / 106
Result size estimation: 5.
σ
P
1
∧
P
2
(R)
Example
R(A,B,C)
is a relation.
T(R)
= 10,000.
V(R,A) = 50
Estimate the size of the result set
U =
σ
A=10
∧
B<20
(R)
Answer
sel
A=10
(R) =
1
50
sel
B<20
(R) =
1
3
T(U) =
sel
A=10
*
sel
B<20
*
T(R) =
1
50
*
1
3
*
10,000 = 66.67
78 / 106
Result size estimation: 6. Selection using several
conditions with OR
Selection using several conditions with OR
:
σ
P
1
∨
P
2
(R)
P
1
∨
P
2
=
¬
(
¬
P
1
∧ ¬
P
2
)
Treat
σ
P
1
∨
P
2
(R)
as
σ
¬
(
¬
P
1
∧¬
P
2
)
(R)
sel
P
1
∨
P
2
(R)= 1
(
1
sel
p
1
(R)
)
*
(
1
sel
P
2
(R)
)
⇒
T
(
σ
P
1
∨
P
2
(R)
)
= T(R)
*
sel
P
1
∨
P
2
(R)
Example
R(A,B,C)
is a relation.
T(R)
= 10,000.
V(R,A) = 50
Estimate the size of the result set
U =
σ
A=10
∨
B<20
(R)
Answer
sel
A=10
(R) =
1
50
sel
B<20
(R) =
1
3
T(U) = T(R)
*
(
1

(1

1
50
)(1

1
3
)
)
79 / 106
Result size estimation:
R
1
S
In our size estimations for join, we will look at natural join (
1
), but other
joins is managed similarly
equijoin as natural join
thetajoins as a cartesian product followed by a selection
Assume:
R(X,Y)
and
S(Y,Z)
, we join on
Y
:
R(X,Y)
1
S(Y,Z)
.
Question: Estimate the size of
(
R(X,Y)
1
S(Y,Z)
)
The challenge is we do not know how the set of values of
Y
in
R
relate to
the values of
Y
in
S
. There are some possibilities:
If
Y
attribute values in
R(X,Y)
and
S(Y,Z)
are disjoint  empty join:
T
(
R(X,Y)
1
S(Y,Z)
)
= 0
If
Y
attribute is a key in
S
and a foreign key of
R
, so each tuple in
R
joins
with exactly one tuple in
S
:
T
(
R(X,Y)
1
S(Y,Z)
)
= T(R)
If almost every tuple in
R
and
S
has the same
Y
attribute value  combine
all tuples of each relation:
T
(
R(X,Y)
1
S(Y,Z)
)
= T(R)
*
T(S)
Range of
T(R
1
S): 0
≤
T(R
1
S)
≤
T(R)
*
T(S)
80 / 106
Result size estimation:
R
1
S
: Simplifying Assumptions
For our calculations, we will make two assumptions:
1.
containment of value sets assumption
An attribute
Y
in a relation
R(
. . .
,Y)
always takes on a
prefix
of a fixed
list of values:
y
1
y
2
y
3
y
4
. . .
You've reached the end of your free preview.
Want to read all 106 pages?
 Fall '19
 Joseph Rosen
 Relational model, StarsIn, R 1c S