ics321-20091022-storage2

ics321-20091022-storage2 -...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Storage
and
Indexing
(ii)
 Asst.
Prof.

Lipyeow
Lim
 InformaBon
&
Computer
Science
Department
 University
of
Hawaii
at
Manoa
 10/22/2009
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 1
 ICS
321
Fall
2009
 Analysis
of
Heap
File
Storage
 OperaBon
 Worst
Case
 Analysis
 Scans
 B*(D
+
R*C)
 Point
 Query
 Range
 Query
 Insert
 Delete
 B*(D
+
R*C)
 B*(D
+
R*C)
 2*D
+
C
 2*
B
*
(D
+
 R*C)
 • 
Fetch
all
B
pages
from
disk
into
memory
 • 
Process
each
record
on
each
page
 • 
In
the
worst
case,
the
desired
record
is

the
 last
record
on
the
last
page
 • 
Since
file
is
unsorted,
the
desired
records
can
 be
anywhere
in
the
file,
so
we
have
to
scan
the
 enBre
file.
 • 
Insert
at
the
end
of
the
file.
 • 
Read
in
the
last
page
 • 
Add
record
 • 
Write
the
page
back
 • 
Search
for
the
record
to
be
deleted
 • 
Delete
the
record
 • 
Move
all
subsequent
records
&
pages
forward.
 2
 10/22/2009
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 Analysis
of
Heap
File
Storage
(Disk
Only)
 OperaBon
 Worst
Case
 Analysis
 Scans
 B*D
 Point
 Query
 Range
 Query
 Insert
 Delete
 B*D
 B*D
 2*D
 2*B*D
 • 
Fetch
all
B
pages
from
disk
into
memory
 • 
Process
each
record
on
each
page
 • 
In
the
worst
case,
the
desired
record
is

the
 last
record
on
the
last
page
 • 
Since
file
is
unsorted,
the
desired
records
can
 be
anywhere
in
the
file,
so
we
have
to
scan
the
 enBre
file.
 • 
Insert
at
the
end
of
the
file.
 • 
Read
in
the
last
page
 • 
Add
record
 • 
Write
the
page
back
 • 
Search
for
the
record
to
be
deleted
 • 
Delete
the
record
 • 
Move
all
subsequent
records
&
pages
forward.
 10/22/2009
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 3
 DeleBng
a
Record
 Memory File Record to be deleted 10/22/2009
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 4
 Analysis
of
Sorted
File
Storage
 Op
 Scans
 Point
 Query
 Range
 Query
 Insert
 Worst
Case
 Analysis
 B*(D
+
R*C)
 D
log
B
+
C
log
R
 • 
Fetch
all
B
pages
from
disk
into
memory
 • 
Process
each
record
on
each
page
 • 
Binary
search
for
the
desired
page
 • 
Binary
search
for
the
desired
record
within
 the
page
 • 
Let
S
be
the
number
of
records
in
the
result


 • 
Binary
search
for
the
desired
page
and
record
 • 
Fetch
the
next
S
records
 • 
Binary
search
to
inserBon
point
 • 
In
worst
case,
page
has
no
extra
space,
so
 page
is
split
 • 
Move
all
subsequent
pages
back
 • 
Search
for
the
record
to
be
deleted
 • 
Delete
the
record
 • 
Move
all
subsequent
pages
forward
 5
 D
log
B
+
C
log
R
 +

S/R*D
+
S*C
 D
log
B
+
C
log
R
 +
2*B*(D
+
R*C)
 Delete
 D
log
B
+
C
log
R
 +
2*B*(D
+
R*C)

 10/22/2009
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 Heap
vs
Sorted
File
 Op
 Heap
 Sorted
 B*D
 D
log
B
 D
log
B
+

 S/R*D
 Scans
 B*D
 Point
 B*D
 Query
 Range
 B*D
 Query
 Insert
 2*D
 D
log
B
+
 2*B*D
 Delete
 2*B*D
 D
log
B
+
 2*B*D
 10/22/2009
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 6
 Indexes
 •  An
index on
a
file
speeds
up
selecBons
on
the
 search key fields for
the
index.
 –  –  •  An
index
contains
a
collecBon
of
data entries,
and
 supports
efficient
retrieval
of
all
data
entries
k*
 with
a
given
key
value
k.
 –  A
data
entry
is
usually
in
the
form
<key,
rid>
 –  Given
data
entry
k*,
we
can
find
record
with
key
k
in
 at
most
one
disk
I/O.

(Details
soon
…)
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 Any
subset
of
the
fields
of
a
relaBon
can
be
the
search
 key
for
an
index
on
the
relaBon.
 Search key is
not
the
same
as
key
(minimal
set
of
fields
 that
uniquely
idenBfy
a
record
in
a
relaBon).
 10/22/2009
 7
 B+
Tree
Indexes
 Non-leaf Pages Leaf Pages (Sorted by search key) index entry P0 K1 P1 K2 P2 K m Pm •  Leaf
pages
contain
data
entries,
and
are
chained
(prev
&
next)
 •  A
data
entry
typically
contain
a
key
value
and
a
rid.
 •  Non‐leaf
pages
have
index
entries;
only
used
to
direct
searches:
 10/22/2009
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 8
 Example
B+
Tree
 Root Note how data entries in leaf level are sorted 17 Entries <= 17 Entries > 17 5 13 27 30 2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* •  Find
28*?
29*?
All
>
15*
and
<
30*
 •  Insert/delete:

Find
data
entry
in
leaf,
then
 change
it.
Need
to
adjust
parent
someBmes.
 –  And
change
someBmes
bubbles
up
the
tree
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 10/22/2009
 9
 Analysis
of
Heap
File
with
B+Tree
Index

 Op
 Scans
 Worst
Case
 Analysis
 B*D
 • 
B+
tree
search
for
the
desired
index
page
 • 
Binary
search
for
the
desired
record
within
the
index
 page
 • Fetch
the
data
page
 • 
Let
S
be
the
number
of
records
in
the
result


 • 
B+
tree
search
for
the
desired
index
page
 • 
Fetch
the
next
S/R
index
leaf
pages
 • 
Fetch
the
data
pages
for
the
S
records
 • 
Insert
record
to
end
of
heap
file

 • 
B+
tree
search
to
find
index
page
for
the
inserted
 record
 • 
create
a
data
entry
for
the
inserted
record
in
the
 index
page.
In
worst
case,
index
page
has
no
extra
 space
and
page
split
cascades
up.
Write
index
pages
 • 
B+
tree
search
for
the
desired
index
page
and
record
 • 
Fetch
the
data
page
and
delete
the
record
 • 
In
the
worst
case,
data
page
is
empty
aker
deleBon
 and
needs
to
be
removed
from
heap
file
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 10
 Point
 D
logF
B
+
D
 Query
 Range
 D
logF
B
+

 Query
 S/R
*D
+
 S*D
 Insert
 2*D
+
3*D*
 logF
B
 Delete
 D
logF
B
+
 +
2*B*D

 10/22/2009
 Running
Comparison
 Op
 Scans
 Heap
 B*D
 Sorted
 B*D
 D
log
B
 Heap+Tree
 B*D
 D
logF
B
+
D
 Point
 B*D
 Query
 Range
 B*D
 Query
 D
log
B
+

 D
logF
B
+

 S/R*D
 S/R
*D
+
 S*D
 Insert
 2*D
 D
log
B
+
 2*D
+
3*D*
 2*B*D
 logF
B
 Delete
 2*B*D
 D
log
B
+
 D
logF
B
+
 2*B*D
 +
2*B*D

 10/22/2009
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 11
 Value for age Hash‐Based
Indexes
 H1(age)
 Hash Index on Age Overflow page 56
 41
 20
 25
 20
 25
 30
 45
 File •  Index
is
a
collecBon
of
buckets
that
contain
data
entries •  Hashing func=on h:

h(r)
=
bucket
in
which
(data
entry
for)
 record
r
belongs.
h
looks
at
the
search key
fields
of
r. •  No “index entries” in this scheme. 10/22/2009
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 12
 –  Bucket
=
primary
page
plus
zero
or
more
overflow
pages.

 Analysis
of
Heap
File
with
Hash
Index

 Op
 Scans
 Worst
Case
 Analysis
 B*D
 • 
Hash
search
for
the
desired
index
page
 • 
Linear
search
for
the
desired
record
within
the
index
 page
 • Fetch
the
data
page
 • 
Hash
index
does
not
support
range
queries
 • 
Fall
back
on
scanning
the
heap
file
 • 
Insert
record
to
end
of
heap
file

 • 
Hash
search
to
find
index
page
for
the
inserted
 record
 • 
Create
a
data
entry
for
the
inserted
record
in
the
 index
page.
 • 
Write
index
page
back
to
disk
 • 
Hash
search
for
the
desired
index
page
and
record
 • 
Fetch
the
data
page,
delete
the
record
 • 
In
the
worst
case,
pages
need
to
be
moved
forward
 • 
update
index
page
and
write
back
to
disk
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 13
 Point
 2*D
 Query
 Range
 B*D
 Query
 Insert
 4*D
 Delete
 3*D
+
 2*B*D

 10/22/2009
 Running
Comparison
 Op
 Scans
 Heap
 Sorted
 B*D
 B*D
 D
log
B
 Heap+Tree
 B*D
 D
logF
B
+
D
 Heap +Hash
 B*D
 2*D
 B*D
 Point
 B*D
 Query
 Range
 B*D
 Query
 Insert
 2*D
 D
log
B
+

 D
logF
B
+

 S/R*D
 S/R*D
+
S*D
 D
log
B
+
 2*D
+
3*D
logF
B
 4*D
 2*B*D
 3*D +2*B*D

 14
 Delete
 2*B*D
 D
log
B
+
 D
logF
B
+
 2*B*D
 +
2*B*D

 10/22/2009
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 Index
ClassificaBons
 •  What
should
be
in
a
Data
Entry
k*
?
 –  PossibiliBes:
 •  The
data
record
itself
with
key
value
k
 •  <k,
rid
of
data
record
with
key
value
k>
 •  <k,
list
of
rids
of
data
records
with
key
value
k>
 –  Variable
size
data
entries
 •  Primary
vs
Secondary
 –  Applies
to
any
indexing
technique
 –  Primary
index
:
search
key
contains
primary
key
 –  Unique
Index
:
search
key
contains
candidate
key
 –  Clustered
index:
order
of
data
records
same
or
close
 to
order
of
data
entries
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 15
 •  Clustered
vs
unclustered
 10/22/2009
 Clustered
vs
Unclustered
Index
 •  Suppose
data
records
are
stored
in
a
Heap
file.
 –  –  
To
build
clustered
index,
first
sort
the
Heap
file
(with
 some
free
space
on
each
page
for
future
inserts).


 Overflow
pages
may
be
needed
for
inserts.

(Thus,
 order
of
data
recs
is
`close
to’,
but
not
idenBcal
to,
the
 sort
order.)
 Index entries direct search for data entries CLUSTERED UNCLUSTERED Data entries Data entries (Index File) (Data file) Data Records 10/22/2009
 Data Records 16
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 Clustered
File
 File Tree‐based
index
 •  An
index
where
the
data
entry
contains
the
data
record
itself
(cf.
 just
the
key
value,
RID
pair).
 •  No
heap/sorted
file
is
used,
the
index
IS
the
file
of
record
 •  Steps
to
build
a
clustered
file:
 –  Sort
data
records
 –  ParBBon
into
pages
 –  Build
the
tree
on
the
pages
 10/22/2009
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 17
 Analysis
of
Clustered
Files
 Op
 Scans
 Point
 Query
 Range
 Query
 Insert
 Worst
Case
 Analysis
 B*D
 D
logF
B
 D
logF
B
+

 S/R
*D
 3*D
logF
B
 • 
B+
tree
search
for
the
desired
index
page
 • 
Binary
search
for
the
desired
record
within
the
index
 page
 • 
Let
S
be
the
number
of
records
in
the
result


 • 
B+
tree
search
for
the
desired
index
page
 • 
Fetch
the
next
S/R
index
leaf
pages
which
contains
 the
data
records
as
well
 • 
B+
tree
search
to
find
index
page
for
the
inserBon
 point
 • 
create
a
data
entry
for
the
inserted
record
in
the
 index
page.
In
worst
case,
index
page
has
no
extra
 space
and
page
split
cascades
up.
Write
index
pages
 • 
B+
tree
search
for
the
desired
index
page
and
record
 • 
Delete
the
record
 • 
In
the
worst
case,
the
index
page
is
underfilled
aker
 deleBon
and
needs
to
be
rebalanced
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 18
 Delete
 2*D
logF
B
 10/22/2009
 Running
Comparison
 Op
 Scans
 Heap
 B*D
 Sorted
 B*D
 D
log
B
 Heap+Tree
 B*D
 D
logF
B
+
D
 Heap +Hash
 B*D
 2*D
 Clustered
 File
 B*D
 D
logF
B
 D
logF
B
+

 S/R
*D
 3*D
logF
B
 Point
 B*D
 Query
 Range
 B*D
 Query
 Insert
 2*D
 D
log
B
+

 D
logF
B
+

 B*D
 S/R*D
 S/R*D
+
S*D
 D
log
B
+
 2*D
+
3*D
 2*B*D
 logF
B
 4*D
 Delete
 2*B*D
 D
log
B
+
 D
logF
B
+
 2*B*D
 +
2*B*D

 10/22/2009
 Lipyeow
Lim
‐‐
University
of
Hawaii
at
Manoa
 3*D 2*D
logF
B
 +2*B*D

 19
 ...
View Full Document

Ask a homework question - tutors are online