You've reached the end of your free preview.
Want to read all 58 pages?
Unformatted text preview: FIG 5/2
WISCONSIN UNIV—MADISON MATHEMATICS RESEARCH CENTER
THE DESIGN AND IMPLEMENTATION OF A DATABASE MANAGEMENT SYSTEM U——E TC (U )
JUN 79 A iJ BAROODY . D J DEWItT
MRC TSR I97O AD A077 101 p 0H
~ UNCLASSIFIED ________
_ E11~j END H
I ~~~~~~~~~~ — -~~~~~~~~~~~~~~~~ MRC TEa*41CAL. SIJ4MARY RZPORT I 1970 ThE D~ $I( ~~ AND HW LZ1~~NTATION O~ A
DATA BASE MANACa(ENT SYST 4 USI
ABSTRACT DATA TYPES A. J
~~~s Baroody . Jr.
Davi d J . Da WLtt Mathematics Research Cente r
Univers ity of Wisconsin—Mad ison M:dis:II. i$c / 7)
~j ; In 53706 ~~ 0 •‘ . ~~~~~~~~~~~~~~~~ \ ~~~~~~~~ ~~ y 29 , 19~~ 1 -J
U. Ap ,,.,ed Is. psblèc sisase Is Destri bst i.. usIi ~~itd Spc*~.ored by U. S. Ax~ y Rseurch Of tic.
p 0. Box 12211
i .pearch Tri~~ g1. Park
~ h carolina 2 7709
Nort _ _ National Sci•n ~~
Waahin~tcn .D.C. 20550 _ _ _ _ _ ! UN I VE R SIT Y 1 WI
N.~I ~ — MAD I
MAThEMATICS RESEARCH CENTEH
Tt~~ 1*~SI ~ N AND IMP LEMENTATI ON OP A DATA BASE
MANAGEMENT SYSTEM USING ABS TRACT DATA TYPES
A. J~~~’s Baroody , Jr. and dVx <I J . ~t•W ~ t~ Te chnica l St ~~~ary kt ~~~ x t Sl ~~7
June’ 197. ABST RACT
The ~e” .ig n , i~~ lemsntat i ’n . and p e r f or m ance analysis of a databa se man a ge umnt system li~~1eev,nted using abstract data t y ~ cs are pre sented.
use of abstract data t ypc~. as an lq ~lemsn t a t ion tool siqn i fic a n t advantage s ove r current xy ~~ 1emsnt a t ion
a combination ~if
n 1 shown t , ech i j~~~s. abstract data t ype s and gen er i c proce d ures t~ of a da t abase system . the r e s u l t i n g software T~i~• ..i vi• si’w ’ra l ~ Fir ~;t , by u s i n g
,;t r ’jc t ute the w i l l be e~~rc r e l i a b l e . ~~~~~ by e~~~lo yin g a progra umi ng lan gua ge w~~i - ~. suVport s sp e c i f i r . i t i~ n and
f i cati on of abstract data type s , we can gua r antee data 1 n k ~~en ~~? n c e . Fin ally , we deimns trate that t
he ap~~l i c a t i rj n of abstract data type c
• ‘ l i m i n a t i i n of
IBM ’ s IMS , e rru ts ¶ ! . rt g -ti r i nt i r i retatic * ‘f the schema and subschema such as ir .
~ lvac ’ s :~~~ l 1~~ , and INC,RES. Instead , the data mani ;ulatlon i n e’ c , w .i
~ ~ are shown to be e xai~~ lc s o ’ gene ri c procedures , are as p ara~~ terized calls to the pro ce dure s bo i xi d t r the i nst a n c e s in ~~1emsnted f the thre e abstract data t y ; c’s used to r”~~resent the’ logical ‘-tructure of t h - databa se . A~MS (74)S) Subj ect
la s s if i catlon 68A50
Key Words : iiataba se manaqemsnt sy stems , abstract data t y ~ es ,
~~ n ’-r i c
~ rocedures , CODASYL, qu sry ~ *~cut i on No rA ‘n i t No. A Ccx~~ ute r Science
This research was partially sponsore d by the Nat i~~~al Science Fo x dat lon ~s~~~ r
gran t MCS78-0172 1 and the Un i ted States Army un de r contract No. DMG29-7 S-C-0P24 . .
~~~~~ - .
~~~~~~~~~~~~~~~ I .- —=~- - i ~~tr ~~~ A ~ww new
~ ; t
~ ~~~~~~ t s ;. i . t jv ~~~~~~~~~~ ~ z
~ it i I suc h c rc’~c’ . c t twa
r un — t tn t ? q 1 t:,,, t
~ dat a base m.tnaqe~~~nt
.it , pr ograms.ing language
ius s t~~~. t: ~I — - :
~~~~~~ _ =~~: .t i t t A!J A t~ :~~~:~ - ~~. , 1’ : . ‘:,tem
~ d a t a t y ~ .-s , are ~u~~Lemsntation a fea ture I M ’ ’ A , ~4 ~ f L A
Then — , of ~~ ck ~rn and the ~~~~ : ~ -; se veral con seqtrnces , guaran teed data i i
~~ re reliable software
i citt ~~ ’y cus ca rvd to cczmventi~~ al is 1~ iT ~ r
~ ;r .am— i r~~r~ t t this (Sf ;t-n ..ii~~~• , )l e ?nt at 1 )1 1
~~ ~ /
~~~ The resp iis ibilt ty for the wo r d.ing and v iew s exp ressed In this descriptive
stw ~~ar / l i e s w i t h MRC , and not with the authors of thi s report . ‘?
~ /1 - - _ _ _ _ _ _ _ _ _ _ _ _ _ _ - — —------ -- - - -- ----- --- ‘L ‘~ — I — - - • ,
~~ ----- -- -~ —-.- - - - - — - IMi M1\NA ;EP*:NT : Y ~;TE M
A ~~~~~~~ t-t.tz ~~ I ~~~
, — _ _ __ ABSTRACT : ATA . and : 4 ’/ i ;_ — _ ~YI — -- -— S -1 3. I’s -~~i I)ata independence requires that user programs be isolated from the details
ing physical and k~ icaI structures used to implement the
~~ciI~ er nin~’ the underl~
d t t ih,ise. L ) i t a independence is achie cd h~ presenting to the user an abstraction of
the database in ~ hich details of its actual impkmcntalion ;ire hidden. The
dc~dopment of ttx ls to support abstraction is an ~n.tI%e tica in current research on
pro ’r rmming methodoIog and programming languages. I his report anaI~ ies the
u S C Of .ihstr.iet
data types .is an implementation ItX)l IOi database management
The data model defines a coIle ~iion of logical structures ~ hich arc a~ailab)c
~~c the users of a database management s stcm . In order to u tilize thcsc structures for
database must be defined in terms of the
appliczftions. the structure
st ructur es in thc data model. The definition of the database is termed the schema,or
dal i model definition. Ihe schema defines the t’~pes ~t entities w hich ma~ exist
iihin the database and a l so defines the relationships ~ hrLh ma~ ex ist bct~ een these
entities . The schema thus is a definition of the logical structure of a database and is
shared %,~ all uscr~ t ’si the database . A suhschema. or data suhmodel definition. is the
definition of the subset of the database to be used h~ an application. Fhe subschema
rs a restriction of the schema and defines an application-dependent window into the
database. The data manipulation routines handle all accesses to the database. A request
from a user to access the database is in term s of a procedure call to a data
manipulation routine. This procedure uses the schema and subschema to determine
the neccssar operations to be performed on the database. In the network data
model the actual parameters to the data manipulation routines arc records and sets.
In our approach the record and set are considered to be generic types. The data
~ manipulation routines utiliie the information in the schema and subschcma to
dcscribc the actual parameters in order to determine the function to be performed.
Thus the data manipulation routines are examples of generic procedures.
Abstraction arises in the design of data structures from the need to recognize
T h i s research was ~ a r t i a 1 l y sponsored by the Nation al Science Foundat i cin on~~~r
cF an t MCS7R— 01 72 1 and the United States A rmy unde r contract No. ~~~~~~~~~~~~ -. .‘4 . — — --— -~~~~~~ .— ~~~ ---~~~~ - - - - - -. . .—------ -~~- --- ~ -- -—— . the similarit) bei~ een obj ects and to concentrate on those properties that are shared h) mans objects ~ hile ignoring the ditlerences between them. The use of
abstract ion in thc dc .elopment of data structures as exa mined h) Hoat e IHoa72J.
Ib are applied abstraction to sets of objects to Create discri,ninaied unions. A dis riminate d union is a set defined to be the union of t o or more prcv iousl~
kno n set s. Since t~~~) sets rna~ hase membe rs of different component types , a
discriminat ed union prox ides a method to distinguish the t~ pe of the member b~ means c cl a ta ~ abstract assoc t ite d ith ca h mem ber.
\ liusk I\ linTh) utilizes
data types to extend and model the file
~ conce pt used in traditional data processing applications. A file is ~ie~ ed as an hich ossesses as one of its attributes the record space of the file
in sctondar storage. It also contains an attri bute, the global space . ~ hich contains
intormation describing the characteristi cs and status of the tile as a ~ holc. Bound to abstract data ty pe these data attributes are procedures ~ hich manipulate the record space and global
One of the most significant orks on abstraction and its relationship to
modeling the information in a database is the work of Smith arid Smith on relational
databases JSS77aJ ISS77bJ. This work formalizes the use of foreign keys described by Codd jCod7OJ as an ogg regalion. An aggregation is an abstraction of a relationship
bct ccn objects. Aggregation allows details concerning the objects themselves to be
ignored hen the relations hip is being analyzed. Generalizaifon abstracts the
properties of objects within the database. A generalization is an abstraction in which a set of similar objects is regarded as a collection of instances of a generic object.
That is, the differences bet~ een indi~ idual entities are ignored and their common
properties are idcnt1ficd and used to classify the individual objects as a single.
named, generic object. H) explicitly naming generic objects , it is possible to identif~
gcncnc operators for the generic objects, to specify the attributes of generic objects.
and to specify the relationships betwee n generic objects. The properties of each
generic type may he formalized by a set of invariant properties. These properties
should be satisfied by all relations in the database and should remain invariant
following operations on the database. The concerns of generalization and abstraction
are also used 1,) Smith and Smith to support different user views of the database. —2— _ _ .- l1 c
~ ~ chj eu hich support ~e of this p.ipc’i ts to siiid the absti ct ~ dat.r t ’. pes as a ~ use of piogramnung languages ix a for the implementation of database In out .tppro.ich abstiact data t~ pc’s a rc used both as an
implementation technique similar to the approach ot ~1rnsky and also as a database
and gen eralization
modclin technique incoipor.iting the concepts of
mat1$em~ ilt s\ stems . .i~’gtegJtion dc~ i’lopcd by the Smith and Smith. In this paper the design and implementation of a nct ~ ork model database
man.i ’ement sc ~ stem s~ill be es.imined. Date Il)atTh) and I sichritirs ~l si75J fJsi76)
ha c indepcndenth deseloped pr~cgiJrnming l.nt ~~ nages ~ hich support the
coe\l stencc c~ the three mayo r d.nta models . These languages are based on a system
cr similar to the nc’t ork data model. Since the net oik model can be used
t imp lement the re lational model and the hierarchital model. sol~ ing the problems
limplementation l r the net or k data model v~ill result in a solution vbhich can be
gcneralnie d to include all three data models.
Section 2 dcsctibcs implementation techniques in current database s stems. niiodcl introduced in Section 3 as a technique for dcscnbing
database management sy stem implemcntatk)n techniques. Ibis model is used to
describe the binding of the .tctual parameter descriptors to the data manipulation
The generic procedure model routines. is this model is modified to represent the schema and
Section ~ analyzes the performance
subschema as shared abstract data types.
enhancement pro ided by utilizing compile-time binding rather than run-time
binding through interpretation.
There are sesera l major ad%antages to representing the schema and
subschema as shared abstract objects. First, by using abstract data types to structure
In Section 4 soft~ are should be more
the design of a database management sy stem , the rcsul~ing
reliable (ICRS7SJ (Kos7bj (Lin76j (Wor77J. By using a programming language which
supports specification and ~crification of abstract data types. ~ e can xerify data
nn dcpendence at compile-time. As described by Brodie and Tsichritzis 1BT771. if
specification techniques are employed for defining the schema and subschcma, then
the user cn~oronment presented by the subschema can be guaranteed to remain
constant e~en though the schema and subschema are modified as the database — 3— --— .— .~~~~~~~~~~~~~~~~ In addition, it is sho~ n in LBar7SI that abstract data ty pes and
programming language support for environment contro l allow the programming
language to suppo rt data independence and subschema functions directly.
Another significant ad~antage of our approach wil l be an impro%ement in the
performance of a database management system ~ hich is implemented using abstract
data ty pes. This is possible because application of abstract data types to a database
management system permits the elimination of run-time interpretation of the schema changes. and suhschema used in sy stems such as lB~1s IMS (~1cG77~, U\IVAC ’s D~ 1SflOO Instea d, data abstractions which
represent the logical structure of the database are bound at load-time to the user’s
program. In addition , the data manipulation routines are implemented by
parametcri,ed calls to the procedures bound to the abstract data types v~hich ~ ere
use d to represent the logical structure of the database. In this way we can axoid runation of the schema and subschema ‘~ ithout suffering any loss of data
time interpret (SPF.7 aI ISPF15hI and INGRIS [SwxH76l.
AN ANAL ~’SIS OF CURR LN T IMPLESIEN TATION TECHNIQU ES 2~ One of the major design decisions in developing a database management
sy stem is the determination of when the data manipulation routines bind the data
descriptor from the schema and subschema to their actual parameters. In general.
the longer binding can be delayed, the easier data independence is to support.
Larson (Lar78J analyzes this binding and considers both when and how the
schema information is bound to the data manipulation routines. The mappings, or
transformations of the representations of information , which occur in the
development and execution of a user program are shown in Figure 1. The solid lines
represent mappings and the dashed lines represent algorithm-data interactions. The mapping MI occurs during the development of an algorithm to solve the problem. M2 is the process of compiling a source program into executable
form. M2 and M4 are very important in the database environment. M2 represents
the process of mapping the real-world structures and relationships of the user’s
user ’s —4— ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~— -. ~~~~ . .- programmer s
view source Ml of
problem programmer s program program’s vieofw M2 view
of M3 M4 compiled
VIeW of data data data Figure 1 Mappings of the View of Data ~~~~~~~ — —~. — - - ~ !rr ~ - -~~
~~~ — ~~ -~ ~~~~ _,_ !r
~~~~ ! ~~~~~~~~~~~ —~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ippIi atitlfl oflit the dat.’ sil Lii.in i es and i clatiunships suppoited by the data model.
I’his mapping Is per formed by the schema and subsehema w hit -h define the
mapping t the user s rc ot t he database in a f nn rele ant to his application ofl(O
the d.ita model. \14 represent s the binding of the description of thc database
sir Ui.ILIIL dcs4j ih~d in the si.hema arid suhschema to the data manipulation routines. tao m desciihes four techniques for how and when the mapping \14 occurs.
The first approach is the nra~-r ~ app r oa ch. lhe compiler translates the user s data
manipulation utlne calls directl y into e ecutahle Code by Lising infoi mation in the
sing techniques similar to macro expansion . the
tC schema and suhsehem.i
compiler rcplai.es the user s d.iia manipulation r~ ‘utine call wi th code to perform the
desired operation. Ihe resulting object module hinds all infor mation atx ut access strategies declared in the s&hema and suhschema to the data mani pulation routine calls and the objec t module thus directly accesses the database at run-time.
Ihis approach has the traditional -id antagcs of compilation . that is , e ecuUon
cfficicnc and the facility to utili,c multip le programming langLiages and library
~ An apparent ad antage in the database management s~stcm application is
that no e~plJcIt space is require d for the object sthema and,the object subschema.
ib is ad ant ige is perhaps illuso ry , since in the macro approach the information from
the schema is still present hut i s now distributed in the code , rather than being
routines isolated into an encoded representation. Hovi c cr . the generated code is optimized
for space sini.c it only contains pr edures explicitly referenced in the schema.
These ad antages arc offset by several significant disadvantages. The first is
that multiple users imply multiple object modules. Each of these object modules
contains code to manage concurrent access to the database. This decentralization of
concurrency control makes control of concurrent access more difficult than if a single
resource manager were controlling access to the database. Another disadvantage of this approach is its potential for loss of data
indepe ndence. Any modification of the schema or subschema will require all object
modules using the database to be regenerated. More importantly, after combining
the schema and subschema with the user program it is difficult to guarantee that the
user program is isolated from knowledge of the access methods declared in the — — - ~~~~~~~~~~ ~ - .- - - —— - ~~~- . - -—- - ~— ~—~~~~~ - ~- — - ~~~~~ ~~ -—- --- ,~~~~~~ — ---~ - -~~~~~~~~~~~ --~~~~~~ - : —- .---- -- - — --— ( oup led w nh thes e problems is the l ict that few
to define structur es as eomple as those
~~t O~ I I W W in~ lafl ’ LI,It ’cs ha i c the
that csist w ithin a database nran.t ~ ement system.
si.hcrn Rather than binding .il! in formation from the schema and stibsehema to the
usc ! pro ’ram at i.ompile timc , data indcpen deni.e suggests that this b i n d ing be
In the ‘,‘ ?‘‘ ~:r i appr ~i./; binding of the schema and suhschema to the data
m . i n r p u l a i i n iolrtlnes oc ttrs parti.ill at compile-time and partially at run—time. The
use l \ LLlt.
manipu lation routine calls are trans lated into calls to a library of
I his lthrar is then hound to the riser pr r am at ft t .id-timc. This
i t ’s
approach is similar to an I/O library in w hich langtia c operations such as RI-A D.
‘~~‘ ( ~
F I, P1. 1 and W R IF F a rc translated into calls to routines in an I/O library .
, es. Rir .imLicrs required h~ the lihrar proci’dur
such as life name , record length,
h ock length . de r e on which the file is located , etc.. are supplied from the user ~ ~~ prog? tin . t i orn the ioh control language . or h our similar to the librar approach is uttliicd in Burroughs’
D\lSll IBIS . R 5J, D’slSll use-s two phases of compilation . During the first phase the
data manipulation routines , the schema, and the subschcma are compiled to produce
an oh cct module which is a library , or collection , of procedures called the access
methods In the second phase . the user program is compilca. The schr ,na and
-\ mcc hinrsm w hich file contro l blocks. is subschcma are accessed during compilation to declare d in the schema and suhschcma. perform t~ pc checking ~~~s the entities
Binding of the user environment is complete d at load-time by binding the access methods to the user program. By
separat ing the access methods from the user program . it is possible to modify the access methods w:thout affecting the user programm.
The most frequentl y-us e d implementation technique is to postpone binding the
s hema and subschcma to the data manipulation routines until run-time. This is the
inlerpre 1i ’ approach and ariations of it are used in l\IS, DMSI 100, I\IGRES . etc..
The schema and stibschema arc encoded into an internal form, referred to as the
t ih~ect schema and the object subschema. Using the record types and set typea
re ferenced as actual parameters in a data manipulaton routine call , the object schema
and object subschema are accessed to retrieve the appropriate record type and set —7— -- ~~~ -— ~~~~~~~~~~~~~~~~ --~~~~~~~~~~~~ --— — ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~ —- ~ ~~
—- --- ~~~ . - -— ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -~~~~~ L~pe desc riptors. l’hesc descriptors are t hen interpreted to perfo rm the data
man ipu k t ion language command.
Ihe intcrpreti %e approach has some significant disad~antages. Interpretation
w ill icqrure increased processing time to interpret the object schema and the object
sit hschema. If the object schema and subschema are stored on secondary storage to facilitate their being shared by all programs accessing the database , then increased
View Full Document
- Fall '19