a077101.pdf - FIG 5\/2 WISCONSIN UNIV\u2014MADISON MATHEMATICS RESEARCH CENTER THE DESIGN AND IMPLEMENTATION OF A DATABASE MANAGEMENT SYSTEM U\u2014\u2014E TC(U

a077101.pdf - FIG 5/2 WISCONSIN UNIVu2014MADISON...

This preview shows page 1 out of 58 pages.

You've reached the end of your free preview.

Want to read all 58 pages?

Unformatted text preview: FIG 5/2 WISCONSIN UNIV—MADISON MATHEMATICS RESEARCH CENTER THE DESIGN AND IMPLEMENTATION OF A DATABASE MANAGEMENT SYSTEM U——E TC (U ) DAAG29—75—C—0024 JUN 79 A iJ BAROODY . D J DEWItT ML MRC TSR I97O AD A077 101 p 0H ~ UNCLASSIFIED ________ _ E11~j END H I ~~~~~~~~~~ — -~~~~~~~~~~~~~~~~ MRC TEa*41CAL. SIJ4MARY RZPORT I 1970 ThE D~ $I( ~~ AND HW LZ1~~NTATION O~ A DATA BASE MANACa(ENT SYST 4 USI NG ~~ ABSTRACT DATA TYPES A. J ~~~s Baroody . Jr. Davi d J . Da WLtt Mathematics Research Cente r Univers ity of Wisconsin—Mad ison M:dis:II. i$c / 7) 1 ~j ; In 53706 ~~ 0 •‘ . ~~~~~~~~~~~~~~~~ \ ~~~~~~~~ ~~ y 29 , 19~~ 1 -J U. Ap ,,.,ed Is. psblèc sisase Is Destri bst i.. usIi ~~itd Spc*~.ored by U. S. Ax~ y Rseurch Of tic. p 0. Box 12211 ~ i .pearch Tri~~ g1. Park ~ h carolina 2 7709 Nort _ _ National Sci•n ~~ ~~~~ dation Waahin~tcn .D.C. 20550 _ _ _ _ _ ! UN I VE R SIT Y 1 WI N.~I ~ — MAD I MAThEMATICS RESEARCH CENTEH Tt~~ 1*~SI ~ N AND IMP LEMENTATI ON OP A DATA BASE MANAGEMENT SYSTEM USING ABS TRACT DATA TYPES A. J~~~’s Baroody , Jr. and dVx <I J . ~t•W ~ t~ Te chnica l St ~~~ary kt ~~~ x t Sl ~~7 June’ 197. ABST RACT The ~e” .ig n , i~~ lemsntat i ’n . and p e r f or m ance analysis of a databa se man a ge umnt system li~~1eev,nted using abstract data t y ~ cs are pre sented. use of abstract data t ypc~. as an lq ~lemsn t a t ion tool siqn i fic a n t advantage s ove r current xy ~~ 1emsnt a t ion a combination ~if n 1 shown t , ech i j~~~s. abstract data t ype s and gen er i c proce d ures t~ of a da t abase system . the r e s u l t i n g software T~i~• ..i vi• si’w ’ra l ~ Fir ~;t , by u s i n g ,;t r ’jc t ute the w i l l be e~~rc r e l i a b l e . ~~~~~ by e~~~lo yin g a progra umi ng lan gua ge w~~i - ~. suVport s sp e c i f i r . i t i~ n and f i cati on of abstract data type s , we can gua r antee data 1 n k ~~en ~~? n c e . Fin ally , we deimns trate that t he ap~~l i c a t i rj n of abstract data type c • ‘ l i m i n a t i i n of IBM ’ s IMS , e rru ts ¶ ! . rt g -ti r i nt i r i retatic * ‘f the schema and subschema such as ir . ~ ~ ~ ~ lvac ’ s :~~~ l 1~~ , and INC,RES. Instead , the data mani ;ulatlon i n e’ c , w .i ~ ~ are shown to be e xai~~ lc s o ’ gene ri c procedures , are as p ara~~ terized calls to the pro ce dure s bo i xi d t r the i nst a n c e s in ~~1emsnted f the thre e abstract data t y ; c’s used to r”~~resent the’ logical ‘-tructure of t h - databa se . A~MS (74)S) Subj ect la s s if i catlon 68A50 Key Words : iiataba se manaqemsnt sy stems , abstract data t y ~ es , ~~ n ’-r i c ~ rocedures , CODASYL, qu sry ~ *~cut i on No rA ‘n i t No. A Ccx~~ ute r Science This research was partially sponsore d by the Nat i~~~al Science Fo x dat lon ~s~~~ r ~~ gran t MCS78-0172 1 and the Un i ted States Army un de r contract No. DMG29-7 S-C-0P24 . . ~~~~~ - . — ~~~~~~~~~~~~~~~ I .- —=~- - i ~~tr ~~~ A ~ww new : ~ ; t ~ ~~~~~~ t s ;. i . t jv ~~~~~~~~~~ ~ z ~ it i I suc h c rc’~c’ . c t twa ~L techniqt~~: r un — t tn t ? q 1 t:,,, t ~ dat a base m.tnaqe~~~nt .it , pr ograms.ing language ius s t~~~. t: ~I — - : _ ~~~~~~ _ =~~: .t i t t A!J A t~ :~~~:~ - ~~. , 1’ : . ‘:,tem ~ d a t a t y ~ .-s , are ~u~~Lemsntation a fea ture I M ’ ’ A , ~4 ~ f L A Then — , of ~~ ck ~rn and the ~~~~ : ~ -; se veral con seqtrnces , guaran teed data i i ~~ re reliable software i citt ~~ ’y cus ca rvd to cczmventi~~ al is 1~ iT ~ r ~ ;r .am— i r~~r~ t t this (Sf ;t-n ..ii~~~• , )l e ?nt at 1 )1 1 ~~ ~ / ~~~ The resp iis ibilt ty for the wo r d.ing and v iew s exp ressed In this descriptive stw ~~ar / l i e s w i t h MRC , and not with the authors of thi s report . ‘? ~ /1 - - _ _ _ _ _ _ _ _ _ _ _ _ _ _ - — —------ -- - - -- ----- --- ‘L ‘~ — I — - - • , ~~ ----- -- -~ —-.- - - - - — - IMi M1\NA ;EP*:NT : Y ~;TE M A ~~~~~~~ t-t.tz ~~ I ~~~ , — _ _ __ ABSTRACT : ATA . and : 4 ’/ i ;_ — _ ~YI — -- -— S -1 3. I’s -~~i I)ata independence requires that user programs be isolated from the details ing physical and k~ icaI structures used to implement the ~~ciI~ er nin~’ the underl~ d t t ih,ise. L ) i t a independence is achie cd h~ presenting to the user an abstraction of ~ the database in ~ hich details of its actual impkmcntalion ;ire hidden. The dc~dopment of ttx ls to support abstraction is an ~n.tI%e tica in current research on pro ’r rmming methodoIog and programming languages. I his report anaI~ ies the ~ ~ u S C Of .ihstr.iet data types .is an implementation ItX)l IOi database management s stems. ~ The data model defines a coIle ~iion of logical structures ~ hich arc a~ailab)c ~~c the users of a database management s stcm . In order to u tilize thcsc structures for ~ c c l the actual database must be defined in terms of the appliczftions. the structure st ructur es in thc data model. The definition of the database is termed the schema,or dal i model definition. Ihe schema defines the t’~pes ~t entities w hich ma~ exist iihin the database and a l so defines the relationships ~ hrLh ma~ ex ist bct~ een these ~ entities . The schema thus is a definition of the logical structure of a database and is shared %,~ all uscr~ t ’si the database . A suhschema. or data suhmodel definition. is the definition of the subset of the database to be used h~ an application. Fhe subschema rs a restriction of the schema and defines an application-dependent window into the database. The data manipulation routines handle all accesses to the database. A request from a user to access the database is in term s of a procedure call to a data manipulation routine. This procedure uses the schema and subschema to determine the neccssar operations to be performed on the database. In the network data ~ model the actual parameters to the data manipulation routines arc records and sets. In our approach the record and set are considered to be generic types. The data ~ manipulation routines utiliie the information in the schema and subschcma to dcscribc the actual parameters in order to determine the function to be performed. Thus the data manipulation routines are examples of generic procedures. Abstraction arises in the design of data structures from the need to recognize T h i s research was ~ a r t i a 1 l y sponsored by the Nation al Science Foundat i cin on~~~r cF an t MCS7R— 01 72 1 and the United States A rmy unde r contract No. ~~~~~~~~~~~~ -. .‘4 . — — --— -~~~~~~ .— ~~~ ---~~~~ - - - - - -. . .—------ -~~- --- ~ -- -—— . the similarit) bei~ een obj ects and to concentrate on those properties that are shared h) mans objects ~ hile ignoring the ditlerences between them. The use of abstract ion in thc dc .elopment of data structures as exa mined h) Hoat e IHoa72J. ~ ~ Ib are applied abstraction to sets of objects to Create discri,ninaied unions. A dis riminate d union is a set defined to be the union of t o or more prcv iousl~ ~ ~ kno n set s. Since t~~~) sets rna~ hase membe rs of different component types , a ~ discriminat ed union prox ides a method to distinguish the t~ pe of the member b~ means c cl a ta ~ abstract assoc t ite d ith ca h mem ber. ~ ~ ~ \ liusk I\ linTh) utilizes data types to extend and model the file ~ conce pt used in traditional data processing applications. A file is ~ie~ ed as an hich ossesses as one of its attributes the record space of the file ~ in sctondar storage. It also contains an attri bute, the global space . ~ hich contains ~ intormation describing the characteristi cs and status of the tile as a ~ holc. Bound to abstract data ty pe these data attributes are procedures ~ hich manipulate the record space and global space. One of the most significant orks on abstraction and its relationship to ~ modeling the information in a database is the work of Smith arid Smith on relational databases JSS77aJ ISS77bJ. This work formalizes the use of foreign keys described by Codd jCod7OJ as an ogg regalion. An aggregation is an abstraction of a relationship bct ccn objects. Aggregation allows details concerning the objects themselves to be ~ ignored hen the relations hip is being analyzed. Generalizaifon abstracts the ~ properties of objects within the database. A generalization is an abstraction in which a set of similar objects is regarded as a collection of instances of a generic object. That is, the differences bet~ een indi~ idual entities are ignored and their common properties are idcnt1ficd and used to classify the individual objects as a single. named, generic object. H) explicitly naming generic objects , it is possible to identif~ gcncnc operators for the generic objects, to specify the attributes of generic objects. and to specify the relationships betwee n generic objects. The properties of each generic type may he formalized by a set of invariant properties. These properties should be satisfied by all relations in the database and should remain invariant following operations on the database. The concerns of generalization and abstraction are also used 1,) Smith and Smith to support different user views of the database. —2— _ _ .- l1 c ~ ~ chj eu hich support ~e of this p.ipc’i ts to siiid the absti ct ~ dat.r t ’. pes as a ~ use of piogramnung languages ix a for the implementation of database In out .tppro.ich abstiact data t~ pc’s a rc used both as an implementation technique similar to the approach ot ~1rnsky and also as a database and gen eralization modclin technique incoipor.iting the concepts of mat1$em~ ilt s\ stems . .i~’gtegJtion dc~ i’lopcd by the Smith and Smith. In this paper the design and implementation of a nct ~ ork model database man.i ’ement sc ~ stem s~ill be es.imined. Date Il)atTh) and I sichritirs ~l si75J fJsi76) ~ ha c indepcndenth deseloped pr~cgiJrnming l.nt ~~ nages ~ hich support the ~ coe\l stencc c~ the three mayo r d.nta models . These languages are based on a system cr similar to the nc’t ork data model. Since the net oik model can be used ~ ~ ~ ~ t imp lement the re lational model and the hierarchital model. sol~ ing the problems ~ limplementation l r the net or k data model v~ill result in a solution vbhich can be ~ ~ ~ gcneralnie d to include all three data models. Section 2 dcsctibcs implementation techniques in current database s stems. niiodcl introduced in Section 3 as a technique for dcscnbing database management sy stem implemcntatk)n techniques. Ibis model is used to describe the binding of the .tctual parameter descriptors to the data manipulation The generic procedure model routines. is this model is modified to represent the schema and Section ~ analyzes the performance subschema as shared abstract data types. enhancement pro ided by utilizing compile-time binding rather than run-time ~ binding through interpretation. There are sesera l major ad%antages to representing the schema and subschema as shared abstract objects. First, by using abstract data types to structure In Section 4 soft~ are should be more the design of a database management sy stem , the rcsul~ing reliable (ICRS7SJ (Kos7bj (Lin76j (Wor77J. By using a programming language which supports specification and ~crification of abstract data types. ~ e can xerify data nn dcpendence at compile-time. As described by Brodie and Tsichritzis 1BT771. if specification techniques are employed for defining the schema and subschcma, then the user cn~oronment presented by the subschema can be guaranteed to remain constant e~en though the schema and subschema are modified as the database — 3— --— .— .~~~~~~~~~~~~~~~~ In addition, it is sho~ n in LBar7SI that abstract data ty pes and programming language support for environment contro l allow the programming language to suppo rt data independence and subschema functions directly. Another significant ad~antage of our approach wil l be an impro%ement in the performance of a database management system ~ hich is implemented using abstract data ty pes. This is possible because application of abstract data types to a database management system permits the elimination of run-time interpretation of the schema changes. and suhschema used in sy stems such as lB~1s IMS (~1cG77~, U\IVAC ’s D~ 1SflOO Instea d, data abstractions which represent the logical structure of the database are bound at load-time to the user’s program. In addition , the data manipulation routines are implemented by parametcri,ed calls to the procedures bound to the abstract data types v~hich ~ ere use d to represent the logical structure of the database. In this way we can axoid runation of the schema and subschema ‘~ ithout suffering any loss of data time interpret (SPF.7 aI ISPF15hI and INGRIS [SwxH76l. ~ independence. AN ANAL ~’SIS OF CURR LN T IMPLESIEN TATION TECHNIQU ES 2~ One of the major design decisions in developing a database management sy stem is the determination of when the data manipulation routines bind the data descriptor from the schema and subschema to their actual parameters. In general. the longer binding can be delayed, the easier data independence is to support. Larson (Lar78J analyzes this binding and considers both when and how the schema information is bound to the data manipulation routines. The mappings, or transformations of the representations of information , which occur in the development and execution of a user program are shown in Figure 1. The solid lines represent mappings and the dashed lines represent algorithm-data interactions. The mapping MI occurs during the development of an algorithm to solve the problem. M2 is the process of compiling a source program into executable form. M2 and M4 are very important in the database environment. M2 represents the process of mapping the real-world structures and relationships of the user’s user ’s —4— ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~— -. ~~~~ . .- programmer s view source Ml of problem programmer s program program’s vieofw M2 view of M3 M4 compiled program physical VIeW of data data data Figure 1 Mappings of the View of Data ~~~~~~~ — —~. — - - ~ !rr ~ - -~~ ~~~ — ~~ -~ ~~~~ _,_ !r ~~~~ ! ~~~~~~~~~~~ —~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ippIi atitlfl oflit the dat.’ sil Lii.in i es and i clatiunships suppoited by the data model. ~ I’his mapping Is per formed by the schema and subsehema w hit -h define the mapping t the user s rc ot t he database in a f nn rele ant to his application ofl(O ~ ~ ~ ~ the d.ita model. \14 represent s the binding of the description of thc database sir Ui.ILIIL dcs4j ih~d in the si.hema arid suhschema to the data manipulation routines. tao m desciihes four techniques for how and when the mapping \14 occurs. ~ The first approach is the nra~-r ~ app r oa ch. lhe compiler translates the user s data manipulation utlne calls directl y into e ecutahle Code by Lising infoi mation in the ~ ~ sing techniques similar to macro expansion . the tC schema and suhsehem.i ‘~~Lif ~ compiler rcplai.es the user s d.iia manipulation r~ ‘utine call wi th code to perform the desired operation. Ihe resulting object module hinds all infor mation atx ut access strategies declared in the s&hema and suhschema to the data mani pulation routine calls and the objec t module thus directly accesses the database at run-time. Ihis approach has the traditional -id antagcs of compilation . that is , e ecuUon ~ ~ cfficicnc and the facility to utili,c multip le programming langLiages and library ~ An apparent ad antage in the database management s~stcm application is ~ that no e~plJcIt space is require d for the object sthema and,the object subschema. ib is ad ant ige is perhaps illuso ry , since in the macro approach the information from ~ the schema is still present hut i s now distributed in the code , rather than being routines isolated into an encoded representation. Hovi c cr . the generated code is optimized ~ for space sini.c it only contains pr edures explicitly referenced in the schema. ~~ These ad antages arc offset by several significant disadvantages. The first is ~ that multiple users imply multiple object modules. Each of these object modules contains code to manage concurrent access to the database. This decentralization of concurrency control makes control of concurrent access more difficult than if a single resource manager were controlling access to the database. Another disadvantage of this approach is its potential for loss of data indepe ndence. Any modification of the schema or subschema will require all object modules using the database to be regenerated. More importantly, after combining the schema and subschema with the user program it is difficult to guarantee that the user program is isolated from knowledge of the access methods declared in the — — - ~~~~~~~~~~ ~ - .- - - —— - ~~~- . - -—- - ~— ~—~~~~~ - ~- — - ~~~~~ ~~ -—- --- ,~~~~~~ — ---~ - -~~~~~~~~~~~ --~~~~~~ - : —- .---- -- - — --— ( oup led w nh thes e problems is the l ict that few stzhs ’hcrna ~ and ~ ~ .ipahilit to define structur es as eomple as those ~~t O~ I I W W in~ lafl ’ LI,It ’cs ha i c the ~ ~ ~ ~ that csist w ithin a database nran.t ~ ement system. si.hcrn Rather than binding .il! in formation from the schema and stibsehema to the usc ! pro ’ram at i.ompile timc , data indcpen deni.e suggests that this b i n d ing be ~ i.-d In the ‘,‘ ?‘‘ ~:r i appr ~i./; binding of the schema and suhschema to the data m . i n r p u l a i i n iolrtlnes oc ttrs parti.ill at compile-time and partially at run—time. The ~ ~ use l \ LLlt. manipu lation routine calls are trans lated into calls to a library of ~ I his lthrar is then hound to the riser pr r am at ft t .id-timc. This i t ’s ~ approach is similar to an I/O library in w hich langtia c operations such as RI-A D. ‘~~‘ ( ~ F I, P1. 1 and W R IF F a rc translated into calls to routines in an I/O library . , es. Rir .imLicrs required h~ the lihrar proci’dur such as life name , record length, ~ h ock length . de r e on which the file is located , etc.. are supplied from the user ~ ~~ prog? tin . t i orn the ioh control language . or h our similar to the librar approach is uttliicd in Burroughs’ D\lSll IBIS . R 5J, D’slSll use-s two phases of compilation . During the first phase the data manipulation routines , the schema, and the subschcma are compiled to produce an oh cct module which is a library , or collection , of procedures called the access ~ methods In the second phase . the user program is compilca. The schr ,na and -\ mcc hinrsm w hich file contro l blocks. is subschcma are accessed during compilation to declare d in the schema and suhschcma. perform t~ pc checking ~~~s the entities Binding of the user environment is complete d at load-time by binding the access methods to the user program. By separat ing the access methods from the user program . it is possible to modify the access methods w:thout affecting the user programm. The most frequentl y-us e d implementation technique is to postpone binding the s hema and subschcma to the data manipulation routines until run-time. This is the inlerpre 1i ’ approach and ariations of it are used in l\IS, DMSI 100, I\IGRES . etc.. ~ ~ The schema and stibschema arc encoded into an internal form, referred to as the t ih~ect schema and the object subschema. Using the record types and set typea re ferenced as actual parameters in a data manipulaton routine call , the object schema and object subschema are accessed to retrieve the appropriate record type and set —7— -- ~~~ -— ~~~~~~~~~~~~~~~~ --~~~~~~~~~~~~ --— — ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~ —- ~ ~~ —- --- ~~~ . - -— ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -~~~~~ L~pe desc riptors. l’hesc descriptors are t hen interpreted to perfo rm the data man ipu k t ion language command. ~ Ihe intcrpreti %e approach has some significant disad~antages. Interpretation w ill icqrure increased processing time to interpret the object schema and the object sit hschema. If the object schema and subschema are stored on secondary storage to facilitate their being shared by all programs accessing the database , then increased I/O ...
View Full Document

  • Fall '19

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture