chapter4 - Reference Book Principles of Distributed...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Reference Book Principles of Distributed Database System Principles Chapters Chapters 4. Distributed DBMS Architecture 4. 5. Distributed Database Design 7.5 Layers of Query Processing Preethi Vishwanath Preethi Week 2 : 5th September 2006 – 12th September 2006 ANSI/SPARC Architecture – External View, External which is that of the user, who might be a programmer programmer basically concerned with how users view the data. how – Conceptual view, that of the enterprise – Internal View, that of a system or a machine, machine, deals with the physical definition and organization of data. of Users External View Conceptual View Internal View Possible ways to put together multiple databases multiple Autonomy of Autonomy Local Systems Local – Refers to Refers distribution of control control – Indicates degree Indicates of independence of individual databases databases Alternatives to autonomy Alternatives – Tight Integration Tight Single image of entire db_ Is available for any user who wants to share the info, which may reside in multiple db_. db_. – Semiautonomous systems Consists of DBMSs that can operate Consists independently, but have decided to participate in a federation. – Total Isolation Stand Alone DBMs Distribution Distribution – Deals with Physical distribution of data over Deals multiple sites multiple – Three alternative architectures available Client-Server, communication duties are shared Client-Server, between the client machines and servers. between Peer-to-peer systems, no distinction of client Peer-to-peer machines versus servers. machines Non-distributed systems Heterogeneity Heterogeneity – Occurs in Various forms – Data models: Representing data with different Data modeling tools – Query Languages: Not only involves the use Query of completely different data access paradigms in different data models, but also covers difference in languages, even when the individual systems use the same data model. individual Client-Server architecture Client-Server Distinguish the functionality and Distinguish divide these functions into two classes, server functions and client functions. client Server does most of the data Server management work management – – – – query processing data management Optimization Transaction management etc Multiple Client – Multiple Server Multiple – – Multiple Servers accessed by Multiple multiple clients multiple 2 alternate management alternate strategies strategies 1. Heavy client Systems – – – Each client manages its own Each connection to the appropriate server. server. Simplifies server code Loads client machines with Loads additional responsibilities additional Client performs – – – Application User interface DBMS Client model 1. Light Client Systems – Each client knows of only its Each “home server” which then communicates with other servers as required. as Concentrates on data Concentrates management functionality at the servers. servers. Multiple Client - Single Server – Single Server accessed by Single multiple clients multiple – Peer-to-Peer Distributed Systems Peer-to-Peer Schemas Present – Individual internal schema Individual definition at each site, local internal schema internal – Enterprise view of data is Enterprise described the global conceptual schema. conceptual – Local organization of data Local at each site is describe in the local conceptual schema. schema. – User applications and user User access to the database is supported by external schemas. schemas. Local conceptual schemas are Local mappings of the global schema onto each site. schema Databases are typically Databases designed in a top-down fashion, and, therefore all external view definitions are made globally. made Major Components of a Peerto-Peer System – User Processor – Data processor Peer-to-Peer Distributed Systems Peer-to-Peer User Processor User-interface handler User-interface responsible for interpreting user commands, and formatting the result data formatting Data processor Local query optimizer Acts as the access path Acts selector selector Responsible for choosing the Responsible best access path best Semantic data controller checks if the user query can be processed. be Local Recovery Manager Makes sure local database Makes remains consistent remains Global Query optimizer and Global decomposer decomposer determines an execution strategy strategy Translates global queries into Translates local one. local Distributed execution Coordinates the distributed Coordinates execution of the user request execution Run-time support processor Is the interface to the Is operating system and contains the database buffer contains Responsible for maintaining Responsible the main memory buffers and managing the data access. managing MDBS Architecture MDBS Models Using a Global Conceptual Models Schema Schema GCS is defined by integrating either GCS the external schemas of local autonomous databases or parts of their local conceptual schema their Users of a local DBMS define their Users own views on the local database. own If heterogeneity exists in the system, If then two implementation alternatives exist: unilingual and multilingual exist: Unilingual requires the users to Unilingual utilize possibly different data models and languages and Basic philosophy of multilingual Basic architecture, is to permit each user to access the global database. to GCS in multi-DBMS – Mapping is from local conceptual Mapping schema to a global schema schema – Bottom-up design Models without a global Models conceptual schema conceptual Consists of two layers, local system Consists layer and multi database layer. layer Local system layer , present to the Local multi-database layer the part of their local database they are willing share with users of other database. with System views are constructed above System this layer this Responsibility of providing access to Responsibility multiple database is delegated to the mapping between the external schemas and the local conceptual schemas. schemas. Full-fledged DBMs, exists each of Full-fledged which manages a different database. which GCS in Logically integrated distributed GCS DBMS DBMS – Mapping is from global schema to local Mapping conceptual schema conceptual – Top-down procedure Global Directory Issues Global Global Directory is an extension of the normal directory, including Global information about the location of the fragments as well as the makeup of the fragments, for cases of distributed DBMS or a multimakeup DBMS, that uses a global conceptual schema, Global Directory Issues – Relevant for distributed DBMS or a multi-DBMS that uses a global Relevant conceptual schema conceptual – Includes information about the location of the fragments as well as the Includes makeup of fragments. makeup – Directory is itself a database that contains meta-data about the actual Directory data stored in database. data – Three issues Three A directory may either be global to the entire database or local to each site. directory Directory may be maintained centrally at one site, or in a distributed fashion Directory by distributing it over a number of sites. by – If system is distributed, directory is always distributed – Multiple copies would provide more reliability Replication, may be single copy or multiple copies. Organization of Distributed systems Organization Three orthogonal dimensions – Level of sharing No sharing, each application and data execute at one site Data sharing, all the programs are replicated at other sites but not Data the data. the Data-plus-program sharing, both data and program can be shared – Behavior of access patterns Static – Does not change over time – Very easy to manage – Most of the real life applications are dynamic Dynamic – Level of knowledge on access pattern behavior. No information Complete information Partial information – Access patterns can be reasonably predicted – No deviations from predictions – Deviations from predictions Top Down Design – Suitable for applications where database needs to be build from Suitable scratch scratch – Activity begins with requirement analysis – Requirement document is input to two parallel activities: view design activity, deals with defining the interfaces for end users conceptual design, process by which enterprise is examined conceptual – Can be further divided into 2 related activity groups Entity analyses, concerned with determining the entities, attributes Entity and the relationship between them and Functional analyses, concerned with determining the fun – Fragmentation – Allocation Distributed design activity consists of two steps Bottom-Up Approach – – – Suitable for applications where database already exists Starting point is individual conceptual schemas Exists primarily in the context of heterogeneous database. Fragmentation Fragmentation Advantages 1. Disadvantages 1. Applications whose views are Applications Permits a number of Permits transactions to executed concurrently concurrently Results in parallel execution Results of a single query of Increases level of Increases concurrency, also referred to as, intra query concurrency as, Increased System Increased throughput throughput 1. defined on more than one fragment may suffer performance degradation, if applications have conflicting requirements. requirements. 1. 1. Simple asks like checking for Simple dependencies, would result in chasing after data in a number of sites of 1. Id 100 200 300 Name A B C Sal 10K 20K 30K Dept D1 D2 D3 Horizontal Fragmentation Horizontal Vertical Fragmentation Rows split : Sal > 20K Id 100 200 Name A B Sal 10K 20K Dept D1 D2 Columns split : Primary Columns Key retained Key Id 100 200 Name A B C Id 100 200 300 Sal 10K 20K 30K Dept D1 D2 D3 Id 300 Name C Sal 30K Dept D3 300 Correctness rules of fragmentation Correctness Completeness If a relation instance R is decomposed into fragments R1,R2 …. If Rn, each data item that can be found in R can also be found in each one or more of Ri’s. one Reconstruction If a relation R is decomposed into fragments R1,R2 …. Rn, iit t If should be possible to define a relational operator such that R = ▼Ri, ¥Ri ε FR , ▼R Please note the operator would be different for the different forms Please of fragmentation of Disjointness If a relation R is horizontally decomposed into fragments R1,R2 …. If Rn, and data item di is in Rj, it is not in any other fragment Rk (k ! = j). Comparison of Replication Alternatives Alternatives Full Replication Query Processing Directory Management Concurrency Control Reliability Reality Easy Easy or nonexistent Moderate Very High Possible Application Partial Replication Partitioning Same Difficulty Same Difficulty Difficult High Realistic Easy Low Possible application Derived Horizontal Fragmentation Derived Defined on a member relation of a link Defined according to a selection operation specified on its owner. on Link between the owner and the member relations is defined as equi-join relations An equi-join can be implemented by means An of semijoins. of Given a link L where owner (L) = S and Given member (L) = R, the derived horizontal fragments of R are defined as fragments Ri = R α Si, 1 <= I <= w Where, Si = σ Fi (S) (S) w iis the max number of fragments that will be s defined on Fi is the formula using which the primary horizontal fragment Si is defined fragment is Id PAY1Name 100 200 A B Example Consider two tables Emp Id 100 200 300 Name A B C Dept D1 D2 D3 Dept D1 D2 D3 PAY Sal 10K 20K 30K PAY1 = EMP1 α PAY PAY1 PAY2 = EMP2 α PAY PAY2 Emp1 = σSal <= 20K (Emp) (Emp) Sal Emp2 = σSal > 20K (Emp) (Emp) Sal Dept D1 D2 Id 300 Name C PAY2 Dept D3 Primary Horizontal Fragmentation Fragmentation Primary horizontal fragmentation is Primary defined by a selection operation on the owner relation of a database schema. owner Given relation Ri, iits horizontal fragments ts are given by are Ri = σFi(R), 1<= i <= w Fii selection formula used to obtain fragment F selection Ri The example mentioned in slide 20, can be The represented by using the above formula as as Emp1 = σSal <= 20K (Emp) Emp (Emp) Sal Emp2 = σSal > 20K (Emp) Emp (Emp) Sal Vertical Fragmentation Grouping Starts by assigning each attribute to Starts one fragment one At each step, joins some of the At fragments until some criteria is satisfied. satisfied. Results in overlapping fragments Splitting Starts with a relation and decides on Starts beneficial partitioning based on the access behavior of applications to the attributes Fits more naturally within the top-down design design Generates non-overlapping fragments. Hybrid Fragmentation Hybrid Horizontal or vertical fragmentation of Horizontal a database schema will not be sufficient to satisfy the requirements of user applications. user In certain cases, a vertical In fragmentation may be followed by a horizontal one, or vice versa. horizontal Since two types of partitioning Since strategies are applied one after the other, this alternative is called hybrid fragmentation. fragmentation. R R1 R2 α In case of horizontal fragmentation, In one has to stop when each fragment consists of only one tuple, whereas the termination point for vertical fragmentation is one attribute per fragment. fragment. Example discussed in slides 20 and 26 Example can be converted into hybrid fragmentation U α R11 R12 R21 R22 R23 R11 R12 R21 R22 R23 ...
View Full Document

Ask a homework question - tutors are online