chapter12 - Reference Book Principles of Distributed...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Reference Book Principles of Distributed Database System System Chapters Chapters Chapter 12: Distributed DBMS Reliability Chapter 14: Distributed Object Database Management Systems Chapter 16: Current Issues Preethi Vishwanath Preethi Week 2 : 12th September 2006 –24th September 2006 Reliability concepts - definitions Reliability System refers to a mechanism that consists of a collection of components and interacts with its environment with a recognizable pattern of behavior. pattern Each component of a system is itself a system, commonly called a Each subsystem. subsystem The way components of a system are put together is called the design The design of the system. of An external state of a system can be defined as the response that a An external system gives to an external stimulus. system The behavior of the system in providing response to all the possible The stimuli from the environment needs to be laid out in an authoritative specification of its behavior. specification Any deviation of a system from the behavior described in the Any specification is considered a failure. failure Some transactions could cause system failure, such internal states are Some called erroneous states. erroneous Any error in the internal states of the components of a system or in the Any design of a system is called a fault in the system. fault A permanent fault, also called a hard fault, is one that reflects an permanent irreversible change in the behavior of the system. irreversible Reliability Reliability – Reliability refers to the Reliability probability that the system under consideration does not experience any failures in a given time interval. given – R(t) = Pr{0 failure in time [0,t] R(t) no failures at t = 0} no where R(t) where : reliability of the system reliability Mean Time between Failures – Is the expected time between Is subsequent failures in a system with repair. system – Can be calculated either from Can empirical data or from the reliability function reliability – Is related to the failure rate – MTBF = ∫∞0 R(t) dt R(t) Availability – Refers to the probability that Refers the system is operational according to its specification at a given point in time t given – A=µ/‫+ح‬µ where ‫ ح‬is a failure rate where is µ is a mean repair time is Mean Time to repair – Expected time to repair a Expected failed system. failed – Is related to the repair rate Is Steady State availability of a system with exponential failure and repair rates can be specified as specified A = MTBF/(MTBF + MTTR) Reasons for Failure Reasons S t udy C onduc t e d a t S t a nf ord Line a r a c c e le ra t or SE SS Sw i t c h D a t a Environm ent Operations Hardware Sof tware Un kn own Hardware S of t ware Oper at i n s o Ta nde m Da t a Environm ent hardware sof tware m aintainence operations Fault tolerance Fault – – – Refers to a system design approach which recognizes that faults will occur Aim at ensuring that the implemented system will not contain any faults Two aspects Fault avoidance – – – Refers to the techniques used to make sure that faults are not introduced into the system Involve detailed design methodologies such as design walkthroughs, design inspections etc.. Refers to the techniques that are employed to detect any faults that might have remained in the system despite the Refers application of fault avoidance and removed these faults. application Fault prevention/Fault intolerance Fault removal Fault detection – – – – – Issue a warning when a failure occurs but do not provide any means of tolerating the failure. One that is detected some time after its occurrence Average error latency time over a number of identical systems. Constantly monitors itself and when it detects a fault, shuts itself down automatically Implemented in software by defensive programming, where each software module checks Implemented its own state during state transactions. its Latent Failure Mean time to detect Fail-stop modules Fail-fast Different ways of implementing process pairs – – – – Lock-step Automatic check pointing State check pointing Data check pointing Persistent process pairs Failure in Distributed DBMS Failure Site(System) Failures – Always assumed to result in Always the loss of main memory contents. contents. – Total failures, refers to the Total simultaneous failure of all sites in the distributed system. in – Partial Failure indicates the Partial failure of only some sites while the others remain operational. the Media Failures – Refers to the failures of the Refers secondary storage devices that store the database. that – Duplexing of disk storage and Duplexing maintaining archival copies of the database are common techniques that deal with this sort of catastrophic problem. sort Transaction Failure Communication Failure – Incorrect input data – Detection of present or Detection potential deadlock potential – Usual approach to take in Usual cases of transaction failure is to abort the transaction. to – Unique to the distributed case. – Most common ones are the Most errors in the messages, improperly ordered messages, lost messages and line failures failures – The term for the failure of the The communication network to deliver messages and the confirmations within this period is performance failure Interface between the local recovery manager and the buffer manager buffer Local Recovery Manager Stable database Database Buffer Manager Database Buffers (Volatile Database) Recovery Information Recovery In-Place Update Recovery In-Place Information Information Network Partitioning Network – – Simple partition Network is divided into only two Network components components – Necessary to store info about Necessary database state changes, inorder to recover back. to – Recorded in the database log – REDO Action Database needs to include Database sufficient data to permit the undo by taking the old database state and recover the new state and Database needs to include Database sufficient data to permit the undo by taking the new database state and recover the old state. and Multiple partitioning Network is divided into more than two Network components components Centralized Protocols – Primary Site Makes sense to permit the operation of the Makes partition that contains the primary site, since it manages the lock. since – UNDO Action – Primary copy More than one partition may be operational More for different queries. for Voting-based Protocols – – Transactions are executed if a majority Transactions of the sites vote to execute it. of Quorum-based voting can be used as Quorum-based a replica control method, as well as a commit method to ensure transaction atomicity in the presence of network partitioning. partitioning. In case of non replicated databases, In this involves the integration of the voting principle with commit protocols. voting Out-of-place update recovery Out-of-place information information – Typical techniques Typical Shadowing Shadowing Every time an update is made, the Every old stable storage page, called shadow page is left intact and a new page with the updated data item values is written into the stable database. stable Differential files – 2 Phase Commit Protocol Phase The two phase commit protocol is a distributed algorithm which lets all The sites in a distributed system agree to commit a transaction. transaction. The protocol results in either all nodes committing the transaction or aborting, even in the case of site failures and message losses. Basic Algorithm Basic – 1. 2. 2. Commit-request phase The coordinator sends a query to commit message to all cohorts. The query The cohorts execute the transaction up to the point where they will be The asked to commit. They each write an entry to their undo log and an entry undo to their redo log. redo Each cohort replies with an agreement message if the transaction Each agreement succeeded, or an abort message if the transaction failed. abort The coordinator waits until it has a message from each cohort 3. 3. 4. 4. 1. 2. 2. 3. 3. 4. 4. 5. 5. Commit phase Commit – Success If the coordinator received an agreement message from all cohorts If agreement all during the commit-request phase: during The coordinator writes a commit record into its log. The The coordinator sends a commit message to all the cohorts. The commit Each cohort completes the operation, and releases all the locks and Each resources held during the transaction. Each cohort sends an acknowledgement to the coordinator. Each acknowledgement The coordinator completes the transaction when acknowledgements The have been received. have – Failure 1. If any cohort sent an abort message during the commit-request any abort phase: phase: 2. The coordinator sends a rollback message to all the cohorts. The rollback 1. Each cohort undoes the transaction using the undo log, and 1. Each releases the resources and locks held during the transaction. 2. Each cohort sends an acknowledgement to the coordinator. 2. Each acknowledgement 3. The coordinator completes the transaction when 3. The acknowledgements have been received. acknowledgements 3 Phase Commit Phase Non blocking when failures are restricted to site failures A commit protocol that is synchronous within one state commit transition is nonblocking if and only if its state transition diagram contains neither of the following. diagram – No state that is “adjacent” to both a commit and an abort state. – No noncommittal state that is “adjacent” to a commit state. Replication and Replica Control Protocols Replication Having replicas of data items improves system Having availability. availability. Advantages – With careful design, it is possible to ensure that single points of With failure are eliminated failure – Overall system availability is maintained even when one or more Overall sites fail. Disadvantages Disadvantages – Whenever updates are introduced, the complexity of keeping Whenever replicas consistent arises and this is the topic of replication protocols. protocols. Concepts Concepts Object – Represents a real entity in the Represents system – Represented as a pair (object Represented Identity, state) Identity, – Enables referential object sharing. – Either an atomic value or a Either constructed value constructed – An element of D is a value, called an An atomic value atomic – [a1:v1,…,an:vn], in which ai is an [a1:v1,…,an:vn], element of A and vi is either a value or an element of I, is called a tuple value. value. – {v1,..,vn}, in which vi is either a value {v1,..,vn}, or an element of I, is called a set value. value. – Grouping of common objects – Template for all common objects – Declaring a type to be a subtype of Declaring another. another. Abstract Data Types State Value – Template for all objects of that type. – Describes type of data by providing a Describes domain of data with the same structure, as well as operations applicable to the objects of that domain. domain. – Abstraction capability commonly Abstraction referred as encapsulation. referred – Restriction on composite objects Restriction results in complex objects results – The composite object relationship The between types can be represented by a composition graph. composition – User defined grouping of objects – Similar to class in that it groups Similar objects. – Based on specialization relationship Based among types. among Composition (Aggregation) Collection Class Subtyping Inheritance Object Distribution Design Object Path partitioning – A concept describe the clustering concept of all the objects forming a composite object into a partition. composite – Can be represented as a Can hierarchy of nodes forming a structural index. structural – Index contains the references to Index all the component objects of a composite object, eliminating the need to traverse the class composition hierarchy. composition – Main issue is to improve the Main performance of user queries and applications by reducing the irrelevant data access. irrelevant – Affinity based approach Affinity among instance variables Affinity and methods and affinity among multiple methods can be used for horizontal and vertical class partitioning. partitioning. Allocation – Local behavior-local object Behavior, the object to which it is Behavior, applied, and the arguments are all co-located. co-located. No special mechanism needed to No handle this case. Behavior, the object to which it is Behavior, applied, and the arguments are all co-located. co-located. Two ways to deal – Move th remote object to the site Move where the behavior is located. where – Ship the behavior Ship implementation to the site where the object is located the – Local behavior-remote object Class Partitioning Algorithms – Cost-Driven Approach Client-Server Architecture Client-Server Object Database Object Database Cache Consistency Cache Problem in any data shipping system that moves data to the clients. Cache consistency algorithms – Avoidance-based synchronous algorithms Clients retain read locks across transactions, but they relinquish write locks at the end Clients of the transaction. of The client send lock requests to the server and they block until the server responds. If the client requests a write lock on a page that is cached at other clients. Do not have the message blocking overhead present in synchronous algorithms. Clients send lock escalation messages to the server and continue application Clients processing processing Clients batch their lock escalation requests and send them to the server at commit Clients time. time. The server blocks the updating client if other clients are reading the updated objects. Clients contact the server whenever they access a page in their cache to ensure that Clients the page is not stale or being written to by other clients. the Clients send lock escalation requests to the server, but optimistically assume that their Clients requests will be successful. requests After a client transaction commits, the server propagates the updated pages to all the After other clients that have also cached the affected pages. other Can outperform callback locking algorithms even while encountering a higher abort rate Can if the client transaction state completely fits into the client cache, and all application processing is strictly performed at the clients. – Avoidance-based asynchronous algorithms – Avoidance-based deferred algorithms – Detection-based synchronous algorithms – Detection-based asynchronous algorithms – Detection-based deferred algorithms Object Identifier Management Object Object Identifiers are system generated Used to Uniquely identify every object Transient object identity can be implemented more Transient efficiently efficiently Two common solutions – Physical Identifier approach (POID) Equates the OID with the physical address of the corresponding Equates object object Advantage , the object can be obtained directly from the OID. Drawback, all the parent objects and indexes must be updated Drawback, whenever an object is moved to a different page. whenever – Logical Identifier approach (LOID) Consists of allocating a system wide unique OID. Since OIDs are invariant, there is no overhead due to object Since movement. movement. Object Migration Object Three alternatives can be Three considered for the migration of classes (types) classes – The source code is moved and The recompiled at the destination recompiled – The compiled version of a class is The migrated just like any other object, or or – The source code of the class The definition is moved, but not its compiled operations, for which a lazy migration strategy us used. lazy – Active Active objects are currently Active involved in an activity in response to an invocation or a message to – Waiting Waiting objects have invoked Waiting another object and are waiting for a response. response. – Suspended Suspended objects are Suspended temporarily unavailable for invocation. invocation. Objects can be in one of the four states states – Ready, Ready objects are not currently Ready invoked, or have not received a message, but are ready to be invoked to receive a message. invoked Migration involves two steps – Shipping the object from the Shipping source to the destination, and – Creating a proxy at the source, Creating replacing the original object. replacing Object Clustering Object – Difficult for two reasons Not orthogonal to object identity implementation. Logical Not OIDs incur more overhead , but enable vertical partitioning of classes. classes. Clustering of complex objects along the composition Clustering relationship is more involved because of object sharing . relationship – Given a class graph, there are three basic storage Given models for object clustering models The decomposition storage model, partitions each object The class in binary relations. class The normalized storage model stores each class as a The separate relation. separate The direct storage model enables multi-class clustering of The complex objects based on the composition relationship. complex Distributed Garbage Collection Distributed – As programs modify objects and remove references, a persistent As object may become unreachable from the persistent roots of the system when there is no more reference to it. – Basic garbage collection algorithms can be categorized Basic reference counting reference – In reference counting, each object has an associated count o reference – Each time a program creates an additional reference that points to an Each object, the object’s count is incremented. object, – When reference to an object is destroyed, the corresponding count is When decremented. decremented. – Mark and sweep algorithms Two phase algorithms First phase, mark phase, starts from the root and marks every First reachable object reachable Once all live objects are marked, the memory is examined and Once unmarked objects are reclaimed. – Copy-based algorithms Divide memory into two disjoint areas From-space, Programs manipulate from this space To-space, left empty tracing-based. Object Query Processing – Important issues Object Object Query Processor Object Architectures Architectures – Open OODB project Separation between the user Separation query language parsing structures and the operator graph on which the optimizer operates operates – Cost Function – – – Can be defined recursively based Can on the algebraic processing tree. on – EPOQ project Approach to query optimization Approach extensibility, where the search space is divided into regions space Parameterization Path Expression Rewriting and Algebraic Rewriting Optimization Optimization – Path Indexes – TIGUKAT project Uses an object approach to Uses query processing extensibility query Is an extensible uniform Is behavioral model characterized by a purely behavioral semantics and a uniform approach to objects. approach Query Execution – Path Indexes 1. 2. 3. Query Processing Issues – – Search space and transformation Search rules rules Search Algorithm – Set Matching Algorithms Create an index on each class Create traversed traversed Define indexes on objects across Define their type inheritance their Access support relations, is a Access data structure that stores selected path expression. path Algorithms 1. Centralized Algorithms 2. Join execution algorithm Data Delivery alternatives Data Pull-only – Transfer of data from servers to Transfer clients is initiated by a client pull. pull. – Arrival of new data items or Arrival updates to existing data items are carried out a server without modification to clients unless clients explicitly poll the server. clients Architecture of a Data Architecture Warehouse Warehouse Query/Analysis Reporting Data Mining Q U E R I E s Push-only – Transfer of data from servers to Transfer clients is initiated by a server push in the absence of any specific request from clients. specific Target Database Integ rate Metadata repository Hybrid – Combines the client-pull and Combines server-push mechanisms. server-push Source database Semi structured Data Semi – Free and commercial database on product information etc, interfaces to Free such sources, is typically a collection of fill-out forms. such – Typically modeled as a labeled graph – A labeled graph are self-describing and have no schema. – Object Exchange Model is used to illustrate such a labeled graph A label which is the name of the object class A type which is either atomic (integer, string etc.) or set A value which is either atomic or a set of objects An optional object identifier Wrapper Data Source Web Server Global Data Dictionary Wrapper Data Source Data Source Wrapper Problems with Pull-based approach Problems – users need to know a priori where users and when to look for data. and – Mismatch between the Mismatch asymmetric nature of some applications and the symmetric communications infrastructure on applications such as internet. applications – Two types of asymmetry Network asymmetry, network Network bandwidth between client- server different from server-client. different Distributed information systems, Distributed due to imbalance between the number of clients and the number of servers. of Data, amount of data being Data, transferred between client and server. server. Data volatility Algorithm – Push based approach 1. 2. 3. 3. 4. 5. Why Push based technologies? Response to some of the Response problems inherent in pull-based systems. systems. Order the data items from hottest to Order coldest coldest Partition the data items into ranges of Partition items, such that the items in each range have similar application access profiles. The number of ranges is denoted by num_ranges. num_ranges Choose the relative broadcast Choose frequency for each range as integers (rel_freqi, where i is the range). (rel_freq Divide each range into smaller Divide elements, called chunks (Cij is the j-th elements, chunk of range i). Determine the number of chunks into which range i is divided as num_chunk, = max_chunks/ divided rel_freqi, where max_chunks is the where least common multiple of rel_freqi,¥i. least Create the broadcast schedule by Create interleaving the chunks of each range using the following procedure. using for I from 0 to max_chunks-1 by 1 do for j from 1 to max ranges by 1 do for Broadcast chunk Cj, (i mod (i num_chunksj) num_chunks end-for end-for end-for Difference between pull-based and push-based systems Difference – – Cache replacement policies Prefetching mechanism An idealized algorithm for page replacement is one which determines the An page with the smallest ratio between its probability of access and its frequency of broadcast. frequency PIX algorithm, calculates the “cost” of replacing a page and replaces the PIX least costly one. least The operation of the algorithm is as follows: When a page Pi is brought into cache and inserted into a chain. When Pri = 0, LTi = CurrentTime 2. When Pi is accessed again, it is moved to the top of its own chain and the following When caculations are made: Pri = HF / (Current Time –LTi) + (1 – HF) * LTi , LTi = CurrentTime, CurrentTime 3. If a new page needs to be flushed out to open up space, a lix value is calculated 3. If for the pages at the bottom of each chain and the page with the lowest lix value is flushed out. The lix value is calculated as follows: flushed lixi = Pri/rel-freqi lix where rel-freqi is the relative broadcast frequency of the range (disk) to which that page Pi belongs. 1. ...
View Full Document

This note was uploaded on 12/23/2009 for the course DBST 663 taught by Professor Tba during the Spring '09 term at MD University College.

Ask a homework question - tutors are online