{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

design_theory

design_theory - 224 DESIGN THEORY FOR RELATIONAL DATABASES...

Info icon This preview shows pages 1–22. Sign up to view the full content.

View Full Document Right Arrow Icon
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 4
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 6
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 8
Image of page 9

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 10
Image of page 11

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 12
Image of page 13

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 14
Image of page 15

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 16
Image of page 17

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 18
Image of page 19

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 20
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 224 DESIGN THEORY FOR RELATIONAL DATABASES 3. For no X—vA in F and proper subset Z ofX is F — {X—vA} U {Z—>A} equivalent to F. Intuitively, (2) guarantees that no dependency in F is redundant, and (3) guarantees that no attribute on any left side is redundant. As each right side has only one attribute by (1), surely no attribute on the right is redundant. Theorem 7.3: Every set of dependencies F is equivalent to a set F’ that is minimal. Proof: By Lemma 7.4, assume no right side in F has more than one attribute. To satisfy condition (2), consider each dependency X —>Y in F, in some order, and if F — {X —>Y} is equivalent to F, then delete X —>Y from F. Note that considering dependencies in different orders may result in the elimination of different sets of dependencies. For example, given the set F: A—rB A—>C B—rA C—>A B—>C we can eliminate both B—>A and A—>C, or we can eliminate B—>C, but we cannot eliminate all three. Having satisfied (2), we proceed to satisfy (3) by considering each depen- dency remaining in F, and each attribute in its left side, in some order. If we can eliminate an attribute from a left side and still have an equivalent set of attributes, we do so, until no more attributes can be eliminated from any left side. Again, the order in which attributes are eliminated may affect the result. For example, given AB—>C A—>B B—>A we can eliminate either A or B from AB—rC, but we cannot eliminate them both. E Example 7.6: Let us consider the dependency set F of Example 7.5. If we use the algorithm of Lemma 7.4 to split right sides we are left with: AB—rC BE—>C C—>A CG—rB BC—>D CG—rD ACD—rB CE—rA D—>E CE—>G D—rG Clearly CE—rA is redundant, since it is implied by C—>A. CG—rB is redundant, since CG—rD, C—>A, and ACD—rB imply CG—rB, as can be checked by computing (CG)+. Then no more dependencies are redundant. However. U) (U (Pt-.50) (D 7.2 FUNCTIONAL DEPENDENCIES 225 AB—>C AB—>C _ C—>A C—>A BC—>D BC—>D CD—>B D—>E D—>E D—>G D—>G BE—>C BE—>C CG—>B CG—>D CE—>G CE—>G (a) (b) Fig. 7.2. Two minimal covers. ACD—>B can be replaced by CD—>B, since C—>A is given. Thus one minimal cover for F is shown in Fig. 7.2(a). Another minimal cover, constructed from F by eliminating CE—>A, CG—>D, and ACD—>B, is shown in Fig. 7.2(b). Note that the two minimal covers have different numbers of dependencies. CI 7.3 DECOMPOSITION OF RELATION SCHEMES The decomposition of a relation scheme R = {A1,A2, . . . ,A,” } is its replace- ment by a collection p = {R1, R2, . . . ,Rk } of subsets of R such that R=R1UR2U"‘URk There is no requirement that the Rfs be disjoint. One of the motivations for performing a decomposition is that it may eliminate some of the problems mentioned in Section 7.1. In general, it is the responsibility of the person designing a database (the “database administrator”) to decompose an initial set of relation schemes when warranted. Example 7.7: Let us reconsider the SUPPLIERS relation scheme introduced in Example 3.1, but as a shorthand, let the attributes be S (SNAlVIE), A (SADDRESS), I (ITEM), and P (PRICE). The functional dependencies we shall assume are S—>A and SI —>P. We mentioned in Section 7.1 that replacement of the relation scheme SAI P by the two schemes SA and SIP makes certain problems go away. For example, in SAI P we cannot store the address of a supplier unless the supplier provides at least one item. In SA, there does not have to be an item supplied to record an address for the supplier. |:| One might question whether all is as rosey as it looks, when we replace SAI P by SA and SIP in Example 7.7. For example, suppose we have a relation 1' as the current value of SAIP. If the database uses SA and SIP instead of SAI P, we would naturally expect the current relation for these two relation schemes to be the projection of 1' onto SA and SIP, that is TSA : 7rSA(r) 226 DESIGN THEORY FOR RELATIONAL DATABASES and T5“: = 7rs1p(T). How do we know that TSA and TS“: contain the same information as T? One way to tell is to check that T can be computed knowing only TSA and T5113. We claim that the only way to recover T is by taking the natural join of TSA and Tsjpfl' The reason is that, as we shall prove in the next lemma, if we let 5 = TSA f><ITsjp, then TrSA(s) : TSA, and 7r51p(.s) : TSIP. If s # T, then given TSA and 7‘5 11: there is no way to tell whether T or s was the original relation for scheme SAIP. That is, if the natural join doesn’t recover the original relation, then there is no way whatsoever to recover it uniquely. Lossless Joins If R is a relation scheme decomposed into schemes R1,R2, . . . ,Rk, and D is a set of dependencies, we say the decomposition is a lossless join decomposition (with respect to D) if for every relation T for R satisfying D: T = WR1(T) f><I WR2(T) f><I ' - - f><17rRk(T) that is, T is the natural join of its projections onto the &’s. From our remarks above, it is apparent that the lossless join property is a desirable condition for a decomposition to satisfy, so we shall study the subject of lossless joins in some detail. Some basic facts about project-join mappings follow in Lemma 7.5. First we introduce some notation. If p = (R1,R2, . . . ,Rk), then Tnp is the mapping defined by mp(T) = I><I'-c 17031.0). That is, mp(T) is the join of the projections of T onto the relation schemes in p. Thus the lossless join condition with respect to a set of dependencies D can be expressed as: for all T satisfying D, T = mp(T), As another useful notational convention, if t is a tuple, we define t[X], where X is a set of attributes, to be the components of t for the attributes of X .i For example, we could express 7rX(T) as I t is in T Lemma 7.5: Let R be a relation scheme, p = (R1, . . . , R1,) a decomposition of R, T a relation for R, and Ti 2 TR, Then a) T g TWO“)- b) If s '2 mp(T), then 7031(5) = T,-. 0) mp(mp(")) .: mp0“)- Proof: a) Let t be in T. Then for each i, t,- = is in T1“. By definition of the natural join, if is in mp(T), since if agrees with t,- on the attributes of R,; for all i. b) As T g s by (a), it follows that 7rRz.(T) g 7031(5). That is, T,; g 7032(5). To 1' Recall Section 5.2 for a definition of the natural join. I Recall that t is a mapping from attributes to values, so t[X] is that mapping restricted to domain X. In practice, we always pick some ordering for the attributes and show tuples, or restricted tuples such as t[X], as lists of values. ne 09; xt If I he 81‘ DH ks 1e st 18 ns ct re of 18 31‘ to or 7.3 DECOMPOSITION OF RELATION SCHENIES 227 show TrR,(s) g Ti, suppose for some particular 2' that t,- is in WRILS). Then there is some tuple t in s such that t[Ri] 2 ti. As t is in 3, there is some uj in rj for each j such that t[Rj] = Uj. Thus, in particular, t[R,'] is in n. But t[R,] : ti, so t,- is in Ti, and therefore waits) g Ti. We conclude that Ti = W313). c) If s 2 mph), then by (b), TrR,(s) : n. Thus mp(s) = I><I§=1ri = mph). Cl Let us observe that if for each 2', Ti is some relation for 13,-, and s = I><I§=1n then 7rRi(s) is not necessarily equal to n. The reason is that r,- may contain “dangling” tuples that do not match with anything when we take the join. For example, if R1 2 AB, R2 : BC, r1 : {albl }, and r2 : {b161,b262 }, then s =_ {alblcl} and TFBC(S) = {blcl} 7é r2. However, in general, 7rRi(s) g n, and if the us are each the projection of some one relation r, then 7rRi(s) : n. The ability to store “dangling” tuples is an advantage of decomposition. As we mentioned previously, this advantage must be balanced against the need to compute more joins when we answer queries, if relation schemes are decomposed, than if they are not. When all things are considered, it is generally believed that decomposition is desirable when necessary to cure the problems, such as redundancy, described in Section 7.1, but not otherwise. Testing Lossless Joins It turns out to be fairly easy to tell whether a decomposition has a lossless join with respect to a set of functional dependencies. Algorithm 7.2: Testing for a Lossless Join. Input: A relation scheme R = A1. - -An, a set of functional dependencies F, and a decomposition p = (R1, . . . ,Rk). Output: A decision whether p is a decomposition with a lossless join. Method: We construct a table with n columns and k rows; column j corresponds to attribute Aj, and row 2' corresponds to relation scheme Bi. In row i and column 3' put the symbol aj if AJ- is in Bi. If not, put the symbol bij there. Repeatedly “consider” each of the dependencies X —>Y in F, until no more changes can be made to the table. Each time we “consider” ,X—>Y, we look for rows that agree in all the columns for the attributes of X. If we find two such rows, equate the symbols of those rows for the attributes of Y. When we equate two symbols, if one of them is aj, make the other be aj. If they are bi]- and bgj, make them both bij or bgj, arbitrarily. If after modifying the rows of the table as above, we discover that some row has become (11- - «1k, then the join is lossless. If not, the join is lossy (not lossless). El 228 DESIGN THEORY FOR RELATIONAL DATABASES Example 7.8: Let us consider the decomposition of SAI P into SA and SIP as in Example 7.7. The dependencies are S—>A and SI—>P, and the initial table 18 Since S —+A, and the two rows agree on S, we may equate their symbols for A, making (322 become (12. The resulting table is Since one row has all a’s, the join is lossless. For a more complicated example, let R = ABC’DE, R1 = AD, R2 2 AB, R3 : BE, R4 : ODE, and R5 2 AE. Let the functional dependencies be: A—>C’ DE—>C’ B—>C’ CE—>A C—>D The initial table is shown in Fig. 7.3(a). We can apply A—>C’ to equate b13, (J23, and b53. Then we use B—rC to equate these symbols with b33; the result is shown in Fig. 7.3(b), where b13 has been chosen as the representative symbol. Now use C—>D to equate a4, b24, b34, and b54; the resulting symbol is a4. Then DE—>C’ enables us to equate b13 with a3, and C’E—>A lets us equate b31, b41, and al. The result is shown in Fig. 7.3(c). Since the middle row is all a’s, the decomposition has a lossless join. 3 It is interesting to note that one might assume Algorithm 7.2 could be simplified by only equating symbols if one was an a. The above example shows this is not the case; if we do not begin by equating b13, b23, b33, and b53, we can never get a row of all a’s. Theorem 7.4: Algorithm 7.2 correctly determines if a decomposition has a lossless join. Proof: Suppose the final table produced by Algorithm 7.2 does not have a row of all a’s. We may view this table as a relation 7' for scheme R; the rows are tuples, and the aj’s and bij’s are distinct symbols chosen from the domain of Aj. Relation 7' satisfies the dependencies F, since Algorithm 7.2 modifies the table whenever a violation of the dependencies is found. We claim that r 75 mp(r). Clearly r does not contain the tuple a1a2. - -an. But for each R1, there is a tuple t,- in 7', namely the tuple that is row i, such that ti[Rz~] consists of all a’s. Thus the join of the «Ri(r)’s contains the tuple with all a’s, since that tuple agrees with ti for all i. We conclude that if the final table from Algorithm 7.2 does U4 \U Lu 7.3 DECOMPOSITION OF RELATION SCHEMES 229 A B C D E a 1 b1 2 b1 3 a4 bl 5 a1 (12 (>23 (>24 (>25 b3 1 (12 (>33 534 (15 b4 1 (>42 (13 a4 a5 (11 552 (>53 554 (15 (a) A B C D E (11 b1 2 b1 3 (14 b1 5 a1 (12 (>13 524 525 b3 1 (12 b1 3 b34 (15 b4 1 (>42 (13 a4 a5 a1 552 (’13 (’54 (15 (b) A B C D E (11 b1 2 a3 (14 b1 5 a1 a2 (13 a4 1’25 (11 (12 a3 a4 a5 (11 (L12 03 a4 (15 1 (11 b52 a3 a4 (15 Fig. 7 .3. Applying Algorithm 7.2. not have a row with all a’s, then the decomposition p does not have a lossless join; we have found a relation 7' for R such that mp('r) # 7'. Conversely, suppose the finaltable has a row with all a’s. We can in general view the table as shorthand for the domain relational calculus expression {0102- - ‘an I (3b11)--'(3bkn)(R(W1) /\ --- /\ R(wk))} (7-1) where w is the ith row of the initial table. Formula (7.1) defines the function mp, since mp('r) contains an arbitrary tuple (11- - -an if and only if for each 2', 7' contains a tuple with (1’s in the attributes of R; and arbitrary values in the other attributes. Since 'we assume that any relation 7' for scheme R, to which (7.1) could be applied, satisfies the dependencies F, we can infer that (7.1) is equivalent to a set of similar formulas with some of the a’s and/or b’s identified. The modifications made to the table by Algorithm 7.2 are such that the table is 230 DESIGN THEORY FOR RELATIONAL DATABASES row for R1 . aa- - - row for R2 (10,- - ‘a, Fig. 7.4. A general two row table. always shorthand for some formula whose value on relation 1‘ is mp(r) whenever 1‘ satisfies F, as can be proved by an easy induction on the number of symbols identified. Since the final table contains a row with all a’s, the domain calculus expression for the final table is of the form. {a1--.an|R(a1---an)A~--} I (72) Clearly the value of (7.2) applied to relation r for R, is a subset of r. However, if 1‘ satisfies F, then the value of (7.2) is mp(r), and by Lemma 7.5(a), r g mp(r). Thus whenever 1‘ satisfies F, (7.2) computes r, so 1‘ = mp(r). That is to say, the decomposition p has a lossless join with respect to F. El Algorithm 7.2 can be applied to decompositions into any number of relation schemes. However, for decompositions into two schemes we can give a simpler test, the subject of the next theorem. Theorem 7.5: If p = (R1,R2) is a decomposition of R, and F is a set of functional dependencies, then p has a lossless join with respect to F if and only if (R1 (7 R2)—>(R1 —R2) or (R1 (7 R2)—>(R2 —R1). Note that these dependencies need not be in the given set F; it is sufficient that they be in F +. Proof: The initial table used in an application of Algorithm 7.2 is shown in Fig. 7.4, although we have omitted the subscripts on a and b, which are easily determined and immaterial anyway. It is easy to show by induction on the number of symbols identified by Algorithm 7.2 that if the b in the column for attribute A is changed to an a, then A is in (R1 {‘1 R2)+. It is also easy to show by induction on the number of steps needed to prove (R1 {‘1 R2)—>Y by Armstrong’s axioms, that any b’s in the columns for attributes in Y are changed to a’s. Thus the row for R1 becomes all a’s if and only if R2 —R1 g (R1 {‘1 R2)+, that is (R1 ['1 R2)—>(R2 — R1), and similarly, the row for R2 becomes all a’s if and only if (R1 {‘1 R2)—>(R1 — R2). El Example 7.9: Suppose R = ABC and F = {A—rB Then the decomposition of R into AB and AC hasa lossless join, since AB ['1 AC 2 A, AB—AC = B,T and A—>B holds. However. if we decompose R into R1 2 AB and R2 2 BC, we discover that R1 {‘1 R2 = B, and B functionally determines neither R1 — R2 = A nor R2 ~— R1 = C. Thus the decomposition AB and BC does not have a lossless join with respect to F = {A—rB }, as can be seen by considering the 1' To make sense of equations like these do not forget that A1A2- - -An stands for the set of attributes {A1,A2, . . . ,An 7.3 DECOMPOSITION OF RELATION SCHEMES 231 relation 1‘ = {a1b1c1, a2b102} for R. Then 7rAB(r) = {a1b1, 02b1}, 7rBC(r) = {b1c1, b162 }, and 7rAB(r)I><I7rBC(r)={a1b1c1,a1b102,a2b1c1, a2b1c2 El Decompositions that Preserve Dependencies We have seen that it is desirable for a decomposition to have the lossless join property, because it guarantees that any relation can be recovered from its projections. Another important property of a decomposition of relation scheme R into p -—= (R1, . . . ,Rk) is that the set of dependencies F for R be implied by the projection of F onto the Ri’s. Formally, the projection of F onto a set of attributes Z, denoted 7rz (F), is the set of dependencies X —>Y in F + such that XY Q Z. (Note that X—rY need not be in F; it need only be in F+.) We say decomposition p preserves a set of dependencies F if the union of all the dependencies in 7rR, (F), for 'i = 1, 2, . . . ,lc logically implies all the dependencies in F. The reason it is desirable that p preserve F is that the dependencies in F can be viewed as integrity constraints for the relation R. If the projected dependencies do not imply F, then should we represent R by p = (R1, . . . , Rk), we could find that the current value of the Ri’s represented a relation R that did not satisfy F, even .if p had the lossless join property with respect to F. Alternatively, every updatewtomoneflof the Ri’s would require a join to check that the constraints were not violate . , Example 7.10: Let us reconsider the problem of Example 7.3, where we had attributes CITY, ST, and ZIP, which we here abbreviate C, S, and Z. We ob— served the dependencies CS—rZ and Z —rC . The decomposition of the relation scheme CS Z into 8 Z and CZ has a lossless join, since (SZ {7 CZ)—>(CZ — SZ) However, the projection of F = {CS—>2, Z —>C } onto SZ gives only the trivial dependencies that follow from reflexivity, while the projection onto CZ gives Z —>C and the trivial dependencies. It can be checked that Z —>C and trivial dependencies do not imply CS—rZ, so the decomposition does not preserve dependencies. For example, the join of the two relations in Fig. 7.5(a) and (b) is the rela- tion of Fig. 7.5(c). Figure 7.5(a) satisfies the trivial dependencies, as any rela- tion must. Figure 7.5(b) satisfies the trivial dependencies and the dependency Z—rC. However, their join in Fig. 7.5(c) violates CS—>Z. El We should note that a decomposition may have a lossless join with respect to set of dependencies F, yet not preserve F. Example 7.10 gave one such instance. Also, the decomposition could preserve F yet not have a lossless join. For example, let F = {A—rB, C—rD }, R = ABCD, and p 2 (AB, CD). 232 DESIGN THEORY FOR RELATIONAL DATABASES 545 Tech Sq. Cambridge, Mass. 545 Tech Sq. 02139 Cambridge, Mass. 02139 (a) (b) 545 Tech Sq. 545 Tech Sq. (c) Fig. 7.5. A join violating a functional dependency. Cambridge, Mass. Cambridge, Mass. Testing Preservation of Dependencies In principle, it is easy to test whether a decomposition p = (R1,...,Rk) preserves a set of dependencies F. Just compute F + and project it onto all the Ri’s. Take the union of the resulting sets of dependencies, and test whether this set covers F. However, in practice, just computing F + is a formidable task, since the number of dependencies it contains will be exponential in the size of F. There— fore, it is fortunate that there is a way to test preservation without actually computing F +; this method takes time that is polynomial in the size of F. Algorithm 7.3: Testing Preservation of Dependencies. Input: A decompostion p 2 (R1, . . . , RIC) and a set of functional dependencies F. Output: A decision whether p preserves F. Method: Define ‘G to be Ui=1 «RAF Note that we do not compute G; we merely wish to see whether it covers F. To test whether G covers F, we must consider each X —>Y in F and determine whether X +, computed with respect to G, contains Y. The trick we use to compute X + without having G available is to consider repeatedly what the effect is of closing X with respect to the projections of F onto the various Ri’s. That is, define an R-operation on set of attributes Z with respect to a set of dependencies F to be the replacement of Z by Z U ((Z n R)+ n R), the closure being taken with respect to F. This operation adjoins to Z those attributes A such that (Z n R)—>A is in 7rR(F Then we compute X + with respect to G by starting with X, and repeatedly running through the list of Ri’s, performing the Ri-operation for each i in turn. If at some pass, none of the Ri-operations make any change in the current set of attributes, then we are done; the resulting set is X ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern