chap8_p5 - Probability and Statistics with Reliability,...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Probability and Statistics with Reliability, Queuing and Computer Science Applications Second edition by K.S. Trivedi Publisher-John Wiley & Sons Chapter 8 (Part 5) :Continuous Time Markov Chains Reliability Modeling Dept. of Electrical & Computer engineering Duke University Email:[email protected] URL: www.ee.duke.edu/~kst Copyright © 2003 by K.S. Trivedi 1 Outline of This Part of Chapter 8 • • • • • Software Reliability Growth Models Hardware Reliability Models A Safety Model A Security Model A Real-Time System Model Copyright © 2003 by K.S. Trivedi 2 Software Reliability Growth Models Copyright © 2003 by K.S. Trivedi 3 Software Reliability Growth Models • Failure data is collected during testing • Calibrate a reliability growth model using failure data; this model is then used for prediction • Many SRGMs exist – NHPP – Jelinski Moranda • We revisit the above models which we studied in Chapter 5, studying them now as examples of CTMCs. Copyright © 2003 by K.S. Trivedi 4 Poisson Process • The Poisson process,{N(t) | t ≥ 0}, is a homogeneous CTMC (pure birth type) with state diagram shown below • Since failure intensity is time independent, it cannot capture reliability growth. Hence we resort NHPP. λ 0 λ 1 λ 2 ....... Copyright © 2003 by K.S. Trivedi 5 Example –Software Reliability Growth Model (NHPP) • Consider a Nonhomogenous Poisson process (NHPP) proposed by Goel and Okumoto, as a model of software reliability growth during the testing phase. Note that the Markov property is satisfied and it is an example of a nonhomogeneous CTMC • Assume that the number of failures N(t) occurring in time interval (0, t] has a time-dependent failure intensity λ(t). • Expected number of software failures experienced (and equated to the number of faults found and fixed) by time t: t m(t ) = E[ N (t )] = ∫ λ ( x)dx 0 Copyright © 2003 by K.S. Trivedi 6 Example –Software Reliability Growth Model (NHPP) (Contd.) • Using previous equation the instantaneous failure intensity can be rewritten by • This implies that failure intensity is proportional to expected no. of undetected faults at ‘t’ • Many commonly used NHPP software reliability growth models are obtained by choosing different failure intensities λ(t), e.g. Goel-Okumoto, Musa-Okumoto model etc. Copyright © 2003 by K.S. Trivedi 7 Software Reliability Growth Model Finite failure NHPP models • Nature of the failure occurrence rate per fault and the corresponding NHPP model – Constant : • Goel-Okumoto model – Increasing : • S-shaped model • Generalized Goel-Okumoto model – Decreasing : • Generalized Goel-Okumoto model – Increasing/Decreasing : • Log-logistic model Copyright © 2003 by K.S. Trivedi 8 Example- Jelinski Moranda Model • This model is based on the following assumptions: – The number of faults introduced initially into the software is fixed, say, n. – At each failure occurrence, the underlying fault is removed immediately and no new faults are introduced. – Failure rate is state-dependent and is proportional to the number of remaining faults, that is, µi = iµ, i = 1, 2, . . . n. • Model can be described by pure death process • The constant of proportionality µ denotes the failure intensity contributed by each fault, which means that all the remaining faults contribute the same amount to the failure intensity. Copyright © 2003 by K.S. Trivedi 9 Example- Jelinski Moranda Model (Contd.) nµ n (n-1)µ n-1 µ 0 • The mean-value function is given by • This can be seen as the expected reward rate at time t after assigning reward rate ri = n-i to state i. Copyright © 2003 by K.S. Trivedi 10 Hardware Reliability Models • Two component Markov reliability model with repair • Two component Markov model with imperfect fault coverage • WFS reliability model Copyright © 2003 by K.S. Trivedi 11 Markov Reliability Model With Repair • Consider the 2-component parallel system (no delay + perfect coverage) but disallow repair from system down state. • Note that state 0 is now an absorbing state. The state diagram is given in the following figure. • This reliability model with repair cannot be modeled using a reliability block diagram or a fault tree. We need to resort to Markov chains. (This is a form of dependency since in order to repair a component you need to know the status of the other component). Copyright © 2003 by K.S. Trivedi 12 Markov Reliability Model With Repair (Contd.) Absorbing state • Markov chain has an absorbing state. In the steady-state, system will be in state 0 with probability 1. Hence steady state analysis will yield a trivial answer; transient analysis is of interest. States 1 and 2 are transient states. Copyright © 2003 by K.S. Trivedi 13 Markov Reliability Model With Repair (Contd.) • Some authors erroneously claim that reliability models do not admit repair. • In the model on previous slide, we have component repair from state 1; system has not failed in this state. • In a reliability model we do not allow repair from system failure states (such as state 0). • Thus, there must be one or more absorbing states in a reliability model Copyright © 2003 by K.S. Trivedi 14 Markov Reliability Model With Repair (Contd.) • Assume that the initial state of the Markov chain is 2, that is, π2(0) = 1, πk (0) = 0 for k = 0, 1. • Then the system of differential Equations is written based on: Rate of buildup = Rate of flow in - Rate of flow out for each state Copyright © 2003 by K.S. Trivedi 15 Markov Reliability Model With Repair (Contd.) dπ 2 (t ) = −2λπ 2 (t ) + µπ 1 (t ) dt dπ 1 (t ) = 2λπ 2 (t ) − (λ + µ )π 1 (t ) dt dπ 0 (t ) = λπ 1 (t ) dt Copyright © 2003 by K.S. Trivedi 16 Markov Reliability Model With Repair (Contd.) Using the technique of Laplace transform, we can reduce the above system to: sπ 2 ( s ) − 1 = −2λπ 2 ( s ) + µ π 1 ( s ) sπ 1 ( s ) = 2λπ 2 ( s ) − (λ + µ )π 1 ( s ) sπ 0 ( s ) = λπ 1 ( s ) ∞ where π ( s ) = ∫ e π (t )dt − st 0 Copyright © 2003 by K.S. Trivedi 17 Markov Reliability Model With Repair (Contd.) Solving for π 0 (s) , we get: ___________ 2λ π 0 (s) = 2 2 s[ s + (3λ + µ ) s + 2λ ] 2 • After an inversion, we can obtain π0 (t), the probability that no components are operating at time t ≥ 0. For this purpose, we carry out a partial fraction expansion. Copyright © 2003 by K.S. Trivedi 18 Markov Reliability Model With Repair (Contd.) Inverting the transform, we get 2λ2 e −α1t e −α 2t ( − ) R(t ) = 1 − π 0 (t ) = α1 − α 2 α1 α2 where (3λ + µ ) + λ2 + 6λµ + µ 2 − α1 , α 2 = 2 Copyright © 2003 by K.S. Trivedi 19 Markov Reliability Model With Repair (Contd.) ∞ Recalling that MTTF = ∫ R (t ) dt, we get: 0 2λ2 1 2λ2 (α1 + α 2 ) 1 MTTF = 2 2 α 2 − α 2 = α1 − α 2 α1 α 2 2 1 2λ2 (3λ + µ ) 3 µ = = +2 22 ( 2λ ) 2λ 2λ Copyright © 2003 by K.S. Trivedi 20 Markov Reliability Model With Repair (Contd.) • Note that the MTTF of the two component parallel redundant system,in the absence of a repair facility (i.e., µ = 0), would have been equal to the first term, 3 / ( 2*λ ), in the above expression. • Therefore, the effect of a repair facility is to increase the mean life by µ / (2*λ2), or by a factor µ 2λ2 = µ 3 3λ 2λ Copyright © 2003 by K.S. Trivedi 21 Model made in SHARPE GUI Copyright © 2003 by K.S. Trivedi 22 Parameters entered for the Model Copyright © 2003 by K.S. Trivedi 23 Sharpe Input file generated by GUI Model defined • • • • • • * Initial Probailities assigned: bind init_Rel_Rep_2 0 init_Rel_Rep_1 0 init_Rel_Rep_0 0 end • echo *************************************************** ************************* echo ********* Outputs asked for the model: Rel_Rep ************** • func Reliability(t) 1-tvalue(t;Rel_Rep; lambda, mu) loop t,1,1000,10 expr Reliability(t) end bind lambda 0.0002 bind mu 1/5 • • markov Rel_Rep(lambda, mu) 2 1 2*lambda 1 0 lambda 1 2 mu end * Initial Probabilities defined: 2 init_Rel_Rep_2 1 init_Rel_Rep_1 0 init_Rel_Rep_0 end bind lambda 0.0002 bind mu 1/5 • • • • • • • • • • • • * Initial Probability: ini1 bind init_Rel_Rep_2 1 init_Rel_Rep_1 0 init_Rel_Rep_0 0 end • • • • format 8 factor on • • • • • • • • • • var MTTAb mean(Rel_Rep, 0; lambda, mu) expr MTTAb • end Initial prob. assigned Output asked Copyright © 2003 by K.S. Trivedi 24 Output generated by SHARPE GUI Copyright © 2003 by K.S. Trivedi 25 Graph between Reliability and time Copyright © 2003 by K.S. Trivedi 26 Markov Reliability Model With Imperfect Coverage Copyright © 2003 by K.S. Trivedi 27 Markov Model With Imperfect Coverage • Next consider a modification of the above example proposed by Arnold as a model of duplex processors of an electronic switching system. • Assuming that not all faults are recoverable and that c is the coverage factor which denotes the conditional probability that the system recovers given that a fault has occurred. • The state diagram is now given by the following picture: Copyright © 2003 by K.S. Trivedi 28 Markov Model With Imperfect Coverage (Contd.) c Copyright © 2003 by K.S. Trivedi 29 Markov Model With Imperfect Coverage (Contd.) • Assume that the initial state is 2 so that: π2 (0) = 1, π0 (0) = π1 (0) = 0 • Then the system of differential equations are: dπ2 (t ) = −2λcπ2 (t ) − 2λ (1 − c) π2 (t ) + µπ1 (t ) dt dπ1 (t ) = 2λcπ2 (t ) − (λ + µ ) π1 (t ) dt dπ0 (t ) = 2λ (1 − c) π2 (t ) + λπ1 (t ) dt Copyright © 2003 by K.S. Trivedi 30 Markov Model With Imperfect Coverage (Contd.) Using Laplace transforms as before, the above system reduces to: sπ 2 ( s ) − 1 = −2λπ 2 ( s ) + µ π 1 ( s ) sπ 1 ( s ) = 2λcπ 2 ( s ) − (λ + µ )π 1 ( s ) sπ 0 ( s ) = λπ 1 ( s ) + 2λ (1 − c)π 2 ( s ) Copyright © 2003 by K.S. Trivedi 31 Markov Model With Imperfect Coverage (Contd.) • After solving the differential equations we obtain: R(t)=π2(t) + π1(t) • From R(t), we can system MTTF: λ (1 + 2c) + µ MTTF = 2λ[λ + µ (1 − c)] • It should be clear that the system MTTF and system reliability are critically dependent on the coverage factor. Copyright © 2003 by K.S. Trivedi 32 Model made in SHARPE GUI Copyright © 2003 by K.S. Trivedi 33 Graph between R(t) and time Copyright © 2003 by K.S. Trivedi 34 Markov Reliability Model with Repair (WFS Example) Copyright © 2003 by K.S. Trivedi 35 Markov Reliability Model With Repair (WFS Example) • WFS: Workstation File System • Assume that the computer system does not recover if both workstations fail, or if the file-server fails. Copyright © 2003 by K.S. Trivedi 36 Markov Reliability Model With Repair • States (0,1), (1,0) and (2,0) become absorbing states while (2,1) and (1,1) are transient states. • Note: we have made a simplification that, once the CTMC reaches a system failure state, we do not allow any more transitions. Copyright © 2003 by K.S. Trivedi 37 Markov Reliability Model With Repair (Contd.) • If we solve for π2,1(t) and π1,1(t) then R(t)=π2,1(t) + π1,1(t) • For a Markov chain with absorbing states: A: the set of absorbing states B = Ω - A: the set of remaining states τi,j: Mean time spent in state i,j until absorption ∞ τ i , j = ∫ π i , j ( x ) dx , ( i , j ) ∈ B 0 τQB = −π B (0) Copyright © 2003 by K.S. Trivedi 38 Markov Reliability Model With Repair (Contd.) • QB derived from Q by restricting it to only states in B • Mean time to absorption MTTA is given as: MTTA = ∑τ ( i , j )∈B i, j Copyright © 2003 by K.S. Trivedi 39 Markov Reliability Model With Repair (Contd.) QB = − (λ f + 2λw ) 2λw µw − ( µ w + λ f + λw ) First solve dπ 2,1 (t ) dt dπ 1,1 (t ) dt = −(2λw + λ f )π 2,1 (t ) + µ wπ 1,1 (t ) = −( µ w + λ f + λw )π 1,1 (t ) + 2λwπ 2,1 (t ) Copyright © 2003 by K.S. Trivedi 40 Markov Reliability Model With Repair (Contd.) Then : R(t ) = π2,1 (t ) + π1,1 (t ) next solve τ 2,1 (−(λ f + 2λw )) + τ1,1µ w = −1 τ 2,1 2λw − τ1,1 ( µ w + λ f + λw ) = 0 Then : MTTF = τ 2,1 + τ1,1 • Mean time to failure is 19992 hours (input values refer to Part 2 of Chapter 8). Copyright © 2003 by K.S. Trivedi 41 Model made in SHARPE GUI Copyright © 2003 by K.S. Trivedi 42 Parameters assigned and output asked Copyright © 2003 by K.S. Trivedi 43 SHARPE (textual) input file • • format 8 factor on Model defined • • • • • • • • • • • • • • markov repair(lamW, lamF, muW) 2_1 1_1 2*lamW 2_1 2_0 lamF 1_1 0_1 lamW 1_1 1_0 lamF 1_1 2_1 muW end * Initial Probabilities defined: 2_1 init_repair_2_1 1_1 init_repair_1_1 0_1 init_repair_0_1 2_0 init_repair_2_0 1_0 init_repair_1_0 end • • • • • • • • * Initial Probailities assigned: bind init_repair_2_1 0 init_repair_1_1 0 init_repair_0_1 0 init_repair_2_0 0 init_repair_1_0 0 end • • echo **************************************************************** ************ echo ********* Outputs asked for the model: repair ************** • • • • • • • • * Initial Probability: config1 bind init_repair_1_0 0 init_repair_0_1 0 init_repair_2_1 1 init_repair_2_0 0 init_repair_1_1 0 end • • • bind lamW 0.0003 bind lamF 0.0001 bind muW 1 • • • var MTTAb mean(repair; lamW, lamF, muW) echo Mean time to absorption for repair expr MTTAb • • • bind lamW 0.0003 bind lamF 0.0001 bind muW 1 • • • • func Reliability(t) 1-tvalue(t;repair; lamW, lamF, muW) loop t,1,1000,100 expr Reliability(t) end • end Copyright © 2003 by K.S. Trivedi Output asked Initial prob. assigned 44 Output generated by SHARPE GUI Copyright © 2003 by K.S. Trivedi 45 Graph between R(t) and time Copyright © 2003 by K.S. Trivedi 46 Markov Reliability Model Without Repair Copyright © 2003 by K.S. Trivedi 47 Markov Reliability Model without Repair: Case 1 (Contd.) States (0,1), (1,0) and (2,0) become absorbing states Copyright © 2003 by K.S. Trivedi 48 Model made in SHARPE GUI Copyright © 2003 by K.S. Trivedi 49 Parameters assigned and Output asked Copyright © 2003 by K.S. Trivedi 50 Output generated by SHARPE GUI Copyright © 2003 by K.S. Trivedi 51 Overlapped graph R(t) for with and without repair Copyright © 2003 by K.S. Trivedi 52 Markov Reliability Model without Repair: Case 1 (Contd.) − (λ f + 2λw ) 2λw QB = 0 − (λ f + λ w ) R (t ) = π2,1 (t ) + π1,1 (t ) MTTF = τ 2 ,1 + τ1,1 • Mean time to failure is 9333 hours (see Part2 of Chapter 8). 53 Copyright © 2003 by K.S. Trivedi NHCTMC Model of the Duplex System Copyright © 2003 by K.S. Trivedi 54 NHCTMC Model of the Duplex System • Consider a duplex system with two processors, each of which has a time-dependent failure rate λ(t) = λ0αtα-1 . • The system shown is a non-homogeneous CTMC, because, as its name suggests, it contains one or more time-dependent transition rates. Copyright © 2003 by K.S. Trivedi 55 NHCTMC Model of the Duplex System (Contd.) • The transient behavior of a NHCTMC satisfies the linear system of first order differential equations: • The Q matrix becomes Copyright © 2003 by K.S. Trivedi 56 NHCTMC Model of the Duplex System (Contd.) • Hence we can define an average failure rate: Copyright © 2003 by K.S. Trivedi 57 3 Active Units and One Spare Copyright © 2003 by K.S. Trivedi 58 3 Active Units and One Spare • Consider a system with three active units and one spare. The active configuration is operated in TMR (Triple Modular Redundancy) mode. An active unit has a failure rate λ, while a standby spare unit has a failure rate µ. Copyright © 2003 by K.S. Trivedi 59 3 Active Units and One Spare (Contd.) • Copyright © 2003 by K.S. Trivedi 60 3 Active Units and One Spare (Contd.) • So lifetime distribution becomes • The expression outside the square brackets is the Laplace–Stieltjes transform of EXP(3λ+µ), while the expression within the braces is the LST of HYPO (2 λ, 3 λ). . Copyright © 2003 by K.S. Trivedi 61 3 Active Units and One Spare (contd.) • Therefore, the system lifetime X has the stage-type distribution given as in this figure. Copyright © 2003 by K.S. Trivedi 62 Model made in SHARPE GUI Copyright © 2003 by K.S. Trivedi 63 Parameter assigned and output asked Copyright © 2003 by K.S. Trivedi 64 Output generated by SHARPE GUI Copyright © 2003 by K.S. Trivedi 65 Graph between R(t) and time Copyright © 2003 by K.S. Trivedi 66 Operational Security Copyright © 2003 by K.S. Trivedi 67 Operational Security • Assuming that at each newly visited node of the privilege graph, the attacker chooses one of the elementary attacks that can be issued from that node only (memoryless property) and assigning to each arc a rate at which the attacker succeeds with the corresponding elementary attack, the privilege graph is transformed into a CTMC. Copyright © 2003 by K.S. Trivedi 68 Operational Security (Contd.) ˆ • The matrix QR obtained from generator matrix Q by restricting only to the transient states is • From this it follows that METF (Mean Effort To Failure) becomes Copyright © 2003 by K.S. Trivedi 69 Recovery Block Architecture Copyright © 2003 by K.S. Trivedi 70 Recovery Block Architecture • Consider a recovery block (RB) architecture implemented on a dual processor system that is able to tolerate one hardware fault and one software fault. • The hardware faults can be tolerated due to the hot standby hardware component with a duplication of the RB software and a concurrent comparator for acceptance tests. Copyright © 2003 by K.S. Trivedi 71 Recovery Block Architecture (Contd.) • The transition rates and their meanings are given in the table Copyright © 2003 by K.S. Trivedi 72 Recovery Block Architecture (Contd.) • The system of differential equation is given by • Thus reliability of system becomes Copyright © 2003 by K.S. Trivedi 73 Recovery Block Architecture (Contd.) • Similarly, the absorption probability to the safe failure state is: • And the absorption probability to the unsafe failure state is: Copyright © 2003 by K.S. Trivedi 74 Model made in SHARPE GUI Copyright © 2003 by K.S. Trivedi 75 Parameter assigned and Output asked Copyright © 2003 by K.S. Trivedi 76 SHARPE Input file • • • format 8 factor on • • • • • • • • • • • • • markov Recovery_b_Archi(lam21, lam13, lam14, lam24, lam23) 2 1 lam21 2 UF lam24 2 SF lam23 1 SF lam13 1 UF lam14 end * Initial Probabilities defined: 2 init_Recovery_b_Archi_2 1 init_Recovery_b_Archi_1 SF init_Recovery_b_Archi_SF UF init_Recovery_b_Archi_UF end • • • • • • • * Initial Probailities assigned: bind init_Recovery_b_Archi_2 0 init_Recovery_b_Archi_1 0 init_Recovery_b_Archi_SF 0 init_Recovery_b_Archi_UF 0 end • echo • **************************************************************** ************ echo ********* Outputs asked for the model: Recovery_b_Archi ************** Output asked * Initial Probability: ini bind init_Recovery_b_Archi_UF 0 init_Recovery_b_Archi_2 1 init_Recovery_b_Archi_1 0 init_Recovery_b_Archi_SF 0 end • • • • • bind lam21 0.00007 bind lam13 0.00015 bind lam14 0.00012 bind lam24 0.00007 bind lam23 0.0001 • • • • func Reliability(t) 1-tvalue(t;Recovery_b_Archi; lam21, lam13, lam14, lam24, lam23) loop t,1,1000,100 expr Reliability(t) end • • • • • bind lam21 0.00007 bind lam13 0.00015 bind lam14 0.00012 bind lam24 0.00007 bind lam23 0.0001 • • var MTTAb mean(Recovery_b_Archi, UF; lam21, lam13, lam14, lam24, lam23) expr MTTAb • Model defined • • • • • • • end Copyright © 2003 by K.S. Trivedi Initial prob. assigned 77 Output generated by SHARPE GUI Copyright © 2003 by K.S. Trivedi 78 Plot between R(t) and time Copyright © 2003 by K.S. Trivedi 79 Conditional MTTF of a FaultTolerant System Copyright © 2003 by K.S. Trivedi 80 Conditional MTTF of a Fault-Tolerant System • Consider the homogeneous CTMC models of three commonly used fault-tolerant system architectures. – The simplex system S consists of a single processor. – The Duplex system (D) consists of two identical processors executing the same task in parallel. – The Duplex system reconfigurable to the simplex system (DS) also consists of two processors executing the same task in parallel. (c) Copyright © 2003 by K.S. Trivedi 81 Conditional MTTF of a Fault-Tolerant System (Contd.) • We compare the three architectures with respect to the probability of unsafe failure, the mean time to failure (MTTF) of the system and the conditional MTTF to unsafe failure. • Calculating conditional MTTF Q matrix becomes • Here QTT is the partition of the generator matrix consisting of the states in T, QTA has the transition rates from states in T to states in A and similarly QTB has the transition rates from states in T to states in B. Copyright © 2003 by K.S. Trivedi 82 Conditional MTTF of a Fault-Tolerant System (Contd.) • Solving for the three architectures for different parameters we have Dependability measures for the three architectures Copyright © 2003 by K.S. Trivedi 83 Real Time System: Multiprocessor Revisited Copyright © 2003 by K.S. Trivedi 84 Multiprocessor Revisited • We return to the multiprocessor model earlier discussed but we now consider system failure state ‘0’ as absorbing. • Since task arrivals occur at the rate λ and task service time is EXP(µ), when the reliability model is in state 2, the performance can be modeled by an M/M/2/b queue. Copyright © 2003 by K.S. Trivedi 85 Multiprocessor Revisited (Contd.) • We make the following reward rate assignment to the states: • With this reward assignment, computing the expected accumulated reward until absorption, we can obtain the approximate number of tasks successfully completed until system failure: given in the textbook. Copyright © 2003 by K.S. Trivedi 86 Multiprocessor Revisited (Contd.) • Now we consider a hard deadline, instead of soft deadline so that if an accepted job fails to complete within the deadline, we will consider the system to have failed. • Using the τ method, we can compute the values of τ2 and τ1 for the CTMC and the system MTTF that includes the effect of dynamic failures. Copyright © 2003 by K.S. Trivedi 87 ...
View Full Document

This note was uploaded on 04/08/2010 for the course COMPUTER E 409232 taught by Professor Mohammadabdolahiazgomiph.d during the Spring '10 term at Islamic University.

Ask a homework question - tutors are online