chap8_p2 - Probability and Statistics with Reliability,...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Probability and Statistics with Reliability, Queuing and Computer Science Applications Second edition by K.S. Trivedi Publisher-John Wiley & Sons Chapter 8 (Part 2) :Continuous Time Markov Chain Availability Modeling Dept. of Electrical & Computer engineering Duke University Email:kst@ee.duke.edu URL: www.ee.duke.edu/~kst Copyright © 2003 by K.S. Trivedi 1 2-State Markov Availability Model λ UP 1 1 DN 0 µ λ 1 µ = MTTF = MTTR 1) Steady-state balance equations for each state: – Rate of flow IN = rate of flow OUT • State1: µ π 0 = λ π1 • State0: λ π1 = µ π 0 2 unknowns, 2 equations, but there is only one independent equation. Copyright © 2003 by K.S. Trivedi 2 2-State Markov Availability Model (Contd.) • Need an additional equation: π 0 + π 1 = 1 1 λ ⇒ π1 + π1 = 1 ⇒ π1 = λ µ 1+ µ µ 1 1λ MTTF = = = π 1 = Ass = λ λ + µ 1 λ + 1 µ MTTF + MTTR 1+ µ MTTR 1 − Ass == MTTF + MTTR • Downtime in minutes per year = MTTR MTTF + MTTR * 8760*60 Ass = 0.99999 ⇒ 1 − Ass = 10 −5 ⇒ DTMY = 5.356 min Copyright © 2003 by K.S. Trivedi 3 2-State Markov Availability Model (Contd.) 2) Transient Availability for each state: – Rate of buildup = rate of flow IN - rate of flow OUT dπ 1 = µ π 0 (t ) − λ π 1 (t ) dt dπ1 =µ(1−π1(t))−λπ1(t) since π 0 (t ) + π 1 (t ) = 1 we have dt dπ 1 + ( µ + λ ) π 1 (t ) = µ dt This equation can be solved to obtain assuming π1(0)=1 π 1 (t ) = A(t ) = µ λ+µ + λ λ+µ e −( λ + µ )t Copyright © 2003 by K.S. Trivedi 4 2-State Markov Availability Model (Contd.) 3) R(t ) = e − λt 4) Steady State Availability: lim A(t ) = Ass = t →∞ Copyright © 2003 by K.S. Trivedi µ λ+µ 5 DTMC vs. CTMC • Many books on fault tolerant or dependable computing unnecessarily restrict themselves so as to view a CTMC through the limited prism of a DTMC like in this state diagram. λ∆t 1-λ∆t 0 µ∆t 1-µ∆t 1 • Instead, by using the rich theory of CTMC directly, we can gain efficiency in expression and solution. Copyright © 2003 by K.S. Trivedi 6 Example-Defective Distribution • We now consider task-oriented measures for the two-state availability model. • Consider a task that needs x amount of time to execute in absence of failures. Let T(x) be the completion time of the task. First consider λ = 0 so that there are no failures. • In this case T1(x) = x, and the distribution function of T1(x) is the unit step function at x • Next consider a nonzero value of λ but set µ = 0. Assuming that the server is up when the task arrives, the task will complete at time x provided the server does not fail in the interval (0, x). • Otherwise, the task will never complete, hence Copyright © 2003 by K.S. Trivedi 7 Example-Defective Distribution (contd.) • T2(x) is a defective random variable with a defect at infinity equal to 1-e-λx, the probability that a task will never finish. Copyright © 2003 by K.S. Trivedi 8 Example-Defective Distribution (contd.) • Third case which is relatively complex has been analyzed • If a server failure occurs before the task is completed, we need to consider two separate cases. – If the work done so far is not lost so that when the server repair is completed, the task resumes from where it was interrupted, we have the preemptive resume (prs) case. – Otherwise we have the preemptive repeat (prt) case. • The LST of completion time distributions of these two cases is given by Copyright © 2003 by K.S. Trivedi 9 Two component system: Markov availability model • Assume we have a two-component parallel redundant system with repair rate µ. • Assume that the failure rate of both the components is λ. • When both the components have failed, the system is considered to have failed. Copyright © 2003 by K.S. Trivedi 10 Markov availability model (Contd.) • Let the number of properly functioning components be the state of the system. The state space is {0,1,2} where 0 is the system down state. • We wish to examine effects of shared vs. non-shared repair. Copyright © 2003 by K.S. Trivedi 11 Markov availability model (Contd.) 2λ 2 λ 1 0 µ 2µ 2λ Non-shared (independent) repair λ 2 1 µ 0 Shared repair µ Copyright © 2003 by K.S. Trivedi 12 Markov availability model (Contd.) • Note: Non-shared case can be modeled & solved using a RBD or a FTREE but shared case needs the use of Markov chains. Copyright © 2003 by K.S. Trivedi 13 Steady-state balance equations • For any state: Rate of flow in = Rate of flow out • Consider the shared case 2 λ π 2 = µπ 1 (λ + µ )π 1 = 2λπ 2 + µπ 0 λ π 1 = µπ 0 • πi : steady state probability that system is in state i Copyright © 2003 by K.S. Trivedi 14 Steady-state balance equations (Contd.) • Hence • Since µ µ π 2 = π1 π1 = π 0 λ 2λ π 0 + π1 + π 2 = 1 • We have π + µ π + µ µ π = 1 0 0 0 or λ π0 = 1 λ 2λ µ µ2 1+ + 2 λ 2λ Copyright © 2003 by K.S. Trivedi 15 Steady-state balance equations (Contd.) • Steady-state unavailability = π0= 1 - Ashared • Similarly for non-shared case, • Steady-state unavailability = 1 - Anon-shared 1 − Anon − shared = 1 2µ µ +2 1+ λλ 2 • Downtime in minutes per year = (1 - A)* 8760*60 Copyright © 2003 by K.S. Trivedi 16 Steady-state balance equations Copyright © 2003 by K.S. Trivedi 17 WFS Example Copyright © 2003 by K.S. Trivedi 18 A Workstations-Fileserver Example • Computing system consisting of: – A file-server – Two workstations – Computing network connecting them • System operational as long as: – One of the Workstations and – The file-server are operational • Computer network is assumed to be fault-free. Copyright © 2003 by K.S. Trivedi 19 The WFS Example Copyright © 2003 by K.S. Trivedi 20 Markov Chain for WFS Example • Assuming exponentially distributed times to failure – λw : failure rate of workstation – λf : failure rate of file-server • Assume that components are repairable – µw: repair rate of workstation – µf: repair rate of file-server • File-server has (preemptive) priority for repair over workstations (such repair priority cannot be captured by non-state-space models) Copyright © 2003 by K.S. Trivedi 21 Markov Availability Model for WFS λ 2λ w w 2,1 1,1 0,1 µw µf µw λf µf λf 2λw 2,0 µf λf λw 1,0 0,0 Since each state is reachable from every other state, the CTMC is irreducible. Furthermore, all states are positive recurrent (since it is a finite state CTMC). Copyright © 2003 by K.S. Trivedi 22 Markov Availability Model for WFS (Contd.) • In the previous figure, the label (i,j) of each state is interpreted as follows: i represents the number of workstations that are still functioning and j is ‘1’ or ‘0’ depending on whether the file-server is up or down respectively. • Note that in the text, no component failures are allowed from system failure states; this is commonly assumed by many engineers in practice. Here we allow component failures from system failure states to show that this situation can also be modeled. Copyright © 2003 by K.S. Trivedi 23 Markov Model • Let {X(t), t > 0} represent a finite-state Continuous Time Markov Chain (CTMC) with state space Ω. • Infinitesimal Generator Matrix Q = [qij]: • qij (i ≠ j) : transition rate from state i to state j • qii = - qi= − ∑ qij , the diagonal element j ≠i Copyright © 2003 by K.S. Trivedi 24 Markov Availability Model for WFS (Contd.) • For the example problem, with the states ordered as (2,1), (2,0), (1,1), (1,0), (0,1), (0,0) the Q matrix is given by: Q= λf 2λw 0 0 0 − (λ f + 2λw ) µf 0 2λw 0 0 − ( µ f + 2λw ) µw 0 λf λw 0 − ( µ w + λ f + λw ) 0 0 µf 0 λw − ( µ f + λw ) 0 0 µw 0 − (µ w + λ f ) λ f 0 0 0 0 µf −µf Copyright © 2003 by K.S. Trivedi 25 Markov Model (steady-state) π : Steady-state probability vector πQ = 0, ∑π i∈Ω i =1 π = (π ( 2,1) , π ( 2, 0) , π (1,1) , π (1,0) , π ( 0,1) , π ( 0,0 ) ) These are called steady-state balance equations Rate of flow in = Rate of flow out after solving for π , obtain Steady-state availability ASS = π ( 2,1) + π (1,1) Copyright © 2003 by K.S. Trivedi 26 Markov Model (transient) • π(t): transient state probability vector • π(0): initial probability vector of the CTMC • Transient behavior described by the Kolmogorov differential equation (KDE): d π (t ) = π (t )Q, dt given π (0) Copyright © 2003 by K.S. Trivedi 27 Markov Availability Model • We compute the availability of the system:System is available as long as it is in states (2,1) and (1,1). • Instantaneous availability of the system: A(t ) = π ( 2,1) (t ) + π (1,1) (t ) lim A(t ) = Ass t →∞ Copyright © 2003 by K.S. Trivedi 28 Markov Availability Model (Contd.) t Define L(t ) = ∫ π (u )du o • L(i,j)(t): Expected Total Time Spent in State (i,j) during (0,t) • Integrating the KDE, we get the LTODE: d L(t ) = L(t )Q + π (0) , dt L ( 0) = 0 • Interval availability AI (t ) = L( 2,1) (t ) + L(1,1) (t ) t Copyright © 2003 by K.S. Trivedi 29 Availability (Contd.) • Interval Availability: ∫ t A ( x ) dx Expected uptime in ( 0 , t ] AI (t ) = = t t • Steady-State Availability: 0 ASS = lim A(t ) = lim AI (t ) t →∞ t →∞ • There are three kinds of Availabilities! – Instantaneous, Interval & Steady-state Copyright © 2003 by K.S. Trivedi 30 Model made in SHARPE GUI Copyright © 2003 by K.S. Trivedi 31 Analysis Frame Copyright © 2003 by K.S. Trivedi 32 Code (textual) generated by SHARPE GUI • • • • • • • • • • • • • • • • • • • • • • • • format 8 factor on markov M1(lamW, lamF, muF, muW) 2_1 1_1 2*lamW 2_1 2_0 lamF 1_1 0_1 lamW 1_1 1_0 lamF 1_1 2_1 muW 0_1 1_1 muW 0_1 0_0 lamF 2_0 2_1 muF 2_0 1_0 2*lamW 1_0 1_1 muF 1_0 0_0 lamW 0_0 0_1 muF * Reward configuration defined: reward 2_1 rew_M1_2_1 1_1 rew_M1_1_1 0_1 rew_M1_0_1 2_0 rew_M1_2_0 1_0 rew_M1_1_0 0_0 rew_M1_0_0 end • • • • • • • • * Initial Probabilities defined: 2_1 init_M1_2_1 1_1 init_M1_1_1 0_1 init_M1_0_1 2_0 init_M1_2_0 1_0 init_M1_1_0 0_0 init_M1_0_0 end • echo ********* Outputs asked for the model: M1 ************** * UP configuration: up1 bind rew_M1_2_1 1 rew_M1_1_1 1 rew_M1_0_1 0 rew_M1_2_0 0 rew_M1_1_0 0 rew_M1_0_0 0 end bind lamW 0.0003 bind lamF 0.0001 bind muF 1.0 bind muW 1.0 echo Input parameters values: lamW= 0.0003, lamF=0.0001, muF=1.0, muW=1.0 echo Output: var SS_Avail exrss(M1; lamW, lamF, muF, muW) echo Steady_State Availability for M1 expr SS_Avail • • • • • • • • • • • • • • • • • • Copyright © 2003 by K.S. Trivedi 33 Code (textual) generated by SHARPE GUI (contd.) • • • • • • • • • • • • • • • • • • • • • • • • • • • * DOWN configuration: up1 bind rew_M1_0_1 525600 rew_M1_2_0 525600 rew_M1_1_0 525600 rew_M1_0_0 525600 rew_M1_2_1 0 rew_M1_1_1 0 end bind lamW 0.0003 bind lamF 0.0001 bind muF 1.0 bind muW 1.0 var Downtime exrss(M1; lamW, lamF, muF, muW) expr Downtime * UP configuration: up1 bind rew_M1_2_1 1 rew_M1_1_1 1 rew_M1_0_1 0 rew_M1_2_0 0 rew_M1_1_0 0 rew_M1_0_0 0 end * Initial Probability: intit1 bind • • • • • • • • • • • • • • • • init_M1_1_0 0 init_M1_0_1 0 init_M1_0_0 0 init_M1_2_1 1 init_M1_2_0 0 init_M1_1_1 0 end bind lamW 0.0003 bind lamF 0.0001 bind muF 1.0 bind muW 1.0 func Transient_Availability(t) exrt(t ;M1; lamW, lamF, muF, muW) loop t,1,100,10 expr Transient_Availability(t) end end Copyright © 2003 by K.S. Trivedi 34 Output Generated by SHARPE Copyright © 2003 by K.S. Trivedi 35 Markov Availability Model Results λw = 0.0001 hr −1 , λ f = 0.00005 hr −1 , µ w = 1.0 hr −1 , µ f = 0.5 hr −1 Ass = 0.9999 Copyright © 2003 by K.S. Trivedi 36 Markov Reward Model: WFS Example • For the WFS example, assign reward rates as follows: r(2,1) = 1, r(1,1) = 1, r(0,1) = 0, r(2,0) = 0 and r(1,0) = 0, r(0,0) = 0 • Then, Instantaneous availability of the system: A(t ) = E[ Z (t )] = π ( 2,1) (t ) + π (1,1) (t ) Copyright © 2003 by K.S. Trivedi 37 Markov Reward Model: WFS Example (Contd.) • Interval availability: L( 2,1) (t ) + L(1,1) (t ) 1 AI (t ) = E[Y (t )] = t t • Steady-state availability: Ass = E[ Z ] = π ( 2,1) + π (1,1) Copyright © 2003 by K.S. Trivedi 38 Condition based maintenance • Preventive maintenance useful where the device time to failure distribution has an increasing failure rate. • We model TTF by Hypoexponential HYPO(λ1, λ2) distribution. • Time to trigger inspection is assumed to be EXP(λin ), time to carry out inspection is EXP(µin ), time to repair is EXP(µ ), the time to carry out PM is EXP(yµ ). Copyright © 2003 by K.S. Trivedi 39 Preventive Maintenance Example (contd.) • CTMC for PM Model • Writing and solving steady state eqns. • Thus • Since only (0,0) and (0,1) are up states Copyright © 2003 by K.S. Trivedi 40 • Plot of SS availability as function of MTBI=1/λin Copyright © 2003 by K.S. Trivedi 41 Model made in SHARPE GUI Copyright © 2003 by K.S. Trivedi 42 Values of variables defined Copyright © 2003 by K.S. Trivedi 43 Textual input file generated by SHARPE GUI Copyright © 2003 by K.S. Trivedi 44 Textual input file generated by SHARPE GUI Copyright © 2003 by K.S. Trivedi 45 Downtime, Steady State and Transient Availability Calculation Copyright © 2003 by K.S. Trivedi 46 Graph made in Matlab between Steady State Availability vs. MTTF Copyright © 2003 by K.S. Trivedi 47 2-component Availability model with finite Detection delay • 2-component availability model without det. delay – Steady state availability Ass = 1-π0 • Fault detection stage takes random time, EXP(δ) Copyright © 2003 by K.S. Trivedi 48 Redundant System with Finite Detection Switchover Time • After solving the Markov model, we obtain steady-state probabilities: π 2 , π 1D , π 1 , π 0 Asys = π 2 + π 1 (or + π 1D ) • Can solve in closed-form or using SHARPE Copyright © 2003 by K.S. Trivedi 49 Closed-form µ λ +δ µ2 µ2 λ +δ π 0 [1 +=1 + + λ λ +µ +δ λ (λ + µ + δ ) 2λ2 λ + µ + δ π π π π 0 1 1 E = = 1D 2 1 µ λ +δ λ λ +µ +δ E = µ 2 1 λ (λ + µ + δ ) E 1 µ2 λ +δ = 2λ2 λ + µ + δ E A = π 2 + π 1 + rπ 1 D µ2 λ +δ µ λ +δ µ2 =( 2 + )/ E +r 2λ λ + µ + δ λ λ + µ + δ λ (λ + µ + δ ) Copyright © 2003 by K.S. Trivedi 50 Redundant System with Finite Detection Switchover Time (contd.) • Steady state Unavailability (assuming state 1D is down) is given by • Downtime per minutes is given by • Equivalent failure and repair rate (see p. 439, Ex 8.11) Copyright © 2003 by K.S. Trivedi 51 Redundant System with Finite Detection Switchover Time (contd.) • Quite often state 1D is considered down if the sojourn time exceeds a threshold tth • We can deal with this via the assignment of reward rate to the state so that • Then Unavailability is given by Copyright © 2003 by K.S. Trivedi 52 Redundant System with Finite Detection Switchover Time (contd.) • Plot of D(δ), D(δ, tth), and D (for 3 state model without state 1D) as functions of 1/δ (in seconds) for 1/λ = 10, 000 h and 1/µ = 2 h. Copyright © 2003 by K.S. Trivedi 53 2-component availability model with imperfect coverage • Coverage factor = c (conditional probability that the fault is correctly handled) • ‘1C’ state is a reboot (down) state. Copyright © 2003 by K.S. Trivedi 54 2-components availability model : delay + imperfect coverage • Model has detection delay + imperfect coverage • Down states are ‘0’, ‘1C’ and ‘1D’. Copyright © 2003 by K.S. Trivedi 55 Modeling Software Faults Operating System Failure Availability model with hardware and software (OS) redundancy; operational phase; Heisenbugs Assumptions Hardware failures are permanent A repair or replacement action while OS failures are cleared by a reboot Repair or reboot takes place at rates µ and β for the hardware and OS, respectively. Copyright © 2003 by K.S. Trivedi 56 Modeling Software Faults Operating System Failure (contd.) • In state 1, both nodes and their OS are functioning properly. • In state 2, one of the nodes has a hardware failure and in state 3, both the nodes have hardware failure. • These equations can be solved, in conjunction with Copyright © 2003 by K.S. Trivedi 57 Modeling Software Faults Operating System Failure (contd.) • Steady state probabilities are given by • Solving for Steady State Availability we get Copyright © 2003 by K.S. Trivedi 58 Model made in SHARPE GUI Copyright © 2003 by K.S. Trivedi 59 Analysis Frame Copyright © 2003 by K.S. Trivedi 60 Code (textual) generated by SHARPE GUI • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • markov 2node.rgl(lambda, mu, lambdaos, beta) 1 2 2*lambda 1 4 2*lambdaos 2 3 lambda 2 1 mu 2 6 lambdaos 3 2 mu 4 1 beta 4 5 lambdaos 4 6 lambda 6 2 beta 5 4 2*beta * Reward configuration defined: reward 1 rew_2node.rgl_1 2 rew_2node.rgl_2 3 rew_2node.rgl_3 4 rew_2node.rgl_4 6 rew_2node.rgl_6 5 rew_2node.rgl_5 end end echo ********* Outputs asked for the model: 2node.rgl ************** * UP configuration: ava bind rew_2node.rgl_1 1 rew_2node.rgl_2 1 rew_2node.rgl_4 1 rew_2node.rgl_3 0 rew_2node.rgl_6 0 Copyright rew_2node.rgl_5 0 end • • • • • • • • • • • • • • • • • • • • • • • bind lambda 0.00015 bind mu 1/10 bind lambdaos 0.000014 bind beta 1/5 var SS_Avail exrss(2node.rgl; lambda, mu, lambdaos, beta) echo Steady_State Availability for 2node.rgl expr SS_Avail * DOWN configuration: ava bind rew_2node.rgl_3 525600 rew_2node.rgl_6 525600 rew_2node.rgl_5 525600 rew_2node.rgl_1 0 rew_2node.rgl_2 0 rew_2node.rgl_4 0 end bind lambda 0.00015 bind mu 1/10 bind lambdaos 0.000014 bind beta 1/5 var Downtime exrss(2node.rgl; lambda, mu, lambdaos, beta) expr Downtime end © 2003 by K.S. Trivedi 61 Output showing Downtime and SS Availability Copyright © 2003 by K.S. Trivedi 62 Webserver Availability Model with warm Replication • Two nodes for hardware redundancy • Each node has a copy of the Webserver (software redundancy– replication) • Primary node can fail • Secondary node can fail • Primary process can fail • Secondary process can fail • Failures may have imperfect coverage • Time delay for fault detection • Model of a real system developed at Avaya Labs Copyright © 2003 by K.S. Trivedi 63 Modeling Software Faults Application Failure Availability model with passive redundancy (warm replication) of application; Operational phase; Heisenbugs or hardware transients Assumptions A web server software, that fails at the rate γp running on a machine that fails at the rate γm Mean time to detect server process failure δ-1p and the mean time to detect machine failure δ-1m The mean restart time of a machine τ-1m The mean restart time of a server τ-1p Performance and Reliability Evaluation of Passive Replication Schemes in Application Level Fault-Tolerance S. Garg, Y. Huang, C. Kintala, K. S. Trivedi and S. Yagnik Proc. of the 29th Intl. Symp. On Fault-Tolerant Computing, FTCS-29, June 1999. Copyright © 2003 by K.S. Trivedi 64 Parameters • Process MTTF = 10 days (1/γp) • Node MTTF = 20 days (1/γn) • Process polling interval = 2 seconds (1/δp) • Mean process restart time = 30 seconds (1/τp) • Mean process failover time = 2 minutes (1/τn) • Switching time with mean 1/ τs • c = 0.95 Copyright © 2003 by K.S. Trivedi 65 Solution for warm replication Copyright © 2003 by K.S. Trivedi 66 Hierarchical modeling-Example • Consider the availability model of a workstation consisting of three subsystems: – A cooling subsystem with two fans, – A dual power supply subsystem and – A two-CPU processing subsystem. • The workstation is considered to be unavailable when one or more of the subsystems have failed. Copyright © 2003 by K.S. Trivedi 67 Hierarchical modeling-Example (contd.) • Solving first subsystem of Fans we have Copyright © 2003 by K.S. Trivedi 68 Hierarchical modeling-Example (contd.) • Solving second subsystem of power Supply we have Copyright © 2003 by K.S. Trivedi 69 Hierarchical modeling-Example (contd.) • Solving last subsystem of processors we have Copyright © 2003 by K.S. Trivedi 70 Hierarchical modeling-Example (contd.) • The overall availability of the system can be determined by taking the product of each individual block availability. • Thus system availability is given by Copyright © 2003 by K.S. Trivedi 71 Model made in SHAPRE GUI Copyright © 2003 by K.S. Trivedi 72 Sub-Models made Embedded in RBD Model Fan Submodel Power- Supply Submodel Processors Copyright © 2003 by K.S. Trivedi 73 Hierarchy parameters passed to main block Copyright © 2003 by K.S. Trivedi 74 Inserting Parameters for Sub-Model Copyright © 2003 by K.S. Trivedi 75 Analysis Frame Copyright © 2003 by K.S. Trivedi 76 Output generated by SHARPE Copyright © 2003 by K.S. Trivedi 77 Modeling an N+1 Protection System Copyright © 2003 by K.S. Trivedi 78 Outline • Description of the system • Using a rate approximation • Using a 3-stage Erlang approximation to a uniform distribution • Using a Semi-Markov model - approximation method using a 3-stage Erlang distribution • Using equations of the underlying SemiMarkov Process • Solutions for the models Copyright © 2003 by K.S. Trivedi 79 Description of the system • N = Number of protected units (we use N=1) • λ = Unit failure rate • µ = Unit restoration rate • T = deterministic time between routine diagnostics • c = Probability that a protection switch successfully restores service • d = Probability that a failure in the standby unit is detected Copyright © 2003 by K.S. Trivedi 80 Outline • • • • • Description of the system Using a rate approximation Using a 3-stage Erlang approximation to a uniform distribution Using a Semi-Markov model - approximation method using a 3-stage Erlang distribution Using equations of the underlying SemiMarkov Process Solutions for the models Copyright © 2003 by K.S. Trivedi 81 Hot Standby with different coverages Normal (1+1) (1-d)λ (1-c)λ µ (c+d)λ Protection Switch µ Failure λ Failure to Detect Protection Fault Simplex (1) 2µ λ λ Failed (0) Copyright © 2003 by K.S. Trivedi Normal: 1 Protection Switch Failure: 2 Simplex: 3 Failure to detect protection fault: 4 Failed: 5 82 Diagnostics; Using a rate approximation Normal (1+1) (1-d)λ (1-c)λ µ (c+d)λ Protection Switch µ Failure λ Simplex (1) 2/T Failure to Detect Protection Fault 2µ λ λ Failed (0) Time to diagnostic is exponentially distributed with mean T/2 Copyright © 2003 by K.S. Trivedi Normal: 1 Protection Switch Failure: 2 Simplex: 3 Failure to detect protection fault: 4 Failed: 5 83 Copyright © 2003 by K.S. Trivedi 84 Outline • • • • Description of the system Using a rate approximation Using a 3-stage Erlang approximation to a uniform distribution Using a Semi-Markov model approximation method using a 3-stage Erlang distribution Using equations of the underlying SemiMarkov Process Solutions for the models Copyright © 2003 by K.S. Trivedi 85 1.8 Comparison of probability density functions (pdf) 1.6 1.4 1.2 1 pdf 3-stage Erlang pdf U(0,1) pdf 0.8 0.6 0.4 0.2 time Copyright © 2003 by K.S. Trivedi 0. 9 0. 96 1. 02 0. 6 0. 66 0. 72 0. 78 0. 84 0. 3 0. 36 0. 42 0. 48 0. 54 0 0. 06 0. 12 0. 18 0. 24 0 86 1.2 Comparison of cumulative distribution functions (cdf) 1 3-stage Erlang cdf 0.6 U(0,1) cdf 0.4 0.2 0. 9 0. 96 1. 02 0. 6 0. 66 0. 72 0. 78 0. 84 0. 3 0. 36 0. 42 0. 48 0. 54 0. 06 0. 12 0. 18 0. 24 0 0 cdf 0.8 time Copyright © 2003 by K.S. Trivedi 87 Using a 3-stage Erlang approximation to a uniform distribution Normal (1+1) (1-d)λ (1-c)λ µ (c+d)λ Protection Switch Failure λ Time to diagnostic is uniformly distributed over (0,T) approximated by a 3-stage Erlang with mean T/2 Simplex (1) µ s1 s2 6/T Failure to Detect Protection Fault 2µ λ 6/T Failed (0) λ Copyright © 2003 by K.S. Trivedi λ 6/T λ 88 Copyright © 2003 by K.S. Trivedi 89 Outline Description of the system Using a rate approximation Using a 3-stage Erlang approximation to a uniform distribution • Using a Semi-Markov model approximation method using a 3-stage Erlang distribution • Using equations of the underlying SemiMarkov Process • Solutions for the models Copyright © 2003 by K.S. Trivedi 90 Using a Semi-Markov model approximation method using an Erlang distribution (N=1) E(t) -> 3-stage Erlang distribution given by, 3−1 1− ∑ Normal (1+1) (1-d)λ (1-c)λ Protection Switch µ Failure λ e µ (c+d)λ Time to diagnostic is uniformly distributed over (0,T) approximated by a 3-stage Erlang distribution with mean T/2 k =0 6 ( T t )k k! Simplex (1) E(t) Failure to Detect Protection Fault 2µ λ λ Failed (0) Copyright © 2003 by K.S. Trivedi 91 6 −T t Outline Description of the system Using a rate approximation Using a 3-stage Erlang approximation to a uniform distribution Using a Semi-Markov model approximation method using a 3-stage Erlang distribution • Using equations of the underlying SemiMarkov Process • Solutions for the models Copyright © 2003 by K.S. Trivedi 92 Using Equations of the underlying Semi-Markov Process •Steady state solution One step transition probability matrix, P of the embedded DTMC 0 0 µ λ +µ P= 0 0 1-c 2 c+d 2 1-d 2 0 λ +µ 0 0 0 0 1 (1 − e −λT ) λT 0 0 0 1 0 µ Copyright © 2003 by K.S. Trivedi λ λ +µ λ λ +µ −λT 1 1 − λT (1 − e ) 0 0 93 Using Equations of the underlying Semi-Markov Process (Contd.) Solve v = vP to obtain, v=[ , v 1− c 12 λ+µ µ 1 v, where v1 = 1− d 12 v, v1 , ( λ (1− c ) 2( λ + µ ) ++ λ µ 1− d 2 (1 − 1 λT (1 − e − λT )))v1 1 1 + 1− c + 2 λ+µ µ + 1− d + 2 λ ( 1− c ) 2 ( λ +µ ) + λ + 1− d (1 − µ 2 1 λT Copyright © 2003 by K.S. Trivedi (1 − e − λT )) 94 Using Equations of the underlying Semi-Markov Process (Contd.) •Time to the next diagnostic is uniformly distributed over (0,T) H i (t ) : CDF of the sojourn time in state i H1 (t ) = 1 − e − 2 λt , H 3 (t ) = 1 − e −( λ + µ )t H 2 (t ) = 1 − e −( λ + µ )t , , t 1 − (1 − T )e −λt , H 4 (t ) = 1, H 5 (t ) = 1 − e − 2 µt t<T t≥T Copyright © 2003 by K.S. Trivedi 95 Using Equations of the underlying Semi-Markov Process (Contd.) ∞ hi : mean sojourn time in state i = ∫ [1-H i(t)]dt 0 h1 = 1 2λ , h2 = 1 λ +µ , h3 = 1 λ +µ 1 , h4 = λ − T1 2 (1 − e −λT ), h5 = λ 1 2µ State probabilities of the SMP are given by, πi = vi hi 5 ∑v jhj j =1 Unavailability = π 2 + π 5 Copyright © 2003 by K.S. Trivedi 96 Outline Description of the system Using a rate approximation Using a 3-stage Erlang approximation to a uniform distribution Using a Semi-Markov model approximation method using a 3-stage Erlang distribution Using equations of the underlying SemiMarkov Process • Solutions for the models Copyright © 2003 by K.S. Trivedi 97 Solutions for the models Parameter values assumed: • N=1 • c = 0.9 • d = 0.9 • λ = 0.0001 / hour • µ = 1 / hour • T = 1 hour Copyright © 2003 by K.S. Trivedi 98 Results obtained • Steady state availability Probability of being in states “Normal”, “Simplex”, or “Failure to Detect Protection Fault” • Steady state unavailability Probability of being in states “Protection Switch Failure”, or “Failed (0)” • Average downtime in steady state Steady state unavailability * Number of minutes in a year • Average #units available 2*PNormal + 1*PSimplex +1*PFailure to Detect Protection Fault Copyright © 2003 by K.S. Trivedi 99 Diagnostic start time approxim ation Steady state availability Steady state unavailability Avg. downtime in steady state (Minutes/year) Avg. #units available (out of 1 + 1 spare) Exp(2/T) 9.99989992e-01 1.00075983e-05 5.25999 1.99977503e+00 3-stage Erlang with mean T/2 9.99989992e-01 1.00075983e-05 5.25999 1.99977503e+00 Semi-Markov (3-stage Erlang approx. with mean 9.99989992e-01 T/2) 1.00075983e-05 5.25999 1.99977503e+00 Semi-Markov Process equations-U([0,T]) 1.00075983e-05 5.25999 1.99977503e+00 9.99989992e-01 Copyright © 2003 by K.S. Trivedi 100 ...
View Full Document

Ask a homework question - tutors are online