This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Probability and Statistics with Reliability,
Queuing and Computer Science Applications
Second edition
by K.S. Trivedi
PublisherJohn Wiley & Sons Chapter 8 (Part 2) :Continuous Time Markov Chain
Availability Modeling
Dept. of Electrical & Computer engineering
Duke University
Email:kst@ee.duke.edu
URL: www.ee.duke.edu/~kst
Copyright © 2003 by K.S. Trivedi 1 2State Markov Availability Model
λ
UP
1 1 DN
0 µ λ
1 µ = MTTF
= MTTR 1) Steadystate balance equations for each state:
– Rate of flow IN = rate of flow OUT
• State1: µ π 0 = λ π1 • State0: λ π1 = µ π 0 2 unknowns, 2 equations, but there is only one
independent equation.
Copyright © 2003 by K.S. Trivedi 2 2State Markov Availability Model
(Contd.) • Need an additional equation: π 0 + π 1 = 1 1
λ
⇒ π1 + π1 = 1 ⇒ π1 =
λ
µ
1+
µ
µ
1
1λ
MTTF
=
=
=
π 1 = Ass =
λ λ + µ 1 λ + 1 µ MTTF + MTTR
1+
µ
MTTR
1 − Ass ==
MTTF + MTTR • Downtime in minutes per year = MTTR
MTTF + MTTR * 8760*60 Ass = 0.99999 ⇒ 1 − Ass = 10 −5 ⇒ DTMY = 5.356 min
Copyright © 2003 by K.S. Trivedi 3 2State Markov Availability Model
(Contd.) 2) Transient Availability for each state:
– Rate of buildup = rate of flow IN  rate of flow OUT dπ 1
= µ π 0 (t ) − λ π 1 (t )
dt
dπ1
=µ(1−π1(t))−λπ1(t)
since π 0 (t ) + π 1 (t ) = 1 we have
dt
dπ 1
+ ( µ + λ ) π 1 (t ) = µ
dt
This equation can be solved to obtain assuming π1(0)=1 π 1 (t ) = A(t ) = µ λ+µ + λ λ+µ e −( λ + µ )t Copyright © 2003 by K.S. Trivedi 4 2State Markov Availability Model
(Contd.) 3) R(t ) = e − λt 4) Steady State Availability:
lim A(t ) = Ass =
t →∞ Copyright © 2003 by K.S. Trivedi µ
λ+µ 5 DTMC vs. CTMC
• Many books on fault tolerant or dependable
computing unnecessarily restrict themselves so as
to view a CTMC through the limited prism of a
DTMC like in this state diagram.
λ∆t 1λ∆t
0 µ∆t 1µ∆t
1 • Instead, by using the rich theory of CTMC
directly, we can gain efficiency in expression and
solution.
Copyright © 2003 by K.S. Trivedi 6 ExampleDefective Distribution
• We now consider taskoriented measures for the twostate
availability model.
• Consider a task that needs x amount of time to execute in
absence of failures. Let T(x) be the completion time of the task.
First consider λ = 0 so that there are no failures.
• In this case T1(x) = x, and the distribution function of T1(x) is the
unit step function at x
• Next consider a nonzero value of λ but set µ = 0. Assuming that
the server is up when the task arrives, the task will complete at
time x provided the server does not fail in the interval (0, x).
• Otherwise, the task will never complete, hence
Copyright © 2003 by K.S. Trivedi 7 ExampleDefective Distribution
(contd.)
• T2(x) is a defective random variable with a defect at
infinity equal to 1eλx, the probability that a task will
never finish. Copyright © 2003 by K.S. Trivedi 8 ExampleDefective Distribution
(contd.)
• Third case which is relatively complex has been analyzed
• If a server failure occurs before the task is completed, we
need to consider two separate cases.
– If the work done so far is not lost so that when the server
repair is completed, the task resumes from where it was
interrupted, we have the preemptive resume (prs) case.
– Otherwise we have the preemptive repeat (prt) case.
• The LST of completion time distributions of these two cases
is given by Copyright © 2003 by K.S. Trivedi 9 Two component system: Markov
availability model
• Assume we have a twocomponent parallel
redundant system with repair rate µ.
• Assume that the failure rate of both the
components is λ.
• When both the components have failed, the system
is considered to have failed.
Copyright © 2003 by K.S. Trivedi 10 Markov availability model (Contd.)
• Let the number of properly functioning
components be the state of the system.
The state space is {0,1,2} where 0 is the
system down state.
• We wish to examine effects of shared vs.
nonshared repair.
Copyright © 2003 by K.S. Trivedi 11 Markov availability model
(Contd.) 2λ
2 λ
1 0 µ 2µ 2λ Nonshared (independent)
repair λ 2 1 µ 0 Shared repair µ
Copyright © 2003 by K.S. Trivedi 12 Markov availability model (Contd.)
• Note: Nonshared case can be modeled & solved
using a RBD or a FTREE but shared case needs the
use of Markov chains. Copyright © 2003 by K.S. Trivedi 13 Steadystate balance equations
• For any state:
Rate of flow in = Rate of flow out
• Consider the shared case 2 λ π 2 = µπ 1
(λ + µ )π 1 = 2λπ 2 + µπ 0 λ π 1 = µπ 0 • πi : steady state probability that system is in state i
Copyright © 2003 by K.S. Trivedi 14 Steadystate balance equations
(Contd.) • Hence
• Since µ
µ
π 2 = π1 π1 = π 0
λ
2λ
π 0 + π1 + π 2 = 1 • We have π + µ π + µ µ π = 1 0
0
0
or λ π0 = 1 λ 2λ µ µ2
1+ + 2
λ 2λ
Copyright © 2003 by K.S. Trivedi 15 Steadystate balance equations
(Contd.) • Steadystate unavailability = π0= 1  Ashared
• Similarly for nonshared case,
• Steadystate unavailability = 1  Anonshared 1 − Anon − shared = 1
2µ µ
+2
1+
λλ 2 • Downtime in minutes per year = (1  A)* 8760*60
Copyright © 2003 by K.S. Trivedi 16 Steadystate balance equations Copyright © 2003 by K.S. Trivedi 17 WFS Example Copyright © 2003 by K.S. Trivedi 18 A WorkstationsFileserver Example
• Computing system consisting of:
– A fileserver
– Two workstations
– Computing network connecting them • System operational as long as:
– One of the Workstations
and
– The fileserver are operational • Computer network is assumed to be faultfree.
Copyright © 2003 by K.S. Trivedi 19 The WFS Example Copyright © 2003 by K.S. Trivedi 20 Markov Chain for WFS Example
• Assuming exponentially distributed times to
failure
– λw : failure rate of workstation
– λf : failure rate of fileserver • Assume that components are repairable
– µw: repair rate of workstation
– µf: repair rate of fileserver • Fileserver has (preemptive) priority for repair
over workstations (such repair priority cannot
be captured by nonstatespace models)
Copyright © 2003 by K.S. Trivedi 21 Markov Availability Model for
WFS λ
2λ
w w 2,1 1,1 0,1 µw
µf µw λf µf λf 2λw
2,0 µf λf λw
1,0 0,0 Since each state is reachable from every other state, the
CTMC is irreducible. Furthermore, all states are positive
recurrent (since it is a finite state CTMC).
Copyright © 2003 by K.S. Trivedi 22 Markov Availability Model for WFS
(Contd.) • In the previous figure, the label (i,j) of each state is
interpreted as follows: i represents the number of
workstations that are still functioning and j is ‘1’ or ‘0’
depending on whether the fileserver is up or down
respectively.
• Note that in the text, no component failures are allowed
from system failure states; this is commonly assumed by
many engineers in practice. Here we allow component
failures from system failure states to show that this situation
can also be modeled.
Copyright © 2003 by K.S. Trivedi 23 Markov Model
• Let {X(t), t > 0} represent a finitestate Continuous
Time Markov Chain (CTMC) with state space Ω.
• Infinitesimal Generator Matrix Q = [qij]:
• qij (i ≠ j) : transition rate from state i to state j
• qii =  qi= − ∑ qij , the diagonal element
j ≠i Copyright © 2003 by K.S. Trivedi 24 Markov Availability Model for
WFS (Contd.)
• For the example problem, with the states ordered
as (2,1), (2,0), (1,1), (1,0), (0,1), (0,0) the Q
matrix is given by: Q= λf
2λw
0
0
0
− (λ f + 2λw ) µf
0
2λw
0
0
− ( µ f + 2λw ) µw
0
λf
λw
0
− ( µ w + λ f + λw ) 0
0
µf
0
λw − ( µ f + λw ) 0
0
µw
0
− (µ w + λ f ) λ f 0
0
0
0
µf
−µf Copyright © 2003 by K.S. Trivedi 25 Markov Model (steadystate)
π : Steadystate probability vector πQ = 0, ∑π
i∈Ω i =1 π = (π ( 2,1) , π ( 2, 0) , π (1,1) , π (1,0) , π ( 0,1) , π ( 0,0 ) )
These are called steadystate balance equations
Rate of flow in = Rate of flow out
after solving for π , obtain Steadystate availability ASS = π ( 2,1) + π (1,1)
Copyright © 2003 by K.S. Trivedi 26 Markov Model (transient)
• π(t): transient state probability vector
• π(0): initial probability vector of the CTMC
• Transient behavior described by the
Kolmogorov differential equation (KDE):
d
π (t ) = π (t )Q,
dt given π (0) Copyright © 2003 by K.S. Trivedi 27 Markov Availability Model
• We compute the availability of the system:System is
available as long as it is in states (2,1) and (1,1).
• Instantaneous availability of the system: A(t ) = π ( 2,1) (t ) + π (1,1) (t )
lim A(t ) = Ass
t →∞
Copyright © 2003 by K.S. Trivedi 28 Markov Availability Model (Contd.)
t Define L(t ) = ∫ π (u )du
o • L(i,j)(t): Expected Total Time Spent in State (i,j) during
(0,t)
• Integrating the KDE, we get the LTODE: d
L(t ) = L(t )Q + π (0) ,
dt L ( 0) = 0 • Interval availability AI (t ) = L( 2,1) (t ) + L(1,1) (t )
t
Copyright © 2003 by K.S. Trivedi 29 Availability (Contd.)
• Interval Availability: ∫ t A ( x ) dx Expected uptime in ( 0 , t ]
AI (t ) =
=
t
t
• SteadyState Availability:
0 ASS = lim A(t ) = lim AI (t )
t →∞ t →∞ • There are three kinds of Availabilities!
– Instantaneous, Interval & Steadystate
Copyright © 2003 by K.S. Trivedi 30 Model made in SHARPE GUI Copyright © 2003 by K.S. Trivedi 31 Analysis Frame Copyright © 2003 by K.S. Trivedi 32 Code (textual) generated by SHARPE GUI
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• format 8
factor on
markov M1(lamW, lamF, muF, muW)
2_1 1_1 2*lamW
2_1 2_0 lamF
1_1 0_1 lamW
1_1 1_0 lamF
1_1 2_1 muW
0_1 1_1 muW
0_1 0_0 lamF
2_0 2_1 muF
2_0 1_0 2*lamW
1_0 1_1 muF
1_0 0_0 lamW
0_0 0_1 muF
* Reward configuration defined:
reward
2_1 rew_M1_2_1
1_1 rew_M1_1_1
0_1 rew_M1_0_1
2_0 rew_M1_2_0
1_0 rew_M1_1_0
0_0 rew_M1_0_0
end •
•
•
•
•
•
•
• * Initial Probabilities defined:
2_1 init_M1_2_1
1_1 init_M1_1_1
0_1 init_M1_0_1
2_0 init_M1_2_0
1_0 init_M1_1_0
0_0 init_M1_0_0
end • echo ********* Outputs asked for the model: M1
**************
* UP configuration: up1
bind
rew_M1_2_1 1
rew_M1_1_1 1
rew_M1_0_1 0
rew_M1_2_0 0
rew_M1_1_0 0
rew_M1_0_0 0
end
bind lamW 0.0003
bind lamF 0.0001
bind muF 1.0
bind muW 1.0
echo Input parameters values: lamW= 0.0003,
lamF=0.0001, muF=1.0, muW=1.0
echo Output:
var SS_Avail exrss(M1; lamW, lamF, muF, muW)
echo Steady_State Availability for M1
expr SS_Avail •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• Copyright © 2003 by K.S. Trivedi 33 Code (textual) generated by SHARPE GUI
(contd.)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• * DOWN configuration: up1
bind
rew_M1_0_1 525600
rew_M1_2_0 525600
rew_M1_1_0 525600
rew_M1_0_0 525600
rew_M1_2_1 0
rew_M1_1_1 0
end
bind lamW 0.0003
bind lamF 0.0001
bind muF 1.0
bind muW 1.0
var Downtime exrss(M1; lamW, lamF, muF,
muW)
expr Downtime
* UP configuration: up1
bind
rew_M1_2_1 1
rew_M1_1_1 1
rew_M1_0_1 0
rew_M1_2_0 0
rew_M1_1_0 0
rew_M1_0_0 0
end
* Initial Probability: intit1
bind •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• init_M1_1_0 0
init_M1_0_1 0
init_M1_0_0 0
init_M1_2_1 1
init_M1_2_0 0
init_M1_1_1 0
end
bind lamW 0.0003
bind lamF 0.0001
bind muF 1.0
bind muW 1.0
func Transient_Availability(t) exrt(t ;M1;
lamW, lamF, muF, muW)
loop t,1,100,10
expr Transient_Availability(t)
end
end Copyright © 2003 by K.S. Trivedi 34 Output Generated by SHARPE Copyright © 2003 by K.S. Trivedi 35 Markov Availability Model Results λw = 0.0001 hr −1 , λ f = 0.00005 hr −1 , µ w = 1.0 hr −1 , µ f = 0.5 hr −1 Ass = 0.9999 Copyright © 2003 by K.S. Trivedi 36 Markov Reward Model:
WFS Example
• For the WFS example, assign reward rates as
follows:
r(2,1) = 1, r(1,1) = 1, r(0,1) = 0, r(2,0) = 0 and
r(1,0) = 0, r(0,0) = 0
• Then, Instantaneous availability of the system: A(t ) = E[ Z (t )] = π ( 2,1) (t ) + π (1,1) (t )
Copyright © 2003 by K.S. Trivedi 37 Markov Reward Model:
WFS Example (Contd.)
• Interval availability: L( 2,1) (t ) + L(1,1) (t )
1
AI (t ) = E[Y (t )] =
t
t • Steadystate availability: Ass = E[ Z ] = π ( 2,1) + π (1,1) Copyright © 2003 by K.S. Trivedi 38 Condition based maintenance
• Preventive maintenance useful where the device time
to failure distribution has an increasing failure rate.
• We model TTF by Hypoexponential HYPO(λ1, λ2)
distribution.
• Time to trigger inspection is assumed to be EXP(λin ),
time to carry out inspection is EXP(µin ), time to
repair is EXP(µ ), the time to carry out PM is
EXP(yµ ).
Copyright © 2003 by K.S. Trivedi 39 Preventive Maintenance Example
(contd.) • CTMC for PM Model
• Writing and solving steady state eqns. • Thus
• Since only (0,0) and (0,1) are up states Copyright © 2003 by K.S. Trivedi 40 • Plot of SS availability as function of
MTBI=1/λin Copyright © 2003 by K.S. Trivedi 41 Model made in SHARPE GUI Copyright © 2003 by K.S. Trivedi 42 Values of variables defined Copyright © 2003 by K.S. Trivedi 43 Textual input file generated by SHARPE GUI Copyright © 2003 by K.S. Trivedi 44 Textual input file generated by SHARPE GUI Copyright © 2003 by K.S. Trivedi 45 Downtime, Steady State and Transient
Availability Calculation Copyright © 2003 by K.S. Trivedi 46 Graph made in Matlab between Steady
State Availability vs. MTTF Copyright © 2003 by K.S. Trivedi 47 2component Availability model with
finite Detection delay
• 2component availability model without det.
delay – Steady state availability Ass = 1π0 • Fault detection stage takes random time,
EXP(δ) Copyright © 2003 by K.S. Trivedi 48 Redundant System with Finite
Detection Switchover Time
• After solving the Markov model, we obtain
steadystate probabilities: π 2 , π 1D , π 1 , π 0
Asys = π 2 + π 1 (or + π 1D )
• Can solve in closedform or using SHARPE
Copyright © 2003 by K.S. Trivedi 49 Closedform
µ
λ +δ
µ2
µ2
λ +δ
π 0 [1 +=1
+
+
λ λ +µ +δ
λ (λ + µ + δ )
2λ2 λ + µ + δ
π
π
π
π 0 1 1
E =
= 1D 2 1
µ
λ +δ
λ λ +µ +δ E
= µ 2 1
λ (λ + µ + δ ) E 1
µ2
λ +δ
=
2λ2 λ + µ + δ E A = π 2 + π 1 + rπ 1 D µ2 λ +δ
µ λ +δ
µ2
=( 2
+
)/ E
+r
2λ λ + µ + δ λ λ + µ + δ
λ (λ + µ + δ )
Copyright © 2003 by K.S. Trivedi 50 Redundant System with Finite
Detection Switchover Time (contd.)
• Steady state Unavailability (assuming state 1D is down)
is given by
• Downtime per minutes is given by
• Equivalent failure and repair rate (see p. 439, Ex 8.11) Copyright © 2003 by K.S. Trivedi 51 Redundant System with Finite
Detection Switchover Time (contd.)
• Quite often state 1D is considered down if the sojourn
time exceeds a threshold tth
• We can deal with this via the assignment of reward rate
to the state so that
• Then Unavailability is given by Copyright © 2003 by K.S. Trivedi 52 Redundant System with Finite Detection
Switchover Time (contd.)
• Plot of D(δ), D(δ, tth), and D (for 3 state model
without state 1D) as functions of 1/δ (in seconds) for
1/λ = 10, 000 h and 1/µ = 2 h. Copyright © 2003 by K.S. Trivedi 53 2component availability model
with imperfect coverage
• Coverage factor = c (conditional probability
that the fault is correctly handled) • ‘1C’ state is a reboot (down) state.
Copyright © 2003 by K.S. Trivedi 54 2components availability model
: delay + imperfect coverage
• Model has detection delay + imperfect
coverage
• Down states are ‘0’, ‘1C’ and ‘1D’. Copyright © 2003 by K.S. Trivedi 55 Modeling Software Faults
Operating System Failure
Availability model with hardware and software (OS)
redundancy; operational phase; Heisenbugs Assumptions
Hardware failures are
permanent
A repair or replacement
action while OS failures are
cleared by a reboot
Repair or reboot takes place
at rates µ and β for the
hardware and OS,
respectively.
Copyright © 2003 by K.S. Trivedi 56 Modeling Software Faults
Operating System Failure (contd.)
• In state 1, both nodes and their OS are functioning
properly.
• In state 2, one of the nodes has a hardware failure and in
state 3, both the nodes have hardware failure. • These equations can be solved, in conjunction with
Copyright © 2003 by K.S. Trivedi 57 Modeling Software Faults
Operating System Failure (contd.)
• Steady state probabilities are given by • Solving for Steady State Availability we get Copyright © 2003 by K.S. Trivedi 58 Model made in SHARPE GUI Copyright © 2003 by K.S. Trivedi 59 Analysis Frame Copyright © 2003 by K.S. Trivedi 60 Code (textual) generated by SHARPE GUI
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• markov 2node.rgl(lambda, mu, lambdaos, beta)
1 2 2*lambda
1 4 2*lambdaos
2 3 lambda
2 1 mu
2 6 lambdaos
3 2 mu
4 1 beta
4 5 lambdaos
4 6 lambda
6 2 beta
5 4 2*beta
* Reward configuration defined:
reward
1 rew_2node.rgl_1
2 rew_2node.rgl_2
3 rew_2node.rgl_3
4 rew_2node.rgl_4
6 rew_2node.rgl_6
5 rew_2node.rgl_5
end
end
echo ********* Outputs asked for the model:
2node.rgl **************
* UP configuration: ava
bind
rew_2node.rgl_1 1
rew_2node.rgl_2 1
rew_2node.rgl_4 1
rew_2node.rgl_3 0
rew_2node.rgl_6 0
Copyright
rew_2node.rgl_5 0
end •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• bind lambda 0.00015
bind mu 1/10
bind lambdaos 0.000014
bind beta 1/5
var SS_Avail exrss(2node.rgl; lambda, mu, lambdaos,
beta)
echo Steady_State Availability for 2node.rgl
expr SS_Avail
* DOWN configuration: ava
bind
rew_2node.rgl_3 525600
rew_2node.rgl_6 525600
rew_2node.rgl_5 525600
rew_2node.rgl_1 0
rew_2node.rgl_2 0
rew_2node.rgl_4 0
end
bind lambda 0.00015
bind mu 1/10
bind lambdaos 0.000014
bind beta 1/5
var Downtime exrss(2node.rgl; lambda, mu, lambdaos,
beta)
expr Downtime
end © 2003 by K.S. Trivedi 61 Output showing Downtime and SS
Availability Copyright © 2003 by K.S. Trivedi 62 Webserver Availability Model
with warm Replication
• Two nodes for hardware redundancy
• Each node has a copy of the Webserver (software
redundancy– replication)
• Primary node can fail
• Secondary node can fail
• Primary process can fail
• Secondary process can fail
• Failures may have imperfect coverage
• Time delay for fault detection
• Model of a real system developed at Avaya Labs
Copyright © 2003 by K.S. Trivedi 63 Modeling Software Faults
Application Failure
Availability model with passive redundancy
(warm replication) of application; Operational phase;
Heisenbugs or hardware transients Assumptions
A web server software,
that fails at the rate γp
running on a machine
that fails at the rate γm
Mean time to detect
server process failure
δ1p and the mean time
to detect machine
failure δ1m
The mean restart time
of a machine τ1m
The mean restart time
of a server τ1p Performance and Reliability Evaluation of Passive Replication Schemes in Application Level FaultTolerance
S. Garg, Y. Huang, C. Kintala, K. S. Trivedi and S. Yagnik
Proc. of the 29th Intl. Symp. On FaultTolerant Computing, FTCS29, June 1999. Copyright © 2003 by K.S. Trivedi 64 Parameters
• Process MTTF = 10 days (1/γp)
• Node MTTF = 20 days (1/γn)
• Process polling interval = 2 seconds (1/δp)
• Mean process restart time = 30 seconds (1/τp)
• Mean process failover time = 2 minutes (1/τn)
• Switching time with mean 1/ τs
• c = 0.95
Copyright © 2003 by K.S. Trivedi 65 Solution for warm replication Copyright © 2003 by K.S. Trivedi 66 Hierarchical modelingExample
• Consider the availability model of a workstation consisting
of three subsystems:
– A cooling subsystem with two fans,
– A dual power supply subsystem and
– A twoCPU processing subsystem.
• The workstation is considered to be unavailable when one
or more of the subsystems have failed. Copyright © 2003 by K.S. Trivedi 67 Hierarchical modelingExample
(contd.)
• Solving first subsystem of Fans we have Copyright © 2003 by K.S. Trivedi 68 Hierarchical modelingExample
(contd.)
• Solving second subsystem of power Supply
we have Copyright © 2003 by K.S. Trivedi 69 Hierarchical modelingExample
(contd.)
• Solving last subsystem of processors we
have Copyright © 2003 by K.S. Trivedi 70 Hierarchical modelingExample
(contd.)
• The overall availability of the system can be
determined by taking the product of each
individual block availability.
• Thus system availability is given by Copyright © 2003 by K.S. Trivedi 71 Model made in SHAPRE GUI Copyright © 2003 by K.S. Trivedi 72 SubModels made Embedded in RBD
Model
Fan Submodel Power Supply Submodel Processors
Copyright © 2003 by K.S. Trivedi 73 Hierarchy parameters passed to main block Copyright © 2003 by K.S. Trivedi 74 Inserting Parameters for SubModel Copyright © 2003 by K.S. Trivedi 75 Analysis Frame Copyright © 2003 by K.S. Trivedi 76 Output generated by SHARPE Copyright © 2003 by K.S. Trivedi 77 Modeling an N+1 Protection
System Copyright © 2003 by K.S. Trivedi 78 Outline
• Description of the system
• Using a rate approximation
• Using a 3stage Erlang approximation to a
uniform distribution
• Using a SemiMarkov model  approximation
method using a 3stage Erlang distribution
• Using equations of the underlying SemiMarkov Process
• Solutions for the models
Copyright © 2003 by K.S. Trivedi 79 Description of the system
• N = Number of protected units (we use N=1)
• λ = Unit failure rate
• µ = Unit restoration rate
• T = deterministic time between routine diagnostics
• c = Probability that a protection switch successfully
restores service
• d = Probability that a failure in the standby unit is
detected
Copyright © 2003 by K.S. Trivedi 80 Outline
•
•
•
•
• Description of the system
Using a rate approximation
Using a 3stage Erlang approximation to a
uniform distribution
Using a SemiMarkov model  approximation
method using a 3stage Erlang distribution
Using equations of the underlying SemiMarkov Process
Solutions for the models
Copyright © 2003 by K.S. Trivedi 81 Hot Standby with different coverages
Normal
(1+1)
(1d)λ (1c)λ µ (c+d)λ Protection
Switch
µ
Failure
λ Failure to
Detect
Protection
Fault Simplex
(1)
2µ λ λ Failed
(0) Copyright © 2003 by K.S. Trivedi Normal:
1
Protection Switch Failure:
2
Simplex:
3
Failure to detect protection fault: 4
Failed:
5 82 Diagnostics; Using a rate
approximation
Normal
(1+1)
(1d)λ (1c)λ µ (c+d)λ Protection
Switch
µ
Failure
λ Simplex
(1) 2/T Failure to
Detect
Protection
Fault 2µ λ λ Failed
(0)
Time to diagnostic is
exponentially distributed
with mean T/2
Copyright © 2003 by K.S. Trivedi Normal:
1
Protection Switch Failure:
2
Simplex:
3
Failure to detect protection fault: 4
Failed:
5 83 Copyright © 2003 by K.S. Trivedi 84 Outline
•
• •
• Description of the system
Using a rate approximation
Using a 3stage Erlang approximation to a
uniform distribution
Using a SemiMarkov model approximation method using a 3stage
Erlang distribution
Using equations of the underlying SemiMarkov Process
Solutions for the models
Copyright © 2003 by K.S. Trivedi 85 1.8 Comparison of
probability density functions (pdf) 1.6
1.4
1.2
1 pdf 3stage Erlang pdf
U(0,1) pdf 0.8
0.6
0.4
0.2 time Copyright © 2003 by K.S. Trivedi 0.
9
0.
96
1.
02 0.
6
0.
66
0.
72
0.
78
0.
84 0.
3
0.
36
0.
42
0.
48
0.
54 0
0.
06
0.
12
0.
18
0.
24 0 86 1.2 Comparison of cumulative distribution
functions (cdf) 1 3stage Erlang cdf 0.6 U(0,1) cdf 0.4 0.2 0.
9
0.
96
1.
02 0.
6
0.
66
0.
72
0.
78
0.
84 0.
3
0.
36
0.
42
0.
48
0.
54 0.
06
0.
12
0.
18
0.
24 0 0 cdf 0.8 time Copyright © 2003 by K.S. Trivedi 87 Using a 3stage Erlang approximation to a
uniform distribution
Normal
(1+1) (1d)λ (1c)λ µ (c+d)λ Protection
Switch
Failure
λ Time to
diagnostic is
uniformly
distributed over
(0,T) approximated
by a 3stage
Erlang with
mean T/2 Simplex
(1) µ s1 s2 6/T Failure to
Detect
Protection
Fault 2µ λ 6/T Failed
(0) λ Copyright © 2003 by K.S. Trivedi λ 6/T λ 88 Copyright © 2003 by K.S. Trivedi 89 Outline
Description of the system
Using a rate approximation
Using a 3stage Erlang approximation to a
uniform distribution
• Using a SemiMarkov model approximation method using a 3stage
Erlang distribution
• Using equations of the underlying SemiMarkov Process
• Solutions for the models
Copyright © 2003 by K.S. Trivedi 90 Using a SemiMarkov model approximation method using an Erlang
distribution (N=1) E(t) > 3stage Erlang distribution
given by,
3−1 1− ∑ Normal
(1+1)
(1d)λ (1c)λ Protection
Switch
µ
Failure
λ e µ (c+d)λ Time to
diagnostic is
uniformly
distributed over
(0,T) approximated
by a 3stage
Erlang
distribution
with mean T/2 k =0 6
( T t )k
k! Simplex
(1) E(t) Failure to
Detect
Protection
Fault 2µ λ λ Failed
(0) Copyright © 2003 by K.S. Trivedi 91 6
−T t Outline
Description of the system
Using a rate approximation
Using a 3stage Erlang approximation to a
uniform distribution
Using a SemiMarkov model approximation method using a 3stage
Erlang distribution
• Using equations of the underlying SemiMarkov Process
• Solutions for the models
Copyright © 2003 by K.S. Trivedi 92 Using Equations of the underlying
SemiMarkov Process
•Steady state solution
One step transition probability matrix, P of the
embedded DTMC 0 0
µ λ +µ
P= 0
0 1c
2 c+d
2 1d
2 0 λ +µ 0 0
0 0
1
(1 − e −λT )
λT 0
0 0 1 0 µ Copyright © 2003 by K.S. Trivedi λ
λ +µ λ λ +µ
−λT 1
1 − λT (1 − e ) 0 0 93 Using Equations of the underlying
SemiMarkov Process (Contd.)
Solve v = vP to obtain,
v=[ ,
v 1− c
12 λ+µ
µ
1 v, where v1 = 1− d
12 v, v1 , ( λ (1− c )
2( λ + µ ) ++
λ
µ 1− d
2 (1 − 1
λT (1 − e − λT )))v1 1
1 + 1− c +
2 λ+µ
µ + 1− d +
2 λ ( 1− c )
2 ( λ +µ ) + λ + 1− d (1 −
µ 2 1
λT Copyright © 2003 by K.S. Trivedi (1 − e − λT )) 94 Using Equations of the underlying
SemiMarkov Process (Contd.)
•Time to the next diagnostic is uniformly distributed over (0,T) H i (t ) : CDF of the sojourn time in state i
H1 (t ) = 1 − e − 2 λt ,
H 3 (t ) = 1 − e −( λ + µ )t H 2 (t ) = 1 − e −( λ + µ )t ,
, t
1 − (1 − T )e −λt ,
H 4 (t ) = 1, H 5 (t ) = 1 − e − 2 µt t<T
t≥T Copyright © 2003 by K.S. Trivedi 95 Using Equations of the underlying
SemiMarkov Process (Contd.)
∞ hi : mean sojourn time in state i = ∫ [1H i(t)]dt
0 h1 = 1
2λ , h2 = 1
λ +µ , h3 = 1
λ +µ 1
, h4 = λ − T1 2 (1 − e −λT ), h5 =
λ 1
2µ State probabilities of the SMP are given by, πi = vi hi
5 ∑v jhj
j =1 Unavailability = π 2 + π 5
Copyright © 2003 by K.S. Trivedi 96 Outline
Description of the system
Using a rate approximation
Using a 3stage Erlang approximation to a
uniform distribution
Using a SemiMarkov model approximation method using a 3stage
Erlang distribution
Using equations of the underlying SemiMarkov Process
• Solutions for the models
Copyright © 2003 by K.S. Trivedi 97 Solutions for the models
Parameter values assumed:
• N=1
• c = 0.9
• d = 0.9
• λ = 0.0001 / hour
• µ = 1 / hour
• T = 1 hour
Copyright © 2003 by K.S. Trivedi 98 Results obtained
• Steady state availability
Probability of being in states “Normal”, “Simplex”,
or “Failure to Detect Protection Fault” • Steady state unavailability
Probability of being in states “Protection Switch
Failure”, or “Failed (0)” • Average downtime in steady state
Steady state unavailability * Number of minutes in
a year • Average #units available
2*PNormal + 1*PSimplex +1*PFailure to Detect Protection Fault
Copyright © 2003 by K.S. Trivedi 99 Diagnostic start time
approxim
ation Steady state availability Steady state
unavailability Avg. downtime in
steady state
(Minutes/year) Avg. #units available (out of 1 + 1
spare) Exp(2/T) 9.99989992e01 1.00075983e05 5.25999 1.99977503e+00 3stage Erlang with mean
T/2 9.99989992e01 1.00075983e05 5.25999 1.99977503e+00 SemiMarkov (3stage
Erlang approx. with mean 9.99989992e01
T/2) 1.00075983e05 5.25999 1.99977503e+00 SemiMarkov Process
equationsU([0,T]) 1.00075983e05 5.25999 1.99977503e+00 9.99989992e01 Copyright © 2003 by K.S. Trivedi 100 ...
View Full
Document
 Spring '10
 MohammadAbdolahiAzgomiPh.D

Click to edit the document details