ClassNotes04 - Reliability Engineering OEM 2009 ESI 6321...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Reliability Engineering OEM 2009 ESI 6321 Applied Probability Methods in Engineering Reliability Engineering Reliability is the probability that a system will perform properly for a specified period of time. Reliability depends on Definition of "performing properly" (or "failure") Operating conditions Time (age of system) Reliability as a function of time is important system characteristic. 2 Failure rate Informally, we will define the failure rate of a system as the rate at which the system will fail if the system has not yet failed for t time units. We will formalize this later. How do we expect this rate to vary over time? Increasing? Decreasing? Constant? Mixed? 3 Failure rate Nature of failures: "Infant mortality" failure Random failures "Age-related" failures A common assumption is therefore that the failure rate follows a bathtub curve. 4 Time to failure We will define reliability as it relates to time through the following random variable: T = time to failure Let F denote the cumulative distribution function (cdf) of this random variable: F(t) = Pr(T ! t) = probability that failure takes place no later than time t 5 Reliability We then define reliability as a function of time as r(t) ! Pr(T > t) = 1 " F(t) = probability that system operates without failure for at least t time units It is easy to see that r(0) = 1 lim r(t) = 0 t !" 6 Example 1 reliability The reliability of a machine is given by r(t) = exp !0.04t ! 0.008t 2 ( ) where t is the age of the machine in years What is the distribution of T? F(t) = 1 ! r(t) = 1 ! exp !0.04t ! 0.008t 2 ( ) 7 Example 1 graphical illustration 8 Design life The design life of a product is defined to be the duration for which a certain reliability level can be guaranteed. In particular, if r* is the minimum desired reliability, the design life is the maximum value of T* such that r(T * ) = r * If the reliability function r is invertible, we have T * = r !1 (r * ) 9 Example 1 revisited design life The reliability of a machine is given by r(t) = exp !0.04t ! 0.008t 2 ( ) where t is the age of the machine in years If the design life is 1 year, what is apparently the minimum required reliability? What is the design life that maintains a reliability of 90%? 10 Failure rate - discrete time In discrete time, we define the failure rate as the probability that the system will fail in period t+1 given that it has not yet failed through period t !(t) = Pr(T = t + 1|T > t) = = Pr(T = t + 1,T > t) Pr(T > t) Pr(T = t + 1) 1 ! F(t) 11 Failure rate - discrete time Example 2 Consider a system that only fails as a result of shocks occurring at discrete points in time probability of failure is constant, say p What is the distribution of T ? R(t) = 1 ! F(t) = Pr(T > t) = (1 ! p)t ! e " pt (small p) Pr(T = t + 1) = (1 ! p)t p T has a geometric distribution with parameter p Then t !(t) = Pr(T = t + 1) 1 " F(t) = (1 ! p) p (1 ! p)t =p The geometric distribution is a constant failure rate distribution. 12 Example 2 - graphical representation 13 Time to failure - continuous time If T is a continuous random variable (and therefore has a density function, say f) we have: f (t) = F '(t) = lim !t "0 F(t + !t) # F(t) !t so that f (t)!t " Pr(t < T # t + !t) = probability that failure takes place shortly after time t 14 Failure rate - continuous time In continuous time, we define the failure rate as the rate at which the system will fail immediately following period t given that it has not yet failed through period t !(t) = lim Pr(t < T $ t + "t |T > t) / "t "t #0 = lim !t "0 Pr(t < T # t + !t,T > t) / !t Pr(T > t) = lim !t "0 Pr(t < T # t + !t) / !t 1 $ F(t) = lim !t "0 (f (t)!t ) / !t 1 # F(t) 15 = f (t) 1 ! F(t) Example 1 revisited failure rate The reliability of a machine is given by r(t) = exp !0.008t ! 0.00032t 2 where t is the age of the machine in years What is the failure rate function for this machine? F !(t) r "(t) f (t) = =! In general: !(t) = r(t) 1 " F(t) 1 " F(t) ( ) So r !(t) = " 0.008 + 0.00064t # exp "0.008t " 0.00032t 2 ( !(t) = " r #(t) r(t) ) ( ) = 0.008 + 0.00064t 16 Example 1 revisited graphical illustration 17 Failure rate - continuous time Example 3 Consider a system whose time until failure has an exponential distribution with parameter : T ~ exp(). F(t) = Pr(T ! t) = 1 " e " #t r(t) = 1 ! F(t) = e ! "t f (t) = ! e " !t Then !(t) = f (t) 1 " F(t) = ! e " !t e " !t =! Like the geometric distribution, the exponential distribution is a constant failure rate distribution. 18 Constant failure rate The constant failure rate property can be reformulated as: The rate at which the system fails is independent of how long the system has already been in operation. The distribution of the remaining time to failure of the system is independent of how long the system has already been in operation. We often refer to this as the memoryless property. 19 Constant failure rate A constant failure rate model can be a convenient and reasonable approximation: Early failures may be limited or eliminated using a wearin period. If a system contains many similar components that are replaced when they fail, the failure rate of the system may appear constant. If the decreasing and increasing segments of the failure rate curve are moderate, a constant failure rate model can be used as a (somewhat pessimistic) approximation. 20 Reliability analysis In much of this class, we will assume that we know the failure rate function (t) In many cases, we will pay particular attention to the constant failure rate case in continuous time Focus will be on gaining insights 21 Component failures and failure modes So far we have considered the reliability of an entire system due to a combination of all causes of failure (failure modes) It is often more convenient or tractable to focus on single failure modes single components and express the system reliability (or failure rate function) as a function of the individual reliabilities (or failure rate functions). 22 Failure modes Consider a system with n failure modes. For example: Manufacturing defects Random shocks Deterioration Suppose these mode failures are independent of one another. Denote the time until failure mode i causes a failure by Ti, and the corresponding How can we express the system reliability and system failure rate in terms of the individual reliabilities and failure rates? 23 reliability by ri(t) failure rate by i(t) Failure modes - system reliability We can determine the system reliability as follows: r(t) = Pr(T > t) = Pr T1 > t ! T2 > t ! ! ! Tn > t ( = Pr (T = = n i =1 n i =1 i ! Pr (T ! r (t) 1 > t ! Pr T2 > t ! ! ! Pr Tn > t i ) >t ) ( ) ) ( ) Thus system reliability is the product of the individual reliabilities corresponding to the failure modes! 24 Constant failure rate Example 4 Consider a system subject to two different types of shocks both with constant failure rates but with different rates, say 1 and 2 The individual shocks have failure rate !" t !" t functions of r1 (t) = e 1 and r2 (t) = e 2 The system as a whole therefore has reliability function "# t "# t "( # + # )t r(t) = r1 (t) ! r2 (t) = e 1 ! e 2 = e 1 2 This means that the system has constant failure rate equal to =1+2 Can this result be generalized? 25 Time-dependent failure rate Consider a system with failure rate function given by (t). In case (t)= for all t, we obtained r(t) = e ! "t For the general case, we can actually generalize this expression to " (s)ds $ t ' r(t) = e #0 = exp & ! # "(s)ds ) % 0 ( We can use this to express the system failure rate in terms of the failure rates of the individual failure modes. ! 26 t Failure modes - system failure rate Rewrite the reliability relationship further as follows: r(t) = ! r (t) & * exp $ ! # % i =1 i =1 n i n = t 0 ' "i (s)ds ) ( % t n ( = exp ' ! $ # i =1 "i (s)ds * 0 & ) $ t ' = exp & ! # "(s)ds ) % 0 ( Thus system failure rate function is the sum of the individual failure rate functions corresponding to the failure modes! 27 Failure modes 28 Failure modes - constant failure rate We can conclude that With a constant failure rate i, the mean time to i =1 failure is given by E(Ti)=1/i if each of the n failure modes has a constant failure rate n the system has a constant failure rate as well: ! = " !i therefore, 1 E(T ) = ! E(T ) i =1 i n 1 or E(T ) = 1 ! E(T ) i =1 i n 1 " n 1 % = $! ' # i =1 E(Ti ) & (1 Note that this does not hold when the failure rate(s) are time-dependent!! 29 Component failures We can apply a similar reasoning to systems with n components, each with their own reliability properties. Component failures should be independent. Failure of a single component causes system failure. 1 2 3 30 System vs. component/mode reliability Improve system reliability by Improving component reliability Introducing redundancy into the system 31 Redundancy Simplest types of redundancy Active parallel: two units run simultaneously; the "component" fails after both units have failed 1 2 32 Redundancy Standby parallel: two units are available; the backup unit is used after the primary unit fails, and the "component" fails after both units have failed 1 2 33 Redundancy - active parallel Two-unit active parallel system Reliabilities r1(t), r2(t) Often, but not necessarily, the units will be identical Unit failures are independent System r(t) = Pr(T > t) = Pr T1 > t ! T2 > t reliability is equal to = Pr T1 > t + Pr T2 > t " Pr T1 > t # T2 > t = r1 (t) + r2 (t) " r1 (t)r2 (t) ( ( ) ( ) ) ( ) = Pr T1 " t # T2 " t = 1 ! r1 (t) $ 1 ! r2 (t) 1 ! r(t) = Pr(T " t) ( ( ) ( ) ) 34 Redundancy - standby parallel Two-unit standby parallel system Reliabilities r1(t), r2(t) Often, but not necessarily, the units will be identical Unit failures are independent System reliability is equal to r(t) = Pr(T > t) = Pr T1 + T2 > t = " Pr T1 + T2 > t |T1 = s f1 (s)ds 0 " 2 ( = # Pr (T 0 ! ! ( ) ) > t ! s f1 (s)ds t 2 1 ) = " f (s)ds + " r (t # s)f (s)ds 1 t 0 = r1 (t) + " r2 (t ! s)f1 (s)ds 0 t 35 Redundancy constant failure rate, identical units Let each unit have constant failure rate . Two-unit active parallel system: r(t) = e ! "t + e ! "t ! e ! "t # e ! "t = 2e ! "t ! e !2 "t Two-unit standby parallel system: r(t) = e ! "t + # e ! " (t ! s)" e ! " s ds = (1 + "t)e ! "t 0 t 36 Graphical illustration 37 Reliability - approximation for constant failure rate There is a useful first-order approximation of the reliability function, valid for small values of t. General (single unit): r(t) ! 1 " #t when #t is very small when #t is very small Two-unit active parallel system: r(t) ! 1 " (#t)2 Two-unit standby parallel system: r(t) ! 1 " 1 (#t)2 2 when #t is very small 38 Graphical illustration 39 Redundancy - other properties We now have an expression for the reliability of two simple redundant systems. In general E(T ) = " tf (t)dt = 0 We can use this to derive the failure rate and distribution of time to failure. How do we find the mean time to failure (MTTF) for these systems? ! " " 0 0 # (1 ! F(t)) dt = # r(t)dt Two-unit active parallel system: Two-unit standby parallel system: we can use the fact that T = max(T1,T2) we can use the fact that T = T1 + T2 40 Redundancy - active parallel, constant failure rate, identical units Let each unit have constant failure rate . Two-unit active parallel system: r(t) = 2e ! "t ! e !2 "t !(t) = " # $ 1 " e " !t ' r #(t) = ! & ) " !t 1 r(t) %1 " 2 e ( 1 2e ! "t ! e !2 "t dt = E(T ) = $( 0 ) 11 2 " = 1 1 E(T1 ) 2 41 Redundancy - standby parallel, constant failure rate, identical units Let each unit have constant failure rate . Two-unit standby parallel system: r(t) = e ! "t + # e ! " (t ! s)" e ! " s ds = (1 + "t)e ! "t 0 t !(t) = " # $ !t ' r #(t) = ! & ) r(t) % 1 + !t ( 1 E(T ) = $ (1 + !t)e " !t dt = 0 2 ! = 2E(T1 ) 42 Limitations to the effect of redundancy We have assumed that failures of the units are independent. In practice, this may not be the case which can significantly impact the efficacy of redundancy. 43 Active parallel systems Common-mode failures A single underlying cause may lead to failures of both units. We can model this as a slightly expanded system with the parallel redundant units in series with a fictitious unit representing common-mode failure - with reliability rc(t). 1 c 2 44 Active parallel systems common-mode failures The system reliability becomes r(t) = #1 ! 1 ! r1 (t) " 1 ! r2 (t) % " rc (t) $ & ( ) ( ) so 1 ! r(t) = 1 ! rc (t) + 1 ! r1 (t) " 1 ! r2 (t) ! c 1 ( (1 ! r (t)) " (1 ! r (t)) " (1 ! r (t)) 2 ) ( ) ( ) This suggests that common-mode failures will often be dominant! 45 Active parallel systems common-mode failures Under constant failure rate, suppose that ! = (1 " # )! is the independent failure rate I ! = "! is the common-mode failure rate c Then we obtain r(t) = 2e !(1! " )#t ! e !2(1! " )#t $ e ! "#t !(1! " ) #t ( = (2 ! e )$e ) ! #t = 2e ! #t ! e !(2! " )#t % 1 ! "#t when #t is very small 46 Graphical illustration (=0.5) 47 More complex systems Key is to decompose the system into simple subsystems of the following forms: Serial Active parallel Standby parallel 2 1 5 4 3 6 48 More complex systems Another example 3 2 3' 1 5 4 6 49 Weibull distribution An important distribution that is often used to model different failure patterns is the Weibull distribution. Notation: T ! Weibull(m, ! ) Distribution function: F(t) = 1 ! e !(t /" ) m is a scale parameter m is a shape parameter This distribution generalizes the exponential distribution Weibull(1,1/) = exponential() 50 Weibull distribution Properties Reliability function: r(t) = e !(t /" ) m Failure rate function: m#t& !(t) = % ( " $"' m)1 Decreasing failure rate if m<1 Constant failure rate if m=1 (exponential distribution!) Increasing failure rate if m>1 51 Weibull distribution - failure rate 52 Modeling general failure rate functions Note that the exponential distribution has constant failure rate Weibull distribution has either increasing or decreasing failure rate How do we model more general failure rate functions? We can use a combination of different Weibull functions: mi )1 k m # t& i !(t) = * % ( i =1 " i $ " i ' View this as different causes of failure (compare with serial system!) 53 Estimation Parameter estimation Moment estimators Maximum likelihood estimators Other estimators Exponential distribution Weibull distribution Goodness-of-fit tests What probability distribution appropriately describes (failure) data? 54 Moment Estimators A formal approach to obtaining good point estimators of population parameters is the method of moments. Recall that the random variables (X1,X2, ...,Xn) form a random sample of size n if the Xi's are independent random variables every Xi has the same probability distribution, say with mass function or density f(xi;) where is vector of unknown problem parameters. 55 Moment Estimators The kth population moment is equal to E ( X k ), k = 1,2,... and the kth sample moment is equal to If there are m unknown problem parameters, say 1,...,m, then the moment estimators can be found by equating the first m population moments to the first m sample moments. 56 !X n i =1 1 n k i , k = 1,2,... Moment Estimators Example: Let X1,...,Xn be a random sample from an exponential population and a sample size of n. For example, observed times until failure. The first population moment is 1/, and the first sample moment is X . This yields the moment estimator: ^ 1 != X Is this estimator unbiased? 57 Maximum Likelihood Estimators Another formal approach to obtaining good point estimators of population parameters is the method of maximum likelihood. In the discrete case, the probability that we obtain the observed sample is equal to: P(X1 = x1 ,..., X n = xn ) = f (x1;! ) " # # # " f (xn;! ) $ L(! ) The maximum likelihood estimator (MLE) of is the value of that maximizes L(). In the continuous case, L() is the joint density function of the sample. 58 Maximum Likelihood Estimators Example 1: For a normal population and a sample size of n, the maximum likelihood estimators of and 2 are: !X = X n i =1 i 1 n n #1 2 2 " = ! (X i # X)2 = ^ S n i =1 n = ^ 1 n Are these estimators unbiased? 59 Maximum Likelihood Estimators Example 2: For an exponential population and a sample size of n, the maximum likelihood estimator of is: ^ != 1 1 n "X n i =1 i = 1 X 60 Maximum Likelihood Estimators When the sample size is large, the maximum likelihood estimator is: Approximately unbiased Approximately normally distributed ^ ^ Invariant: if ! is the MLE of , then h( ! ) is the MLE of h(). 61 Confidence intervals If the sample is "large enough", we can often use a normal approximation to establish confidence intervals For example, recall that a confidence interval for the population mean is given by $ ! z # / n, + z # / n ' ^ ^ ^ ^ " /2 " /2 & ) % ( 62 Confidence intervals For an exponential population, recall that ^ 1 1 != = X ^ A confidence interval on =1/ can then be approximated by $ ' 1 1 & ) , & + z " / n # z " / n) ^ ^ ^ ! /2 ! /2 %^ ( 63 Confidence intervals When the sample size is small, we should (try to) determine the exact distribution of the parameter estimator of interest. Unfortunately, this is often a very difficult task. 64 Another approach to parameter estimation A parameter estimation approach that is sometimes easier and definitely more easily generalized is one that is based on estimating the distribution function (c.d.f.) of the population. To estimate the c.d.f., we sort the sample observations in ascending order; assign cumulative frequencies for this order. The simplest way is: j ^ F xj = for j = 1,..., n n 65 ( ) Cumulative frequency The upper tail of this estimate is not very good. Why? Good alternatives are j ! 12 ^ F xj = n ^ F xj = ( ) ( ) j n +1 j ! 0.3 ^ F xj = n + 0.4 ( ) 66 Parameter estimation Denote the general form of the c.d.f. by F(x). Now if we can find a transformation that expresses a linear function of x in terms of F: a + bx = g F(x) we can ( ) ^ plot the sample data xj, j=1,...,n against g F x j estimate a corresponding linear regression line ( ( )) 67 Parameter estimation Example Normal distribution Recall that $ F(x) = ! & % x " ' ) # ( where is the standard normal c.d.f. Thus, we have 1 ! x" = # "1 F(x) ! j We then plot the points We can use the coefficients of a linear regression line to estimate the population parameters ( x , ! (F^ ( x ))) for j = 1,..., n "1 j ( ) 68 Parameter estimation Example Exponential distribution For this distribution, F(x) = 1 ! e ! " x Thus, we have " % 1 ln $ ' = (x # 1 ! F(x) & " $ # We then plot the points $ x j ,ln $ %% 1 ' ' for j = 1,..., n ^ $1 ! F x '' # j && " ( ) We could use the coefficient of a linear regression line (no intercept!) to estimate the population parameter 69 Parameter estimation Example Weibull distribution # # & m& x For this distribution, F(x) = 1 ! exp % ! % ( ( % $"' ( $ ' Thus, we have " " %% 1 ln $ ln $ ' ' = m ln x ! m ln ( # # 1 ! F(x) & & We then plot the points $ ln x j ,ln $ ln $ $ # $ # " " %%% 1 ' ' ' for j = 1,..., n ^ $1 ! F x ''' # j &&& " ( ) We could use the coefficients of a linear regression line to estimate the population parameters 70 Testing for Goodness of Fit We have discussed the issue of estimating parameters assuming that the shape of the population/probability distribution is known. We can then develop hypothesis testing procedures to draw conclusions on the population parameters based on a sample. If the shape of the population distribution is unknown, we may want to test whether a particular distribution is a satisfactory approximation of the population. 71 Testing for Goodness of Fit A general approach is to compare observed frequencies to frequencies that can be expected for a particular probability distribution. We hypothesize a particular probability distribution We test whether this is a reasonable model 72 Testing for Goodness of Fit Informally, when using the last estimation approach we can do this by judging the fit of the linear regression. However, it is possible to develop a more formal approach based on observed and theoretical (hypothesized) frequencies. 73 Example Let X denote the number of flaws observed on a large coil of galvanized steel. 75 coils are inspected, with the following results: Values Observed 1 2 3 4 13 5 11 6 12 7 10 8 9 1 11 8 Is the Poisson distribution an adequate probability model for this data? H0 : X ~ Poisson H1 : X ~ Poisson 74 Example Compute the expected frequencies as follows: E1 = 75 ! P(X " 1) Ei = 75 ! P(X = i) for i = 2,...,7 E8 = 75 ! P(X # 8) where the probabilities are computed for the Poisson distribution with parameter ^ ! = x = 4.9 3 8 4 13 i Oi Ei 1 1 3.3 2 11 5 11 6 12 7 10 7.5 8 9 9.2 75 6.7 11.0 13.4 13.2 10.7 Chi-squared Test The test statistic for this hypothesis is: X = where 2 0 " i =1 k (Oi ! Ei )2 Ei # $ 2 (k ! p ! 1) k=number of classes = 8 p=number of estimated distribution parameters = 1 2(k-p-1) denotes the chi-squared distribution with k-p-1 degrees of freedom 76 Example The value of the test statistic is: X = 2 0 " i =1 8 (Oi ! Ei )2 Ei = 6.56 and the critical value of a 2(8-1-1) distribution at significance level =0.05 is: 12.59 > 6.56. Thus we cannot reject H0 the data are consistent with a Poisson distribution. 77 Goodness of Fit Test The test statistic is valid if the expected frequencies are not too small, say at least 3. If some expected frequencies are too small, we can merge adjacent classes. We can also apply the test to continuous distributions. In that case, we often choose the classes in such a way that the expected frequencies are all equal. 78 Goodness of Fit Test For example, for testing for a normal probability distribution with mean 5 and standard deviation 2, we could use the following 10 classes. 79 Censored observations In the context of reliability, observations are sometimes not "complete": An item may not fail before the end of an experiment An item may have been in use for an unknown amount of time before the start of an experiment An item is not monitored continuously Such observations are often called censored observations. 80 Censored observations Such situations require specialized estimation procedures. They use the fact that, even for a censored observation, we will have a bound on the value of the observation (i.e., the observed time until failure). 81 ...
View Full Document

Ask a homework question - tutors are online