CPE 619 Summarizing Measured Data Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in Huntsville http://www.ece.uah.edu/~milenka http://www.ece.uah.edu/~lacasa

2 Overview Basic Probability and Statistics Concepts CDF, PDF, PMF, Mean, Variance, CoV, Normal Distribution Summarizing Data by a Single Number Mean, Median, and Mode, Arithmetic, Geometric, Harmonic Means Mean of a Ratio Summarizing Variability Range, Variance, Percentiles, Quartiles Determining Distribution of Data Quantile-Quantile plots
3 Part III: Probability Theory and Statistics How to report the performance as a single number? Is specifying the mean the correct way? How to report the variability of measured quantities? What are the alternatives to variance and when are they appropriate? How to interpret the variability? How much confidence can you put on data with a large variability? How many measurements are required to get a desired level of statistical confidence? How to summarize the results of several different workloads on a single computer system? How to compare two or more computer systems using several different workloads? Is comparing the mean sufficient? What model best describes the relationship between two variables? Also, how good is the model?

4 Basic Probability and Statistics Concepts Independent Events Two events are called independent if the occurrence of one event does not in any way affect the probability of the other event Random Variable A variable is called a random variable if it takes one of a specified set of values with a specified probability
5 CDF, PDF, and PMF Cumulative Distribution Function (CDF) Probability Density Function (PDF) Given a pdf f(x), the probability of x being in (x1, x2) 1 0 x F(x) f(x) x

6 CDF, PDF, and PMF (cont’d) Probability Mass Function (PMF) For discrete random variables CDF is not continuous PMF is used instead of PDF x i f(x i)
7 Mean, Variance, CoV Mean or Expected Value Variance : The expected value of the square of distance between x and its mean Coefficient of Variation

8 Covariance and Correlation Covariance For independent variables, the covariance is zero Although independence always implies zero covariance, the reverse is not true Correlation Coefficient : normalized value of covariance The correlation always lies between -1 and +1
9 Mean and Variance of Sums If are k random variables and if are k arbitrary constants (called weights), then: For independent variables:

10 Quantiles, Median, and Mode Quantile : The x value at which the CDF takes a value a is called the a-quantile or 100a-percentile. It is denoted by xa: Median : The 50-percentile or (0.5-quantile) of a random variable is called its median Mode : The most likely value, that is, x i that has the highest probability p i , or the x at which pdf is maximum, is called mode of x 1.00 0.00 x F(x) 0.25 0.50 0.75 f(x) x
