55 Pages

401-univariate

Course: PHYSICS 401, Fall 2008
School: Michigan
Rating:
 
 
 
 
 

Word Count: 11182

Document Preview

of Review basic probability and statistics Probability: basic denitions A random variable is the outcome of a natural process that can not be predicted with certainty. Examples: the maximum temperature next Tuesday in Chicago, the price of WalMart stock two days from now, the result of ipping a coin, the response of a patient to a drug, the number of people who will vote for a certain candidate in a future...

Register Now

Unformatted Document Excerpt

Coursehero >> Michigan >> Michigan >> PHYSICS 401

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
of Review basic probability and statistics Probability: basic denitions A random variable is the outcome of a natural process that can not be predicted with certainty. Examples: the maximum temperature next Tuesday in Chicago, the price of WalMart stock two days from now, the result of ipping a coin, the response of a patient to a drug, the number of people who will vote for a certain candidate in a future election. On the other hand, the time of sunrise next Tuesday is for all practical purposes exactly predictable from the laws of physics, and hence is not really a random variable (although technically it may be called a degenerate random variable). There is some grayness in this denition: eventually we may be able to predict the weather or even sociological phenomena like voting patterns with extremely high precision. From a practical standpoint this is not likely to happen any time soon, so we consider a random variable to be the state of a natural process that human beings cannot currently predict with certainty. The set of all possible outcomes for a random variable is called the sample space. Corresponding to each point in the sample space is a probability, which is a number between 0 and 1. The sample space together with all probabilities is called the distribution. Properties of probabilities: (i) a probability is always a number between 0 and 1, (ii) the sum of probabilities for all points in the samples space is always exactly 1. Example: If X is the result of ipping a fair coin, the sample space of X is {H, T } (H for heads, T for tails). Either outcome has probability 1/2, so we write P (X = H) = 1/2 (i.e. the probability that X is a head is 1/2) and P (X = T ) = 1/2. The distribution can be written {H 1/2, T 1/2}. Example: If X is the number of heads observed in four ips of a fair coin, the sample space of X is {0, 1, 2, 3, 4}. The probabilities are given by the binomial distribution. The distribution is {0 1/16, 1 1/4, 2 3/8, 3 1/4, 4 1/16}. Example: Suppose we select a point on the surface of the Earth at random and measure the temperature at that point with an innitely precise thermometer. The temperature will certainly fall between 100 C and 100 C, but there are innitely many values in that range. Thus we can not represent the distribution using a list {x y, . . .}, as above. Solutions to this problem will be discussed below. A random variable is either qualitative or quantitative depending on the type of value in the sample space. Quantitative random variables express values like temperature, mass, and velocity. Qualitative random variables express values like gender and race. 1 The cumulative distribution function (CDF) is a way to represent a quantitative distribution. For a random variable X, the CDF is a function F (t) such that F (t) = P (X t). That is, the CDF is a function of t that species the probability of observing a value no larger than t. Example: Suppose X follows a standard normal distribution. You may recall that this distribution has median 0, so that the P (X 0) = 1/2 and P (X 0) = 1/2. Thus for the standard normal distribution, F (0) = 1/2. There is no simple formula for F (t) when t = 0, but a table of values for F (t) is found in the back of almost any statistics textbook. A plot of F (t) is shown below. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 P(X <= t) -4 -3 -2 -1 t 0 1 2 3 4 The standard normal CDF Any CDF F (t) has the following properties: (i) 0 F (t) 1, (ii) F () = 0, (iii) F () = 1, (iv) F is non-decreasing. 2 We can read probabilities of the form P (X t) directly from the graph of the CDF. Since P (X > t) = 1P (X t) = 1F (t), we can also read o a probability of the form P (X > t) directly from a graph of the CDF. 1 0.8 0.6 0.4 0.2 0 -4 -3 -2 -1 0 1 2 3 4 The length of the green line is the probability of observing a value less than 1. The length of the blue line is the probability of observing a value greater than 1. The length of the purple line is the probability of observing a values less than 1. 3 If a b, for any random variable X P (a < X b) = P (X b) P (X a) = F (b) F (a). Thus we can easily determine the probability of observing a value in an interval (a, b] from the CDF. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -4 -3 -2 -1 0 1 2 3 4 The length of the purple line is the probability of observing a value between 1 and 1.5. If a and b fall in an area where F is very steep, F (b) F (a) will be relatively large. Thus we are more likely to observe values where F is steep than where F is at. A probability density function (PDF) is a dierent way to represent a probability distribution. The PDF for X is a function f (x) such that the probability of observing a value of X between a and b is equal to the area under the graph of f (x) between a and b. A plot of f (x) for the standard normal distribution is shown below. We are more likely to observed values where f is large than where f is small. 4 0.4 0.35 0.3 0.25 Density 0.2 0.15 0.1 0.05 0 -4 -3 -2 -1 Value 0 1 2 3 4 The standard normal PDF 5 Probability: samples and populations If we can repeatedly and independently observe a random variable n times, we have an independent and identically distributed sample of size n, or an iid sample of size n. This is also called a simple random sample, or an SRS. (Note that the word sample is being used somewhat dierently in this context compared to its use in the term sample space). A central problem in statistics is to answer questions about an unknown distribution called the population based on a simple random sample that was generated by the distribution. This process is called inference. Specically, given a numerical characteristic of a distribution, we may wish to estimate the value of that characteristic based on data. In an iid sample, each point in the sample space will be observed with a certain frequency. For example, if we ip a fair coin 20 times we might observe 13 heads, so the frequency of heads is 13/20. Due to random variation, this frequency diers somewhat from the underlying probability, which is 1/2. If the sample is suciently large, frequencies and probabilities will be very similar (this is known as the law of large numbers). Since probabilities can be estimated as frequencies, and the CDF is dened in terms of probabilities (i.e. F (t) = P (X t)), we can estimate the CDF as the empirical CDF (ECDF). Suppose that X1 , X2 , . . . , Xn are an iid sample. Then the ECDF (evaluated at t) is dened to be the proportion of the Xi that are not larger than t. The ECDF is notated as F (t) (in general the symbol represents an estimate based on an iid sample of a characteristic of the population named ). Example: Suppose we observe a sample of size n = 4 whose sorted values are 3, 5, 6, 11. Then F (t) is equal to: 0 for t < 3, 1/4 for 3 t < 5, 1/2 for 5 t < 6, 3/4 for 6 t < 11, and 1 for t 11. 6 1 0.8 P(X<=t) 0.6 0.4 0.2 0 0 2 4 6 t 8 10 12 14 16 The ECDF for the data set {3, 5, 6, 11} Since the ECDF is a function of the sample, which is random, if we construct two ECDFs for two samples from the same distribution, the results will dier (even through the CDFs from the underlying population are the same). This is called sampling variation. The next gure shows two ECDFs constructed from two independent samples of size 50 from a standard normal population. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 CDF ECDF 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 CDF ECDF P(X<=t) -4 -3 -2 -1 t 0 1 2 3 4 P(X<=t) -4 -3 -2 -1 t 0 1 2 3 4 Two ECDFs for standard normal samples of size 50 (the CDF is shown in red) The sampling variation gets smaller as the sample size increases. The following gure shows ECDFs based on SRSs of size n = 500. 7 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 CDF ECDF 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 CDF ECDF P(X<=t) -4 -3 -2 -1 t 0 1 2 3 4 P(X<=t) -4 -3 -2 -1 t 0 1 2 3 4 Two ECDFs for standard normal samples of size 500 (the CDF is shown in red) The sampling variation gets larger as the sample size decreases. The following gure shows ECDFs based on SRSs of size n = 10. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 CDF ECDF 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 CDF ECDF P(X<=t) -4 -3 -2 -1 t 0 1 2 3 4 P(X<=t) -4 -3 -2 -1 t 0 1 2 3 4 Two ECDFs for standard normal samples of size 10 (the CDF is shown in red) 8 Given a SRS X1 , . . . , Xn , a histogram formed from the SRS is an estimate of the PDF. To construct a histogram, select a bin width > 0, and let H(x) be the function such that when (k 1) x < k, H(x) is the number of observed Xi that fall between (k 1) and k. To directly compare a density and a histogram they must be put on the same scale. A density is based on a sample of size 1, so to compare it to a histogram based on n observations using bins with width , the density must be scaled by n. There is no single best way to select . A rule of thumb for the number of bins is = R , log2 (n) + 1 where n is the number of data points and R is the range of the data (the greatest value minus the least value). This can be used to produce a reasonable value for . Just as with the ECDF, sampling variation will cause the histogram to vary if the experiment is repeated. The next gure shows two replicates of a histogram generated from an SRS of 50 standard normal random draws. 20 Scaled density Histogram 16 Scaled density Histogram 14 15 12 10 f(x) 10 f(x) -4 -3 -2 -1 0 1 2 3 4 8 6 5 4 2 0 0 x -4 -3 -2 -1 x 0 1 2 3 4 Two histograms for standard normal samples of size 50 (the scaled density is shown in red) As with the ECDF, larger sample sizes lead to less sampling variation. This is illustrated in comparing the previous gure to the next gure. 9 140 Scaled density Histogram 140 Scaled density Histogram 120 120 100 100 80 80 f(x) 60 f(x) 60 40 40 20 20 0 -4 -3 -2 -1 0 1 2 3 4 0 -4 x -3 -2 -1 x 0 1 2 3 4 Two histograms for standard normal samples of size 500 (the scaled density is shown in red) The quantile function is the inverse of the CDF. It is the function Q(p) such that F (Q(p)) = P (X Q(p)) = p, where 0 p 1. In words, Q(p) is the point in the sample space such that with probability p the observation will be less than or equal to Q(p). For example, Q(1/2) is the median: P (X Q(1/2)) = 1/2, and the 75th percentile is Q(3/4). A plot of the quantile function is just a plot of the CDF with the x and y axes swapped. Like the CDF, the quantile function is non-decreasing. 10 4 3 2 t: P(X <= t) = p 1 0 -1 -2 -3 -4 0 0.1 0.2 0.3 0.4 0.5 p 0.6 0.7 0.8 0.9 1 The standard normal quantile function Suppose we observe an SRS X1 , X2 , . . . , Xn . Sort these values to give X(1) X(2) X(n) (these are called the order statistics). The frequency of observing a value less than or equal to X(k) is k/n. Thus it makes sense to estimate Q(k/n) with X(k) , i.e. Q(k/n) = X(k) . It was easy to estimate Q(p) for p = 1/n, 2/n, . . . , 1. To estimate Q(p) for other values of p, we use interpolation. Suppose k/n < p < (k + 1)/n. Then Q(p) should be between Q(k/n) and Q((k + 1)/n) (i.e. between X(k) and X(k+1) ). To estimate Q(p), we draw a line between the points (k/n, X(k) ) and ((k + 1)/n, X(k+1) ) in the x-y plane. According to the equation for this line, we should estimate Q(p) as: Q(p) = n (p k/n)X(k+1) + ((k + 1)/n p)X(k) . Finally, for the special case p < 1/n set Q(p) = X(1) . (There are many slightly dierent ways to dene this interpolation. This is the denition that will be used in this course.) The following two gures show empirical quantile functions for standard normal samples of sizes 50 and 500. 11 4 4 3 3 2 2 t: P(X <= t) = p 1 t: P(X <= t) = p 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0 0 -1 -1 -2 -2 -3 -3 -4 -4 p 0 0.1 0.2 0.3 0.4 0.5 p 0.6 0.7 0.8 0.9 1 Two empirical quantile functions for standard normal samples of size 50 (the population quantile function is shown in red) 4 4 3 3 2 2 t: P(X <= t) = p 1 t: P(X <= t) = p 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0 0 -1 -1 -2 -2 -3 -3 -4 -4 p 0 0.1 0.2 0.3 0.4 0.5 p 0.6 0.7 0.8 0.9 1 Two histograms for standard normal samples of size 500 (the population quantile function is shown in red) 12 Measures of location When summarizing the properties of a distribution, the key features of interest are generally: the most typical value and the level of variability. A measure of the most typical value is often called a measure of location. The most common measure of location is the mean, denoted . If f (x) is a density function, then the mean of the distribution is = xf (x)dx. If the distribution has nitely many points in its sample space, it can be notated {x1 p1 , . . . , xn pn }, and the mean is p1 x1 + + pn xn . Think of the mean as the center of mass of the distribution. If you had an innitely long board and marked it in inches from to , and placed an object with mass p1 at location X1 , an object with mass p2 at X2 , and so on, then the mean will be the point at which the board balances. The mean as dened above should really be called the population mean, since it is a function of the distribution rather than a sample from the distribution. If we want to estimate the population mean based on a SRS X1 , . . . , Xn , we use the sample mean, which is the familiar average: X = (X1 + + Xn )/n. This may also be denoted . Note that the population mean is sometimes called the expected value. Although the mean is a mathematical function of the CDF and of the PDF, it is not easy to determine the mean just by visually inspecting graphs of these functions. An alternative measure of location is the median. The median can be easily determined from the quantile function, it is Q(1/2). It can also be determined from the CDF by moving horizontally from (0, 1/2) to the intersection with the CDF, then moving vertically down to the x axis. The x coordinate of the intersection point is the median. The population median can be estimated by the sample median Q(1/2) (dened above). Suppose X is a random variable with median . Then we will say that X has a symmetric distribution if P (X < c) = P (X > + c) for every value of c. An equivalent denition is that F ( c) = 1 F ( + c). In a symmetric distribution the mean and median are equal. The density of a symmetric distribution is geometrically symmetric about its median. The histogram of a symmetric distribution will be approximately symmetric (due to sampling variation). 13 1 0.8 0.6 0.4 0.2 0 -4 -3 -2 -1 0 1 2 3 4 The standard normal CDF. The fact that this CDF corresponds to a symmetric distribution is reected in the fact that lines of the same color have the same length. Suppose that for some values c > 0, P (X > + c) is much larger than P (X < c). That is, we are much more likely to observe values c units larger than the median than values c units smaller than the median. Such a distribution is right-skewed. 14 1 0.8 0.6 0.4 0.2 0 0 2 4 6 8 10 12 14 16 A right-skewed CDF. The fact that the vertical lines on the right are longer than the corresponding vertical lines on the left reects the fact that the distribution is right-skewed. The following density function is for the same distribution as the preceeding CDF. Right-skewed distributions are characterized by having long right tails in their density functions. 15 0.25 0.2 0.15 0.1 0.05 0 0 2 4 6 8 10 12 14 16 A right-skewed density. If P (X < c) is much larger than P (X > + c) for values of c > 0, then the distribution is left-skewed. The following gures show a CDF and density for a leftskewed distribution. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.05 0.1 0.15 0.2 0.25 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 A left-skewed CDF (left) and a left-skewed density (right). In a right-skewed distribution, the mean is greater than the median. In a left-skewed distribution, the median is greater than the mean. In a symmetric distribution, the mean and median are equal. 16 Measures of scale A measure of scale assesses the level of variability in a distribution. The most common measure of scale is the standard deviation, denoted . If f (x) is a density function then (x )2 f (x)dx is the standard deviation. = If the distribution has nitely many points in its sample space {x1 p1 , . . . , xn pn } (notation as used above), then the standard deviation is = p1 (x1 )2 + + pn (xn )2 . The square of the standard deviation is the variance, denoted 2 . The standard deviation (SD) measures the distance between a typical observation and the mean. Thus if the SD is large, observations tend to be far from the mean while if the SD is small observations tend to be close to the mean. This is why the SD is said to measure the variability of a distribution. If we have data X1 , . . . , Xn and wish to estimate the population standard deviation, we use the sample standard deviation: = (X1 X)2 + + (Xn X)2 /(n 1). It may seem more natural to use n rather than n 1 in the denominator. The result is similar unless n is quite small. The scale can be assessed visually based on the histogram or ECDF. A relatively wider histogram or a relatively atter ECDF suggests a more variable distribution. We must say suggests because due to the sampling variation in the histogram and ECDF, we can not be sure that what we are seeing is truly a property of the population. Suppose that X and Y are two random variables. We can form a new random variable Z = X + Y . The mean of Z is the mean of X plus the mean of Y : Z = X + Y . If X and Y are independent (to be dened later), then the variance of Z is the variance of X plus the 2 2 2 variance of Y : Z = X + Y . 17 Resistance Suppose we observe data X1 , . . . , X100 , so the median is X(50) (recall the denition of order statistic given above). Then suppose we observe one additional value Z and recompute the median based on X1 , . . . , X100 , Z. There are three possibilities: (i) Z < X(50) and the new median is (X(49) + X(50) )/2, (ii) X(50) Z X(51) , and the new median is (X(50) + Z)/2, or (iii) Z > X(51) and the new median is (X(50) + X(51) )/2. In any case, the new median must fall between X(49) and X(51) . When a new observation can only change the value of a statistic by a nite amount, the statistic is said to be resistant. On the other hand, the mean of X1 , . . . , X100 is X = (X1 + + X100 )/100, and if we observe one additional value Z then the mean of the new data set is 100X/101 + Z/101. Therefore depending on the value of Z, the new mean can be any number. Thus the sample mean is not resistant. The standard deviation is not resistant. A resistant estimate of scale is the interquartile range (IQR), which is dened to be Q(3/4) Q(1/4). It is estimated by the sample IQR, Q(3/4) Q(1/4). 18 Comparing two distributions graphically One way to graphically compare two distributions is to plot their CDFs on a common set of axes. Two key features to look for are The right/left position of the CDF (positions further to the right indicate greater location values). The steepness (slope) of the CDF. A steep CDF (one that moves from 0 to 1 very quickly) suggests a less variable distribution compared to a CDF that moves from 0 to 1 more gradually. Location and scale characteristics can also be seen in the quantile function. The vertical position of the quantile function (higher positions indicate greater location values). The steepness (slope) of the quantile function. A steep quantile function suggests a more variable distribution compared to a quantile function that is less steep. The following four gures show ECDFs and empirical quantile functions for the average daily maximum temperature over certain months in 2002. Note that January is (of course) much colder than July, and (less obviously) January is more variable than July. Also, the distributions in April and November are very similar (April is a bit colder). Can you explain why January is more variable than July? 1 0.9 0.8 0.7 January July P(X <= t) 0.6 0.5 0.4 0.3 0.2 0.1 0 t The CDFs for January and July (average daily maximum temperature). 10 20 30 40 50 60 70 80 90 100 110 19 110 100 90 80 70 60 50 40 30 20 10 January July t: P(X <= t) = p p The quantile functions for January and July (average daily maximum temperature). 1 0.9 0.8 0.7 April October 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P(X <= t) 0.6 0.5 0.4 0.3 0.2 0.1 0 t The CDFs for April and October (average daily maximum temperature). 20 30 40 50 60 70 80 90 100 20 100 April October 90 80 t: P(X <= t) = p 70 60 50 40 30 20 p The quantile functions for April and October (average daily maximum temperature). Comparisons of two distributions can also be made using histograms. Since the histograms must be plotted on separate axes, the comparisons are not as visually clear. 220 200 180 160 300 350 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 250 Frequency Frequency 0 20 40 60 80 100 120 140 120 100 80 60 40 20 0 200 150 100 50 0 Temperature 0 20 40 Temperature 60 80 100 120 Histograms for January and July (average daily maximum temperature). 21 250 250 200 200 Frequency 150 Frequency 0 20 40 60 80 100 120 150 100 100 50 50 0 0 Temperature 0 20 40 Temperature 60 80 100 120 Histograms for April and October (average daily maximum temperature). The standard graphical method for comparing two distributions is a quantile-quantile (QQ) plot. Suppose that QX (p) is the empirical quantile function for X1 , . . . , Xm and QY (p) is the empirical quantile function for Y1 , . . . , Yn . If we make a scatterplot of the points (QX (p), QY (p)) in the plane for every 0 < p < 1 we get something that looks like the following: 100 80 July quantiles 60 40 20 20 40 January quantiles 60 80 100 QQ plot of average daily maximum temperature (July vs. January). 22 The key feature in the plot is that every quantile in July is greater than the corresponding quantile in January. More subtly, since the slope of the points is generally shallower than 45 , we infer that January temperatures are more variable than July temperatures (if the slope were much greater than 45 then we would infer that July temperatures are more variable than January temperatures). If we take it as obvious that it is warmer in July than January, we may wish to modify the QQ plot to make it easier to make other comparisons. We may median center the data (subtract the median January temperature from every January temperature and similarly with the July temperatures) to remove location dierences. In the median centered QQ plot, it is very clear that January temperatures are more variable throughout most of the range, although at the low end of the scale there are some points that do not follow this trend. 40 July quantiles (median centered) 30 20 10 0 -10 -20 -30 -40 -40 -30 January quantiles (median centered) -20 -10 0 10 20 30 40 QQ plot of median centered average daily maximum temperature (July vs. January). 23 A QQ plot can be used to compare the empirical quantiles of a sample X1 , . . . , Xn to the quantiles of a distribution such as the standard normal distribution. Such a plot is called a normal probability plot. The main application of a normal probability plot is to assess whether the tails of the data are thicker, thinner, or comparable to the tails of a normal distribution. The tail thickness determines how likely we are to observe extreme values. A thick right tail indicates an increased likelihood of observing extremely large values (relative to a normal distribution). A thin right tail indicates a decreased likelihood of observing extremely large values. The left tail has the same interpretation, but replace extremely large with extremely small (where extremely small means far in the direction of ). To assess tail thickness/thinness from a normal probability plot, it is important to note whether the data quantiles are on the X or Y axis. Assuming that the data quantiles are on the Y axis: A thick right tail falls above the 45 diagonal, a thin right tail falls below the 45 diagonal. A thick left tail falls below the 45 diagonal, a thin left tail falls above the 45 diagonal. If the data quantiles are on the X axis, the opposite holds (thick right tails fall below the 45 , etc.). Suppose we would like to assess whether the January or July maximum temperatures are normally distributed. To accomplish this, perform the following steps. First we standardize the temperature data, meaning that for each of the two months, we compute the sample mean and the sample standard deviation , then transform each value using Z (Z )/ . Once this has been done, then the transformed values for each month will have sample mean 0 and sample standard deviation 1, and hence can be compared to a standard normal distribution. Next we construct a plot of the temperature quantiles (for standardized data) against the corresponding population quantiles of the standard normal distribution. The simplest way to proceed is to plot Z(k) (where Z1 , Z2 , . . . are the standardized temperature data) against Q(k/n), where Q is the standard normal quantile function. 24 4 4 3 3 Standardized January quantiles 2 Standardized July quantiles -4 -3 -2 -1 0 1 2 3 4 2 1 1 0 0 -1 -1 -2 -2 -3 -3 -4 -4 Standard normal quantiles -4 -3 -2 Standard normal quantiles -1 0 1 2 3 4 QQ plot of standardized average daily maximum temperature in January (left) and July (right) against standard normal quantiles. In both cases, the tails for the data are roughly comparable to normal tails. For January both tails are slightly thinner than normal, and the left tail for July is slightly thicker than normal. The atypical points for July turn out to correspond to a few stations at very high elevations that are unusually cold in summer, e.g. Mount Washington and a few stations in the Rockies. Normal probability plots can also be used to detect skew. The following two gures show the general pattern for the normal probability plot for left skewed and for right skewed distributions. The key to understanding these gures is to consider the extreme (largest and smallest) quantiles. In a right skewed distribution, the largest quantiles will be much larger compared to the corresponding normal quantiles. In a left skewed distribution, the smallest quantiles will be much smaller compared to the corresponding normal quantiles. Be sure to remember that small means closer to , not closer to 0. 25 4 4 3 2 2 Normal quantiles Normal quantiles -3 -2 -1 0 1 2 3 4 5 6 7 1 0 Note that the data quantiles are on the X axis (the reverse of the preceeding normal probability plots). It is important that you be able to read these plots both ways. 0 -2 -1 -4 -2 -3 -6 -6 -4 -2 0 2 4 Quantiles of a right skewed distribution Quantiles of a left skewed distribution 26 Sampling distributions of statistics A statistic is any function of a random variable (i.e. a function of data). For example, the sample mean, sample median, sample standard deviation, and sample IQR are all statistics. Since a statistic is formed from data, which is random, a statistic itself is random. Hence a statistic is a random variable, and it has a distribution. The variation in this distribution is referred to as sampling variation. The distribution of a statistic is determined by the distribution of the data used to form the statistic. However there is no simple procedure that can be used to determine the distribution of a statistic from the distribution of the data. Suppose that X is the average of a SRS X1 , . . . , Xn . The mean and standard deviation of X are related to the mean and standard deviation of Xi as follows. The mean of X is is /n. and the standard deviation of X Many simple statistics are formed from a SRS, for example the sample mean, median, standard deviation, and IQR. For such statistics, the key characteristic is that the sampling variation becomes smaller as the sample size increases. The following gures show examples of this phenomenon. 3500 3000 3000 3000 2500 2500 2500 2000 2000 1500 1500 1000 1000 1000 1500 2000 500 500 500 0 -1 -0.5 0 0.5 1 0 -1 -0.5 0 0.5 1 0 -1 -0.5 0 0.5 Sampling variation of the sample mean for standard normal SRSs of size 20, 50, and 500. 3000 3000 3500 2500 2500 3000 2500 2000 2000 2000 1500 1500 1500 1000 1000 1000 500 500 500 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 0 0.2 0.4 0.6 0.8 1 1.2 Sampling variation of the sample standard deviation for standard normal SRSs of size 20, 50, and 500. 27 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 n=20 n=50 n=500 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 ECDFs showing the sampling variation in the sample median for standard normal SRSs of size 20, 50, and 500. 4 3.5 IQR for sample size 100 3 2.5 2 1.5 1 0.5 0 0 0.5 1 IQR for sample size 20 1.5 2 2.5 3 3.5 4 QQ plot the showing the sampling variation in the sample IQR for standard normal SRSs of size 20 (x axis) and 100 (y axis). The true value is 1.349. 28 In the case of the sample mean, we can directly state how the variation decreases a as is / n, function of the sample size: for an SRS of size n, the standard deviation of X where is the standard deviation of one observation. The sample size must increase by a factor of 4 to cut the standard deviation in half. Doubling the sample size only reduces by around 30%. For other statistics such as the sample median or sample standard deviation, the variation declines with sample size. But it is not easy to give a formula for the standard deviation in terms of sample size. For most statistics, it is approximately true that increasing sample size by a factor the of F scales the sample standard deviation by a factor of 1/ F . 29 Hypothesis testing In most practical data analysis it is possible to carry out inferences (from sample to population) based on graphical techniques (e.g. using the empirical CDF and quantile functions and the histogram). This type of inference may be considered informal, since it doesnt involve making quantitative statements about the likelihood that certain characteristics of the population hold. In certain cases it is important to make quantitative statements about the degree of uncertainty in an inference. This requires a formal and quantitative approach to inference. In the standard setup we are considering hypotheses, which are statements about a population. For example, the statement that the mean of a population is positive is a hypothesis. More concretely, we may be comparing incomes of workers with a BA degree to incomes of workers with an MA degree, and our hypothesis may be that the mean MA income minus the mean BA income is positive. Note that hypotheses are always statements about populations, not samples, so the means above are population means. Generally we are comparing two hypotheses, which are conventionally referred to as the null hypothesis and the alternative hypothesis. If the data are inconclusive or strongly support the null hypothesis, then we decide in favor of the null hypothesis. Only if the data strongly favor the alternative hypothesis do we decide in favor of the alternative hypothesis over the null. Example: If hypothesis A represents a conventional wisdom that somebody is trying to overturn by proposing hypothesis B, then A should be the null hypothesis and B should be the alternative hypothesis. Thus, if somebody is claiming that cigarette smoking is not associated with lung cancer, the null hypothesis would be that cigarette smoking is associated with lung cancer, and the alternative would be that it is not. Then once the data are collected and analyzed, if the results are inconclusive, we would stick with the standard view that smoking and lung cancer are related. Note that the conventional wisdom may change over time. One-hundred years ago smoking was not widely regarded as dangerous, so the null and alternative may well have been switched back then. 30 Example: If the consequences of mistakenly accepting hypothesis A are more severe than the consequences of mistakenly accepting hypothesis B, then B should be the null hypothesis and A should be the alternative. For example, suppose that somebody is proposing that a certain drug prevents baldness, but it is suspected that the drug may be very toxic. If we adopt the use of the drug and it turns out to be toxic, people may die. On the other hand if we do not adopt the use of the drug and it turns out to be eective and non-toxic, some people will needlessly become bald. The consequence of the rst error is far more severe than the consequence of the second error. Therefore we take as the null hypothesis that the drug is toxic, and as the alternative we take the hypothesis that the drug is non-toxic and eective. Note that if the drug were intended to treat late stage cancer, the designation would not be as clear because the risks of not treating the disease are as severe as the risk of a toxic reaction (both are likely to be fatal). Example: If hypothesis A is a much simpler explanation for a phenomenon than hypothesis B, we should take hypothesis A as the null hypothesis and hypothesis B as the alternative hypothesis. This is called the principle of parsimony, or Occams razor. Stated another way, if we have no reason to favor one hypothesis over another, the simplest explanation is preferred. Note that there is no general theoretical justication for this principal, and it does sometimes happen that the simplest possible explanation turns out to be incorrect. Next we need to consider the level of evidence in the data for each of the two hypotheses. The standard method is to use a test statistic T (X1 , . . . , Xn ) such that extreme values T of indicate evidence for the alternative hypothesis, and non-extreme values of T indicate evidence for the null hypothesis. Extreme may mean closer to + (a right-tailed test), or closer to (a left-tailed test), or closer to one of , depending on the context. The rst two cases are called one-sided tests, while the nal case is called a two-sided test. The particular denition of extreme for a given problem is called the rejection region. Example: Suppose we are investigating a coin, and the null hypothesis is that the coin is fair (equally likely to land heads or tails) while the alternative is that the coin is unfairly biased in favor of heads. If we observe data X1 , . . . , Xn where each Xi is H or T , then the test statistic T (X1 , . . . , Xn ) may be the number of heads, and the rejection region would be large values of T (since the maximum value of T is n, we might also say T close to n). On the other hand, if the alternative hypothesis was that the coin is unfairly biased in favor of tails, the rejection region would be small values of T (since the minimum value of T is zero, we might also say T close to zero). Finally, if if the alternative hypothesis was that the coin is unfairly biased in any way, the rejection region would be large or small values of T (T close to 0 or n). 31 Example: Suppose we are investigating the eect of eating fast food on body shape. We choose to focus on the body mass index X = weight/height2 , which we observe for people X1 , . . . , Xm who never eat fast food and people Y1 , . . . , Yn who eat fast food three or more times per week. Our null hypothesis is that the two populations have the same mean BMI, and the alternative hypothesis is that people who eat fast food have a higher mean BMI. We shall see that a reasonable test statistic is T = (Y X)/ X /m + Y /n 2 2 where X and Y are the sample standard deviations for the Xi and the Yi respectively). The rejection region will be large values of T. In making a decision in favor of the null or alternative hypothesis, two errors are possible: A type I error, or false positive occurs when we decide in favor of the alternative hypothesis when the null hypothesis is true. A type II error, or false negative occurs when we decide in favor of the null hypothesis when the alternative hypothesis is true. According to the way that the null and alternative hypotheses are designated, a false positive is a more undesirable outcome than a false negative. Once we have a test statistic T and a rejection region, we would like to quantify the amount of evidence in favor of the alternative hypothesis. The standard method is to compute the probability of observing a value of T as extreme or more extreme than the observed value of T , assuming that the null hypothesis is true. This number is called the p-value. It is the probability of type I error, or the probability of making a false positive decision, if we decide in favor of the alternative based on our data. For a right-tailed test, the p-value is P (T Tobs ), where Tobs denotes the test statistic value computed from the observed data, and T denotes a test statistic value generated by the null distribution. Equivalently, the right-tailed p-value is 1 F (Tobs ), where F is the CDF of T under the null hypothesis. For a left-tailed test, the p-value is P (T Tobs ), or equivalently F (Tobs ). For a two sided test we must locate the most typical value of T under the null hypothesis and then consider extreme values centered around this point. Suppose that T is the expected value of the test statistic under the null hypothesis. Then the p-value is P (|T T | > |Tobs T |) which can also be written P (T < T |Tobs T |) + P (T > T + |Tobs T |). 32 Example: Suppose we observe 28 heads and 12 tails in 40 ips of a coin. Our observed test statistic value is Tobs = 28. You may recall that under the null hypothesis (P (H) = P (T ) = 1/2) the probability of observing exactly k heads out of 40 ips is 40 /240 (where k = n!/(n k)!k!). Therefore the probability of observing a test statistic value of 28 or larger under the null hypothesis (i.e. the p-value) is P (T = 28) + P (T = 29) + + P (T = 40) which equals 40 40 40 /240 + /240 + + /240 . 28 29 40 This value can be calculated on a computer. It is approximately .008, indicating that it is very unlikely to observe 28 or more heads in 40 ips of a fair coin. Thus the data suggests that the coin is not fair, and in particular it is biased in favor of heads. Put another way, if we decide in favor of the alternative hypothesis, there is < 1% chance that we are committing a type I error. An alternative approach to calculating this p-value is to use a normal approximation. Under the null distribution, T has mean n/2 and standard deviation n/2 (recall the standard deviation formula for the binomial distribution is = np(1 p) and substitute p = 1/2). Thus the standardized test statistic is Tobs = 2(Tobs n/2)/ n, which is 2.53 in this case. Since Tobs has mean 0 and standard deviation 1 we may approximate its distribution with a standard normal distribution. Thus the p-value can be approximated as the probability that a standard normal value exceeds 2.53. From a table of the standard normal distribution, this is seen to be approximately .006, which is close to the true value of (approximately) .008 and can be calculated without the use of a computer. Example: Again suppose we observe 28 heads out of 40 ips, but now we are considering the two-sided test. Under the null hypothesis, the expected value of T is T = n/2 = 20. Therefore the p-value is P (|T 20| |Tobs 20|), or P (|T 20| 8). To compute the p-value exactly using the binomial distribution we calculate the sum P (T = 0) + + P (T = 12) + P (T = 28) + + P (T = 40) which is equal to 40 40 40 40 /240 + + /240 + /240 + + /240 . 0 12 28 40 n k 33 To approximate the p-value using the standard normal distribution, standardize the boundary points of the rejection region (12 and 28) just as Tobs was standardized above. This yields 2.53. From a normal probability table, P (Z > 2.53) = P (Z < 2.53) 0.006, so the p-value is approximately 0.012. Under the normal approximation, the two-sided p-value will always be twice the on-sided p-value. However for the exact p-values this may not be true. Example: Suppose we observe BMIs Y1 , . . . , Y30 such that the sample mean and standard deviation are Y = 26 and Y = 4 and another group of BMIs X1 , . . . , X20 with X = 24 and X = 3. The test statistic (formula given above) has value 2.02. Under the null hypothesis, this statistic approximately has a standard normal distribution. The probability of observing a value greater than 2.02 (for a right-tailed test) is .022. This is the p-value. 34 Planning an experiment or study When conducting a study, it is important to use a sample size that is large enough to provide a good chance reaching the correct conclusion. Increasing the sample size always increases the chances of reaching the right conclusion. However every sample costs time and money to collect, so it is desirable to avoid making an unnecessarily large number of observations. It is common to use a p-value cuto of .01 or .05 to indicate strong evidence for the alternative hypothesis. Most people feel comfortable concluding in favor of the alternative hypothesis if such a p-value is found. Thus in planning, one would like to have a reasonable chance of obtaining such a p-value if the alternative is in fact true. On the other hand, consider yourself lucky if you observe a large p-value when the null is true, because you can cut your losses and move on to a new investigation. In many cases, the null hypothesis is known exactly but the precise formulation of the alternative is harder to specify. For instance, I may suspect that somebody is using a coin that is biased in favor of heads. If p is the probability of the coin landing heads, it is clear that the null hypothesis should be p = 1/2. However it is not clear what value of p should be specied for the alternative, beyond that p should be greater than 1/2. The alternative value of p may be left unspecied, or we may consider a range of possible values. The dierence between a possible alternative value of p and the null value of p is the eect size. If the alternative hypothesis is true, it is easier to get a small p-value when the eect size is large, i.e. for a situation in which the alternative hypothesis is far from the null hypothesis. This is illustrated by the following examples. Suppose your null hypothesis is that a coin is fair, and the alternative is p > 1/2. An eect size of 0.01 is equivalent to an alternative heads probability of 0.51. For reasonable sample sizes, data generated from the null and alternative hypotheses look very similar (e.g., under the null the probability of observing 10/20 heads is 0.17620 while under the alternative the same probability is 0.17549). Now suppose your null hypothesis is that a coin is fair, the alternative hypothesis is p > 1/2, and the eect size is 0.4, meaning that the alternative heads probability is 0.9. In this case, for a sample size of 20, data generated under the alternative looks very dierent from data generated under the null (the probability of getting exactly 10/20 heads under the alternative is around 1 in 500,000). If the eect size is small, a large sample size is required to distinguish a data set generated by the null from a data set generated by the alternative. Consider the following two examples: 35 Suppose the null hypothesis is p = 1/2 and the eect size is 0.01. If the sample size is one million and the null hypotehsis is true, with probability greater than 0.99 fewer than 501, 500 heads will be observed. If the alternative is true, with probability greater than 0.99 more than 508, 500 heads will be observed. Thus you are almost certain to identify the correct hypothesis based on such a large sample size. On the other hand, if the eect size is 0.4 (i.e. p = 0.5 vs. p = 0.9), under the null chances are greater than 97% that 14 or fewer heads will be observed in 20 ips. Under the alternative chances are greater than 98% that 15 or more heads will be observed in 20 ips. So only 20 observations are sucient to have a very high chance of making the right decision in this case. To rationalize the trade-o between sample size and accuracy in hypothesis testing, it is common to calculate the power for various combinations of sample size and eect size. The power is the probability of observing a given level of evidence for the alternative when the alternative is true. Concretely, we may say that the power is the probability of observing a p-value smaller than .05 or .01 if the alternative is true. Usually the eect size is not known. However there are practical guidelines for establishing an eect size. Generally a very small eect is considered unimportant. For example, if patients treated under a new therapy survive less than one week longer on average compared to the old therapy, it may not be worth going to the trouble and expense of switching. Thus for purposes of planning an experiment, the eect size is usually taken to be the smallest dierence that would lead to a change in practice. Once the eect size is xed, the power can be calculated for a range of plausible sample sizes. Then power can be plotted against sample size. A plot of power against sample size always should have an increasing trend. However for technical reasons, the curve may sometimes drop slightly before resuming its climb. Example: For the one-sided coin ipping problem, suppose we would like to produce a pvalue < .05 (when the alternative is true) for an eect size of .1, but we are willing to accept eect sizes as large as .3. The following gure shows power vs. sample size curves for eect sizes .1, .2, and .3. 36 1.1 1 0.9 0.8 0.7 Power 0.6 0.5 0.4 0.3 0.2 0.1 0 Effect size=.05 Effect size=.1 Effect size=.3 0 50 100 150 200 Sample size 250 300 350 400 450 500 Power of obtaining p-value .05 vs. sample size for one-sided binomial test. 1 0.9 0.8 0.7 0.6 Power 0.5 0.4 0.3 0.2 0.1 0 Effect size=.05 Effect size=.1 Effect size=.3 0 50 100 150 200 Sample size 250 300 350 400 450 500 Power of obtaining p-value .01 vs. sample size for one-sided binomial test. Example: For the two-sided coin ipping problem, all p-values are twice correpsonding value in the one-sided problem. Thus it takes a larger sample size to achieve the same power. 37 1 0.9 0.8 0.7 0.6 Power 0.5 0.4 0.3 0.2 0.1 0 Effect size=.05 Effect size=.1 Effect size=.3 0 50 100 150 200 Sample size 250 300 350 400 450 500 Power of obtaining p-value .05 vs. sample size for two-sided binomial test. 1 0.9 0.8 0.7 0.6 Power 0.5 0.4 0.3 0.2 0.1 0 Effect size=.05 Effect size=.1 Effect size=.3 0 50 100 150 200 Sample size 250 300 350 400 450 500 Power of obtaining p-value .01 vs. sample size for two-sided binomial test. 38 Example: Recall the BMI hypothesis testing problem from above. The test statistic was T = (Y X)/ X /m + Y /n. 2 2 In order to calculate the p-value for a given value of Tobs , we need to know the distribution of T under the null hypothesis. This can be done exactly, but for now we will accept as an approximation that X and Y are exactly equal to the population values X and Y . With this assumption, the expected value of T under the null hypothesis is 0, and its variance is 1. Thus we will use the standard normal distribution as an approximation for the distribution of T under the null hypothesis. It follows that for the right-tailed test, T must exceed Q(0.95) 1.64 to obtain a p-value less than 0.05, where Q is the standard normal quantile function. Suppose that the Y (fast food eating) sample size is always 1/3 greater than the X (non fast food eating) sample size, so n = 4m/3. If the eect size is c (so Y X = c), the test statistic can be written T = c/ + T , T = (Y X c)/ where = X /m + 3Y /(4m) is the denominator of the test statistic. 2 2 Under the alternative hypothesis, T has mean 0 and standard deviation 1, so we will aprpoximate its distribution with a standard normal distribution. Thus the power is P (T > Q(.95)) = P (T > Q(.95)c/ ), where probabilities are calculated under the alternative hypothesis. This is equal to 1 F (Q(.95) c/ ) (where F is the standard normal CDF). Note that this is a function of both c and m. 39 1 0.9 0.8 0.7 Power 0.6 0.5 0.4 0.3 0.2 0.1 Effect size=1 Effect size=2 Effect size=3 0 20 40 60 80 Sample size 100 120 140 160 180 200 Power of obtaining p-value .05 vs. sample size for one sided Z-test. 1 0.9 0.8 0.7 0.6 Power 0.5 0.4 0.3 0.2 0.1 0 Effect size=1 Effect size=2 Effect size=3 0 20 40 60 80 Sample size 100 120 140 160 180 200 Power of obtaining p-value .01 vs. sample size for one sided Z-test. 40 t-tests and Z-tests Previously we assumed that the estimated standard deviations X and Y were exactly equal to the population values X and Y . This allowed us to use the standard normal distribution to approximate p-values for the two sample Z test statistic: (Y X)/ X /m + Y /n. 2 2 The idea behind using the standard normal distribution here is: 2 2 The variance of X is X /m and the variance of Y is Y /n. X and Y are independent, so the variance of Y X is the sum of the variance of and the variance of X. Y 2 2 Hence Y X has variance X /m + Y /n. Under the null hypothesis, Y X has mean zero. Thus (Y X)/ X /m + Y /n. 2 2 is approximately the standardization of Y X. In truth, X /m + Y /n 2 2 and 2 2 X /m + Y /n dier somewhat, as the former is a random variable while the latter is a constant. Therefore, p-values calculated assuming that the Z-statistic is normal are slightly inaccurate. To get exact p-values, the following two sample t-test statistic can be used: T= 2 where Sp is the pooled variance estimate: 2 Sp = (m 1)X + (n 1)Y /(m + n 2) 2 2 mn Y X m+n Sp The distribution of T under the null hypothesis is called tm+n2 , or a t distribution with m + n 2 degrees of freedom. p-values under a t distribution can be looked up in a table. 41 Example: Suppose we observe the following: X1 , . . . , X10 , X = 1, X = 3 Y1 , . . . , Y8 , Y = 3, Y = 2 The Z test statistic is (1 3)/ 9/10 + 1/2 1.6, with a one-sided p-value of 0.05. 2 The pooled variance is Sp = (9 9 + 7 4)/(10 + 8 2) 6.8 so Sp 2.6. The two-sample t-test statistic is 80/18(1 3)/2.6 1.62, with 10 + 8 2 = 16 df. The one-sided p-value is 0.06. The two sample Z or t-test is used to compare two samples from two populations, with the goal of inferring whether the two populations have the same mean. A related problem is to consider a sample from a single population, with the goal of inferring whether the population mean is equal to a xed value, usually zero. Suppose we only have one sample X1 , . . . , Xn and we compute the sample mean X and sample standard deviation . Then we can use T= n(X )/ as a test statistic for the null hypothesis = (where is the population mean of the Xi ). Under the null hypothesis, T follows a t-distribution with n 1 degrees of freedom. Most often the null hypothesis is = 0. For example, suppose we wish to test the null hypothesis = 0 against an alternative > 0. The test statistic is T= nX/ . Under the null hypothesis T has a tn1 distribution, which can be used to calculate p-values exactly. For example, if X = 6, n = 11, and = 10, then Tobs = 11 3/5 2 has a t10 distribution, which gives a p-value of around .04. If we use the same test statistic as above, but assume that = , then we can use the normal approximation to get an approximate p-value. For the example above, the Z statistic p-value is .02 which gives an overly strong assessment of the evidence for the alternative compared to the exact p-value computed under the t distribution. If we were to use the two sided alternative = 0, then the p-value would be .07 under the t10 distribution and .05 under the standard normal distribution. 42 For small degrees of freedom, the t distribution is substantially more variable than the standard normal distribution. Therefore under a t-distribution the p-values will be somewhat larger (suggesting less evidence for the alternative). If the sample size is larger than 50 or so, the two distributions are so close that they can be used interchangeably. 3 t distribution standard normal distribution 2.8 2.6 .95 quantile 2.4 2.2 2 1.8 1.6 0 10 20 30 40 Sample size 50 60 70 80 90 100 .95 quantile for the t-distribution as a function of sample size, and the .95 quantile for the standard normal distribution. 43 4 df=5 df=15 3 2 1 t quantile 0 -1 -2 -3 -4 -4 -3 -2 -1 Normal quantile 0 1 2 3 4 QQ plot comparing the quantiles of a standard normal distribution (x axis) to the quantiles of the t-distribution with two dierent degrees of freedom. A special case of the one-sample test is the paired two-sample test. Suppose we make observations X1 , Y1 on subject 1, X2 , Y2 on subject 2, etc. For example, the observations might be before and after measurements of the same quantity (e.g. tumor size before and after treatment with a drug). Let Di = Yi Xi be the change for subject i. Now suppose we wish to test whether the before and after measurements for each subject have the same mean. To accomplish this we can do a one-sample Z-test or t-test on the Di . If the data are paired, it is much better to do a paired test, rather than to ignore the pairing and do an unpaired two-sample test. We will see why this is so later. Example: Suppose we observe the following paired data: X 5 2 9 YD 4 1 7 1 1 2 XY 7 6 3 3 5 1 D 2 1 2 D = 1.5 and D = 0.3, so the paired test statistic is 6 1.5/ 0.3 16, which is highly signicant. X = 16/3, X 2.6, Y = 21/6, Y 2.3, so the unpaired two-sample Z test statistic is 2 /6 + 2.32 /6 0.9 which is not signicant. (16/3 21/6)/ 2.6 44 The one and two sample t-statistics only have a t-distribution when the underlying data have a normal distribution. Moreover, for the two sample test the population standard deviations X and Y must be equal. If the sample size is large, then p-values computed from the standard normal or t-distributions will not be too far from the true values even if the underlying data are not normal, or if X and Y dier. 45 Summary of One and Two Sample Tests Test statistic m X/X m D/D (Y X)/XY m X/X m D/D mn Y SpX m+n Reference dis...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Michigan - PHYSICS - 420
Geophysics 420 Outline 11 Amplitude of seismic waves The amplitude of seismic waves changes due to a variety of factors. The energy in seismic waves decays due to geometric spreading (as 1/r 2 for body waves and 1/r for surface waves) and changes due
Michigan - PHYSICS - 420
Problem set 1. Due Friday 9/16/051. Measure the absolute value of the acceleration of gravity. Treat this as a standard physics experiment by providing a complete description of goal, experimental setup, measurements, interpretation (including erro
Michigan - PHYSICS - 420
GS420 Geophysics - Outline 3Moment of inertia The denition of the moment of inertia is I= r 2 dm = V r 2 dV M2 3(1)where r is the distance of the innitesimal element dm or dV to the rotation axis. For the long thin rod used previously we nd t
Michigan - PHYSICS - 420
Problem set 4. Due October 14 1. Assume that a mountain of 4 km high exists in isostatic equilibrium with normal continental crust of 30 km thick. The crustal density is 2.8 and that the mantle density is 3.3. a) Calculate the thickness of the crusta
Michigan - PHYSICS - 438
33rd Telecommunications Policy Research Conference, Sept. 2005How Americas Fragmented Approach to Public Safety Wastes Money and Spectrum Jon M. Peha1Carnegie Mellon UniversityAbstractEmergency responders such as firefighters, police, and parame
Michigan - PHYSICS - 441
Internet Governance: Theory and First Principles Johannes M. Bauer* Michigan State University Preliminary draft, August 31, 2005 For purposes of discussion only1. Introduction For many years, the Internet was regarded as a space that should not and
Michigan - PHYSICS - 442
End of Life Care, Euthanasia, and the IncompetentReaction Statements 3 and 4 Number3 was graded out of 10 points instead of 12 4 will be 12 points Assignment Crucialto practiceOne clear sentence (bold or underline) stating each actual arg
Michigan - PHYSICS - 442
What is the nature of mind, perception and reality? (of key importance to psychology) Course explores the connections between perception to science and reality Non-conventional approach to studying the psychology of mind and the nature of human perce
Michigan - PHYSICS - 442
CSM MiniDAQ Quickstart GuideUniversity of MichiganAugust 29, 2001 J. GregoryContentsSection 1. Introduction to the CSM MiniDAQ 2. Starting the CSM DAQ . . 3. JTAG Programming . . 3.1 Connecting the hardware . 3.2 Enabling the JTAG serial connec
Michigan - PHYSICS - 450
0.0614429935813 0.0278114564717 -0.00472683506086 -0.0110163791105 -0.00491652451456 -0.00366433849558 -0.00147688400466 0.00468915700912 0.000263888388872 -0.00637193256989 -0.00784652028233 -0.00961081311107 0.00104263715912 -0.00379575230181 -0.00
Michigan - PHYSICS - 450
3.12162214651e-22 2.92811286484e-22 2.83734491142e-22 2.85489015733e-22 2.8572262546e-22 2.81786496561e-22 2.83088386118e-22 2.86405450873e-22 2.87389017004e-22 2.82458941928e-22 2.80534231732e-22 2.80920150607e-22 2.81815308086e-22 2.812303527e-22 2
Michigan - PHYSICS - 460
Data Flow Simulations through the ATLAS Muon Front-End ElectronicsJ. Wehrley Chapman, University of Michigan (email: umjwc@umich.edu) AbstractA VerilogHDL simulation of the data flow along the readout chain of the ATLAS MDT front-end is presented.
Michigan - PHYSICS - 463
EXPLORATORY RESEARCH ON THE USE OF ACTIVITY CAPTURE TECHNOLOGY IN THE ARCHIVING AND DISSEMINATION OF DISCIPLINE SPECIFIC LECTURES AND ADVANCED TRAINING MATERIALSIntroductionWe request herein a sum of xxx to fund a specific set of exploratory studi
Michigan - PHYSICS - 489
ATLASCSM-0/MiniDaq Hardware Notebook Firewire Setup CSM cardPC/VME Setup MultiplexerJanuary 26, 2000 - J. Wehrley Chapman
Michigan - PHYSICS - 497
AM - MC Public Sector SpectrumDraft September 8, 2005.Getting the best out of public sector spectrum Adele Morris, U.S. Department of the Treasury1 Martin Cave, Warwick Business School, UK Abstract The paper addresses the general problem associat
Michigan - PHYSICS - 497
M I C H I G A N AT L A S M O N I T O R E D D R I FT CHAMBER PRODUCTION DATA B A S EFebruary 4, 2000Homer A. Neal, Shawn McKee and Chunhui Han Department of Physics University of Michigan Ann Arbor, Michigan 48109The University of Michigan ATL AS
Michigan - PHYSICS - 499
'aTRANSFORMATIONScomparative study of social transfomtionsCSST WORKING PAPERSThe University of Michigan Ann Arbor&quot;Reclaiming the Epistemological 'Other': Narrative and the Social Constitution of Identity&quot; Margaret R. Somers and Gloria D. Gi
Michigan - PHYSICS - 499
M I C H I G A N AT L A S MONITORED DRIFT C H A M B E R P RO D U C T I O N DATA BA S EFEBRUARY 4, 2000HOMER A. NEAL, SHAWN MCKEE AND CHUNHUI HAN DEPARTMENT OF PHYSICS UNIVERSITY OF MICHIGAN ANN ARBOR, MICHIGAN 48109THE UNIVERSITY OF MICHIGAN ATLA
Michigan - PHYSICS - 508
February 8, 2002 METAMORPHIC PETROLOGY 508 Lecture 12. Metamafic Rocks II next lecture Monday: continue metagranitic rocks: Chap. 9, Spear, esp. p. 304-327 experimental studies of greenschist-amphibolite-granulite transitions for real rock compositio
Michigan - PHYSICS - 508
April 15, 2002 GS508. METAMORPHIC PETROLOGY Lecture 30: &quot;Ultra-Ultra&quot; High Pressure Metamorphism (UUHPM) Wednesday lecture: fluid flow during metamorphism readings: Spear, Chap. 19, 673-710 UUHPM arbitrarily defined as rocks that attained stishovite
Michigan - PHYSICS - 516
Reference books for medical imaging The first 3 of these should be on reserve at Engineering Library for EECS 516 @b prince:05 Prentice-Hall . 2005 Jerry L Prince Jonathan M Links Medical imaging signals and systems @an ISBN: 0130653535 @b macovski:8
Michigan - PHYSICS - 516
Lecture-by-lecture list of topics EECS 516 Medical Imaging Systems, F07 'X' means a topic covered in a previous year's lecture but not this year! (such topics are usually still in the lecture notes and are recommended reading) 1 (1) Introduction Over
Michigan - PHYSICS - 516
EECS 516 Fall 2007Medical Imaging Systems 2233 GGBL, Tue Thur 3:30-5:00 PMInstructor: Professor Jeff Fessler Email: fessler AT umich DOT edu Office: 3401 EECS Phone: 763-1434 Office Hours: TBA (see web site or office door) Web: http:/www.eecs.umi
Michigan - PHYSICS - 516
EECS 516 Syllabus (Tentative!) Subject Source Material (PL=Prince &amp; Links) (+ lecture notes for most topics) PL 1 1 2 LecturesIntroductionReview PL 2, JF 1 Linear Systems Probability / Random Processes (?) Ultrasound PL 10,11 6 Basic 2-D Image Pr
Michigan - PHYSICS - 520
Problem Set 4 Physics 520Fall 2005 L. Sander 10/13/05 due 10/20/051. Consider a linear chain of atoms of alternating mass interacting via the Lennard-Jones nearest neighbor potential. a.) Solve for the dispersion relation and exhibit the optical m
Michigan - PHYSICS - 520
Problem Set 2 Physics 520Fall 2005 L. Sander 9/22/05 due 9/29/051. a.) Explicitly construct the reciprocal lattices for simple cubic, body-centered cubic, and fcc lattices. b.) Show that the reciprocal lattice of the reciprocal lattice is the dire
Michigan - PHYSICS - 521
Course AnnouncementPhotonic CrystalsEECS 598, Section 002, Fall 2007 Instructor. Almantas Galvanauskas, almantas@eecs.umich.edu, phone: 615-7166, room ERB I 6102 Meeting schedule. M W 1:30pm 3:00pm, 1121 LBME Course Content As a result of recent
Michigan - PHYSICS - 611
4.1Theorems of Alternatives for Systems of Linear ConstraintsKatta G. Murty, IOE 611 Lecture slides System of constraints is Feasible if it has a feasible solution, i.e., one satisfying all constraints in it. Infeasible if it has no feasible solut
Michigan - PHYSICS - 611
Course Description, 611, Sp/Sum 021University of Michigan School of Social WorkCOURSE TITLE: Theories of Social Change (including Societal, Community and Organizational levels and how individual change is related to change at larger system level
Michigan - PHYSICS - 625
0.0566581301391 0.00281607173383 -0.0295413117856 -0.0350922979414 -0.0505063198507 -0.0333682894707 -0.0284948050976 -0.0294864345342 -0.0260722339153 -0.0204453077167 -0.032401651144 -0.0390029661357 -0.0355508439243 -0.0288251433522 -0.02911676838
Michigan - PHYSICS - 646
Combustion Theory and Modelling Vol. 9, No. 4, November 2005, 617646Characteristic boundary conditions for direct simulations of turbulent counterow ames C. S. YOO, Y. WANG, A. TROUVE and H. G. IM Department of Mechanical Engineering, University
Michigan - PHYSICS - 650
4.48159389958e-22 4.34525959146e-22 4.29428204547e-22 4.35879056221e-22 4.36741892293e-22 4.35434229093e-22 4.28092903009e-22 4.29036338049e-22 4.3306875641e-22 4.31657825444e-22 4.3232907521e-22 4.32502137835e-22 4.32149564374e-22 4.32765523662e-22
Michigan - DENT - 520
Linear algebra a brief reviewStilian A. Stoev Lecture Notes for STAT 520 Fall 200611.1Matrix algebraBasic notions and operationsA triangular array A of scalars (here, real R or complex C numbers) is said to be a matrix. Notation: A = (aij )
Michigan - DENT - 614
1.1Integer Programming and Combinatorial OptimizationKatta G. Murty Lecture slides Integer Programming (IP) deals with LPs with additional constraints that some variables can only have values 0 or 1 integer values or values in some specied disc
Michigan - PIANO - 139
THE ORGANIZATION OF THE AMERICAN CITY IN THE LATE 19TH CENTURY: ETHNIC STRUCTURE AND SPATIAL ARRANGEMENT IN DETROIT*by0livi.er ZunzPaper written for a special issue of the Journal of Urban History, 'Immigrants and Workers in European and Americ
Michigan - PIANO - 340
Econ 340 Winter Term 2009 Study QuestionsAlan Deardorff Comparative Advantage Page 1 of 6Study Questions Lecture 3 Comparative Advantage and the Gains from Trade Part 1: Multiple ChoiceSelect the best answer of those given. 1. According to the t
Michigan - PIANO - 340
Econ 340 Winter Term 2009 Study Questions (with Answers)Alan Deardorff Comparative Advantage Page 1 of 6Study Questions (with Answers) Lecture 3 Comparative Advantage and the Gains from Trade Part 1: Multiple ChoiceSelect the best answer of thos
Michigan - PIANO - 439
THE CHORDAL ANALYSIS OF TONAL MUSICBRYAN PARDO WILLIAM P. BIRMINGHAMELECTRICAL ENGINEERING AND COMPUTER SCIENCE DEPARTMENT THE UNIVERSITY OF MICHIGAN MARCH 28, 2001 TECHNICAL REPORT CSE-TR-439-011KEYWORDSHeuristic Search, Music Analysis, Segme
Michigan - DENTED - 602
ANALYSIS O THE F NATIORAL CRASH SEVERITY STUDY DATAReport Number UM-HSRI-78-63James O'Day P h y l l i s Gimotty Richard Kaplan L i l y Huang Bruce Bertram Kenneth CampbellHighway S a f e t y Research I n s t i t u t e The U n i v e r s i t y of
Michigan - DENTED - 606
Heterosexual Cohabitation in the United States: Motives for Living Together among Young Men and Women*Pamela J. SmockDepartment of Sociology and Population Studies Center Institute for Social Research The University of Michigan Ann Arbor, MI 48106
Michigan - DENTED - 610
SPECIAL SECTIONAddress Entry While Driving: Speech Recognition Versus a Touch-Screen KeyboardOmer Tsimhoni, Daniel Smith, and Paul Green, University of Michigan Transportation Research Institute, Ann Arbor, MichiganA driving simulator experiment
Michigan - DENTED - 610
Chapter 5. 14. a) Let N1 , N2 be the numbers of Xi s such that Xi = 1 and Xi = 2 respectively. Let p = P(Xi = 1) and q = P(Xi = 2). Then, the probability function of (X1 , . . . , Xn ) is f (x1 , . . . , xn ) = (1 p q)nn1 n2 pn1 q n2 = exp{(n n1
Michigan - DENTED - 612
3.1Single Commodity Maximum Flow ProblemKatta G. Murty, IOE 612 Lecture slides 3Simple Transformations1. To get a model with a single source: If many souce nodes with specied availabilities, can introduce Supersource And convert problem into on
Michigan - DENTED - 612
REGRESSION UNDER SHAPE CONSTRAINTS IN A A FULL RANK EXPONENTIAL FAMILYMoulinath Banerjee The University of MichiganAbstractKey words and phrases:1IntroductionFunction estimation is a ubiquitous, and consequently wellstudied problem in nonp
Michigan - PMR - 510
PMR 510 Course Schedule/Topics* Readings are to be read BEFORE lecture session!DATE Sept. 5 Sept. 8 Sept. 12 Sept. 15 Sept. 19 Sept. 22 Sept. 26 Sept. 29 Oct. 3 Oct. 6 Oct. 10 Oct. 13 Oct. 17 Oct. 20 Oct. 24 Oct. 27 Oct. 31 Nov. 3 Nov. 7 Nov. 10 N
Michigan - POLISH - 314
ULWR - HULec T &amp; Th Lab M3 credits11:30AM1PM 7PM9PMPOLISH 314/SAC 441.003 Polish CinemaThe course covers Polish cinema from WWII to the present, tracing the development of lm styles and genres in the context of the historical, political, and
Michigan - POLISH - 422
-Center for Research on Social Organization The Working Paper Series The University of Michigan Ann ArborINTERNATIONAL CONFLICT AND THE INDIVIDUAL Helen WeingartenFaYA Working Paper # 22 CRSO Working Paper #422May 1990International Conflict
Michigan - POLISH - 432
WTRANSFORMATIONScomparative study of s o d transformationsCSST WORKING PAPERSThe University of MichiganAnn ArborCOLLECTIVE VIOLENCE AND COLLECTIVE LOYALTIES I N FRANCE: W Y THE H FRENCH REVOLUTION MADE A DIFFERENCE W i l l i a m H, Sewell, J
Michigan - POLISH - 450
&quot;Feeling History: Reflections On the Western Culture ControversynRenato Rosaldo CSST Working Paper #60 October 1990 CRSO Working Paper #450FEELING HISTORY: REFLECTIONS ON THE WESTERN CULTURE CONTROVERSY Renato Rosaldo Department of .Anthropology S
Michigan - POLISCI - 101
POLITICAL SCIENCE 101 INTRODUCTION TO POLITICAL THEORY: PERENNIAL QUESTIONS AND CLASSIC TEXTS Fall 2008 Professor Arlene W. Saxonhouse 7772 Haven Hall 764-6389 awsaxon@umich.edu Lecture MW 11:00, Aud B Angell Hall Office Hours: Tuesday 3:00-5:00 or b
Michigan - POLISCI - 302
.Movement and Countermovement: Loosely Coupled C o n f l i c tMayer N. Zald U n i v e r s i t y of Michigan B e r t Useem U n i v e r s i t y of I l l i n o i s a t Chicago C i r c l e October 1983..CRSO Working Paper 302Copies a v a i l a b
Michigan - POLISCI - 327
THE WEB OF POWER: Elites, Social Movements, and Structural Change, A Method of-AnalysisJeffrey P. BroadbentOctober, 1985I, IntroductionThe relation between macro-social structure and individual-level action is a central problem of sociology (G
Michigan - POLISCI - 330
Barbara A. Anderson Brian D. Silver Population Redistribution and the Ethnic Balance in Transcaucasia No. 95-330Research Report April 1995Barbara A. Anderson is Professor of Sociology and Research Associate at the University of Michigan Populatio
Michigan - POLISCI - 343
BEYOND AGREEMENT: Value Judgements i n C o n f l i c t R e s o l u t i o n and Cooperat i v e C o n f l i c t i n t h e Classroom A l f i e Kohn P M Working CA CRSO Working Paper # 6 Paper # 343 A p r i l 1987E s t a b l i s h e d i n J a n u a r y
Michigan - POLISCI - 344
PROGRAM IN COMPARATIVE STUDY OF SOCIAL TRANSFORMATIONS. William H Sewell, Jr. Terrence J McDonald . Sherry B. Ortner Jefferey M. PaigeMay 1987 CRSO # ~ ~ ~ / c s #1T sProgram in Comparative Study of Social TransformationsA Grant Proposal Funde
Michigan - POLISCI - 346
WHY DO GOVERNMENTS AWARD MONOPOLY RIGHTS TO PRIVATIZED TELEPHONE FIRMS? Bruno E. Viani* Telecommunications Policy Research Conference 2004, Arlington, VAAbstract I use an original dataset of 149 privatization sales of telephone firms in 74 countrie
Michigan - POLISCI - 348
INSTITUTIONALIZING CONFLICT MANAGEMENT ALTERNATIVESNancy ManringOct. 1987CRSO # 3 4 8 / ~ ~ ~ d #7INSTITUTIONALIZING CONFLICT MANAGEMENT ALTERNATNES1. INTRODUCTIONIn recent years, conflict management alternatives such as joint problem-solv
Michigan - POLISCI - 350
Yu Xie Kim AkinMigration of Scientists: Roles of Gender and the Family No. 95-350PSC Research Report November 1995Yu Xie is an Associate Professor of Sociology and Faculty Associate at the Population Studies Center, University of Michigan, Ann
Michigan - POLISCI - 356
Work in Progress 1976Highway Safety Research Institute The University of Michigan Ann Arbor, Michigan 48109UM-HSRI-76-1 January, 1976Copies of this publication and additional information about HSRl projects and research publications may be obtain
Michigan - POLISCI - 357
HISTORY, SOCIOLOGY,. AND THEORIES OF ORGANIZATION ~ a ~ N. rZald e CSST W o r k i n g -Paper 16 CRSO W o r k i n g Paper 1357July 1988HISTORY, SOCIOLOGY, AND THEORIES OF ORGANIZATION Mayer N. Z a l d CSST W o r k i n g Paper #6 J u l y 1988 CRSO
Michigan - POLISCI - 369
TRANSFORMATIONScomparative study of social transformationsCSST WORKING PAPERSThe University of Michigan Ann ArborSOCIOLOGY AS A DISCIPLINE: QUASI-SCIENCE AND QUASI HUMANITIES MAYER ZALD -CSST Working Paper #12 October 1988 CRSO Working Paper #3