{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Data_Acquisition_Supplement

# Data_Acquisition_Supplement - Data Acquisition Overview...

This preview shows pages 1–3. Sign up to view the full content.

Data Acquisition Overview GIGO - garbage in, garbage out - is a core principle in statistics. No amount of sophisticated data analysis can compensate for botched data acquisition. Successful data acquisition consists of three interelated activities: 1. deciding what quantities should be measured, 2. specifying exactly how these quantities should be measured, and, finally 3. collecting the data in a manner which supports optimal statistical analysis. Let’s consider each in turn. 1. Deciding what to measure is outside the scope of this course because it requires subject- matter expertise. For example, deciding what variables need to be measured to success- fully characterize a chemical process requires the expertise of a chemist and/or chemical engineer. 2. Similarly, specifying exactly how the quantities of interest should be measured requires subject-matter expertise. However, since the measurements should be done so as to maximize precision while minimizing variability, and these are statistical considerations, statistical expertise is involved. The exact specification of how a quantity should be measured is called an operational definition . The author discusses these in section 4.1 and provides several examples. 3. Finally, collecting the data in a way which supports optimal statistical analysis requires knowledge of the statistical methods to be used. Since all the statistical methods in this course require that the data satisfy, in some form, the IID assumption , acquiring data so as to satisfy this assumption is the primary goal. There are two basic data acquisition scenarios: sampling a population and sampling a pro- cess. We will discuss sampling a population first, the topic of section 4.2. Sampling Populations What is the goal of sampling a population? Suppose we want to describe some population, that is, we want to make quantitative statements about it. For example, suppose we want to determine what fraction of IU students are not from Indiana, i.e., we want to say something like “10% of IU students are not from Indiana.” The ideal way to do this would be to examine each student’s records and determine if their home address is in Indiana. Suppose, due to time, money or other constraints this is not possible. In this case we are forced to make our 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
statement about the IU students (the population) based on examining a fraction (sample) of all the students. Since in most cases the sample is a miniscule fraction of the population, making statements (inferences) about a population based on a sample is a risky, error-prone endeavor. An interesting example of the danger of inferences based on sample data is provided by recent developments in the dating of the Shroud of Turin, a 4m linen cloth thought by many to be the burial shroud of Jesus of Nazareth. In 1988 a sample of cloth was taken from the Shroud and carbon-dated by two different laboratories. The labs gave consistent results and the Shroud was dated to about 1260-1390 AD. This showed that the Shroud was not the burial shroud of Jesus of Nazareth but instead was another medieval forgery. (There were
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}