# In the modelinghomework assignments you will also

• In the modeling/homework assignments you will also encounter other aspects of statistics, such as the gathering , description and summarization of data Very importantly, you ll encounter the issues related to the choice of a good statistical model .

I.10 A Typical Example The data in this example is loosely based on a poll, as described in public_content/politics/elections/election_2012/election_2012_presidential_election/wisconsin/ election_2012_wisconsin_president. The presidential elections in the United States work in a funny way, and in each state there is essentially a separate election. Very important are the so-called ``swing-states'', for which it is difficult to predict the outcome of electoral process. In the 2012 election Wisconsin appeared to be such a swing- state. A phone survey (July 25, 2012) with 480 likely voters yielded the following data: 248 of individuals indicated they will vote for Barack Obama; 232 individuals indicate they will vote for Mitt Romney. What predictions can be made about the outcome of the Wisconsin election (if it was to take place on that same day)?
I.11 German Tanks During the II WW it was of importance for the allies to assess the number of German tanks and V2 rockets that the Germans were able to produce in a certain period of time. A lot of money was spend on intelligence to do so. However, the most successful and accurate approach was based on a relatively simple statistical approach (and some naivety by the Germans): Each German tank that was captured had serial numbers in various parts (e.g., engine block). As the name indicates, these were serial, essentially ranging from 1 to N . Assuming simplistically that each produced tank is equally likely to be captured gives a possible way to estimate N .

I.12 German Tanks A Concrete instance: During a certain period six German tanks were captured, with serial numbers 17, 68, 94, 127, 135, 212. Then a good estimate for N is given by Date Estimate True value Intelligence estimate June 1940 169 122 1000 June 1941 244 271 1550 August 1942 327 342 1550
I.13 Biology and Estimation of Missing Mass Suppose you are working with biologists studying the ecosystem on a certain lake. They would like to know how many species of fish inhabit the lake. They set a several (fish friendly) nets in different places and record the following catch: You later go fishing on the lake. What is the probability you ll encounter a species you haven t seen before? The Good-Turing estimator of this quantity is 2/12=0.167

I.14 What is Data? Definition: Data and Dataset This seems a bit vague… For our purposes: Data is a collection of numerical or categorical observations of a certain process (either physical, biological, social, etc…). Depending on the questions one wants to answer the order of the data might be important (e.g. AEX over time), other times it is irrelevant (exam grades of 2WS30 ordered by student last name).
I.15 A Typical Dataset To better understand the impact of smoking in pregnancy a big study was conducted in the USA. All the pregnancies under a certain health cooperative (in S. Francisco) were monitored

