Unformatted text preview: Statistics and data analysis( Math 301) Sun 111 Prof.Dr.M.S.Abdel Wahab Describing a Normal Distribution And Summarizing Binary Data Going from complicated details to simple informative to understand the most important characteristics of data set . In this case we concern the following questions: summarization 1. What is the shape of the distribution? 1. Where is the center of distribution? 1. How spread out is the distribution? We must also introduce the following expressions: • The median • The range • The interquartile range Summarization That can be used with ordered categorical numeral data We start with finding representative central value for the distribution • Identify the middle of the distribution in the following histogram Summarization • Where is the center? It is harder to say where the middle is because the distribution is skewed [ it looks different in the two sides.] Summarization 20 40 60 80 100 120 140 1st Q tr 2st Q tr East West North thouth West North thouth • Because of this uncertainty there is no unique answer. So, the different summarization methods focus on different concepts. Summarization 1The arithmetic average or mean. N x a v e ra g e N i i ∑ = = 1 2The mode. The most common value or category in the distribution (it is most useful in categorical data) Summarization 3the medium The value that split the distribution in half so that half the value are above it and half are below it .It is easy to calculate and can be used with most distribution. Summarizing a categorical variable (mode) Summarization ✰ The mode can be used with variable measured at any level. ✰ It is sometimes useful to consider the possibility that data have more than one mode (we do not require the mode to be equally frequent. ✰ Data value that occurs most often is called the mode. Summarization Example : Consider the birthplaces of a class of 29 students The mode in this case is California because it is the category with the largest number of students. Take care for small samples :Be careful when interpreting percentage based on small amount of data 3 out of 7 ___42.9% 252 out of 588 42.9% Example The results of an election to choose one from four candidates: .Dina, Ehab, Salem, Farouk The data set contains 36628 ballots forming a very .long list of names like D, C, S, D, F, E, S, S, D … • The mode is simply the name that occurs most often . • The mode is the name of the person who wins the election. In this case not only is the mode computable ,it is probably the important single summary value that ther is! Summarizing ordered and numerical data ( Medium ( ✰ With ordered categories and measured numerical data we can still compute the mode Summarizing ordered and numerical data ( Medium ) .(Finding the medium of a single group of number (odd number 1,8,4,6,8 Put the numbers in order 1,4,6,8,8 The medium = 6 Half the data are greater than 6 Half the data are smaller than 6 Even numbers 1,9,4,6,8,13 in order 1,4,6,8,9,13 Medium (6+8)/2=7 Ranks of the data:...
 Spring '11
 saaedabdalwahab
 Statistics, Normal Distribution, Standard Deviation, Probability distribution

