The SAGE Dictionary of Statistics (Cramer & Howitt) {S-B}.pdf

This preview shows page 1 out of 199 pages.

Unformatted text preview: The SAGE Dictionary of Statistics The SAGE Dictionary of Statistics Duncan Cramer and Dennis Howitt Duncan Cramer and Dennis Howitt SAGE Cramer-Prelims.qxd 4/22/04 2:09 PM Page i The SAGE Dictionary of Statistics Cramer-Prelims.qxd 4/22/04 2:09 PM Page ii Cramer-Prelims.qxd 4/22/04 2:09 PM Page iii The SAGE Dictionary of Statistics a practical resource for students in the social sciences Duncan Cramer and Dennis Howitt SAGE Publications London ● Thousand Oaks ● New Delhi Cramer-Prelims.qxd 4/22/04 2:09 PM Page iv © Duncan Cramer and Dennis Howitt 2004 First published 2004 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Inquiries concerning reproduction outside those terms should be sent to the publishers. SAGE Publications Ltd 1 Oliver’s Yard 55 City Road London EC1Y 1SP SAGE Publications Inc. 2455 Teller Road Thousand Oaks, California 91320 SAGE Publications India Pvt Ltd B-42, Panchsheel Enclave Post Box 4109 New Delhi 110 017 British Library Cataloguing in Publication data A catalogue record for this book is available from the British Library ISBN 0 7619 4137 1 ISBN 0 7619 4138 X (pbk) Library of Congress Control Number: 2003115348 Typeset by C&M Digitals (P) Ltd. Printed in Great Britain by The Cromwell Press Ltd, Trowbridge, Wiltshire Cramer-Prelims.qxd 4/22/04 2:09 PM Page v Contents Preface Some Common Statistical Notation A to Z Some Useful Sources vii ix 1–186 187 Cramer-Prelims.qxd 4/22/04 2:09 PM Page vi To our mothers – it is not their fault that lexicography took its toll. Cramer-Prelims.qxd 4/22/04 2:09 PM Page vii Preface Writing a dictionary of statistics is not many people’s idea of fun. And it wasn’t ours. Can we say that we have changed our minds about this at all? No. Nevertheless, now the reading and writing is over and those heavy books have gone back to the library, we are glad that we wrote it. Otherwise we would have had to buy it. The dictionary provides a valuable resource for students – and anyone else with too little time on their hands to stack their shelves with scores of specialist statistics textbooks. Writing a dictionary of statistics is one thing – writing a practical dictionary of statistics is another. The entries had to be useful, not merely accurate. Accuracy is not that useful on its own. One aspect of the practicality of this dictionary is in facilitating the learning of statistical techniques and concepts. The dictionary is not intended to stand alone as a textbook – there are plenty of those. We hope that it will be more important than that. Perhaps only the computer is more useful. Learning statistics is a complex business. Inevitably, students at some stage need to supplement their textbook. A trip to the library or the statistics lecturer’s office is daunting. Getting a statistics dictionary from the shelf is the lesser evil. And just look at the statistics textbook next to it – you probably outgrew its usefulness when you finished the first year at university. Few readers, not even ourselves, will ever use all of the entries in this dictionary. That would be a bit like stamp collecting. Nevertheless, all of the important things are here in a compact and accessible form for when they are needed. No doubt there are omissions but even The Collected Works of Shakespeare leaves out Pygmalion! Let us know of any. And we are not so clever that we will not have made mistakes. Let us know if you spot any of these too – modern publishing methods sometimes allow corrections without a major reprint. Many of the key terms used to describe statistical concepts are included as entries elsewhere. Where we thought it useful we have suggested other entries that are related to the entry that might be of interest by listing them at the end of the entry under ‘See’ or ‘See also’. In the main body of the entry itself we have not drawn attention to the terms that are covered elsewhere because we thought this could be too distracting to many readers. If you are unfamiliar with a term we suggest you look it up. Many of the terms described will be found in introductory textbooks on statistics. We suggest that if you want further information on a particular concept you look it up in a textbook that is ready to hand. There are a large number of introductory statistics Cramer-Prelims.qxd 4/22/04 2:09 PM Page viii THE SAGE DICTIONARY OF STATISTICS viii texts that adequately discuss these terms and we would not want you to seek out a particular text that we have selected that is not readily available to you. For the less common terms we have recommended one or more sources for additional reading. The authors and year of publication for these sources are given at the end of the entry and full details of the sources are provided at the end of the book. As we have discussed some of these terms in texts that we have written, we have sometimes recommended our own texts! The key features of the dictionary are: • • • • • • • • • • Compact and detailed descriptions of key concepts. Basic mathematical concepts explained. Details of procedures for hand calculations if possible. Difficulty level matched to the nature of the entry: very fundamental concepts are the most simply explained; more advanced statistics are given a slightly more sophisticated treatment. Practical advice to help guide users through some of the difficulties of the application of statistics. Exceptionally wide coverage and varied range of concepts, issues and procedures – wider than any single textbook by far. Coverage of relevant research methods. Compatible with standard statistical packages. Extensive cross-referencing. Useful additional reading. One good thing, we guess, is that since this statistics dictionary would be hard to distinguish from a two-author encyclopaedia of statistics, we will not need to write one ourselves. Duncan Cramer Dennis Howitt Cramer-Prelims.qxd 4/22/04 2:09 PM Page ix Some Common Statistical Notation Roman letter symbols or abbreviations: a df F log n M MS n or N p r R SD SS t constant degrees of freedom F test natural or Napierian logarithm arithmetic mean mean square number of cases in a sample probability Pearson’s correlation coefficient multiple correlation standard deviation sum of squares t test Greek letter symbols: ␣ ␤ ␥ ␴ ␩ ␬ ␭ ␳ ␶ ␸ ␹ (lower case alpha) Cronbach’s alpha reliability, significance level or alpha error (lower case beta) regression coefficient, beta error (lower case gamma) (lower case delta) (lower case eta) (lower case kappa) (lower case lambda) (lower case rho) (lower case tau) (lower case phi) (lower case chi) Cramer-Prelims.qxd x 4/22/04 2:09 PM Page x THE SAGE DICTIONARY OF STATISTICS Some common mathematical symbols: 冱 ⬁ ⫽ ⬍ ⱕ ⬎ ⱖ 冑苴 sum of infinity equal to less than less than or equal to greater than greater than or equal to square root Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 1 A a posteriori tests: see post hoc tests a priori comparisons or tests: where there are three or more means that may be compared (e.g. analysis of variance with three groups), one strategy is to plan the analysis in advance of collecting the data (or examining them). So, in this context, a priori means before the data analysis. (Obviously this would only apply if the researcher was not the data collector, otherwise it is in advance of collecting the data.) This is important because the process of deciding what groups are to be compared should be on the basis of the hypotheses underlying the planning of the research. By definition, this implies that the researcher is generally disinterested in general or trivial aspects of the data which are not the researcher’s primary focus. As a consequence, just a few of the possible comparisons are needed to be made as these contain the crucial information relative to the researcher’s interests. Table A.1 involves a simple ANOVA design in which there are four conditions – two are drug treatments and there are two control conditions. There are two control conditions because in one case the placebo tablet is for drug A and in the other case the placebo tablet is for drug B. An appropriate a priori comparison strategy in this case would be: • Meana against Meanb • Meana against Meanc • Meanb against Meand Table A.1 A simple ANOVA design Drug A Drug B Placebo control A Placebo control B Meana = Meanb = Meanc = Meand = Notice that this is fewer than the maximum number of comparisons that could be made (a total of six). This is because the researcher has ignored issues which perhaps are of little practical concern in terms of evaluating the effectiveness of the different drugs. For example, comparing placebo control A with placebo control B answers questions about the relative effectiveness of the placebo conditions but has no bearing on which drug is the most effective overall. The a priori approach needs to be compared with perhaps the more typical alternative research scenario – post hoc comparisons. The latter involves an unplanned analysis of the data following their collection. While this may be a perfectly adequate process, it is nevertheless far less clearly linked with the established priorities of the research than a priori comparisons. In post hoc testing, there tends to be an exhaustive examination of all of the possible pairs of means – so in the example in Table A.1 all four means would be compared with each other in pairs. This gives a total of six different comparisons. In a priori testing, it is not necessary to carry out the overall ANOVA since this merely tests whether there are differences across the various means. In these circumstances, failure of some means to differ from Cramer Chapter-A.qxd 4/22/04 2:09 PM 2 Page 2 THE SAGE DICTIONARY OF STATISTICS the others may produce non-significant findings due to conditions which are of little or no interest to the researcher. In a priori testing, the number of comparisons to be made has been limited to a small number of key comparisons. It is generally accepted that if there are relatively few a priori comparisons to be made, no adjustment is needed for the number of comparisons made. One rule of thumb is that if the comparisons are fewer in total than the degrees of freedom for the main effect minus one, it is perfectly appropriate to compare means without adjustment for the number of comparisons. Contrasts are examined in a priori testing. This is a system of weighting the means in order to obtain the appropriate mean difference when comparing two means. One mean is weighted (multiplied by) 1 and the other is weighted 1. The other means are weighted 0. The consequence of this is that the two key means are responsible for the mean difference. The other means (those not of interest) become zero and are always in the centre of the distribution and hence cannot influence the mean difference. There is an elegance and efficiency in the a priori comparison strategy. However, it does require an advanced level of statistical and research sophistication. Consequently, the more exhaustive procedure of the post hoc test (multiple comparisons test) is more familiar in the research literature. See also: analysis of variance; Bonferroni test; contrast; Dunn’s test; Dunnett’s C test; Dunnett’s T3 test; Dunnett’s test; Dunn–Sidak multiple comparison test; omnibus test; post hoc tests abscissa: this is the horizontal or x axis in a graph. See x axis absolute deviation: this is the difference between one numerical value and another numerical value. Negative values are ignored as we are simply measuring the distance between the two numbers. Most Absolute deviation  4 9 5 Absolute deviation  2 3 Figure A.1 5 Absolute deviations commonly, absolute deviation in statistics is the difference between a score and the mean (or sometimes median) of the set of scores. Thus, the absolute deviation of a score of 9 from the mean of 5 is 4. The absolute deviation of a score of 3 from the mean of 5 is 2 (Figure A.1). One advantage of the absolute deviation over deviation is that the former totals (and averages) for a set of scores to values other than 0.0 and so gives some indication of the variability of the scores. See also: mean deviation; mean, arithmetic acquiescence or yea-saying response set or style: this is the tendency to agree or to say ‘yes’ to a series of questions. This tendency is the opposite of disagreeing or saying ‘no’ to a set of questions, sometimes called a nay-saying response set. If agreeing or saying ‘yes’ to a series of questions results in a high score on the variable that those questions are measuring, such as being anxious, then a high score on the questions may indicate either greater anxiety or a tendency to agree. To control or to counteract this tendency, half of the questions may be worded in the opposite or reverse way so that if a person has a tendency to agree the tendency will cancel itself out when the two sets of items are combined. adding: see negative values Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 3 ALPHA (α) RELIABILITY, CRONBACH’S Probability of head = 0.5 + Probability of tail = 0.5 = 3 Probability of head or tail is the sum of the two separate probabilities according to addition rule: 0.5 + 0.5 = 1 Figure A.2 Demonstrating the addition rule for the simple case of either heads or tails when tossing a coin addition rule: a simple principle of probability theory is that the probability of either of two different outcomes occurring is the sum of the separate probabilities for those two different events (Figure A.2). So, the probability of a die landing 3 is 1 divided by 6 (i.e. 0.167) and the probability of a die landing 5 is 1 divided by 6 (i.e. 0.167 again). The probability of getting either a 3 or a 5 when tossing a die is the sum of the two separate probabilities (i.e. 0.167  0.167  0.333). Of course, the probability of getting any of the numbers from 1 to 6 spots is 1.0 (i.e. the sum of six probabilities of 0.167). N is the number of scores and 冱 is the symbol indicating in this case that all of the scores under consideration should be added together. One difficulty in statistics is that there is a degree of inconsistency in the use of the symbols for different things. So generally speaking, if a formula is used it is important to indicate what you mean by the letters in a separate key. algorithm: this is a set of steps which see analysis of covariance describe the process of doing a particular calculation or solving a problem. It is a common term to use to describe the steps in a computer program to do a particular calculation. See also: heuristic agglomeration schedule: a table that shows alpha error: see Type I or alpha error which variables or clusters of variables are paired together at different stages of a cluster analysis. See cluster analysis Cramer (2003) ) reliability, Cronbach’s: one of a alpha ( adjusted means, analysis of covariance: algebra: in algebra numbers are represented as letters and other symbols when giving equations or formulae. Algebra therefore is the basis of statistical equations. So a typical example is the formula for the mean: m 冱X N In this m stands for the numerical value of the mean, X is the numerical value of a score, number of measures of the internal consistency of items on questionnaires, tests and other instruments. It is used when all the items on the measure (or some of the items) are intended to measure the same concept (such as personality traits such as neuroticism). When a measure is internally consistent, all of the individual questions or items making up that measure should correlate well with the others. One traditional way of checking this is split-half reliability in which the items making up the measure are split into two sets (odd-numbered items versus Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 4 THE SAGE DICTIONARY OF STATISTICS 4 Table A.2 Preferences for four foodstuffs plus a total for number of preferences Person Person Person Person Person Person 1 2 3 4 5 6 Q1: bread Q2: cheese Q3: butter Q4: ham Total 0 1 1 1 0 0 0 1 0 1 0 1 0 1 1 1 0 0 0 0 1 1 1 0 0 3 3 4 1 1 Table A.3 The data from Table A.2 with Q1 and Q2 added, and Q3 and Q4 added Person Person Person Person Person Person Half A: bread + cheese items Half B: butter + ham items Total 0 2 1 2 0 1 0 1 2 2 1 0 0 3 3 4 1 1 1 2 3 4 5 6 even-numbered items, the first half of the items compared with the second half). The two separate sets are then summated to give two separate measures of what would appear to be the same concept. For example, the following four items serve to illustrate a short scale intended to measure liking for different foodstuffs: 1 2 3 4 I like bread I like cheese I like butter I like ham Agree Disagree Agree Disagree Agree Disagree Agree Disagree Responses to these four items are given in Table A.2 for six individuals. One split half of the test might be made up of items 1 and 2, and the other split half is made up of items 3 and 4. These sums are given in Table A.3. If the items measure the same thing, then the two split halves should correlate fairly well together. This turns out to be the case since the correlation of the two split halves with each other is 0.5 (although it is not significant with such a small sample size). Another name for this correlation is the split-half reliability. Since there are many ways of splitting the items on a measure, there are numerous split halves for most measuring instruments. One could calculate the odd–even reliability for the same data by summing items 1 and 3 and summing items 2 and 4. These two forms of reliability can give different values. This is inevitable as they are based on different combinations of items. Conceptually alpha is simply the average of all of the possible split-half reliabilities that could be calculated for any set of data. With a measure consisting of four items, these are items 1 and 2 versus items 3 and 4, items 2 and 3 versus items 1 and 4, and items 1 and 3 versus items 2 and 4. Alpha has a big advantage over split-half reliability. It is not dependent on arbitrary selections of items since it incorporates all possible selections of items. In practice, the calculation is based on the repeated-measures analysis of variance. The data in Table A.2 could be entered into a repeated-measures one-way analysis of variance. The ANOVA summary table is to be found in Table A.4. We then calculate coefficient alpha from the following formula: alpha   mean square between people  mean square residual mean square between people 0.600 − 0.200 0.400   0.67 0.600 0.600 Of course, SPSS and similar packages simply give the alpha value. See internal consistency; reliability Cramer (1998) alternative hypothesis: see hypothesis; hypothesis testing AMOS: this is the name of one of the computer programs for carrying out structural Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 5 ANALYSIS OF COVARIANCE (ANCOVA) Table A.4 Repeated-measures ANOVA summary table for data in Table A.2 Sums of Degrees of squares freedom Between treatments Between people Error (residual) 0.000 3 3.000 3.000 5 15 Means square 0.000 (not needed) 0.600 0.200 equation modelling. AMOS stands for Analysis of Moment Structures. Information about AMOS can be found at the following website: . html See structural equation modelling analysis of covariance (ANCOVA): analysis of covariance is abbreviated as ANCOVA (analysis of covariance). It is a form of analysis of variance (ANOVA). In the simplest case it is used to determine whether the means of the dependent variable for two or more groups of an independent variable or factor differ significantly w...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture