Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. Accuracy : Tools for Accurate and Reliable Statistical Computing Micah Altman Harvard University Jeff Gill University of California, Davis Michael P. McDonald George Mason University Brookings Institution Abstract Most empirical social scientists are surprised that low-level numerical issues in software can have deleterious effects on the estimation process. Statistical analyses that appear to be perfectly successful can be invalidated by concealed numerical problems. We have developed a set of tools, contained in accuracy , a package for R and Splus , to diagnose problems stemming from numerical and measurement error and to improve the accuracy of inferences. The tools included in accuracy include a framework for gauging the computational stability of model results, tools for comparing model results, optimization diagnostics, and tools for collecting entropy for true random numbers generation. Keywords : sensitivity analysis, statistical computation, numerical accuracy, generalized inverse, generalized Cholesky, starr test, global optimization. 2 Accuracy 1. Introduction Social science data are often subject to measurement error, and nearly all methods of statistical computing can yield inaccurate results under some combinations of data and model specification. This is an issue of practical concern – since our previous work and that of others, demonstrates that published analyses are affected with surprising frequency (Altman, Gill, McDonald 2004; McCullough and Vinod 1999). Unfortunately, most practitioners ignore even routine numerical issues in statistical computing. This is in contrast to some scientific and engineering disciplines, which require assessments of the accuracy of both data and results. 1 A simple example (below), using generated data, shows how numeric issues can affect even seemingly simple calculations. Consider the calculation of a variable’s dispersion: In R , the function sd returns the ‘sample standard deviation’, but does not provide an option to return the ‘population standard deviation’. 2 The three functions below compute this quantity. The first two functions are direct implementations of textbook formulas, and the third simply adjusts the results of the sd function. > sdp.formula1 <- function(x) { + n = length(x) + sqrt(n * sum(x^2) - sum(x)^2)/n + } > sdp.formula2 <- function(x) { + sum(sqrt((x - sum(x)/length(x))^2))/length(x) + } > sdp.formula3 <- function(x) { + sqrt(var(x) * (length(x) - 1)/length(x)) + } To see the damaging effects of numerical errors, we apply these functions to the following generated data frames, each of which produces columns of numbers, of increasing magnitude, that have standard deviations of 0.5....
View Full Document

This note was uploaded on 05/12/2010 for the course APPLIED ST 2010 taught by Professor Various during the Spring '10 term at Universidad Nacional Agraria La Molina.

Page1 / 40


This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online