This preview has intentionally blurred parts. Sign up to view the full documentView Full Document
- Download Document
- Word Count: 106691
Unformatted Document Excerpt
\6 L5M - l.smfs Living Standards Measurement Study Working Paper No. 126 Mo iqq9, the Living A Manual for Planning and Implementing Standards Measurement Study Survey A Manual for Planning and Implementing the Living Standards Measurement Study Survey The Living Standards Measurement Study The Living Standards Measurement Study (LSMS)was established by the World Bank in 1980to explore ways of improving the type and quality of household data collected by statistical offices in developing countries. Its goal is to foster increased use of household data as a basis for policy decisionmaking. Specifically, the LSMS is working to develop new methods to monitor progress in raising levels of living, to identify the consequences for households of past and proposed government policies, and to improve communications between survey statisticians, analysts, and policymakers. The LSMSWorking Paper series was started to disseminate intermediate products from the LSMS.Publications in the series include critical surveys covering different aspects of the LSMSdata collection program and reports on improved methodologies for using Living Standards Survey (LSS) data. More recent publications recommend specific survey, questionnaire, and data processing designs and demonstrate the breadth of policy analysis that can be carried out using LSSdata. LSMSWorking Paper Number 126 A Manual for Planning and Implementing the Living Standards Measurement Study Survey Margaret E. Grosh Juan Mufnoz The World Bank Washington, D.C. Copyright D 1996 The International Bank for Reconstruction and Development/THE WORLD BANK 1818H Street, N.W. Washington, D.C. 20433, U.S.A. All rights reserved Manufactured in the United States of America First printing May 1996 To present the results of the Living Standards Measurement Study with the least possible delay, the typescript of this paper has not been prepared in accordance with the procedures appropriate to formal printed texts, and the World Bank accepts no responsibility for errors. Some sources cited in this paper may be informal documents that are not readily available. The findings, interpretations, and conclusions expressed in this paper are entirely those of the author(s) and should not be attributed in any manner to the World Bank, to its affiliated organizations, or to members of its Board of Executive Directors or the countries they represent. The World Bank does not guarantee the accuracy of the data included in this publication and accepts no responsibility whatsoever for any consequence of their use. The boundaries, colors, denominations, and other information shown on any map in this volume do not imply on the part of the World Bank Group any judgment on the legal status of any territory or the endorsement or acceptance of such boundaries. The material in this publication is copyrighted. Requests for permission to reproduce portions of it should be sent to the Office of the Publisher at the address shown in the copyright notice above. The World Bank encourages dissemination of its work and will normally give permission promptly and, when the reproduction is for noncommercial purposes, without asking a fee. Permission to copy portions for classroom use is granted through the Copyright Clearance Center, Inc., Suite 910, 222 Rosewood Drive, Danvers, Massachusetts 01923, U.S.A. The complete backlist of publications from the World Bank is shown in the annual Index of Publications, which contains an alphabetical title list (with full ordering information) and indexes of subjects, authors, and countries and regions. The latest edition is available free of charge from the Distribution Unit, Office of the Publisher, The World Bank, 1818H Street, N.W., Washington, D.C. 20433,U.S.A.,or from Publications, The World Bank, 66, avenue d'1ena, 75116Paris, France. ISSN:0253-4517 Margaret Grosh is a senior economist at the World Bank.Juan Munoz is the Director of Sistemas Integrales, a survey research firm in Santiago, Chile. Libraryof Congress Cataloging-in-PublicationData A manual for planning and implementing the Living standards measurement study survey / [compiled by] Margaret E. Grosh, Juan Munoz. p. cm. - (LSMSworking paper : no. 126) Includes bibliographical references (p. ISBN0-8213-3639-8 1. Cost and standard of living-Data processing-Planning. 2. Household surveys-Methodology. 1. Grosh, Margaret E. 11. Mufioz, Juan, 1947- . III. Series. HD6978.M34 1996 339.4'7'0723-dc2O 96-19418 CIP Contents Foreword ......................................................... Abstract .......................................................... Acknowledgments .................................................... ix xi xii Chapter 1. Introduction .1................................................. A. What This Manual Covers ......................................... B. Who Should Read This Manual ...................................... C. Some Assumptions Implicit in the Manual ............................... Chapter 2: An Overview of LSMS Surveys ..................................... Key Messages .................................................. A. The "Prototype' LSMS Survey ..................................... Purpose of LSMS Surveys ...................................... Questionnaire Content ........................................ Quality Control ........................................... Planning and Budgeting ...................................... B. Variations from the Prototype ..................................... Common Variants .......................................... Evolution in LSMS Surveys .................................... Chapter 3. QuestionnaireDevelopment ....................................... Key Messages ....................... .......................... A. Questionnaire Content .......................................... B. The Process of QuestionnaireDevelopment ............................. The Actors .............................................. The Iterative Process ........................................ Field Test of the Questionnaire .................................. C. Questionnaire Format ........................................... Chapter 4: Sampling .................................................. Key Messages . ................................................. A. Overview of Issues in Sample Design ................................. B. Sampling Practice in LSMS Surveys .................................. C. Implementinga Sample Design ..................................... Determining the Basic Sample Design Parameters ...................... Implementation of the First Sampling Stage .......................... Implementation of the Second Sampling Stage ......................... Selecting Random Persons in a Household ........................... Chapter 5: Field Operations .............................................. Key Messages . ................................................. A. Standard LSMS Organization of Field Work ............................. Four-Week Interview Cycle .................................... Composition of Survey Staff ................................... Duties of Survey Team Members ................................ Team Logistics ........................................... Complexity of LSMS Field Operations ............................. Alternatives to LSMS Standard Field Procedures and Their Implications ... B. Preparation for Field Work ...................................... Personnel .............................................. Training ............................................... Manuals ............................................... 1 1 3 5 5 5 6 7 14 15 15 17 18 21 21 21 22 22 25 29 32 53 3 53 60 64 64 66 71 81 83 83 83 83 87 88 94 94 ...... 95 103 104 106 110 v Developing Supervision Forms ................................. Scheduling Field Work ...................................... Ensuring Collaboration by Households ............................ Piloting Field Procedures .................................... Chapter 6. Data Management ............................................ Key Messages ................................................ A. An Overview of LSMS Data Management Philosophy ..................... Objectives.............................................. Approach Developed ....................................... Implications for Survey Planning ................................ B. Requirements for the Data Management System ......................... Ease of Analysis of Resulting Data Files ........................... Data Quality Checks During Data Entry ........................... After Data Entry ......................................... C. File Structure Used in LSMS Data Entry Program ........................ Correspondence Between Records and Individual Units Observed ............ Variable Number of Records .................................. Limiting the Length of a Record Type ............................ Identifiers .............................................. Transformation ........................................... Chapter 7: Beginning Data Analysis ....................................... Key Messages ................................................ A. Policies and Project Components to Promote Data Use ..................... B. Documentationand Dissemination of Data Sets .......................... Data Use Policy .......................................... Basic Documentation ....................................... Unit Record Data Files ...................................... Filing System ........................................... Setting Service Standards Assignment of Responsibilitiesfor Data Documentation and Dissemination .................................... C. The Abstract ............................................... D. Examples of Further Analysis ..................................... the Study of Poverty ....................................... Understandingthe Effects of the Economic Environment ................. Provision of Public Services ................................... Impact of Government Programs ................................ The Determinants of Household Decisions .......................... Chapter 8. Developing a Budget and Work Program .............................. ....................................... Key Messages ......... .................. A. Assessing the Country's Statistical Capabilities ........ Assessing the Outputs of the Statistical Agency ....................... B. Developing a Budget ....................................... Actual Survey Budgets ...................................... Base Case Prototype ...................................... Discussion ....................................... SensitivityAnalysis ...................................... C. Developing the Work Program .................................... Management and Logistics.................................... QuestionnaireDevelopment ................................... Sampling ...................................... Staffing and Training ...................................... Data Management ...................................... vi 112 119 121 123 125 125 126 126 126 127 128 128 130 136 140 140 140 142 142 144 145 145 146 148 149 151 153 154 154 155 163 165 168 172 176 181 184 184 185 185 187 188 189 198 198 199 201 205 206 208 208 Field Work.210 Data Analysis and Documentation . 210 AnnexI. AnnexlI. AnnexIlI. AnnexIV. Description Questionnaires of from VietNam LSMS AnnotatedList of Selectd References.222 LSMSWorkingPapers .229 Instructions PriceQuestionnaire for from KageraHealth and Development Survey .233 AnnexV. Calendarof Eventsin Kagera,Tanzania.237 AnnexVI. ill Questionnaire Verification Form Used in PakistanLSMS.238 AnnexVII. Inter-Rocord Checksin the RomaniaIntegrated Household Survey . .242 AnnexVmI.Table of Contentsfrom Abstractof PakistanIntegrated HouseholdSurvey . .254 AnnexIX. Table of Contentsfrom Abstractof the 1993 JamaicaSurveyof LivingConditions .259 AnnexX. Calculating BasicConsumption Aggregates .268 Blbliography .284 Tablas 212 Table2. 1: Table 3.1: Table 4.1: Table 7. 1: Table 7.2: Table 7.3: Table7.4: Table 7.5: Table 7.6: Table 7.7: Table 7.8: Table 7.9: Table 8. 1: Table 8.2: Table 8.3: Boxes Box 11 Box 1.2: Box2. 1: Box 2.2: Box2.3: Box3. 1: Box 3.2: Box3.3: Box 4. 1: Box 4.2: Box 4.3: Box 4.4: Descriptionof LSMS-Type Surveysby Country.16 Unitsof Quantity.0 SampleDesignin Slcto LSMSSurveys.61 SampleSize, Mean, and StandardError of Estimateof Per CapitaConsumption, 1992and 1993JamaicaSurveyof LUving Conditions (SLC).161 SomeCharacteristics the Poor In Ecuador,1994 .166 of Determinants Household of Expenditure Lvels .169 Coto d'Ivolre 1985 - Distributional Chamactisics of Coffke and Cocoa Farming Tunisia- EstimatedNutritional Effectsof Alternative Price Policie .171 Changesin WelfareIn Lima 1985to 1990.172 Accss to Infrastructure Rural VletNam In .. Indonesia The Distribution Seleted Subsidies.175 of Percentof womenwho haveheardof, ever used,or are currentlyusinga modern methodof contrception, Ghana, 1988-89.183 Approximate SurveyBudgetsfrom SelectedCountrice .188 Generic,All-Inclusive Budgetfor a One Year, 3,200-Household LivingStandardsSurvey.190 Sensltivity Analysison the Budget .199 .... 170 173 The MinimumPackageof Reference Materlals. Guidefor ThoseWho Wlll Read OnlyPortionsof the Manual. Common Use of LSMSData.. UsingLSMSData to InformGovernment PolicyChoicu .9 Modules LSMSQuestionnaire In .. Levolsof Refinement Determining In QuestonnaireContent.23 SynergyIn Elements Questionnalre of Deslgn.24 Translating Questlonnalre the .. How WrongWill Our Estlmates .55 Be? SamplingError and SampleSize: A Case of Diminishing Returns .56 SampleSze and Populatlon .57 Slze ClusterEffect .59 vil 2 3 8 11 26 Box S.1: Box 6. 1: Box 6.2: Box 6.3: Box 6.4: Box 6.5: Box 6.6: Box 6.7: Box 7.1: Box 7.2: Box 7.3: Box 7.4: Box 8.1: Box 8.2: Box 8.3: Figures Day 2 of a TypicalInterviewer TrainingSession.108 Levelsof Observation the KageraHealthand Development in Survey.130 SettingBoundariae RangeChecks.131 for SampleReportof Inter-Rocord Checks.134 SamplePage of the HouseholdPrintout.. File Structure,Identifiersand the InterfaceBetweenData Entryand Analysis.137 An Evaluation Data EntryPackages'Suitability the LSMS.141 of for A SampleDataEntry Screen.143 The Roleof DifferentActorsin Analysis.147 PrototypeDataAccossPolicy .150 The DifferenceBetweena Goodand Bad Table .159 Data CleaningDuringAnalysis .164 Assessingthe Productsof a Statistical Institute ........................... Assessingthe Inputsof a Statistical Institute.187 Contracting for TechnicalAssistance.191 Out 135 186 Figure 3.1: Figure 3.2: Figure 3.3: Figure 3.4: Figure 3.5: Figure 3.6: Figure 3.7: Figure 3.8: Figure 4.1: Figure 4.2: Figure 4.3: Figure 4.4: Figure 4.5: Figure 4.6: Figure 4.7: Figure 4.8: Flgure 5. 1: Figure 5.2: Figure 5.3: Figure 5.4: Figure 5.5: Figure 5.6: Figure 5.7: Figure7. 1: Figure 7.2: Figure 7.3: Figure 7.4: Figure 7.5: Figure 7.6: Flgure 7.7: FIgure 8.1: Illustration IndividualIdentification Skip Codes. of and FormatWhenonly One of a Unit of Analysisis Observed ................... RosterArrangements .39 Illustration Precodingand an Open-Ended .41 of List Illustration CasnConventions ......... of .......................... FlowChart of Hoealth Module. Illustration a Closed-Ended .47 of List Illustration Respondent-Selected .49 of Units List of First StageSampling Units .67 Cumulative Totalsin the List of First StageSampling Units .68 Selecting First StageSamplingUnits .69 the Assignment Work Areas, GhanaLivingStandardsSurvey, 1988-89.72 of An Algorithm Producea Random to Permutation the Integers1 to N .74 of TypicalListingForm .76 Llst of SelectedDwellings .80 StickerUsedfor Selecting Random a IndividualWithinthe Household .81 WeeklyActivitias the Field Members.85 of A Calendarof Events .. Interviewerand Data Entry OperatorTrainingProgram .107 InterviewerEvaluation Form .114 Page One of PakistanQuestionnaire Verification Form.116 Check-up InterviewForm .118 Schedulefor FieldWork .120 Illustration an Abstractfor PrimarySchools. of Indonesia Percentof ThoseIll in Last MonthWho SoughtHealthCars, by Doecile PlaceWhere Care Sought,According 1990SUSENAS and to .174 SelectodIndicatorsof Qualityof HealthFacilitiesin Jamaica,According to the ExpandedHealthModule,1989Surveyof LivingConditions.176 User Fee Simulations for Children's Health Care In Sierra Regions of Peru, 1985 Workers' Incomein the BoliviaEmergency SocialFund.179 Responseof PrivateTransforsto PublicTransforPrograms.180 Age-specific fertilltyrate by women'sage and consumption percentile, C8te d'Ivoire, 1985-87 .183 GenericTime Tablefor SurveyManagement .200 35 36 43 45 93 158 .S.. 177 vlil Foreword In makingsoundpolicydecisions,governmentsneedto knowhowthose decisions affect the populationsin their countries. Answersto some of the importantquestionscan come only from householdsurveydata. For example- who is poor and who is rich and why? Who uses government services such as schools, clinics, agricultural extension offices, welfare programs, and old-age pensions? Are those not using government services able to get services in the private sector? How do households change their decisionsabout who works and how much, whetherand where to send their childrento school, and how many children to have? To answerthese questionsrequireshousehold survey data that cover manyaspectsof householdwelfare. Until a few years ago, such surveys were very rare in developing countries. The Living Standards Measurement Study (LSMS)program was launchedin 1980 to help foster the collectionof good data from household surveys and improve its subsequent use in policymaking. The first LSMS surveyswere implemented CMte in d'lvoire in 1985and in Peru in 1985/86. Since then, over forty LSMS surveyshavebeen conductedin nineteencountriesand new LSMS surveys are currently in the field or being planned in nine additionalcountries. LSMS surveys provide high quality, timely, and comprehensivedata on most aspectsof householdwelfare (consumption,incomefrom activitiesin the labor market, household enterprises or agriculture, asset ownership, migration, health, education, nutrition, fertility, and anthropometrics). LSMS surveyshave become a powerful tool for understandinghouseholdeconomicdecisionsand the effects of social and economic policies. The use of LSMS data in poverty assessments helps to ensure that the development community's efforts to reduce poverty can be guided by quantitative informationon levels, causes, and consequencesof poverty. The data have been used by governmentsin various direct and indirectways. In Bolivia, LSMS data were used to help the government evaluate its public employmentprogram. In Jamaica, the government used data from its LSMS survey to reformulatethe food stamps program. In South Africa, the governmentused the data in designingtheir tax reform program. LSMSsurveyshave evolvedover time. Originallythey were motivatedprimarily to support research; now they are muchmore oftendrivenby policy needs. The contents of the questionnaireshave accordingly changed over the years and from country to country. The modulardesign of the LSMS questionnaireshas facilitatedthis flexibility in and country-specificity.The surveyshave alsobenefittedfrom developments computer technology. The LSMS program had to design its own software to lay out the first questionnaires,but now such software is availablecommercially. In the first surveysin COte d'Ivoire and Peru, it was novel to carry out data entry on personal computers in regional field offices, with electricity often provided by gas-fueledgenerators. In the Nepal LSMS now in the field, data entry is carried out in the field on notebook computerspowered by portable solar panels. In 1995,Tanzaniabecamethe first country to allow data from its LSMS surveyto be put on the Internet for easy accessby scholars worldwide. Forewordcontinucson next page ix Foreword(continued) The interest in conducting and analyzing surveys like the LSMS has grown markedlysince the early days of the project. Such surveys are now beingdone in many more places than the LSMS divisionof the World Bank can work. Since many of those now involved in the implementationof new LSMS-typesurveys have little familiarity with the old surveys, it is importantto ensure that the lessons from the first ten years of LSMS field work are widely available. This manual is one of a series of efforts to compile and disseminatethe lessons of LSMS experience. Here the focus is on the planningof the survey and the conductof the field work. A comprehensivereview of the content of the questionnairesand the way in which the various modules can be combinedis currently underway, and the documentationand disseminationof data sets from surveys already fielded have recentlybeen upgraded. Lyn Squire, Director Policy ResearchDepartment x Abstract This manual explainsthe planningprocess, technicalprocedures,and standards used in Living StandardMeasurementStudy (LSMS)householdsurveys, includingwhat these procedures entail, why they are used, and how they can be implemented. The "what" is the factual description of procedures and standards. The explanationof the "why" will help the reader to understandthe importanceof the different procedures. Moreover, if some aspectof them is to be changedor eliminatedin a particularcountry, knowingwhat they were designedto achievemay aid the survey planner in finding an alternate strategy to accomplishthe same objective. The "how' comprises explicit instructions,along with examplesof ways the procedureshave been adapted in different LSMSsurveys. Althoughthe lessonspresentedhere are countriesthat have implemented derived from LSMS surveys, many of them are applicable to surveys generally, and especiallyto those that are complexor especiallyconcernedwith quality control. Topics covered in this manual include the technical aspects of questionnaire formattingand testing, ways to implementa sample design, and what fieldworkand data management procedures have been successful. Ideas about directions to pursue in analyzingthe data are sketched. A brief descriptionof how to assess local statistical capacity is included. Generic work plans and budgets are presented to give ballpark estimatesof how long each process will take and what must be includedin a budget. This manual will be useful to a broad spectrumof those who collaborateon an LSMS survey, includingthe staffs of the statisticalagency, planningagency, university, or internationaldevelopmentagency that will design, finance, implement,and analyze the survey, and technicalassistantswho are not familiarwith LSMS surveypractice. The authors have tried to write so that persons who are not specialistscan read all parts of the manual. xi Acknowledgments This document is an attempt to put on paper the oral tradition of the Living StandardsMeasurementStudy (LSMS)surveys. As such the authors are not the creators of the thoughtspresentedhere, but somethingmore like scribes. The practicesrecorded here were developedover years of discussionand field work in which numerouspeople and agenciesparticipated. We wouldlike to acknowledge irreplaceablecontributions the to definingthe body of thoughtpresentedhere made by the past and present staff of the LSMS Division at the World Bank, our colleaguesin the academic world who have provided advice and criticismover the years, the manyagenciesthat provided technical assistanceand financingfor the surveys,and, most importantly,the agenciesthat actually implemented surveys. the We have also receivedhelp from manypeople in bringingthis documentabout. Emmanuel Jimenez made the funds and time available and encouraged us in the importanceof the task. Martha Ainsworthallowedus to crib heavily from her writings on LSMS surveys. Martha Ainsworth,Harold Alderman,Ana Maria Arriagada, Benu Bidani,GauravDatt, Paul Glewwe,ChristiaanGrootaert,StephenHowes, LuisaFerreira, Emmanuel Jimenez, Dean Jolliffe, Tim Marchant, P.B.K. Murthy, Raylynn Oliver, GiovannaPrennushi,Laura Rawlings,Chris Scott, KinnonScott, Jacques van der Gaag, and Robert Vos provided many useful comments on drafts of this work. Martha Ainsworth,Paul Glewwe,ChristiaanGrootaert,and EmmanuelJimenez draftedsynopses of their researchfor Chapter7. StephanieFaul editedthe drafts. Carlo del Ninno wrote Annex X. Jim Shafer handledthe desktoppublishing. xii Chapter 1. Introduction A. What This ManualCovers This manual explainsthe planningprocess, technicalprocedures, and standards used in Living StandardMeasurementStudy(LSMS)householdsurveys, includingwhat these procedures entail, why they are used, and how they can be implemented. The "what' is the factual description of procedures and standards. The explanationof the 'why' will help the reader to understandthe importanceof the different procedures. Moreover, if some aspect of the proceduresis to be changedor eliminatedin a particular country, knowing what they were designedto achieve may aid the survey planner in finding an alternate strategy to accomplishthe same objective. The "how' comprises explicit instructions,along with examplesof ways the procedureshave been adapted in differentcountriesthat haveimplemented LSMS surveys. Althoughthe lessonspresented here are derivedfrom LSMS surveys,many of them are applicableto surveysgenerally, and especiallyto those that are complexor especiallyconcernedwith quality control. This manual is part of a multi-pronged effortto document,evaluate,and improve LSMS surveys. As such it is not designedto stand alone, but to fill part of the gap in available materials. The planner of any new LSMS survey will need to consult many other documentsas well as the one in hand. The basics are listed in Box 1.1 and a more extensiveannotatedbibliographyis provided in AnnexII. Topics covered in this manual include the technical aspects of questionnaire formattingand testing, ways to implementa sampledesign, and what fieldworkand data management procedures have been successful. Ideas about directions to pursue in analyzingthe data are sketched. A brief sectiondescribeshow to assess local statistical capacity. Generic work plans and budgets are presented to give ballpark estimatesof how long each process will take and what must be includedin a budget. The manualdoes not attemptto coverinstitutionalfactorsin developingthe scope and design of the survey project, the content of the questionnaires,or analysis of the data. As this manual is being written, the LSMS division of the World Bank' is just beginninga major research effort which will culminatein a separate volume covering these themes. The final productof that effort is expectedin about 1998,but draft papers should be available beginning in 1996. Moreover, this manual provides only brief summariesof some technicaltopicsfor which extensiveinformationis already available, for example,samplingand anthropometric measurement.Suggestionsfor furtherreading are provided in Annex II. B. Who Should Read This Manual This manual will be useful to a broad spectrumof those who collaborateon an LSMS survey, includingthe staffsof the statisticalagency,planningagency,or university 1. The name and place in the organizationchart of the division that supports LSMSsurveys has changed several times over the last 15 years. Currently it is the Poverty and Human Resources Divisionin the PolicyResearchDepartment. For simplicity,we will use the term LSMSdivision throughoutthe manual. Box 1.1: 7he MinimumPackageof ReferenceMaterials Those involved in developing a new LSMS survey will want to refer to many other documents. Some minimal suggestions are provided here. A more complete list of materials with annotations on content and complete descriptions of the references is provided in Annex II. Materials LSMSSurveys on The LSMS working papers are available through the World Bank Bookstore, affiliated bookstores throughout the world, and many libraries. Materials marked with an asterisk are available from the LSMS division of the World Bank for those involved in planning new LSMS surveys. Requests for these should be sent by electronic mail to LSMS@worldbank.org, by fax to LSMS Surveys at 202-522-1153, or by sending a letter to LSMS Surveys, PRDPH, World Bank, 1818 H Street, N.W., Washington, D.C. 20433. For a discussion of strategic institutional choices: LSMS Working Paper 80 For help in formatting questionnaires: this manual For help in designing the contents of questionnaires: LSMS Working Papers 24, 34, 90 sample questionnairesfrom other countries forthcoming work to revise the prototype modules* For help in planning field work: this manual For help in data management: this manual examples of Basic Information Documentsfrom other countries For lessons in building analytic capacity: case studies from several countries* Selected Manuals Produced by Allied Survey Programs For an overview of the Social Dimensions of Adjustment Surveys: Delaine et al. (1992) on Integrated Surveys Marchant and Grootaert (1991) on Priority Surveys Wold (1995) on Community Surveys The UN National Household Survey Capability Program produced a series of manuals that may be of interest, especially: UNNHSCP (1986a) on How to Conduct Anthropometric Measurements UNNHSCP (1982) on Non-SamplingError UNNHSCP (1986b) on Sample Frames and Sample Designs 2 Box 1.2: Guidefor Those Who WillRead Only Portionsof the Manual Everyonewho works on a survey will benefit from knowinghow its differentaspects fit together. An appreciationof the strategicissues involvedin every facet of the surveywill help planners,team members,and analystscollaboratemore effectivelyandproducea highquality survey. All the membersof an LSMSteam should thereforeread as much of this manual as they can, and avoid focusingonly on 'their' chapter. Those who cannot read the whole manualshould read the whole chapter most relevant to their specificrole in the survey and the followingparts of the rest of the manual. Chapter 2: Overviewof LSMS surveys - skim all sections Chapter 3: Questionnaire Development SectionsA and B, skim SectionC Chapter 4: Sampling- SectionsA and B Chapter 5: Field Operations- SectionA Chapter 6: Data Management- SectionsA and B Chapter 7: BeginningData Analysis- skim SectionA Chapter 8: Developinga Budgetand Work Program - skim all sections that will design, implement and analyze the survey, and of the international agencies that finance the survey, as well as technical assistants who are not familiar with LSMS survey practice. The authors have tried to write so that persons who are not specialists can read all parts of the manual. The following aids are provided to make it easier for the reader to find the parts of this documentmost relevantto his or her particularpurposes: brief suggestionsare providedin Box 1.2; * key messages are summarizedin bullet form at the beginningof each chapter; * the chapter's structureand potentialusefulnessto different audiences is outlined at the beginningof each chapter; and * informationthat shouldbe read by all audiencesis presentedfirst in the chapter, with informationof interestprimarilyto the different specialized members of the technical teams presented in the latter parts. C. Some Assumptions Implicit in the Manual Many strategic decisionsmust be made when designinga survey project, more than can be discussed in this manual. These issues are treated in other materialsalready availableor that are scheduledto be made availablesoon. However,these choiceshave repercussionsfor the parts of planning a survey that are describedin this manual, and therefore this subsectionbriefly mentionsthe issues and choicesthat are implicit in the rest of the document. These can be thoughtof as the "base case" for implementation of an LSMS survey. Packages can be tailored by addingor subtractingelementsfrom the base case. The assumptionsmade here about these strategicdecisionsare as follows: ONEYEARVSMULTI-ER PROGRAm. This manualdescribes a single year of an LSMS survey. When surveys are repeatedonce a year or once every two years, most of the same steps are requiredfor each round. Someof them may be accomplishedmore easily, with less technical assistance, and with less need for new equipment. Their 3 content, however, remains the same so the manual is still fully applicable to multiyear projects. HowMUCH DATAALYSS INCLUDE TE PROJECr. This manual focuses iN on the productionof data, althoughprojectsoften includea good deal of analysisas well. Thus the manual is a guideto what may be one componentof a larger project or may be 2 by a first project to be complemented other projectsthat focus on analysisof data. OF BUIzDING.This manual again focuses on the narrowest AMOUrVT CAPACnry likely definitionof a project. Some training will take place in the scenario used here. It includescompletetraining for field staff, extensivetraining for the data manager, and some on-the-jobtraining for the survey managerand field manager as they interactwith the technical assistants. Projects that emphasize capacity building would arrange additionaltraining for staff involved in questionnairedesign and formatting, sampling, and data managementand analysis. SOURcE OF FINANCNG. The term 'survey project" is used throughout the manual as though a specialsource of fundswere to be sought. This has usuallybeen the case for LSMS surveys, although of course countries could finance them from their normal nationalbudgets. The source of funds is largely immaterialto the information provided in this manual. AGENCY.This manual assumes that the survey will be carried out iMPLMENTING by the government's central statisticaloffice, though in some countries a universityor private research firm may be used instead. In the great majorityof LSMS surveys,the agency chosen has been the governmentstatisticalagency. STAFF.Lastly there is the division of labor PERmANEvTv TEMPORARY how much shouldcome from the permanentstaff of the statisticalagencyand how muchfrom people hired on short-term contracts. The first choice may be better for institution building. The second choice may be speedierand, dependingon the wagesthat can be paid, may make it easierto ensure high-qualitystaff. This manualdiscussesthe full staff whetherthey comefrom insideor necessaryto carry out a survey withoutdifferentiating outside the statisticalagency. 2. The LSMS division is sponsoringa reviewand evaluationof mechanismsfor supportingdata analysis in LSMSprojects, the results of whichshould be availablein 1996. 4 Chapter 2: An Overview of LSMS Surveys Key Messages LSMS surveys are designed to produce a comprehensive monetary measure of welfareand its distribution;describe other aspects of welfare; describepatterns of access to and use of socialservices; allow study of the determinantsof important socialand economicoutcomes;and allow study of how households behave in response to changes in the economic environmentor governmentprograms. * LSMS surveys are integratedsurveys covering a number of topics. The household questionnaire always produces comprehensive measures of consumption, usually comprehensive measures of income, and always coversa variety of sectoralissues, usuallyhealth, education,nutrition, and fertility. The community questionnaire describes the economic environment faced by the households in the sample. The price questionnaire gathers information on prices of basic goods in the community. Sometimesspecial questionnairesare used for health clinics and schools. LSMS surveys use an extensive set of quality control procedures to minimizeerrors and delays in data collection and processing. These are the topic of much of the rest of this manual. Many surveys in the LSMS family differ from the prototype in one or more aspects of purpose, content, or quality control. This is natural, as each is adapted to fit the circumstancesof the time and place where it was developed. * * This chapter describes very briefly the purpose and contents of LSMS surveys and the factors affecting their evolution. These topics are really the theme of the planned companion manual to this one. Their treatment here is therefore brief. This chapter may be skipped or skimmed by those who are already well familiar with the content of LSMS surveys, but should be read by all others. A. The "Prototype" LSMS Survey Here we describe a "prototype" LSMS survey. The LSMS prototype is actuallya compositebased on experiencewith a numberof surveys. Throughout the manual these will be drawn upon for examples and illustrations of the concepts discussed. In fact, many of the surveys in the LSMS family have departed from the prototype in one or more ways in order to fulfill slightly different objectivesor in response to institutionalor budget constraints. The use 5 of a survey to illustrate a specific point does not imply that all aspects of that survey are the same as for the LSMS "prototype." Purpose of LSMS Surveys The objective of LSMS surveys is to provide data adequate for the planning, monitoring,and analysisof economicpolicies and socialprograms with respectto their impact on householdliving standards,especiallythose of the poor. In order to achieve this objective, the data must by integrated, timely, and available for analysis on a variety of issues, often conducted by many analysts and using a wide range of techniques. In terms of content, LSMS surveys provide an integrated view of householdwelfare and allow for the study of its determinants. The surveys are designed on the premise that quantifyingand locating a problem is not enough. We need to learn how to solve it. For example, knowing how many poor there are, where they live, and what they do is only a part of the enquiry. In order to devise cost-effectivesolutions,planners also need to understandin greater detail the causes and consequencesof poverty and the effect of changes in government policies. The same principle applies to other problems such as illiteracy or malnutrition. LSMS questionnairestherefore provide an integrated set of information. First, they are designed to measure the distributionof welfare and the level of poverty in economies where subsistence agriculture, informal household enterprises, seasonalemployment,and non-cashpaymentsare common. Second, they describe the patterns of access to and utilizationof many public servicesschooling,health care, electricity, water supply, and sanitation. Third, they are designed to understand how householdsreact to the economic environmentand governmentprograms - for example, how householdwelfare might be affected by changes in the prices of major agricultural commoditiesor how the use of governmenthealth services might change if user fees were raised. Fourth, they are designedto supportcomplexanalysesof relationshipsbetweenvarious aspects of household welfare - such as the impact of household income on the enrollment of children in school, of the effects of education on childbearing behavior, or the impact of health statuson employment. In order to be relevant to policy analysis, survey data must be timely. The procedures designed for LSMS surveys result in data that is ready for analysiswithinabout three monthsfrom the completionof fieldwork,as described in chapters 5 and 6. Finally, the most important tangible product of LSMS projects is not viewed as a set of standard tabulations, but as a data set that can be used by multiple users to answer many differentquestions. A rich abstract that presents some of the basic findings of the multiple aspects of welfare covered in the 6 survey is certainly a useful reference and should be produced in the survey projects. But in most cases the availabilityof the data in tabulatedform will not be enough to carry out the kind of deeper investigations needed for povertyrelated work and economicanalysis in general. Someof these issues require the adoption of sophisticated calculations and modelling tools (usually of a multidimensional nature)which require direct interactionbetween the analystand the data. Moreover, much of such analysis requires knowledge of specific sectoral issues that cannot be expectedto reside in statisticalinstitutes. Thus the data sets must be produced and distributed to analysts outside of the statistical institute. Only thus can the less tangibleproductfrom the surveys, the improved understandingof poverty, social policy, and householdbehavior, be achieved. Someof the mostcommon uses of LSMSdata are shownin Boxes2.1 and 2.2. Section D of Chapter 7 provides many more examplesof the varied uses that have been made of LSMS data. The reader will have noted that the LSMSquestionnairesinclude modules on topics that are often the focus of single-purpose surveys, including some common and well respected surveys - labor force surveys, income and expendituresurveys, or demographicand health surveys. The LSMS modulesdo not collect the same depth of informationon any single topic as do single-topic surveys and may have smaller samplesso that the precision of measurementof key outcomesmay be lower than for the single-topicsurveys. But becauseLSMS surveyscollect informationon so many aspects of welfare, they not only provide a good multi-dimensionalsummary of welfare, but also allow study of the interactionsbetween these various factors. LSMS surveys and other surveys can be combined into a program of householdsurveysin various ways, dependingon the needs and constraintsof the country. In Jamaica, the local (modified) version of the LSMS is carried out annually and tied to one quarter of the quarterly labor force survey. Surveyson literacy, contraceptive prevalence, and income and expenditure are conducted every three to 10 years to round out the program. In several countries Romania, Russia, Latvia, and Lithuania- one-year LSMS survey projects have been used as a way to pilot alternativesfor reforms to or replacementsof ongoing survey programs. Sometimes, as in Peru, a series of single-yearprojects has provided a time series of data. For Africa, the Social Dimensionsof Adjustment project recommends Integrated Surveys (which are much the same as LSMS surveys) every three to five years with Priority Surveys(which usually cover the same general themes but with much less detailed questionnaires and larger samples) in the intervening years. Questionnaire Content In order to gather data consistent with their objectives, LSMS surveys normally use three different kinds of questionnaires: (1) the household 7 Box 2.11: Common Uses of LSMSData Measurementwith reasonableaccuracyof: * number of persons in poverty * * distributionof welfare variablesthat pertain to manyindividualsor householdsin the sample, such as employmentrates, rates of malnutrition,and mean consumptionlevels. Descriptionor analysis of. * * * * * characteristicsof different socio-economic groups access to or use of major government services (health, education, water supply, electricity, roads) participationin large governmentprograms incidenceof taxes or subsidieson commonlyconsumeditems interactions between aspects of welfare, such as the effect of health on labor supply, of parent's educationon children's nutrition, or of education on earnings. Complementary data will usuallybe requiredfor: * * program impactevaluations program cost-effectiveness studies. LSMS samplesare usuallytoo small to allow: * measurement of variables that pertain to only a few households or individuals, such as infant mortality, patterns of morbidity, and rates of internationalmigration descriptionor analysisof governmentprogramsthat reach only a smallpart of the population description of small socioeconomic groups or geographicunits. * * questionnaire, in which household members are asked about many aspects of the household's welfare, especially consumption, income, and use of social services; (2) the community questionnaire, in which key community leaders and groups are asked about the infrastructure and services available in the community; and (3) the price questionnaire, in which vendors are asked about prices for selected 8 Box 2.2: UsingLSMS Data to Infonn GovernmentPolicy Choices The data from LSMS surveys are designed to be used to understand living standards and the effects of government policies. Here we provide some brief examples of how governments and aid agencies are making use of them. In 1989 the Jamaican government was considering whether to stop subsidizing the prices of basic foods and instead to expand their food stamp program. While the decision was being made, data from their LSMS became available. Analysis showed that most of the benefits from general price subsidies went to the non-poor, while most of the benefits of the food stamps program went to the poor. This helped the government to move ahead with a reform program. The government then commissioned further analysis of the LSMS data to show how many families neededhelp in purchasing a minimumfood basket and how much help. The government used this information to decide on new eligibility thresholds and benefit levels for the food stamp program. While this is perhaps the most concrete single use the Jamaican government has made of the data, they have been used in makingother decisionsas well - whether to change kerosene subsidies, whether and how to establish a 'drug window' in public health clinics, to study the effects of raising user fees for public health care, etc. The survey is conducted annually, and the poverty rates are monitored as well. In South Africa, the 1993 LSMS survey provided for the first time a comprehensive, credible data set for the entire territory of South Africa, including the homelands. The survey was completed just before the last elections. The data were quickly put to extensive use by the new government and by academic researchers alike. The first product was an extensive statistical abstract, followed by a poverty profile prepared jointly by the Ministry of Reconstruction and Development and the World Bank and other studies and reports. The body of work has helped to shift the discussion in the country from debating the nature and extent of poverty to discussionof options for alleviating poverty. For example, it was decided to admit young women in rural areas to future public works schemes since the data showed that this group was often needy and could obtain child care. Also, because the survey data have revealed that the old age pension program is well targeted, attention has moved on to reforming other programs that may be less well targeted. In Ecuador, the 1994 LSMS data were used first to produce a poverty assessment. This work was done by the World Bank in 1995 as part of an ongoing effort with the government to develop poverty alleviation strategies. The findings from the report were presented the Cabinet. Wide-rangingdiscussions identified a number of issues on which the government would like further policy analysis. Arrangements to conduct these will be made over the next few months. The first use will probably be to revise the poverty maps used in targeting many government programs. As is usual, the current poverty maps are based on census data because it allows disaggregationto small geographicareas (parroquias). However, the weighting of the variables used to produce the composite poverty indicator is necessarily ad hoc, since the census contains no direct informationon consumptionor income. Since the LSMS data contain income and expenditure measuresas well as the kind of indicators available in the census, the LSMS data will be used to help select and weight indicators to be used in a revised census-based poverty map. This should enable the government to target its programs more accurately. 9 items. The informationusually collectedin these is shown in Box 2.3. A fourth set of questionnairesto collect information about schools or health facilities is sometimesused as well. collect The HOUSEHOLD QUESTIONNAIRES.LSMShouseholdquestionnaires data on several major aspects of householdwell-being,as shown in Box 2.3. A more detailed summary of all the questionnairesused in Vietnam is included in AnnexI. The full householdquestionnaireusedin CMte d'Ivoire is presentedand annotatedin Grootaert, 1986. The householdquestionnaireused in Kagera region of Tanzania is presentedand annotated in Ainsworthet al., 1992. Becausemeasuringwelfare is a key objectiveof LSMS surveys, measures 3 of consumption are strongly emphasized in the questionnaires. Detailed questions are asked about cash expenditures, the value of food items grown at home or receivedas gifts, and ownershipof housing and durable goods (for example, cars, televisions,bicycles, and washing machines)to allow a use value 4 to be assigned. Because understandinghouseholdbehavior and determiningthe causes of poverty are also central LSMS objectives, the survey collects a wide range of income measures. For individuals in formal sector jobs, the surveys include detailed questions about wages, bonuses, and various forms of in-kind compensation. Informationis soughton secondaryas well as principaljobs. At the household level, detailed agricultural and small enterprise modules are designed to yield estimatesof net householdincome from these activities. Other sources of miscellaneousincome, such as the receipt of private transfers (e.g., child support or remittancesfrom abroad), public transfers (in cash or in kind), earnings (e.g., lottery winnings),and interest incomeare recorded miscellaneous as well. In order to analyze the relationships among different aspects of the household's quality of life, such as the impact of parents' education on child nutrition or the effect of health on employment,it is necessary to collect several measures are used in most LSMS studies of welfare. The data are, 3. Consumption-based however, rich enoughto allowother indicatorsof householdwelfareto be used (See Glewweand van der Gaag, 1988). 4. These goodsare not completelyconsumedwhen first acquiredbut are used over a long period of time. Householdwelfare due to the ownershipof such goods can be based on the estimated yearly rental valuesof those goods. LSMSsurveyscollectdata that are sufficientto imputerental housingand ownershipof durable goods. values for both owner-occupied 10 Box 2.3: Modules in LSMS Questionnaires Module Respondent Subjea HouseholdQuesto nnaire Household Composition Consumption Modulks Head of household/principal respondent Householdroster, demographicdata, informationon parentsof all householdmembers Food expenditures Best-informed household member Best-informed household member Head of household/principal respondent Best-informed household member Food expendituresin the past 14 days and past 12 months; consumption home productionin past 12 of months Expendituresin the past 14 days and past 12 months; remittancesto other households Type of dwelling;housingand utilitiesexpenditures Non-Food Expenditures Housing Durable Goods Income-related Modules Inventoryof durable goods and their characteristics Non-farm selfemployment Agro-pastoral activities EconomicActivities Best-informed household memberfor each of three businesses Best-informed household member All householdmembers7 years and older (all adults must respondfor themselves) Best-informed household member Best-informed household member Income,expenditures,and assets for three most importanthouseholdbusinesses Land, crops, income,and expenditurefrom raising crops and animals; livestockand farm equipment inventory Employment,income, and time data for the main and secondary jobs in the last 7 days and the last 12 months;employmenthistory; unemployment spells in the last 12 months; time use in the home Income from other sources, includingremittances from other households Savingsand net debt the day of the interview; characteristicsof outstandingloans to and from householdmembers Other income Saving and credit 11 Box 2.3 continueson next page Box 2.3 (continued) Box 2.3. Modules in LSMS Questionnaires Modulk Sctoral Moduks Education Head of householdlprincipal respondent All householdmembers (parents respondfor young children) All householdmembers 15 years and older One randomlyselected woman 15 years or older - Respondent Subject Completedschoolingand schoolingexpendituresfor all householdmembers 5 or older; schoolingand other informationof all non-memberchildrenunder 30 Utilizationof health services and medical expenditures any illness in the last four weeks; for utilizationof and expendituresfor preventiveservices in the last 12 months Place of birth, time and current place of residence, and reasonsfor first and last moves Birth history; use of maternityservices and duration of breastfeedingfor last live birth Height and weightmeasurementsof all household members . i.. . . ...... .i Health Migration Fertility Anthropometrics i T; T 7 T: : -D .iS -E. 07T f i f. T 7fl -:.D E .: WE i.. .} 00E- ....... . ik; , ST . .- Demographics Economyand Infrastructure Education Health Agriculture Communityleader Communityleader Size, growth, ethnic mix Economicactivities, access to roads, electricity, water, public services such as public transport,mail service, etc. Locationand characteristicsof schoolsserving community Locationand characteristicsof health facilities serving community Farmingpractices, agriculturalservices available Urk~ Qu,aionnalre:. Headmasteror Communityleader Health workers or Communityleader Extensionagent or Communityleader Market, shops Prices on frequentlypurchaseditems 12 kinds of sectoraldata from each household. The sectoral modulesinclude health, education, fertility, anthropometrics,and migration. The sectoral modules are designed to measure a few key outcomes(such as nutritional status, vaccination rates, incidenceof diarrhea among children,and enrollmentrates), to measurethe use of services that might affect those outcomes, and to supplementinformation from the rest of the questionnaireto study why householdsuse those servicesand what factors influence the outcomes. QuEswOANAIRES.To help limit the length of the household COMMUNTFY questionnaire, the community questionnaire gathers information on local conditions that are common to all householdsin the area. This questionnaireis typically used only in rural areas, where local communitiesare easier to define. The informationcovered by the questionnairetypically includesthe locationand quality of nearby health facilitiesand schools, the conditionof local infrastructure such as roads, sources of fuel and water, availability of electricity, means of communications,and local agriculturalconditionsand practices. In countries where prices vary considerably among regions, it is importantto gather informationon the prices that households 5 actually pay for goods and services. The price questionnaires compile information on the prices of the most important items that a household 'particularlya poor household)must buy and that are widely availablethroughout the country. The prices are gathered in markets or shops in the communities where the householdslive. PRICE QuEmoNNAAREs. SPECIAL FACILHTY QUEsTONNAiRES. Sometimesspecialquestionnairesare designedto gather detailedinformationon schoolsor healthfacilities. These have been used in at least one year of the surveysin C6te d'Ivoire, Ghana, Morocco, Jamaica, and Tanzania. The rest of the manualrefers to "the questionnaire"as though there were only one instrument instead of separate instrumentsfor household,community, price, and facility information. This is conceptually,if not physically,accurate. In formulating the contents of each questionnairethe planners must ensure that the informationserves the survey's analyticalgoals and is collectedin an efficient manner. In formatting, the same principles apply to all the questionnaires, although some of the techniques may be more commonly used in one than another. The sameprinciplesof data processingapply. Occasionallythe logistics of how questionnairesare managedmay diverge, but this is done for convenience rather than because of conceptual differences. 5. See Ravallionand Bidani, 1994. 13 There has been substantialvariation among the questionnairesused in different countries.6 The features discussed here may have been changed or omitted in specific cases because the objectives of the survey or country circumstances differed. Nonetheless, there are more commonalities than differences. Quality Control In addition to sharing common objectivesand questionnairecontent, the LSMS surveysuse an extensive set of procedures to minimizeerrors and delays in data collection and processing. The reasons why each is used and how it is implementedare discussedin detail in the rest of this manual. Briefly, quality control elements include the following: QUESTIONNAIRE FORMAT. A single paper questionnaireis used to obtain informationon the householdand the differentindividualsand businesseswithin it. The questionnairesare designed to minimize errors by the interviewer and other survey staff. The questionnaires are pre-coded, with extensive use of explicit skip patterns. SAMPLING. The need to minimize non-sampling error is given heavy weight in decisions about sample size. Coupled with the analytic objectivesof the surveys, this leads to small samples - usually on the order of 2,000-5,000 households. FILD WORK. Fieldwork and data entry are decentralizedand supervision is very strict. A small number of highly trained interviewers is used, with the fieldwork spread over a full year. Each householdis visited twice, two weeks apart. In each visit, a series of "mini-interviews" with the differentmembersof the householdis conducted. Sinceeach adult responds for him/herself, the errors that can be introducedby proxy respondentsor respondentfatigueare minimized. DATAMANAGEMENT. Data entry and editing are carried out in the field concurrentlywith data collection, usually in the local surveyoffices. As data are entered, a number of data quality checks are carried out by the data entry program. This allows errors or inconsistenciesfound in the first half of the questionnaireto be checked during the second visit to the household. 6. In addition to standard changes in vocabulary required to make a questionnaire relevant to a given country, the major differences include: (i) some modules have been excluded in some countries; (ii) the level of detail has varied widely within each module; (iii) the direct outcome measures (height, weight, upper arm circumference, cognitive skills) and the group of respondents for which they were measured have differed; (iv) the extent of price, community, or facility questionnaires has differed; and (v) a few LSMS surveys are actually linked to other surveys. In these cases, the LSMS questionnaires omit some modules for which information is gathered in the linked survey and the data sets from the two surveys must be merged for analysis. 14 Planning and Budgeting An important hallmark of LSMS surveys, and one of the reasons for their success, is the large role that analystsplay in developingthe whole survey. Not only is the questionnairecontent determinedby analystswho will actuallyuse the data, their input is also used in field testing, in training field workers, in sampling, and in data management. The work program for implementing LSMSsurvey is divided into three an phases. The planning stage often lasts about a year, the field work is scheduled to take place over a full year, and the initial analytic phase of producing an abstract, documentingthe data, and settingup other analyses may take about six months. The survey planner (e.g. the reader of this document)must realize that the many activitiesinvolved in planning the survey take place concurrentlyand that decisions in one area have repercussionsfor other areas. How to do the most important of these activities (designingthe questionnaire,designingand drawing the sample, preparing for field work, preparingthe data managementsystem, and beginning to think about data analysis) is described in the following chapters. How long each of these may take under different circumstancesand how they must be interlinked in time is discussed in detail in Chapter 8. The formal budgets for LSMS surveys have varied widely from country to country, with a range from $155,000 to about $3,000,000. There is a preponderanceof formal budgetsin the neighborhoodof $750,000to $1,000,000, but the "prototype"all-inclusivebudgetdevelopedin Chapter 8 is for $1,300,000. The differencesare so big for three reasons. First, many of the inputs used are often supplied in-kind by either the national statistical agency or the external agency helping to finance the survey, and thus are omitted from the budget. Second, the amount of various inputs may vary from country to country dependingon the surveydesign and the existinginstitutionalcapacity. Third, the unit prices for locally suppliedinputs may vary greatly from country to country. B. Variationsfrom the Prototype The LSMS surveysare not a static, uniformproduct. Each is unique and sometimestheir differencesare considerable. This can be seen from Table 2.1, which lists surveys that share some or all of the characteristicsof the LSMS 15 Table 2.1: Description of LSMS-l7ype Surveys by Countvya Basic Information Year of First Survey 1985/86 1985/86 1987/88 1987 1989 1988 1990/91 1991 1991 Number of Rounds Fielded to Date 4 1 2 2 5 8 1 1 1 Number of Households in Sample 1600 5120 3200 1488 10,000 2000-6000 3360 4800 1500/2200/ 3500 HH Questionnaire Scope full fun full full truncated truncated ful ful ful Price Questionnaire yes yes yes yes no no yes yes no Content Community Questionnaire yes yes yes yes no no yes yes no Facility Questionnaire no no health/ed no no sometimes health health, ed no Country C6te d'Ivoire Peru 1985 Ghana Mauritania Bolivia Jamaica Morocco Pakistan Peru 1990/91/94 Interview Schedule year-round year-round year-round year-round wave wave year-round year-round wave Panel rotating no no no no some no no 85/90/91/ 94 Lima; 9 1-94 elsewhere yes no no no no 4 period Educational Testinge none none 9-55,m,r,R none none 7-18,m,r 9-69,r,m none none Anthropometrics all none all al no child < 5 < 11, parents < 5, mother child < 5 Venezuela Vietnam Nicaragua Guyana Tanzania - national Tanzania - Kagera Region 1991 1992/93 1993 1992/93 1993 1991 3 1 1 1 1 4 14,000 4800 4200 5340 5200 800 wave year-round wave wave wave year round truncated full truncated truncated truncated expanded no yes yes no yes yes no yes yes no no yes no no no no no health, edu, healers, NGOs no no none none none none none none child < 5 all, h/w/c child < 5 child < 5 no all South Africa Romania 1993 1994/95 1 continuou s 9000 36,000 wave continuous no yes full full yes no yes no some none child < 5 child < 5 Ecuador 1994 1 4500 wave no full no yes no none no Note: a. Researchers interested n using data from these surveys should refer to Grosh and Glewwe (1995). b. In the column 'Educational Testing": numbers indicate age range to which applied; codes are m = mathematics, r = reading, R = Ravens Progressive Matrices Test. 7 surveys. More differencesare expectedto crop up in the future, and these may become increasinglylarge. Common Variants The most common variants on the prototypeare sketched here. QUESTIONNAIRES. The questionnaire has been severely TRUNCATED truncated in some cases. This limits the range of possible analysis. The most common case is to forgo attempts to use income as a measure of household welfare and to understand the choices that households make about income generation. Sometimescommunityor price questionnairesare omitted. Another less common variation is to adapt a core and rotating moduledesign in a multiyear survey plan.8 The core questionnaire would allow the measurement of consumptionand a reduced set of other indicatorsof welfare and use of services. Each year a module of specialemphasis studiesa particular topic in depth. This maintains some, but not all, of the possibilitiesof intersectoralwork that is one of the objectivesof the prototypeLSMS questionnaire. Modifyingthe questionnairealways affects what analysis is possible. It not necessarily affect how sampling, field work, or data management is does done. Therefore it will not be discussedfurther in this manual. INTERViEW.Some surveysplan for each village and householdto SINGLE be visited only once. Until recently, this has only been the case when the questionnaireswere severely truncated, but in the Nepal survey now in the field each village will be visited once but the full questionnairewill be maintained. Each household's interview will be conducted in more than one session if necessary for convenienceof the household,but not out of strict protocol for the survey. The use of a single interview usually means that the standard checks on quality from the first interview and correction during the second interview data 7. There are other surveys that have objectivesas similar to the LSMS 'prototype as some of those listed in this table. The Integrated Surveysin The Gambia, Guinea, Madagascar,Senegal, project in the World Bank, or the of and Uganda, supportedby the Social Dimensions Adjustment RAND-sponsoredFamily Life Surveys in Malaysia and Indonesia, are examples. These are similar in spirit, but are omitted here becausethe authors do not know enough about how they were carried out to use them as illustrationsin this manual. The surveys arranged by the Cornell Food and Nutrition Policy Program in Guinea and Mozambiqueand those carried out by the Universityof North CarolinaPopulationCenter in Russiaand the KyrgyzRepublicare also very similar in spirit. Since we know more about how these were conducted,we will draw examples from them. 8. Discussionsof core and rotating modulesurveys can be found in Grosh, 1991for the case of Jamaica, and World Bank, 1993 for Indonesia. 17 are not possible. There are also other minorimplicationsfor field operationsand data management,which will be discussedin Chapter 5. The single visit to the household also eliminatesthe use of the recall period bounded between the two interviews, which has implicationsfor the measurementof consumption. CONCENTRATED PERIODOF FIEIDWORK.In some countriesthe field work has been carried out in just a few weeks or monthsrather than spread throughout the year. With respect to analysis, this limits the ability to study seasonalityand affects how to calculate annualizedvalues of income and consumption. It also has major implicationsfor the organizationof field work and data management, which will be discussedin Chapter 5. REFORmS OF ExISTING SURVEYS. Finally, some countries are using parts of the LSMSexperiencenot to carry out full-fledgedLSMSsurveysbut to reform ongoing surveys. In some cases they have added modulesto questionnaires,in others they have adapted some of the features of field work or data management to improve data quality or the speed with which results are processed, and sometimesthey have done both. Evolution in LSMS Surveys This manualpresentsinformationon what has proved usefulin the surveys conducted in the last 10 years. Much of what has been learned is expected to continueto be relevant, though the way in which the principles espousedhere are put into practice will evolve. The principal types of changesexpectedincludethe following: The early LSMS surveys were carried out as research projects. Their first goal was to determine whether it was feasible to gather such comprehensivedata sets. Their secondgoal was to conductresearch to better understand household behavior and its implications for the design of governmentprograms. Emphasiswas on analysisof the determinantsof welfare and their interactions, rather than on precise measurement of a few aspects of welfare. When the first surveys proved feasible and their analyses fruitful, policymakersand their advisorsrealized that data from the surveyscould be very useful in policymaking. The descriptionsof the welfare of the populationand of the use of governmentserviceswere especiallyvalued. Someof the results from the more sophisticatedstudies of the determinantsof welfare and the impact of policies were also valuedby the operationsaudience, but perhaps less so than by the academic community. CHANGING PURPOSE. The shift in motivationfor the surveys is leading to changes in them and to considerablevariation from country to country. Some of the content is being adjusted. Often there is a desire to have the estimates of indicators be accurate at sub-national levels. This requires a much larger sample and thus calls into can questionwhether quality and comprehensiveness be maintained. Attentionis 18 more often being given to developinglocal capacity, both for data collectionand for data analysis. CHANGiNGACTORS. The cast of actors involved in implementingnew surveys has changed greatly in the last five years. In the early years of the LSMS surveys, the LSMS division of the World Bank wore many hats simultaneously. It usually provided the impetus to carry out a survey in a particular country, often arranged and administered financing, provided all technicalassistance, and was often the main user of the data. Now these hats are being worn by many different actors. Indeed, several LSMS-typesurveys have been developed without any involvementat all from the LSMS division of the World Bank. Its various functionsare being parceled out to the nationalplanning agency, to operational staff in the World Bank, to other internationalagencies, or to technical assistants. Since these alternative arrangements are new and varied, institutional methods that have worked in the past may need to be modified for some tasks, such as designinga questionnaireor organizingfield work and data management. The surveyplanner who reads this manualwill need an extra dose of imagination in determining how to apply the suggestionsprovided here to the institutional circumstancesrelevant for the particular country. CHANGING TCHNOLOGICALENviRONMENT.Many of the practical issues in survey implementation heavily influencedby availabletechnology. Every are technological change has implications for survey management, for technical assistanceand training, and sometimesfor costs. Three changesin technologyimproved data entry programs, more portable hardware, and computer-assisted interviewing - may affect how LSMS surveys are implementedin the future. These innovationsare already in view, and more are surely developing. Commercially supported data entry packages may soon supersede the customizeddata entry programs that have been used for LSMS surveys to date. When these packages are used, the amount of technical assistance or training required in the use of the softwaremay be reduced, but it will still be important to ensure that there is adequateunderstanding the conceptualissues of handling of hierarchical file structures and determiningrange and consistencychecks. Hardware has evolved to the point where it would be simple to take the data entry function on the road with the interviewersrather than having the data entry operator and computer located at a base station. This will change some aspects of the day-to-daymanagementof field work and quality control. Such a system is being used in the survey underway in Nepal as this manual is being written. Advances in computer technology permit a still more ambitious proposition. Interviewerscan enter data directlyinto a portable computerduring 19 the interview, thus eliminatingthe paper questionnairescompletely. This system has been piloted already in Boliviaand is scheduledfor piloting in the Indonesia Family Life Survey in 1996. The eliminationof the paper questionnairewill require fundamentalnew approaches to how to involve all the right people in questionnairedesign, how to organizeand supervisefield work, establishquality control systems, and manage data. 20 Chapter 3. QuestionnaireDevelopment Key Messages * * The process of definingthe content of the questionnairemust be driven by analysts and by policy needs. Formatting a questionnaire is a complex art and proper formatting is critical to survey success. It must be done by the survey planners and not relegatedto lower-levelor clerical staff. The field test is also critical to the successof the survey. For advice on formulatingquestionnaire content, survey planners must read the cited materials, especially Grootaert (1986) and Ainsworthand van der Gaag (1988)and the new set of revised questionnairemodulesthat should be availablein final form in 1998 and in draft form sometimein 1996. * * Those who have never had to analyzedata from a questionnairethey have developedthemselvesmay think that designinga questionnaireis easy. It is not. This chapter first gives an overviewof the processof developinga questionnaire. It then discusses in detail how to produce a workable format. SectionsA and B are recommendedfor all readers, while the Section C may be skimmedlightly by those who are not involved in designing a questionnaire. For those who want morebackgroundand detail on general issues in questionnaire design, UNNHSCP (1985) is a useful introductory manual. A. Questionnaire Content The most important issues in designing a questionnaireare the analytic objectives and measurement techniques to be used. Indeed, these are so important that they are treated separately in other LSMS documents. LSMS standards for the objectives of the surveys and information requirements stemming from them are explained in Grootaert (1986) and Ainsworth and van der Gaag (1988). The closely related Integrated Survey supported for African countries by the Social Dimensionsof Adjustmentproject is describedin Delaine et al. (1992). A numberof LSMS working papers on measurementwere written before formulationof the LSMS questionnaires. The completelist of working papers is provided in Annex HI. As this manual is being drafted, the LSMS divisionof the World Bank is embarkingon a review of the first 10 years of field experienceto determinehow questionnairesmight need alteration, either because of changingobjectivesin LSMS surveysor to improveaccuracy. The first results of the planned review will be available in about 1996 and are recommendedto interested readers. 21 B. The Process of Questionnaire Development Perhaps the most importantway to ensure a successfulquestionnaireis to make sure that the right kinds of people are involved in its design. The second most important thing is to allow enough time and repeated iterations in the questionnairedevelopmentprocess. The third critical element is the field test. The Actors THE ANALYS7S. The importance of the analyst in questionnaire design cannot be overemphasized. Much of the successof the LSMSstems from the fact that the questionnairesare designed by analysts. Drafting the questionnaireand coordinatinginputs from others is usually best assigned to a small group of analysts who share two characteristics: First, they shouldknow what subjectsare of policy and analytic interest to the country. Second, they shouldhave experienceusing data from similar surveyson a variety of topics. The team might be composedof one person from the nationalplanning agency, one from academia, and one person who has helped analyze or design LSMS surveys in other countries. It is crucial for the team to have extensivelocal expertise when designing the questionnaire. Indeed, it is preferable for local analysts to take primary responsibility. They bring an irreplaceableknowledgeof the country's society and of existing programsand they know what issues shouldbe emphasized. They may be familiar with earlier local surveys about some of the topics covered by the LSMS, which will help them designprecoded questions. Further, they will know the network of people and institutionsthat shouldbe contactedduring the survey design process. Sometimesit is also desirable to have internationalanalysts involved in questionnairedesign, especiallywherelocal analystsare not familiarwith surveys that have objectivessimilar to those of the LSMS. The internationalanalystscan bring experience about what has worked in other LSMS surveys and why. Judicious use of past experience, rather than blind replication of old questionnaires that are ill-adapted to present circumstances, is probably best ensured by this balance of local and foreign expertise. Most LSMS surveys have probably erred on the side of having too little local input. Where local input has been obtained, it has often been provided by statisticiansfrom the statisticalagency (data producers) rather than from social policy analysts or from governmentand academia (data users). The statisticians often have only a limited knowledge of sectoral policy issues and programs. They can improve nomenclatureand precoding,but are not necessarilyqualified to help set priorities among differentpossible objectives. 22 POLICYMAKERS. defining the basic and subsidiary objectives of the In survey, the team responsible for drafting the questionnairemust seek extensive input from policymakersand program managers. The first level is to decide what important issues shouldbe covered. This will help establish the relative weight of the different modulesin the questionnaire. Then the important issues can be identifiedwithin sectors. Once these are outlined,the questionwriters may also have to learn a fair amount about how specific programswork. This means that interviews with technicallevel people in many agencies may be needed. Once this informationis available the actual drafting of the questionnairemay begin. Box 3.1 shows how progressivelygreater detail is required at each level of the process. The step of ensuring adequate communicationand consultation with policymakersis usually given much less attention than it deserves. People who are unfamiliar with survey work may find it difficult to read complicated Box 3.1: Levels of Refinement in Determining Questionnaire Content Writing the questionnaire involves moving from knowing the importance of broad issues to getting the details of specific questions straight. This box illustrates the successive levels of details required. Overarching Objectives: Define the objectives: for example, to study poverty; to understand the effects of government policies on households Balance between Sectors Define which issues are most important: for example, the incidence of food price subsidies; the effect of changes in the accessibility or cost of government health and education services; the effect of changes in the economic climate due to structural adjustment or transition from a centrally planned to a market economy. Balance within Sectors Within the education sector, define which of the following are the most important for the country and moment: the levels and determinants of enrollment, poor attendance, learning, and differences in male and female indicators; the impact of the number of years of schooling on earnings in the formal sector and agriculture and the question of how or if they differ; which children have textbooks or receive school lunches or scholarships; how much parents have to pay for schooling. Write Questions to Study Specific Issues or Programs In a case where it is decided that it is important to study who has access to textbooks, for example, the question writer will need to know: how many different subjects are supposed to have textbooks available; if the books to be given out by the government are to be given to each child individually or are to be shared; if they are to be taken home or used only in the classroom; if they are to be used for only one year or several; if they are to be paid for; when the books are supposed to be available; and are textbooks bought from bookshops better or worse than those provided by the school. 23 questionnaires and imagine what analyses will be possible. It is therefore preferable to show policymakersand program managers examples of tables or other analyses that could be producedfrom it along with the questionnaireitself. These may be dummytables or examplesof work done for other countrieson the basis of similar questions. Where surveys are planned to be repeated in successive years, the first year's abstract is an excellent tool for obtaining feedback from policymakersfor use in the future. A complementarystrategy is to ask the policymakerswhat they want to know. Then the survey designer can translate that need into appropriate questionsor modulesin the questionnaire. THE DATA PRODUCERS. Input from the data manager is essential in questionnaire design. Often the data management process can be greatly simplifiedby minor changesin the layout or flowof the questionnairethat do not detract from its analytic content. The data manager should comment on every draft (see Box 3.2). It is also useful to have the input of the field manager, who will notice whetherthe instructionsto the interviewerare clear, if the skip codes are correct, and if the format is consistent. There is, of course, a natural tension betweenthe analysts, who want comprehensiveinformnation, the field manager, who is and likely to see all the disadvantagesbut few of the advantagesof a lengthy, complex Box 3.2: Synergyin Elementsof Questionnaire Design There is great synergy between the different aspects of questionnairedesign defining analytic content, simplifyingfield work, and specifyingof the data and quality checks. The story of the mother/fatherquestions on the LSMS roster illustrateshow a single change can serve all purposes. The traditional method of assembling a roster establishes who is the head of household then asks for the relationshipbetweenthe headand each householdmember. and Where family structure is complex, this can require many codes and the relationships between various individualsoften remain unclear. For example, is the sister of the head of householdthe mother or the aunt of the nephewof the head of household? In CMte d'lvoire in 1985 the writer of the data entry program suggested adding question. These asked separatefollow-upquestionsafter the traditionalrelationship-to-head whether the spouse, the father, and the mother of each household member was in the household and, if so, what their identificationcodes were. The data managersuggested these questions to allow powerfulconsistencychecks on the age, sex, marital status, and relationship-to-head-of-household variables. The change also simplified field work by variable. reducingthe complexity codesneeded for the relationship-to-head-of-household of But perhaps the most importantcontributionhas been analytic. Knowingwhich household members are the parents of the childrenin the householdhas proved helpful in modeling the determinantsof child welfare, especiallywhen addressingintra-householdbargaining - an issue that was hardly even on the analytic agenda at the time the innovationwas ever since. made. Needlessto say, this systemhas been recommended 24 questionnairebecause field workers do not usually analyze the information they collect. A real life story can illustrate the risks of not having the right kinds of people involved in questionnairedesign. The questionnairefor the first year of the Jamaica Survey of Living Conditions (1988) was devised largely by the internationaltechnical assistants who knew little of Jamaican social programs. Though largely successful in accomplishing its analytic objectives and well formatted, the questionnaire ended up with three important program-specific flaws. First, the consumptionsectionlumpedtogether one of the main subsidized staples with one of the main non-subsidizedstaples, making the study of the incidence of food subsidies cumbersome and probably inaccurate, although changes in food subsidypolicy was one of big issues being debated at the time. Second, the reference period on receipt of food stamps was given as a month, although the stamps are only received every two months, again making difficult the study of incidence, which was an important issue at the time. Thirdly, the education module used was very similar to that used in previous LSMS surveys where the purpose was to study the determinantsof enrollmentin primary school. Since primary school enrollmentin Jamaicais universal, nothingvery interesting was learned and the opportunity to study issues important in Jamaica, such as daily attendance, the extent of the textbook or the school feeding programs, or patterns of secondaryenrollment was missed. Fortunately, the Jamaican version of the LSMS is conducted annually. Thus these flaws were corrected in the second year's questionnaire. Moreover, they were pointed out in the draft abstract from the first year. They thus served as vivid examplesto many of the people who were involvedin managingthe surveyfrom the second year on of the importance of their input. In the end, this pedagogic purpose was quite useful. The Iterative Process The process of questionnairedevelopmentis an iterative one. After an initial version is drafted, it shouldbe reviewedin detail by the various interested parties. The next draft takes the assembledcriticisms into account. This process may be repeated several times. Translationsmay be required (see Box 3.3). A seminar may be conducted and further revisions done. Then a field test is conducted and the questionnaire revised again. Depending on the extent of revisions, a second field test may be needed for some parts of the questionnaire. A few concrete indicators of the extent of revisions that may be done at each stage may be useful. It is completely standard that the international technicalassistantswrite letters of 20 or more (single spaced)pages pointing out imperfectionsin the substantiveformulationor formattingof the questionnaires, even when theseare on their third or fourthdraft. Often two or more people will write such letters and only about half of their remarks will overlap, the others will relate to imperfections the other has missed. When the changes are suggested on the questionnaireitself, it is rare that a single page not be marked 25 Box 3.3: Translatingthe Questionnaire Translationmay be required for three reasons, which have different implications for logistics. Most importantly and most commonly, the survey may need to be administeredin several languages. In the many countrieswhere more than one language is spoken, good quality control for the field work requires providing written, verbatim questionnairesin as many of the languagesas practical. Research reported by Scott and others (1988) demonstratesthe importanceof this. They conducted an experiment to measure interviewer errors when the interviewer was asked to provide verbal field interpretations e.g., to use a questionnairewrittenin English to conductan interviewin Tagalog or Cebuanoor to use a French questionnaire conductan interviewin Baouleor to Dioula. Interviewererror rates were two to four timeshigher with oral field interpretations than with written translationsof the questionnaires. When the reason for doing translationsis to administer the final questionnairein several languages,the preliminarydrafts of the questionnaire be developedonly in the can official language. Ideally, the questionnaireshould then be translatedand field tested in each languagein which it will finallybe written. In fact, the field test is often done using oral interpretationsof the officiallanguageversion of the questionnaireonly. Thus the wording in the local languageinterviewsduring the field test may not correspondexactly to the verbatimwording worked out later for the written translationsinto local languages. Whileimperfect,this is often viewedas a reasonabletradeoffagainstthe difficultiesof field testing in each language. When translating questionnaires, the classic practice is to translate from the language in which they were developed into the language(s) in which they will be administered and then translate them back into the original language. After the back translation,the two versions in the first languageshouldbe compared. Where the wording or meaningis different, the translationshouldbe adjusted. The first translationshould be done by a person or group of persons familiarwith the purposeof the questions. The back translation should be done by someonewho was not intimately involvedin designingthe questionnairein order to avoid contaminating interpretation the with prior knowledge. Most LSMS questionnairesare printed in only in the official language(s)of the country and multilingual interviewing teams are used for the mostcommonlocal languages. In this case a few key questions or phrases may be translated into these languagesand presented in the interviewer manual. For less commonly spoken languages, local interpretersmay have to be used. In this, the LSMS surveys have been within the range of normal survey practice, but behindthe cuttingedge of quality control. The cuttingedge is defined by the World Fertility Survey, which used as a guideline that questionnaires should be prepared in any languagethat would account for more than 10 percent of the sampleand that a minimumof 80 percent of the sampleshould be coveredwith a verbatim questionnairein the languageof interview. A secondreasonfor translationsis that sometimesinternationaltechnicalassistants do not speakthe predominantlanguageof the country wellenough to assistin designingthe questionnairedirectly in the official language of the country. The VietnameseLSMS questionnaire,for example,was developed jointly in Englishand Vietnamese. In the Latin Box 3.3 continuedon next page 26 Box 3.3 (continued) Americancountries,in contrast, the LSMSquestionnaires haveusuallybeendrafteddirectly in Spanish. When translationis required as part of questionnaire development,it becomes necessary to update the translationof each draft, which can require substantialtime and money. Finally, translatingquestionnaires from the local official languageto one or more of the major internationallanguages (English, Spanish, or French) for the international research community can help stimulate data analysis that may be of interest to local policymakers. These translationsmay be done after the final questionnaire developed, is and back translationscan be omitted. Questionnaires should always be worded in simple terms used in the languageas commonlyspokenrather than in academicor formallanguage. The gapbetweenthe spoken and writtenlanguagesand the difficultyof balancingsimplicityandprecisionmaybe greater in local languages,especiallythose that are not commonly used in writing. The translations should thereforebe especiallycareful in findingan appropriatebalance. Let us illustrate the kind of problem that may occur. The question 'LEstuvoenfermaen las ultimascuatro semanas?"literallyasks, in Spanish,whetherthe respondent sick in the last four weeks. was But in Chilean commonlanguage, it could be understoodas being a polite euphemismfor asking whethera woman has had a menstrualperiod in the last four weeks. An even more difficult problem in wording was revealedin the field test in Nepal. Apparentlythe most natural Nepaliphrasingfor 'have you been ill?" is closer to 'have you been to the doctor?" The change in meaningfrom what was intendedwas revealed when several respondents answered 'no, I couldn't afford to go," an inappropriateresponse to "have you been ill?" in red ink on the first coupleof drafts. The formulationof the NicaraguanLSMS questionnaire,whichwas not subjectto unusualdifficulties,took nine monthsand produced a foot-high stack of different versions of questionnaires. Appropriateinput from all the actors can be soughtthrough the aggressive pursuit of informal contacts. However, it is often preferable to add formal elements to the process as well. One option is to create a user committee. It can have several roles: * * it provides a forum to balance diverse objectivesof the survey; it providesa mechanismby whichany interested individualor agencycan make suggestionsfor the survey(either through their representativeon the committeeor through addressingthemselvesto the committeeas a whole); the committee members can help facilitate access to individuals and information in their agencies that is required by the team drafting the questionnaire;and * 27 * * it provides one mechanismfor the plans and results of the survey to be 9 known by policymakers; and because the members of the committeeare familiar with both the policy questionsand the survey's content, they are well positionedto foster data analysis. The user committee should not assume too much authority for the technicaland day-to-daymanagementof the survey. For example, the committee should not be involved in the details of questionnaireformat or the brand of computers to be purchased by the survey organization. Instead, the committee should help set the objectives of the survey, which have implications for questionnairecontent, sample design, and cost. The user committeemight be chaired by the national planning agency or co-chairedby the planningand statisticalagencies. Members should come from the sectoral ministrieswhose interestsare of greatest concernin the surveys(such as health, education, welfare, agriculture, and family or women's affairs). Members of the policy research community(universities, independentresearch institutes, and internationaldevelopmentagencies) should be included as well. It is ideal if the individualswho serve on the committeeknow about surveys, are interested in the policies to be studied by the LSMS, and come from appropriate parts of their agencies. Where that is not possible, it may be preferableto choose them based on their interest and knowledge rather than their institutional affiliation. A formal seminar can also be a useful tool. The presentations could explain the plans for the survey, including its objectives, the questionnaires, sample plan, and approach to field work. Some general background on LSMS surveys elsewhere might also be included. The presentations should clearly define which decisions have already been firmly taken and are not open to change, and which on whichelementsfeedbackis sought. It is usually necessary to mentionany refinementsin the draft questionnairehave been made betweenthe time it was circulatedprior to the seminar and the date of the seminar. The bulk of the discussionshould solicit feedbackon the content of the questionnairesand the plans for analysis. The participants should include staff from all the concemed govemment agencies, several local research institutions, and internationaldevelopmentagencies. A seminar has the advantageof being able to involvea larger number of peoplethan informalmeetingsor a user committee. It also means that those providing feedbackdo not need to draft formal written comments and can provide input immediately. 9. Of course, mrny other complementary mechanisms should be pursued as well. 28 Field Test of the Questionnaire The field test is one of the most critical steps of survey preparation. Its goal is to ensure that the questionnairesare capableof collecting the information they are supposedto collect. The LSMS field test addressesthe adequacyof the questionnairesat three levels: Is the full range of required information collected? Is the information collected by different parts of the questionnaireconsistent? Are there any unintentionaldouble counts of some variables? QUESTIONNAIRE AS A WHOLE. AT THE LEVEL OF INDIVIDUAL MODULES. Does the module collect the intendedinformation? Have all major activitiesbeen accountedfor? Are all major living arrangements, agricultural activities, and sources of inkind and cash income accounted for? Are some questionsirrelevant? Is the wording clear? Does the question allow ambiguous responses? Are there alternative interpretations? Have all responses been anticipated? AT THE LEVEL OF INDIVIDUAL QUESTIONS. It is important to obtain good coverageof all major socioeconomic groups in the field test. For example, sampled householdsshouldinclude those that are rural and those that are urban; individualsemployed in the formal sector, in the informal sector, and farmers; farmers in the main agroecologicalregions and production schemes(independent,cooperative, and wage earners), and so forth. The householdsshouldnot be selectedat random. Instead,different types should be purposely includedso that the various situationslikely to be found during the survey are observed during the field test. LSMS field tests usuallyconduct interviewsin about 100 households. To get enough responses for some sectionsof the questionnaire,it may be necessary to visit extra householdsand conductonly partial interviews. For example, the original 100 householdsmay not include enoughpregnant women or people who have been ill in the month before the interview to test those moduleseffectively. In such a case, householdswith pregnant women or ill people should be located 0 and interviewed using the health module.) A field test usually takes about one month to complete. More time is required if the final questionnairesare to be 10. An alternative approach is to stretch the reference periods during the field test. For instance, instead of asking "were you ill or injured during the last 30 days?" as in the actual survey, it may be expedient to ask "were you ill or injured in the past 12 months?" or "when was the last time you were ill or injured?" This approach will simplify the logistics of finding enough persons to put through the module, but will not test as exactly whether they have problems recalling the information since the recall period used in the field test will be longer than that in the final questionnaire. 29 produced in more than one language, because each version of the questionnaire should be field tested. While a final large (100 householdsor so) field test is desirable, quite a lot can be learned from smaller tests. As a general rule of thumb, half of the problems will probably show up in the first ten householdsinterviewed. In one recent field test, the internationaltechnicalassistants found enough fodder after three households to write six pages of comments about a single module. Such small tests may be particularly appropriate on new or difficult modules as a preamble to a fuller test of the whole questionnaire. The personnel involved in the field test should be the core headquarters team, a few experiencedinterviewersor field supervisors,and the analysts who helped design the questionnaire. It may also be helpful to include persons with experience in other LSMS surveys. The people should work in teams, and each team should include representativeswith each kind of expertise. The number of teams involved in the field test should be kept small. Mechanismsshouldbe set up to allow contact between them during the field test so they can compare notes on problems they encounter and solutionsthey have tried. Perhaps the best way to accomplishthis is to have all the teams involved in the field test work for the first few days in one of the main cities. This way the teams can be in contact each evening when the first and often biggest flaws in the draft questionnaire are uncovered. Agreements can be reached on modifyingthe questionnaireduring the field test itself. Each interview during the field test should include the respondent, the interviewer, and an analyst or senior survey specialist. During the field test, it is acceptableto tactfully interruptthe interviewin order to refine the wordingof a question or the responses coded for it (of course in the actual survey, the interviews should be conducted in private and the wording on the questionnaire respected). The interviewersused for the field test shouldbe drawn from the agency's pool of experienced staff. In training for the field test it is assumed that the participants are generally good interviewers familiar with basic interviewing practices and able to distinguishbetweenproblems caused by deficienciesin the questionnaireand problems caused by their lack of familiaritywith it. Training focuses on the purpose of the survey and the structure and format of the questionnaire. Usually one week is adequate. One to two weeks at the end of the field test shouldbe set aside to review the results from it and to agree on what changes are required. Essentially, the group involved in the field test should go through the questionnairesmodule by module and discuss any issues that arose. The inevitable concern that the interviews are too long should be tempered by the knowledge that the time 30 required for an interviewfalls dramaticallywhen the interviewersare well trained and have become familiar with the questionnaire- usually to half or less of the time taken in the field test. It is not necessary to enter the data from the field test, since the sample is so small and non-randomthat it is difficultto make any decisionsbased on the statistics produced." The personal participation of all senior staff (including analysts) is fundamentalfor both the field test and its evaluation. An anecdote will illustrate this. In one country, prior to the field test, a manager in the statistics office asserted that collectinginformationon familyassets would be impossiblebecause respondents would fear that the information would be used for taxation. The module was included in the field test and no unusual difficulties were encountered. But the prime opponentof the moduledid not witnessthe field test and some of those who did participatein the field test had to miss the module's evaluation. Despite the successful field experience, the module was removed from the questionnaire,largely because key decisionmakersonly participatedin part of the process. Many small changes will probably result from field testing, including changes in wordingof questions, the format of the questionnaire,and the answer codes. If major modificationsare indicated for the questionnaire's structure or in the way conceptsare measured,the modifiedquestionsmust be re-tested. For this reason it is sometimesdesirableto begin the field test with alternateversions of particularly difficult, contentious,or important modulesof the questionnaire. Ideally the communityand price questionnairesshould be field tested at the same or very nearly the same time as the household questionnaire. This allows the analystsinvolvedto treat the resultinginformationas a single body and to take into account changes on one instrumentthat may have repercussionsfor other instruments. It can also reduce travel costs, sincethe communityand price questionnairesshould be tested in a variety of locations. In fact, experience with LSMS surveys has been that field testing of the communityand price questionnairesis often neglectedin favor of concentrating on the householdquestionnaire. The communityand price modulesmay be tested late and haphazardlyor even not at all. It is probablynot coincidentalthat users of the data seem to have more complaintsabout the community and price data than about the householddata. If staff time constraintsmake it preferable that the questionnairesnot be tested all at once, it is at least important to ensure that each is tested well. 11. The questionnairesfrom the field test will, however, be useful fodder for testing the data entry program. 31 The detailed facility questionnaires have sometimes been nearly as complex as the household questionnaire. Field testing facilityquestionnairesis essential and that has, in fact, been the practice. Care should be taken to visit facilitiesin each of the major categoriesexpected to be of analytic interest. For example, to field test a health facility questionnaire, visits might be made to public health posts, public clinics, private doctors' offices, public hospitals, and private hospitals, each in rural and urban areas. Since facility field tests are major undertakings in their own right, it is probably best to conduct them separatelyfrom the testing of the other questionnaires. C. Questionnaire Format The questionnaire's format is important because it clarifies the analytic objectives. Furthermore, a good format minimizespotential interviewerand data entry errors, thusimprovingdata quality and improvingthe timelinesswith which the data are available. While some of the contents will need to be changed from country to country, almost all of what has been learned about questionnaire formats in LSMS surveys is applicableto new countries. This section discusses characteristics that should be replicated in all LSMS surveys (and, in fact, are good practice for other surveysas well). UNris OF OBSERVATION. The art of designing a complex survey questionnaire is, to a large extent, the art of choosing appropriate units of observation. Often this is simple: for example, sex and age are clearly attributes of individuals, while the roofing material of the dwelling is an attribute of the household. Sometimes, however, it may not be obvious what the most natural level of observationis. To collect informationon animal assets for a rural household, for instance, one could choose to observe individualanimals and record things like species, breed, age, and size. Alternatively,observerscould note the animal species and then ask the farmer how many of these animals are owned, what it costs to feed them, and so forth. Precise definition of units of observation is especially important in LSMS surveys because so many units are used. The Kagera Health and DevelopmentSurvey, for example, uses 22 separate units of observationin the householdquestionnairealone (see Box 6.1). The choice of the unit of observationwill largely be determined by the information's expected analytic use. The designer's judgement on the cost or reliability of the informationobtainedmay also affect the choice. For example,if the objective is to study how education affects wages, income data must necessarily be collected at the individual level, since education can only be observed at the individual level. Alternatively, if the analytic objective is to describethe regional incidenceof poverty, knowingincomeat the householdlevel suffices. It may nonethelessbe preferableto gather the informationseparatelyfor each wage earner and householdenterpriseand then aggregate it to get household 32 income on the grounds that this method probably yields more accurate informationthan a general question on total household income. IDEA7FIERs. Each object observed in the survey must be uniquely identified. This usually requires two or three separate codes. The first code always identifies the household. The second code identifies the individual, business, or plot of land. Sometimesa third level applies, for example, all children ever born to each woman in the householdor a series of assets for each business. The importance of adequate identifiers is so obvious that it is hard to believe that mistakescan be made, but they can. In one health survey we know of, the questionnaireconsistedof two sheetsof paper stapledtogether. One had informationon the household,the other on individuals. In order to facilitatedata entry, the pages of the questionnaire were separated. Unfortunately, the householdidentifier was not put on the page for individuals,so it was impossible to link the two parts of the survey with each other. The identifier used for each household and to link all its data should be short so as not to take up undue space and to avoid errors caused by copyingor typing the same long code over and over. Statisticalinstitutionsaround the world like to identify householdsby long series of numbers and letters representingthe geographicallocation and samplingprocedure. This method is cumbersomeand expensive; often a dozen digits or more are used to number a few hundred households. It is better to use a simple serial number that is written or stamped on the front page of the questionnaire. This numbershouldbe appendedto every datum collected for that household. Geographic location, urban/rural status, samplingcodes, and so forth are, of course, importantattributesand as such must be includedin the variables recorded about each household,but they need not be used for householdidentification. Possible improvementsof the householdserial number idea include: * having the numberpre-printedby the print shop, which will ensure that no serial number is duplicated; * having the serial number printed on every page of the questionnaire, so that if pages become detached they can be matched with the rest of the questionnaire;and 33 * 2 using a check digit' on the serial number to flag errors in copying the number. Whenever possible, the identificationcodes for the second or third levels of observation should be pre-printed on the questionnairepages to which they pertain. For example, the individualidentificationcode is printed on every page for which individual data are collected. This ensures that the codes cannot be omitted and there are fewer opportunitiesfor errors in copying. An example of these codes appears in the left-mostcolumn of Figure 3.1 . QUESTiONNAIRE LAYouT. The LSMS questionnairesare designed so that only one questionnaire is needed for each household. This contrasts with a system sometimesfound in simpler surveys. In some such surveys there is one household questionnaireand a separate set of individual questionnaires. This requires that the recording of identity codes be perfect on all questionnaires. While perfection is always sought, it is rarely achieved, and separate questionnairescreate the risk of improper matching. The extent of difficulty is illustrated in the case of the Russia LongitudinalMonitoring Survey. Care was taken to ensure accurate coding and matching,but the extent of error introduced was non-negligible. For the first round in the summer of 1992, there were 3 percent fewer individualquestionnaires than expected based on the household questionnaires. By the summer of 1993, in the third round of the survey, the discrepancyhad grown to about 9.5 percent. A grid is required in cases where there may be more than one of a unit of analysis in a household. For example, a householdincludes several persons and may also have several plots of land or grow several different crops. A grid is designedfor each of these cases so that questionsare arranged across the top and the units of observation(people, plots, or crops) down the sides. Examplesare shown in Figures 3.1 and 3.4 through 3.7. Note that the identificationcode for the unit of observationis either printed on the left side of the grid on each page 3 or filled in by the interviewerin the first column." In the grids for individuals, lines are differentiatedwith alternatingshadedand unshadedblocks or by printing the questionnairein color with a different color used for each row or block of rows. This helps the interviewer record the answer on the correct line. 12. In a code number, such as 49-601-666-3, the last number is the check digit. When an algorithm consisting of a series of arithmetic operations is performed on the code number, the result should give the check digit. An example of a check digit algorithm is as follows. Each digit is multiplied by a number determined by its place in the sequence and the results are summed. This sum is divided by a specific number and the remainder is subtracted frc.m that number. The result is the check digit. Check digit algorithms are constructed sc. *ha ;onImon coding mistakes, such as the transposing or omitting of digits, will produce thc wrong .1he.k dilgit. 13. The number of times the interviewer has to fill in the identification codes by hand should be minimized, as this introduces the possibility of errors. 34 Figure3.1: Illustrationof Individual Identification and Skip Codes FOLD -OUT RST ER A : I D E N T I F C A T I O N1N NAME C O D Ut.p E S E C T I oN 1 I Did you receive way pension or social security pay-ent during the test 12 months? 5. W AG E E M P L OY M E N T 3 Did you work for pay, pro-fit, or fi_l1y gain (cash or In-kind) dur-ing the pest 7 days? PART C. PENSION,SOCIAL SECLMITY UNEMPLOYMENT| AND 5 Were you looking for work duri ng the pest 7 days? 6 why didn't : : : : :I : I D E N T 2 Now nxh ney did you receive? F :I :C A :T :I :0 SEX :0 :E Were you avelsLble for work during regular work hours during the past 7 days? you look for work? AGE : . C .O0 YES... 1 :Co .... 2(.'3) YES . i (>N NEXT PERSOI) No. 2 RUPEES YES.. 1 NO .... 2 YES ....... I (' NEXT PERSON) No . 2 SICK.1I NANDICAPPED ............... 2 TOOOLD/RETIRED. 3 DO NOTWANT WORK. TO 4 STIDENT .5 HOUSEWORK .6 TOOTOUCG . 7 ON VACATION . 8 AWAITING REPLYOF EMPLOYER ..... 9 WAITINGTO STARTA NEW .10 JOB NO WORK EXISTS.11 DOMITKNOW NOW TO LOOK ... 12 OTHER REASONS ............ 13 (SPECIFY:_ _) > EXT PERSO] PALE ... IA YEARS : PENSION SECtITY FENALE. .2 01 02 J AN E 03 F R E DD I 04 BAI Y 05 06 D E2 1 : 101 . 02 DOE E E DOE _ .03 1 .04 05 .06 Note: the dotted lines indicate that the questionnaire pege was truncated in this illu tratlon Figure 3.2: Format When Only One of a Unit of Analysis is Observed SE CT I O N 1. What type of dwelling 2. HOU S I N G P ART A: |TYK OF DWELLING does your household occupy? INTERVIEWER: PLEASEPROVIDE THE FOLLOWING INFORMATION THE ON RESPONDENT HOUSEHOLD'S DWELLING UNIT (Q. 8-11) 1 2 3 4 5 ) .. 6 8. MAIN CONSTRUCTION MATERIAL OUTSIDEWALLS: OF MAIN TYPEOF DWELLING SINGLE-FAMILY .................. APARTMENT/FLAT .................. ROOM LARGER IN UNIT COMPW ................ ND PART OF A COMPOUND .............. OTHER(SPECIFY: BAKEDC..... BAKEDBRICKS/ CEMENTED BRICKS/ STONES-CEMENT BONDED ............. 1 BAKEDBRICKS/STONESHMMBONDED. 2 WOOD/BRANCHES. 3 CONCRETE. 4 UNBAKED BRICKS. 5 OTHER PERMANENT MATERIALS. 6 (SPECIFY: ) NO OUTSIDEWALLS. 7 9. MAIN FLOORING MATERIAL: EARTH ........ 2. How many rooms does your household occupy, including sleeping rooms, living rooms and rooms used for household business? (DO NOT COUNT STORAGE ROOMS,BATHROOMS, TOILETS OR KITCHENS) 3. Are any of the rooms also used for a household business or trade? (Excluding storage areas or housing for livestock) YES1..... NO. | 2 (> 5) WOOD. STONE/BRICK. CEMENT/TILE. OTHER. (SPECIFY: .... 4. How many are used primarily for your business? No. OF ROOMS 1 2 3 .4 5 10. MAIN MATERIALROOFIS MADE OF: dwelling? STRAW,THATCH. EARTH/MLID ............ WOOD,PLANKS. GALVANIZED IRON ...... CONCRETE, CEMENT ..... OTHER. (SPECIFY: 1 2 3 4 5 6 ) 5. How long has your household been living in this IF MORE THAN5 YEARS LEAVEMONTH BLANK. IF 'FOREVER' OR 'ALWAYS', ETC. WRITE99 YEARS: [ YES1..... NO. MONTHS: 11. THE WINDOWS FITTED WITH (CHECK ARE THE FIRST THATAPPLIES) No WINDOWS/ NO COVERING. SCREENS/SHUTTERS. GLASS........ OTHER .. (SPECIFY:_ _ 1 2 3 4 6. Do other persons who are not household members share this dwelling with you? 2 (> 8) LJ 7. How many such persons share with you? i [>PRB| Exceptionally large households sometimeshave so many members that there are not enough lines in the grids for individuals. In these cases a second questionnaire for the household will be required and care should be taken to ensure that the right householdand individualnumbers are used. For example, the individual numbers in the second questionnaireshould be changed to start with 16 instead of 1. These cases are a potential source of errors, so to minimize them, spaces for as many individualsas practical should be put in the grids. LSMS questionnairesusually have space for 12 to 15 individuals. Where there is only one observationfor a unit of analysis, the questions 4 pertaining to that unit are arranged in a single column down the page." For 5 Questionson example, there will usually be only one dwellingper household." the quality of the dwelling or its other characteristics can follow this simple format. The first page of the section on housing expenses from the Kagera Health and Development Survey questionnaire is shown in Figure 3.2. This format is often used in the communityquestionnairesas well, since they often have only one observationof each unit of analysis. CONCEPTUAL STRuCTUREOF QUESTIONNAIRE. questionnaireis divided The into several parts or modules. Each modulehas a unifying theme, for example, labor or durable assets. Each module will also pertain to a uniform unit of observation, e.g. individualsor crops or items of expenditure. In LSMS surveys, the roster questionsare administeredfirst, in order to establish who should be included in subsequentsections of the questionnaire. Then mini-interviews conductedwith differenthouseholdmembers. In these, are each member is asked to answer each module that applies - e.g. health, education, employment,etc., before the next mini-interviewis conducted. The order of modules within the questionnaireand of questions within modules is therefore carefully thought out. It should aid in establishing rapport with the respondent, provide a structure to the interview that makes sense to the respondent,and aid in the logisticsof field work. The relativelyuncontroversial modulesare placed early in the questionnaire- housing, health, and education. The modulesfor whichgreater rapport is needed, suchas savingsor fertility, are put at the end of the questionnaire. The consumptionmodulesare administered in the second visit to the cluster so that the first visit may be used to define the beginningof the recall period for the purchase of food items. 14. There may in fact be two or more columns on the page to save on paper, but the columns are not related. 15. In some cultures, there may be a number of separate tents, huts, or structures that house a single household. Consideration should be given as to whether these need to be enumerated separately or whether the attributes of the housing situation as a whole are more pertinent. 37 FOLD-ouT ROSTER PAGE. The roster page is printed in such a way that it extends to the left of the pages that pertain to individualsin the household, with 6 lines for individuals aligned with items on the questionnaire.' This has been done four different ways in LSMS surveys, as illustrated in Figure 3.3. * * In the first method the sheets in front of the roster are shorter than the cover and the sheetsbehind the roster, as shown in Format 1. The second, most common, method is shown in Format 2. The roster sheet is folded out to extend beyond the body of the questionnaireand its covers. In either of these formats, the roster is placed behind all the pages that pertain to individualsso that it is visible whenever individual questionsare asked. An innovationin the Kagera Healthand DevelopmentSurvey in Tanzania was to make the rosters removable, as shown in Format 3. This was useful because the survey was designedas a four-wavepanel. The roster was inserted in a pocket in the back of the questionnairein the first wave of the survey. When the second wave started, the roster was removed from the first questionnaireand placed in the back pocket of the second questionnaire. In this way individualsretained the same identification code from wave to wave. A few follow-up questions guaranteed that individualswho moved in or out of the household or were born or died between rounds were counted appropriately. After four waves of interviewsconductedover two years, none of the roster cards were lost. The Tunisia questionnaire is shown in Format 4. It is oriented as "portrait" (a vertical page) rather than as "landscape"(a horizontalpage) and is spiral bound so it opens flat. Each questionnaire"page" is then the full 11 x 17 inches of the two-page spread. The roster folds out to the left. * * There may be more than one fold-outroster per questionnairefor different units of analysis. Any time there will be several pages of questionson the same level of analysis, and especiallywhere there are many rows on the grid, a foldout roster will be useful. For example, rosters might be made for crops grown, or for the list of landholdings. PRECODING. Potential responses to almost all questions are given numbered codes and the interviewer records only the response code on the questionnaire. Most of the codes are written directly in the box where the question appears. Where the list of codes is lengthy and applies to several questions, it is placed in a special box on the border of each page where it is needed or on the back of the preceding page (which will be visible while the 16. Years after the field work, photocopies of the questionnaire for use by analysts may be made if all the original questionnaires have been used up. In these cases the roster is usually reduced to fit on a regular sheet of paper, which hides its original, very important ability to facilitate the interviewer's accuracy. 38 Figure 3.3: Roster Arrangements Format I lzgal size(14" x 8.5") or ISOA4 Format 2 Lete size (8.5" x I I") 150A4 por Shorter pages for the individual HouseholdRosteris the first of the longer pages in the middle of the Format 3 Household numbermust appearon on the HouseholdRosteron a wider sit, Uds out fium the back pap D1 b~~~~~~~~~~~ea size(14"x BYS) or A4 A50 sie uo s Lettersize (8.5" x II) Format 4 Household Roster foldsout fiom fiaft page cr,, Lettr size (8.5" x 11") or lSOA4 00al / /Rouemovaldrser I\doubk-sized In all formats,choosebindingto make questionnaire open flat. ID codes appearon the roster and on each individual page. Lineson the rostermust be alignedwith the pages in the questionnaire. interviewer is filling out the page in question). Examplesof both these situations are shown in Figure 3.4. Typicallyonly a dozen or so questionsrequire manual coding. Precoding allows the data to be entered into the computer straight from the completed questionnaire, thus eliminating the time-consuming and error-prone step of transcribingcodes onto data entry sheets. Precodingrequires that choices be clear, simple, and mutuallyexclusive, that they exhaustall likely answers, that respondentswill not all fall into the same category, and that categories will not contain too few respondents to be meaningful. Designingadequate responsecodes requires good knowledgeof the phenomenonbeing studied and thorough field testing. A standard technique to ensure that the codes are mutuallyexclusiveis to add a qualifier where more than one answer could apply, for example, "What was the main reason for dropping out of school?" Other standard qualifiers are first, last, or principal. Alternatively, spaces for several responses (i.e. several variables) can be designated, with an instruction to code all responses that apply, or the two or three most important of these. A standard technique to ensure that codes include all possibleanswers is to add an "other (specify)"code to those questionswhere an explicitenumeration is impossibleor inconvenient. In practice the detailed answers are almost never coded in the end and so analysis is done lumpingall those who answered "other" into a single category. To increase slightly the chances that the information recorded in the "other (specify)"answers is coded, it can be helpfulto record all such answers on a special page of the questionnairewhere they can be found easily. There are, of course, limits on the kind of material that can be covered even in well-designed,pre-codedquestions. This may be less of a disadvantage than some believe. Most of the analysis of LSMS questionnaires uses sophisticatedquantitative techniques into which it is difficult to incorporate the exploratory, qualitativeinformationgathered in open-endedquestions. So even if such questionsare asked, the extent of their actual use would probably be low. If extensive information of an exploratory, qualitative nature is desired, a differentdata collectioninstrumentor even another research techniquealtogether may be needed. 40 Figure 3.4: Illustration of Precoding and an Open-Ended List SECTION9. FARMING AND LIVESTOCK PART D. IEXPENDITURESON AGRICULTURE INPUTS | CROPS CROPS COTTONDESI . 01 ....... CABBAGE ........ 65 COTTON AMERICAN ...... 02 CAULIFLOWER .......... 66 OTHER FIBRE CROPS .... 03 LADYFINGER (OKRA) .... 67 WHEAT .11 GOURDS,SQUASH, RICE,FINE(BASNATI)..12 ZUCCINI .... 68 RICE, COARSE ......... 13 PEAS ................. 69 BARLEY .14 OTHER VEGETABLE OTHER SMALL GRAINS ... 15 CROPS .............. 70 CORN (MAIZE) ......... 21 MANGO .71 SORGHUM .............. 22 GUAVA .72 MILLET .23 BANANAS .73 OTHER FEED GRAINS .24 DATES .74 GRAM ............... 31 KINO .75 MAsH................. OTHER CITRUS .76 HONG ............... 33 ALMONDS/WALNUTS . 77 RAwAN .. 34 APPLE .78 MASOOR ............... 35 APRICOT .............. 79 OTHER PULSESAND PEARS/PLUMS/PEACHES . 80 LEGUMES ............ 36 MELON .81 CORN . 41 POMEGRANATE ......... 82 SORGHUM ..... . 42 GRAPES ....... 83 MILLET .43 PAPAYA ....... 84 BEERSEEM/LUCERN MULBERRY .85 (CLOVER/ALFALFA) 44 . PERSIMMON .... . 86 MUSTARD-RAPESEED 45 ..... OTHER FRUITS, NUTS, TURNIP ............... 46 BERRIES............ 87 OATS ... 47 FIREWOOD ........ 88 OTHER FODDER CROPS .48 TOBACCO .91 MUSTARD-RAPESEED 51 .. CHILIES .92 SESAMUN .52 TUMERIC .93 LINSEED .53 GARLIC .94 SOYABEAN .54 GINGER .95 GROuJNDNUTS .55 FENNELSEED .96 SUNFLOWER .56 MEHNDI (HENNA) . 97 SAFFLOWER ...... 57 OTHER SPICES, TARAMIRA ..... 58 DRUGS AND DYES . 98 OTHER OILSEEDS......59 SUGARCANE .101 PATATOES ...... ....... SUGARBEETS .102 ONIONS ... 62 ALL OTHER MONOTOMATOES ............. CULTURE CROPS . 103 EGGPLANT ..... 64 I wouLd Like to ask you about your expenditures agriculture on irnputs over the past 12 months. 1. Did you purchaseany seeds or young plantsfor cropscultivatedin the last rabi and kharifseasons? YES ..... 1 NO .... 2 (> 6) 2 For which crops? LIST ALL CROPS BEFORE GOING TO 3-5 3 How much in totaL did you spend for seeds and young plants? 4 Where did you obtainthem? 5 How did you pay for the seeds/young pLants? CASH ... 1... CREDIT ........ 2 CASH/CREDIT 3 ... ADVANCE BY LANDLORD . 4 CROP CROP CODE RUPEES PRIVATE DEALER 1 ... GOVT AGENCY . 2 LANDLORD ......... OTHER .4 (SPECIFY: ) VERBATIM QUESTIONS SIMPLE WITH ANSWERS.All questionsare writtenout and are to be read verbatimby the interviewer. This is done to make sure that questions are asked in a uniform way, since different wordings may elicit different responses. For example, the answer a respondent gives to "Can you read?" or to "Can you read, say a newspaperor magazine?"might be somewhat different. Other changesmay subtlychangethe time period referred to, as in the change from "Haveyou worked sinceyou were married?" to "Did you workafter you were married?" Scott and others (1988) report results from rigorous field experimentsthat compared scheduleswhere the topic, but not exact wording, of a question was given with verbatim questionnaires. Use of the schedules produced 7 to 20 times as many errors as the equivalentverbatimquestionnaire. In wordingquestions, it is importantto find terms that reflect the language as it is commonlyspoken. Use of languagethat is too formal or academicwill make the interviewstiltedand unnatural. For example, "Did you spend any time doing housework?" if necessary followed by probing "...such as cooking, mending, doing laundry, or cleaning..." is better than "Did you spend any time engaged in domestic labor? e.g., preparing food, repairing clothes, cleaning clothes, or cleaning house...." It is occasionallytricky to find terms that are simple, short, and yet concise, but that should always be the goal. For most questions, the interviewer reads the question aloud and marks the code for the answer given by the respondent. For example, for the question, "Have you been ill in the last four weeks?" the interviewer would write down a 1 for yes or a 2 for no. For a few questionsthe response categoriesare part of the question, for example, "Is the school you attend public or private?" For a few questions where the responses may vary or be worded differently by different respondents, the interviewer will read the response categories. For example, in question2 shown in Figure 3.5, after reading "Did you work as..." the interviewer will read the responses "permanentlabor," "seasonallabor," and "casual labor." This last technique should be used as little as possible, because respondentsmay not listen to the full list before answering. The answers to the questions must be kept simple. This means that additional filter questions are often used. Adding enough filter questions to ensure simple answers can make the number of questions and skips seem high. Attemptsto shorten or simplifythe questionnairethat result in complex answers are common, but should be avoided. For example, in the Ghana LSMS agriculturalmodule, question7 asks "Do you or the membersof your household have the right to sell all or part of their land to someoneelse if they wish?" The pre-codedanswersare "Yes," "No," "Only after consultingfamily memberswho are not householdmembers," and "Only after consultingthe chief or the village elders." It is not clear that the respondentswould necessarilydistinguishbetween the simple yes and the yes qualified by the need for consultation. Thus an alternate formulationmight be better. The first questioncould be left the same, but only simple yes/no codes used. Then for those who answered, a second 42 Figure 3.5: Illustrationof Case Conventions SECTION5. WAGE EMPLOYMENT PART A. IEMPLOYMENT IN AGRICULTURE(ALL persons 10 years and older) EACH MEMBER OF THE HOUSEHOLDSHOULD ANSWER FOR HIMSELF/HERSELF. IF NOT, WRITE ID CODE OF RESPONDENTBELOW. 1 2 3 During the past 7 days, how many days did you spend working on someone eLse's farm? 4 5 D ID over the Did you E CODE past 12 work as: N OF RES- months, T PONDENT that is, I FROM during the F HOUSE- past rabi I HOLD and Kharif C ROSTER season, A did you Permanent T work for labor?..1 I payment in Seasonal O cash or labor? 2 ... N kind on (> 14) some other CasuaL ... C person's Labor? 3 D farm? (> 14) How How many many days were hours spent did working on you someone norm- else's farm: ally over the work past 12 per months? day? (PROBE IF : NECESSARY) : D E YES.... 1 2 NO. (> PART B) TOTAL DAYS NORMAL HOURS DAYS 01 02 03 04 05 061 Note: the dotted Lines indicatethat the questionnaire page was truncated in this iLLustration question could be asked, "Do you need to consult with anyone outside the household before selling the land?" The response codes would be "Yes" and "No." Then a third question would be asked for those who answered yes to the second: "Whommust you consult?" The response codes for this questionwould be for "Family member," "Village headman," etc. This formulation makes the questionnaire longer in terms of printed pages, but probably does not increase interview time since some sort of probing would probably have been used 43 frequently. Most importantly, it makes the interpretation of the data much clearer. SKIPCODES. Skip codes are used extensivelyin LSMS questionnaires. A skip code is an indicationto the interviewerto proceed to the next appropriate question. An example is shown in Figure 3.1. In this case, if the answer to question 1 is "no," the interviewer skips question 2 and proceeds to question 3. If the answer to question 1 is "yes," the interviewershouldproceed with question 2. In question3, the same constructionis used, but the instructionis to proceed to the next person. Where the skip applies only when a particular answer is given, the skip arrow is positionedin parenthesesnext to or below the individual response to which it applies, as was done in questions 1 and 3. Another kind of skip instructionis shown in question 6. The arrow is placed in a box below all the response codes, indicating that it applies regardless of which answer was given. There are several advantages to extensive, explicit skip codes. Interviewersdo not have to make decisionsthemselves,nor need they remember complicatedrules printed in the manualbut not on the questionnaire. This helps ensure that instructions will be followed uniformly. There is no danger that inapplicablequestionswill be asked, which would irritate the respondent, waste interview time, and confuse analysis. The "not applicable" code is seldom required on LSMS questionnairesbecauseexplicit skip codes are used. It can be useful both in checking the logic of the questionnaireand in training interviewersto chart the flow of questionsin a flow chart. Figure 3.6 is a flow chart of a simplifiedbut typical health module. The proportionsof people who answer yes at each branch are recorded, based on results from several LSMS surveys. The number of individualsthat would be asked each set of questionsis shownon the left, assuminga base of 10,000individualsin the sample. The flow chart makesit easy to check that the skip patternslead peoplethrough the module appropriately. For example, we can check that the questionon health insurance is asked of all persons, not just of those who are ill. Putting the appropriate proportionson the branches makes it easy to check whetherthe effective sample size is large enough to support the planned analysis. For example, very few persons will answer the questionsabout hospitalization. Therefore asking more questions on that topic would not really result in more potential for analysis. When this kind of analysis is done for the questionnaireas a whole, it will give a better sense of likely interview time than the number of pages in the questionnaire,since manywhole modulesor sub-sectionsof them will be skipped by many individuals. CUE CONENTIONS. Everythingthat the interviewer is to read aloud is written in lower case letters. Answer codes that are not to be read aloud and everything that is an instruction to the interviewer is written in upper case 44 Figure 3.6.: Flow Chart of Health Module 105000 1 Were you ilt or injured YES in the last week? (10-45%) 1000-4500 2 How many days in the Last 4 weeks did you you have to stop doing your usuat activities? 3 Was anyone consulted? I YES (40-80X) No 400-3600 Who was consulted? Where did you go for that consultation? What was the cost of that consuLtation? What means of travel did you use? How long did it take to get to the place of consultation? 9 How much did you spend on travel costs? 10 How long did you have to wait? 11 Did you have to stay overnightat the cLinic or hospital? 4 5 6 7 8 Y I YES v (5-8X) No 20-288 12 How many nights did you stay? 13 How much did you have to pay? 14 Did you buy any medicines for this illness or injury? I 1000-4500 YES , (60-90X) 600-4050 10,000 15 How much did you spend on medicines? 16 Do you have heaLth insurance? NEXT PERSONW letters.'7 This makes it easy to include instructionson the questionnairerather than to rely on the interviewers' memory of the manual or of instructionsgiven orally during training. On the page shown in Figure 3.5, instructions to the 17. In languagesthat do not have an upper and lower case, some other way of distinguishing the instructionsfrom the questionsshouldbe found. It maybe possibleto use italics,bold, a different font or size of letter, or a differentcolor of ink. 45 interviewerare printed above the grid, in the first column, and below the question for question 5. ENUMERAION LisTs. There are two ways of gathering information OF about long lists of items. The LSMS questionnairesuse both, dependingon the circumstances. One approachis used when it is expectedthat many of the items in the list will apply to most households. In this case, a line for each item is put in the grid and the label for the item is printed in the first column of the grid. This approach is used in the consumptionmodule, as shown in Figure 3.7. Although several dozen items are included, it is expected that most householdswill have consumed many of them. The first question is "During the last twelve months has your householdconsumedany [item]?" The interviewerfirst goes down the whole list asking this yes/no question. Then the interviewer returns to the first item that was consumedand asks all the follow-upquestionsfor that item before proceeding to the next item. The complete enumerationof items consumed is done before asking the follow-up questions, so that respondents will not be tempted to say that they have not consumed something in order to avoid the follow-up questions. The other approachis useful when it is expectedthat only a few of many possibleitems will pertain to any one household. This approach is often used in the agriculture modules. An exampleis shownin Figure 3.4. The grid contains lines for several crops, but these are not pre-identified. Rather, the respondent names the crops for which seeds or plants were purchased and the interviewer writes their codes into the grid. Codes are provided for 103 crops. Obviously it would not be efficient to ask about inputs for each of 103 crops when any one householdwill only grow a few of them. Where it is expected that the respondent may omit useful information,indicationsto probe are includedin the box for that question. Sample probing questions are usually included in the interviewer manual and occasionallyin the questionnaireitself. An indicationto probe occurs in question 5 of Figure 3.5. Probes are often used to ensure that all items in a respondentdetermined list have been included. They may also be used to ensure that the respondent's answer is categorized properly. Such probes are common in the employmentsection, for example to determinecorrectly whether the respondent is unemployed,out of the labor force, or has a secondjob. Interviewersare also asked to probe for answers to "how much?" questions, such as are found throughout the consumption, agriculture, and small enterprise modules. Whereverprobing is expected, the training of interviewerswill be intense, so that they thoroughlyunderstandwhat to probe for and how to do so withoutdistorting information. PROBE QUEIONS. 46 Figure 3.7: Illustrationof a Closed-Ended List SECTION 12. FOOD EXPENSESAND HOME PRODUCTION PART A. FOOD EXPENSES I would like now to ask about your household's food expenses, consumption food producedat home and food of received as gifts or paymentsin-kind (for example,paymentfor work on someoneelse's land) During the past 12 months, has your household consumed any .. EFOXOD.. that it purchased or acquired in-kind? PUT A CROSSIN THE APPROPRIATE BOX FOR 2 Since my Last visit, have you purchased any .. [FOOD]..? 3 Howmuch in totaL did you purchase? 4 How muchdid you pay per (UNIT)? 5 Did you purchase the .. [FOOD].. on credit or "udhar"? EACH FOOD ITEM. IF THE ANSWER Q.1 IS YES, TO ASK Q.2-9. NO YES Wheat (grain) Wheat (flour or mnaida) YES 1 .... NO . 2 (> 6) 301 302 303 304 305 306 307 308 309 310 311 QUANTITY UNIT OF PURCHASE RUPEES YES 1 ... NO .2 ... PAISA Maize (flour grain) or Jawar/Bajra Fine rice (Basmati) Coarse rice Other grains/cereaLs Gram DaL Groundnuts Liquid Vegetable oiLs Ghee, Desi ghee .......................................................................................................... 312 Note: the dottedlinesindicate the questionnaire wastruncated this illustration. that page in Because the interviewer probes for information, very few answers of "don't know" are expected and no code for "don't know" is placed on the questionnaire. In the exceptionalcase where sound interviewingtechniquesdo not produce an answer, the interviewer is instructed (in the manual and in training) to write "d.k." in the space reserved for an answer code. This is then encoded in the data entry program with a special non-numeric code. The end result for analysis is much the same as merely having a "don't know" code for each question. However, the system discouragesthe interviewerfrom accepting "don't know" answersbecausethey are handleddifferentlyand show up glaringly when the supervisor reviews the questionnaire. 47 RESPONDENT-CHOSEN UNITS. For many questions that involve payments or quantities,respondentsare left to report their answers in whicheverunits they find convenient. Examplesof this are found in Figure 3.8. In questions 13, 17, 19, and 21, the code of the time unit in which the respondentreplies is placed in the box marked "time unit". The codes are provided in a box that runs above the grid. Allowingthe respondentto select the time unit means that transactionsare expressed in the units in which they normally occur, which may differ from household to household or person to person. This avoids inaccuracies in conversion. For example, a person who is paid $510 per week can respond precisely if allowed to respond on a per-week basis. If the response must be in dollars per month, the figure might be rounded to $500 for ease of multiplication by the (approximately)four weeks per month. The annualized figure thus becomes $24,000 instead of the more accurate $26,520 that is reported when the respondent picks the unit and the analyst carries out the conversion. Analysisis, of course, complicatedby the need to convert observationsin order to correctly annualizedata. But since this is all handled by computer the issue is really trivial. The more important issue is to ensure that, where necessary, the questionnaire asks explicitly how many times per year the paymentstake place. For example, a worker who reports a daily wage rate may only be employed intermittently. To multiply the daily wage by the number of working days per year (which itself differs from country to country) is likely to overstate the worker's earnings significantly. Another major applicationof flexibleunits is for the "quantitiesproduced or consumed" data in the agricultural section. In Ghana, for example, 22 unit codes were used, as shown in Table 3.1. These create a more complexproblem for the analystwho tries to convert quantitiesto a standardunit. Only about half of the units used in this exampleare standardized. Even some of those (minibag, maxibag) are local terms that need to be documentedwell for users of the data 8 who are not familiar with farming in Ghana.' RESPONDENT CODES.It is sometimes of interest to know who is answering a certain section of the questionnaire. This can be accomplishedby leaving a space for the respondentcode next to the beginningof the string of questionsto which it pertains. The interviewer fills in the identificationcode of the person who actually responds to the question. An example of this is shown in Figure 3.5. The idea is that a proxy respondentmay give less accurate informationthan the individualactually involved. For example, one household member may not 18. The conversion of quantities to standard units (e.g., bunches to kilos) is not required to calculate farm income, which was the purpose of the agriculture module in the Ghana LSMS. But as is common with such rich data sets, analysts are using the data for other purposes as well, for which it is of interest to convert to standard quantities. 48 Figure3.8: Illustrationof Respondent-Selected Units TIME CODES SECTION 5. WAGE EMPLOYMENTPARTS. | EMPLOYMENT OUTSIDEAGRICULTURE (cant.) PRIMARY OFF-FARM EMPLOYMENT 13 14 15 16 17 18 19 20 21 MINUTEN 2 ..... MONTH. .5 D E N T I F I C N C 0 0 D E I Howmuch is your take-home pay, including bonusesor cash a Lowances? T Is your pay sIui ect to the legaL minimum wa?e rate? ? Are taxes aLready deducted from your pay? Over thepast 12 months, haveyou received any tips, bonuses or Llowances that are not included in the (AMOUNT REPORTED Q.13) IN take-home pay? Howmuchdo these tips, bonuses, allowances amount to? Over thepast 12 What is the over the past12 months, haveyou value the of months, have you received any food or received any paymentfor this clothing? payment this for work in the form work in the form of foodor clothing of freeor syb(for.exale,meaLs sidized housing prvide y ur thatare not a1 r hat aro e included the in ~~~~~~~~~~~~~~~~~~~~~~includledREPORTEgm in the AM4OUNT (AMOUNT REPORTED IN IN Q.13), take0.13),take-home pay? pay? What is the vaLue of thissubsidy? YES..1 NO 2 ... RUPEES TIME UNIT YES.-I NO...2 YES .. 1 NO....2 (> 18) RUEES TI I UNIT YES .... 1 NO ME 2 (> 20) .. RUPEES TIM UNIT YES NO. .1 2 (> 22) RWPEES UNIT 'l I I I _ I _ _ I _ I TI T _ I _ I _ I _ I~~~~~~~~~~~RUEE ITI _ _ 021 03 04 05 _ _ _ _ 1 _ 061 91"11 1 1____1__ 1 1 1 1 1_____________ 11 ________ 1___ Ii1 11 T________ __ _Till 11 1'l 11 - 1 1 POUNDS KILOGRAM TON MINIBAG SHEET BASKET BOWL SAMPLING AND SURVEYMANAGE- AMERICANTIN MENT PAGE. Each questionnaireshould TREE includeinformationon the sampleand the STICK management of the data collection BARREL process. The sampling information LITER should include the serial number for the GALLON household,any codes required to describe BEERBOTTLE the sampling strata, geographiclocation, BUNCH whether rural or urban, etc., and whether NUT the household interviewed was that FRUIT LOG originally selected in the sample or a BOX replacementhousehold(see Chapter 4 for ALL know the exact salary of another. Some analysts may therefore wish to identify possible biases introduced by the proxy respondentsor omit their responses from some analyses. Although not used in every section of every LSMS questionnaire, respondent codes could be of 1 interest on several modules.9 Table 3.1: Units of Quantity UNITCODES *1 *2 *3 *4 6 7 8 *9 10 11 12 *14 *15 *16 17 18 19 20 21 22 a discussionof replacement households). Information such as an address, or approximate location with a sketch of where the dwelling is, or a phonenumber where possible, will aid in re-visitsto the household. It is often convenientto put this on the cover page or the first page of the questionnaire. The information on the data collectionprocess should include the factors that may help in managementof the survey or in ex-post facto methodological investigations. For example, the code numbers for the interviewer, anthropometrist, supervisor, and data entry operator who worked on that questionnaireshouldappear. Any informationabout whether the interviewwas completed or not and the number of callbacks made should be recorded. The language in which the interview was conducted should be noted. Some of this informationcan be recorded in the cover page for the householdas a whole. In some cases, however, the answers may be specificto individuals. For example, some members of the householdmay speak the official languagewell enough to be interviewed in that language, while other members might need to be interviewedin a local languageor through interpreters. Note: Try to use unit code marked by (* whenever possible. 19. Wages and time-use informationmay be more accuratelyreportedby the person affected than by another family member. Sensitive topics, such as those relating to contraceptive use or deliberately missing school, are more accuratelyanswered by the individual thanby someone else in the household. For the sections on household expenditures, farming, businesses, or use of credit it may be importantto know who answered the questionson behalf of the whole household. 50 The date when the interview was conductedshould be recorded as well. This is important not only in survey management,but may be used in important parts of the analysis as well. In economieswith high inflation, for example, the monetary information will have to be inflated or deflated to reflect prices at a common date. This can only be done correctly when the date of interview is known. LSMS questionnaires are usually printed with cardstockcovers. Where these have been omittedbecauseof cost there have been problems with the front and back pages of the questionnairecoming loose. Since the front page usually carries the key household identifier information and the back page carries the household roster, any such loss is likely to render the rest of the questionnaireuseless. The cardstock covers are well worth their cost. CARDSTOCK COVERS. IDENTIFYING SEcTIoNs. LSMS questionnairesare very bulky. The Nepal questionnaire,for example, has 70 pages. It is therefore usefulto think of some ways to make it easy to find one's way around in them. A few ideas are listed here, but other ideas could be substituted. First, it is useful to have page numbers on the pages and a table of contents of the sections in the beginningor at the end. Second, some graphic techniquesthat are not expensivecan be used to make it easier to tell where one is in the questionnaire. Some sections of the questionnaire can be printed on different colored paper or in different colored inks. Sheets of colored paper can be inserted between major portions of the questionnaire. It is also possible to print short, dark bars at the edge of each page, with the placement on the page being the same within modulesbut lower down (if on the vertical edge) or further to the right (if on the bottom edge) for each successive module. Using just one or a few of the techniques will be sufficient. The questionnaireshould not become too rococo. LEGIBLITYAND SPACING. There is an art to laying out the grids for a questionnaire. The lettering must be large enough to read, which is sometimes difficult to accomplish in the compact structure of the grid. Legibility is especially important, as interviews often take place in conditions with poor lighting - outdoors at dusk, or after dark in homes dimly lit with lanterns, oil lamps, or candles. The better print quality available now that laser printers are replacing dot matrix printers has helped, but poor legibility is an ongoing complaint among interviewers. There must also be enough white space allowed in the layout of the questionnaire. Wheneverthe answerwill be coded later, a generousspace should be allowed to write out fully the informationrequired - the person's name, the name of the school the respondentattends, the respondent's occupation,etc. In other places, judicious use of white space makes the questionnaireeasier to read or less confusing than if every bit of every page were crammed with print. 51 SOFiWARE QUESTIONNARE FOR LAYour. Several widely available word processing and graphics software packages are now adequate for producing 20 questionnairepage layouts. Revisionsbetween drafts can now be made much more simply and cheaply than in the days when graphic artists had to draw each page by hand. The computerizedapproach also simplifies translations, as the verbal parts can be overwritten in the local language, leaving the skip codes, response codes, and general format intact. 20. This was not true for the first LSMSsurveys. For them, a special softwarecalled GRIDS was invented. The options availableon the market have since supersededthe use of GRIDS. 52 Chapter 4: Sampling Key Messages * LSMS samples are small in size, generally from 2,000 to 5,000 households,to balance samplingand non-samplingerrors. LSMS samplesare designed to represent the populationof the country as a whole, as well as that of certain subgroups of the population, called "analyticaldomains." LSMS samples are drawn in two stages. In the first stage, a certain number of area units called Primary Sampling Units (or PSUs) are selected. In the secondstage, a certain numberof households,usually 16, are selected in each of the designated PSUs. Both stages are random selections. Two-stage samplingreduces the cost and effort of samplingand of field work compared with single-stagesampling, but at the cost of increasing the samplingerror. This is a result of the so-called "cluster effect." The first stage of sampling requires developing a sample frame from census files. The second stage requires listing all households in the selectedPSUs and then choosinga randomsampleof those householdsfor the final sample. To derive unbiased estimatesfrom the survey, the values observed in the sample may need to be weighted. To compute the needed raising factors and correct the samplingerrors, all stages of samplingmust be carefully recorded and made available to the survey analysts, both in written documents and in the survey data sets. * Many of those who work on survey implementationor who use the resulting data never learn the details of how sample designs are chosen and implemented. This chapter tries to dispel some of the mystery. Section A reviews the basics of sample design. It may be skipped by readers who know something about the subject. Section B explains the choices made in the usual LSMS sample design and the reasons for them. All readers should read this section. Section C provides a step-by-step guide on how to carry out the sampling. Readers who will not be involved in samplingmay skip it or skim it. A. Overview of Issues in Sample Design The main objectivesof an LSMS are understandingthe determinantsof household behavior and the overall distribution of welfare. The sample design should determinethe number and location of the householdsto be observed in a 53 way that best achieves these goals within budgetary and organizational constraints. The following issues must be considered: To reliably depict the overall situation of the population, the selected sample should contain a sufficient number of households, scattered as much as possible throughoutthe country. However, to reduce the costs, simplifymanagement,and controlthe qualityof the interviews,the sample size and its geographicaldispersal must be kept within reasonable limits. The populationof the country may contain certain subgroups, such as urban and rural areas or other aggregates, that deserve to be studied separately. The sample of households should adequatelyrepresent each of these subgroupsas well as the country as a whole. Each householdin the country should be given a chance to be selected in the sample. To simplify survey design and analysis, this chance should be similar for all households,or at least for all householdswithin the same large domain. Some insightsinto how to arbitrate among theseobjectivesand constraints can be obtained from a quick review of four concepts: sampling error, nonsampling error, multi-stagesampling, and analytical domains. Sampling error is the error inherent in maldng inferences for a whole populationfrom observingonly some of its members(see Box 4.1). Samplingtheory studies the behavior of samplingerror for different design options. It is usually assumed that one of the variables to be observed is of particular interest, for instance, household income, unemployment,or infant mortality, and that the sample design should maximize the precision of the estimatesof this variable, given cost constraints. Severalgood textbooksexplore this complex issue and it does not need to be specified in detail here (see reference list in AnnexII). It is important,however, to bear in mindtwo general conclusionsof samplingtheory. SAMPLING ERRoR. First, the law of diminishingretums underlies the relationship between samplesize and samplingerror. Roughlyspeaking,and other things being equal, the samplingerror is inverselyproportionalto the square root of the samplesize. This means that, even with the best design, to reduce the error of a particular sample by half, the number of householdsvisited must be quadrupled (See Box 4.2). Second, the sample size needed for a given level of precision is almost independentof the total population. For instance,a 500-household samplewould give essentially the same sampling precision whether it is extracted from a population of 10,000 or 1,000,000 households, or indeed, from an infinite population. Some people find it hard to believe that the samplesize does not 54 Box 4.1: How Wrong Will Our Estimates Be? Reports in the press of opinion polls often say somethinglike, 'Forty-two percent of those polled said they would vote for CandidateJones; the margin for error on this poll is plus or minus two percent." The reason for the margin for error is that in doing sample surveys we observe only some members of a populationrather than the wholepopulation. Any conclusionswe draw from studyingthe membersof the samplemay be a little differentthat what we would learn if we could study the wholepopulation. It is desirableto know how far from the 'truth' (what we would know if we studied the whole population)our estimates(what we do know from studyingonly the sample of the population)may be. Of coursewe cannot calculatethis precisely, becauseto calculateit preciselywould require knowingthe 'truth." Statisticaltheory, however, can help us establishboundarieson how large our errors might be, and thereforehow much confidencewe can put in our estimates. Supposewe want to estimate the proportion of people who smoke, using data from a sampleof the population. We want some predeterminedlevel of certainty that our estimate is not too far from the true value of the proportion. We thereforecalculatea range around our estimateof the proportion. This range is called the confidenceinterval. The formula used in calculatingthe confidenceinterval is Cl= Z. where P is the estimate from the sample, a is the estimate of the standard error, and z, is a constant that dependson the degree of certainty, a, we want of the proportion. If we want to be 95 percent certain that the true value lies within the confidenceinterval, za would be 1.96. For 99 percent confidence,z, would be 2.58. Supposethat 28 percent of our sample smokes(P = 0.28), we have an estimated standard error of 1.5 percent and we want to be 95 percent certain that the true value lies within our estimatedinterval. The interval in whichwe have 95 percent confidencethat the true value lies would be from about 25 to 31 percent of population(that is, 28 1.5 x 1.96). Obviously,we want to have the smallestpracticalconfidenceinterval. The confidenceinterval will be smaller, the smaller is the estimateof the standarderror. The followingboxes thereforediscussfactors that influencethe size of the standarderror. To simplifythe presentation,the followingboxes discuss the true standarderror, e, rather than our estimate of it, 6. But the intuitionis the same for both. depend very much on the size of the population;they feel the relationshipshould be more or less proportional. An intuitive grasp of this seemingly striking statisticalfact can be obtainedby noticingthat, in order to test if the soup is salty enough, an army cook does not need to take a larger sip from the regimental pot 55 Box 4.2: SamplingError and Sample Size: A Case of DiminishingReturns For a simple illustrationof the diminishingreturns relationshipbetweensample size and samplingerrors, considerthe case where a proportion(for instance,the proportionof householdswith pre-schoolchildren) is estimatedfrom a simple randomsampleof n households,extractedfrom an infinitepopulation. Let p be the value of the proportionin the population. The standarderror is: The table below gives the values of e for different samplesizes and p=50%: TableB 4.2.1 Samplesize (n): Standarderror (e): 100 200 500 1000 2000 5000 10000 5.00% 3.54% 2.24% 1.58% 1.12% 0.71% 0.50% Notice that in order to reduce the error from 5.00% to 0.50% (a tenfold reduction), the sample must be increased a hundredfold, from 100 to 10,000 households. (See Cochran 1977, Chapter3 for more information.) than a housewife needs to take from the family saucepan (see Box 4.3). This does not necessarily means that the size of an LSMSsample is independentof the size of the country. Large countriesgenerallyrequire larger samples, not because they are large but because large countries tend to demand results for a larger number of internal subdivisions. India, for example, would probably require state-level data from any survey. NON-sAMPLING ERRoRs. Beside samplingerrors, data from a household survey are vulnerable to other inaccuracies from causes as diverse as refusals, respondent fatigue, interviewererrors, or the lack of an adequate sample frame. These are collectivelyknown as non-samplingerrors. Non-samplingerrors are harder to predict and quantify than samplingerrors, but it is well accepted that good planning, management,and supervision of field operations are the most effective ways to keep them under control. Moreover, it is likely that managementand supervision will be more difficult for larger samples than for smaller ones. Thus one would expect non-sampling error to increasewith sample size." 21. See UNNHSCP(1982) for a treatmentof how to minimizenon-sampling error. 56 Box 4.3: Sample Size and PopulationSize The formula in Box 4.2 is valid for simple random samplingfrom an infinite population. For a finite populationof N households,it shouldbe corrected as follows: The term: n is calledthefinitepopulation correction, whichessentially depends the on samplingfraction n/N. Table B 4.2.1 showsthe samplesize n that is needed to achievea 5% standarderror for a proportionp=50% and different populationsizes N: Table B 4.3.1 PopulationSize (N) Samplesize (n) Samplingfraction (n/N) 500 1000 5000 10000 50000 Infinite 83 91 98 99 100 100 0.166 0.091 0.020 0.010 0.005 0.000 Notice how little the required samplesize n changesbetweena populationof 5,000 and infinity. In nationalhousehold surveysthe finite population corrections so smallthat they are almostalwaysignored. are Samplersusually do not have a single complete list of householdsfrom whichto draw a randomsample. Even if such a list were available, a sampletaken from it would entail high travel costs because selected householdswould be spread thinly over the entire country. MULTI-STAGE SPLING. Both of these problems can be diminishedby using two or more stages in sampling. In the version of two-stage sampling generally used for LSMS surveys, a certain number of small area units are selected with Probability Proportional to Size (PPS), then a fixed number of householdsare taken from each selectedarea, giving to each householdin the area the same chance of being chosen.22 The area units are usuallythe smallestrecognizablegeographicunits in the national census. These are usually census enumerationareas (EAs), which are aggregatesof 50 to 200 households. Less often, the first stage samplinghas used administrativeunits such as wards, sectors, etc. Whatever their nature, these may be called Primary Sampling Units, or PSUs. However, in many countries 22. The size of an area is generallydefinedas the number of householdsin the area. Alternative size measuresare the number of dwellingsand the total population. 57 those PSUs that are exceptionallylarge have been divided into segments, one of which is selectedper PSU, in order to economizeon householdlisting. The final operationalarea units are then a mixture of PSUs and segments. To simplifythe descriptionit is convenientto continueusing the word "PSU" for both PSUs and segments. The two-stageprocedurejust describedhas severaladvantages.It provides an approximatelyself-weighted sample(i.e., each householdhas roughly the same chance of being selected), which simplifiesanalysis. It also reduces the travel time of the field teams relative to a single-stagesample, because the households to be visited are clumpedtogether in the PSUs rather than spreadout evenly over the whole country. An additional advantage of selecting a fixed number of householdsin each PSU at the second stage is that this makes it easy to distribute the workload among field teams. A two-stage sample, however, will yield larger errors than a simple random sample with the same number of households because neighboring households tend to have similar characteristics. A sampleof householdsdrawn in two stages will therefore reflect less of a population's diversity than a simple random sample of the same size. The influence of two-stage samplingon the precision of the estimates is called the cluster effect. As would be expected, the cluster effect grows with the number of householdsselected in each PSU. In other words, for a fixed total sample size, a design with more PSUs and fewer householdsin each PSU will provide more precise estimatesof sample statistics than a design with fewer PSUs and more householdsin each PSU (see Box 4.4). The field teams will typicallyspend a large amount of time and thus incur substantialcosts in travellingbetween PSUs. Surveying each PSU also entails certain costs that are independentof the number of householdsto be visited in each PSU, such as the listing operation explained below. It may therefore be tempting to try to reduce the cost of the survey by increasing the number of households in each PSU and reducing the total number of PSUs accordingly. However, the cluster effect indicates that this may often be a false economy. ANALYTICALDOMAINS. For political or policy reasons, some subgroupsof the populationare so important that the survey is expected to provide separate, reliable results for them. Typicalexamplesinclude divisioninto urban and rural locationsand into major administrativeunits such as states, but the subgroups do not necessarily have to be geographicalaggregates - for instance, the urban householdswhose head works for the public sector became an explicit field of interest in certain SDA surveys. The design will then have to ensure a minimum 58 Box 4.4: Custer Effects If the sampleof n householdsreferred to in Box 4.1 is not selectedby simple randomsamplingbut in two stages (m householdsin each of c PSUs, with n=cm) and withoutstratification,the formulafor the standarderror should be correctedas follows: e2(cotrread) e2 [l+p(m-l)] = The term in brackets is called the design effect (see Kish, 1965). It represents how much larger the squared standarderror of a two-stagesample is when compared with the squared standarderror of a simple randomsample of the same size. p is the so-calledintra-cluster correlation coefficient - a number that measuresthe tendency of householdswithinthe same PSU to behave alike in regards to the variableof interest (for the examplein Box 4. 1, this would be the tendency of householdswith pre-schoolchildren to be clumped in the same PSUs). p is almost always positive, normally rangingfrom 0 (no intra-cluster correlation)to I (when all householdsin the same PSU are exactlyalike). For many variables of interest in LSMS surveys,p ranges from 0.01 to 0.10, but it can be 0.5 or larger for variablessuch as the access of the householdto running water. Table B 4.4.1 below gives the design effects due to clustering for various valuesof p and m: Table B 4.4.1 Number of households 0.00 per PSU (m) 5 10 20 50 1.00 1.00 1.00 1.00 0.01 1.04 1.09 1.19 1.49 Intra-clustercorrelation(p) 0.02 0.05 0.10 DesignEffect 1.20 1.08 1.18 1.45 1.38 1.95 1.98 3.45 1.40 1.90 2.90 5.90 0.20 1.80 2.80 4.80 10.80 0.50 3.00 5.50 10.50 25.50 sample size within each of these subgroups, which can then be called analytical domains. For large domains this may occur automatically whereas in other cases it may be necessary to oversample certain analytical domains and to modify the expansion factors (also called "sampling weights") accordingly. The two stage sampling procedure is applied independently within each of those differently weighteddomains. Analysts would often also like to have sufficient sample sizes in smaller analyticalgroups, such as rural locationsin the irrigatedparts of a certain region. They may even want to carry the disaggregationfurther, for example, to study separately male- and female-headedhouseholds in rural irrigated areas. This ideal, however, cannot be fully achieved for all possible analytical domains because it would result in a prohibitively large total sample. Therefore, defining 59 the most significantpartitionsfor a sampleentails establishingsome priorities at the design stage. Often these will not be dictatedby policy relevance alone, but also by local statisticalfolklore and geopoliticalconsiderations.' B. SamplingPractice in LSMS Surveys THE BASIC SAMPLE DESIGN. The sample size for LSMS surveys has usually been small, in the range of 2,000-5,000 households(see Table 4.1). The samples are usually two-stage.' The Primary Sampling Units are area units selected with probability proportionate to size. The second-stage units are households, with a fixed number of households per PSU, normally about 16. When a partition into differently weighteddomains has been defined, the twostage sampling procedure is conducted within each of them; the number of differentlyweighteddomainshas generallybeen kept low, betweenone and four. Decisionsabout the sample design for LSMS surveyshave been made on a somewhat more qualitative (some would even say ad hoc) basis rather than through the applicationof quantitativesamplingformulae for several reasons. First, one of the overriding objectivesof the LSMS was to create very high quality data sets. Thus, great weight has been given to minimizingnonsampling error. Because the questionnaire is complex and fieldwork requires extensivesupervision,the consensushas been that non-samplingerror could only be kept to the desired standard by using samples in the range of 2,000-5,000 households. As a result, surveyplanners decidedto accept higher samplingerror in exchange for lower non-samplingerror. Second, taking advantageof the wealthof informationthat LSMS surveys provideand addressingthe complexbehavioralquestionsthat motivatethe surveys 23. The partition of a sample into analytical domains is akin to the concept of 'sample stratification." Samplestratification,however, is generallydone to improvethe overall precision of the sample, rather than to study each partition separately. A stratifieddesign that seeks to reduce the overall error usuallyentailsoversamplingthe parts of the populationwith the largest variance. In measuringwelfare this would entail oversampling richer parts of the population. the 24. Procedureswith more stagesare possibleand, indeed, are sometimesfollowedby statistical agencies. In three-stage sampling,for instance,insteadof selectingsmallarea unitsdirectly, some larger areas (such as provinces)are selected first; smaller areas are then chosen only within the first-stageareas so selected. The effect is that the small area units themselves(and not just the households)becomeclumpedrather thanbeing spreadthroughoutthe nationalterritory. The most serious disadvantage multi-stagesamplingis that each additionalsamplingstage increasesthe of samplingerror, sometimesconsiderably.The one frequently quotedadvantageof using more than two stages is that it reduces the amountof travel between survey localities. However, this does not apply to the LSMSbecause of the way field work is organized: the field teams return to a local headquartersbetween work in each locality. When they return to the field again it just as easy to go to any one of their assigned localities as to any other. Therefore, we do not recommendusingmore than two stages of samplingin LSMSsurveys. 60 Table 4.)1: Sample Design in Selected LSMS Surveys No. of Differently Weighted Households Anaytical per cluster Domains 16 10 in Lima 16 elsewhere 16 16 1 25 1 4 Country Year SamnplSize (HH) PartitionCriteria Cote d'lvoire 1985-88 1600(per year) Peru 1986 5120 Ghana Mauritania 1988 1987 3200 1488 Pakistan 1991 4800 16 4 none Metro Lima, urban/rural in 12 locations none Nouackchott,other cities, rural in river areas, other rural areas Four provinces:Punjab, Sind, Balochistanand NWFP Tanzania Kagera Region GuineaConakry Mozambique 1992-93 816 16 3 1988 1991 1728 1840 8 10 1 1 Nicaragua Viet Nam Nepal 1993 1992 1995 4200 4800 3300 10 16 12 14 1 4 Groups definedas a functionof mortality rates and geographic location None, but only in Conakryurban area None, but only in Maputo/Matola urban area urban/rural in seven regions none Mountain,urban hills, rural hills, Terai Note:Though Guinea Mozambique the and surveys conducted the Cornell were by University Security Food Program, theirpurposeandmethodology very similar the WorldBanksurveys, are to whichmakesthem interesting examples LSMSfieldimplementation. of requires sophisticatedmultivariateanalytical techniques. Thus the precision of estimates of means from simple two- or three-way tables was not deemed of overwhelmingimportance. Moreover, in designing the LSMS it was judged of much greater analytical interest to have a large amount of information about a relatively small number of households rather than a little information about a larger sample. Third, given the multiplepurposesof an LSMS survey, it is hard to select one single variable for the purpose of minimizingsamplingerror. HOUSEHOLDS DWELLINGS. basic analyticalunit of LSMSsurveys AAD The is the household. Many surveys define the householdas a group of people who 61 share a roof and a cooking pot.25 LSMS surveys often also require individuals to have been present for at least three of the past twelve months in order to be considered as household members (though heads of household and newborn infants are considered memberseven if they have not been present that long). The secondsamplingstagealmost alwaysrequires a field operationcalled "householdlisting." Enumeratorsvisit each selectedPSU to update the existing maps and prepare the list of all householdscurrently living there. Householdsto be interviewedare to be selected from this list. The practical implementation of this operation makes it difficult to preserve the above definition of a household, because that would entail timeconsuminginterviewingthroughouteach PSU. In practice, dwellingsare listed instead of households. A dwelling is defined as "a group of rooms or a single room occupied or intendedfor occupancyas separate living quarters by a family 26 or some other group of persons living together, or by a person living alone." Besides the advantageof being shorter to complete,a listing of dwellingsis more permanentthan a listing of households. Strictly speaking, therefore, the LSMS samples are samples of dwellings rather than of households,though the listing operationis still traditionallycalled "household-listing"rather than "dwelling-listing."' Some dwellings may be unoccupiedand some may be occupiedby two householdsor more, but the large majorityof dwellingsare occupiedby one singlehousehold.(The average number of households per dwelling ranges from 0.9 to 1.1 in most countries.) If a dwelling with two households is selected in the sample, both are interviewed separately. NON-RESPONSE ANDHOUSEHOLD REPL4CEMENT.Some households selected for the sample will not be interviewedbecause of one of the followingreasons: the interviewer cannot locate the dwelling; the dwelling is uninhabited; the dwelling's residentsare away from homeand expectedto remain so until after the end of the survey period in that area; or the residents refuse to be interviewed. Non-responding households cannot be considered to be a random sample of all households. Non-responserates are always higher in urban than in rural areas and higher in rich householdsthan in poor households. They also have a 25. For a discussion of the concept of household and its variants, and details on the operational definitions used by various UN agencies see UNNHSCP (1989). 26. Kish (1965). 27. The confusion in terms is further complicated by the fact that in regions without street addresses and house numbers, dwellings are usually identified by the name of the head of the household currently living there. 62 clear tendency to decrease as the survey proceeds and the field staff becomes more experiencedand persuasive. Surprisinglyenough, refusal does not seem to be related to the length of the questionnairebut to the unwillingnessof certain 28 people to be interviewedat all. There is a lot of controversy about what should be done about nonresponse. Some survey practitioners try to achieve the planned sample size by replacing the refusals with other households,whereas other specialists condemn these efforts as sterile and argue that the resulting sampleof non-refusalswill still be biased by definition. Neither replacing nor failingto replace non-responding householdssolves the essential problem of bias. Thus everybodyagrees that all efforts should be made to keep non-responsesto a minimum and that the choice of replacements,if any, shouldnot be given to the interviewerslest a sampleof "easy-to-interview"householdsresults. The solutionadopted by LSMS surveys is pragmaticand is based on the principle that interviewers should not be "rewarded"by having to do less work householdsare replacedby other in the case of a non-response. Non-responding randomlyselectedhouseholdsby meansof an explicitprocedure that is explained in the next section of this chapter. All the details of this process (includingthe codes of the replaced and the replacing household and the reasons for replacement) are properly documented, both in the questionnaires and in the computer files, to let each analyst decide individuallywhether or not to include the replacementhouseholdsin the data sets being analyzed. The surveymanagersshouldcarefully monitorall replacements,especially those determinedby refusal. Many surveyshave demonstratedthat refusals rates can be reduced to a minimum,since refusals often depend on the interviewers' attitudesand experience. There is empiricalevidencethat individualinterviewers usually have very different refusal rates. It is useful to stress this to interviewers while monitoringthe refusal rates for each interviewer. Refusalsand replacementshave been relativelylow in LSMS surveys. In the Mozambiquesurvey, out of the 560 first householdsvisited, only seven were not those originally selected and only three refused to be interviewed,a trifling d'Ivoire, the number that is the more remarkable in a country at war. In CMte non-response rate was 7.8 percent the first year, of which 1.4 percent was refusals. In Peru (1985) the non-response rate was 17.4 percent with a 1.4 percent refusal rate. The overall non-responserate during the first month of the Romania survey was 7 percent, though it reached 18 percent in some neighborhoodsin Bucharest. 28. This is worth remembering when it is necessary to defend the riches of the LSMS long. questionnaire content against those who insist that it is umnmanageably 63 C. Implementinga SampleDesign Determining the Basic Sample Design Parameters As explained above, the decisions about the basic sample design parameters (the number of households in total, per PSU, and per analytical domain) are based on qualitative judgements based on past experience and estimatesof cost and manageability. The decisionsabout the basic sample for an LSMS generally follows these steps: (1) A preliminary estimate of the total sample size is established. As explained above, the sample rarely exceeds about 5,000 households,but may be much smaller if a single analyticaldomainis required, or because of constraintson the budget or implementation capacity. Using data from the most recent census, this sample is distributed in proportion to the total number of householdsin the major regions, urban and rural locations, etc. In other words, the option of using a constant sampling fraction throughout the country (i.e. a self-weightednational sample) is taken as a starting point. If the sample seems insufficient for some particular analytical domains 29 (fewer than, say, 300 to 400 households) , the sample size may be increased in these domains and decreased in other domains. (2) (3) While implementingStep (2), parts of the populationmay be purposely excludedfrom the sample,becauseof their inaccessibilityor for securityreasons. This happened in Peru, where in 1985 three provinces were controlled by guerrillas and/or drug dealers, and in Pakistan, where the most remote parts of 3 Balochistan were extremely hard to reach. 0 Likewise, the Mauritania survey 29. There is no rigorous, quantitative justification for using this particular number. Rather, a wide variety of analyses of differenttypes on different variableshave convergedupon this as a reasonable ballpark figure. Analysts complain loudly when the numbers get much below this threshold, but are often reasonablycontent above it. For a variable with a proportion of forty percent(for example,the percent of households with pre-schoolagedchildren), ignoringthe finite populationcorrection, assuminga typicalLSMStake of 16 householdsper cluster, and an intracluster correlationof .05, a 400 householdsamplegivesa 95 percent confidence interval ranging from 33.65 to 46.35 percent. This underscoresthe need for caution in reporting results for very small subsetsof the population. 30. The decisionto exclude remoteareas from the samplehas to be consideredcarefully,though. Often these areas are very vast and tend to be frontier regions that are important to national politics (e.g. the Amazonbasin in Brazilor the Chacoregion in Paraguay),so that the survey may look bad in the eyes of policy makersif they are excludedfrom the sample. However, these areas tend to be also so scarcely populatedthat, if included, only a few clusters wili be selected for the sample,and the extra cost of visitingthem would be manageable. 64 excludedthe nomadicpopulation. The surveyin thesecases is explicitlydesigned to represent only the rest of the country. Step (3) may have to be repeated a few times until a satisfactorypartition is achieved. Given that the resources needed to conduct interviews can vary significantlyacross the territory (interviews are usually more expensive in the rural areas and in the most isolated parts of the country), it is useful and instructiveto explore the alternativeoptions with the aid of a spreadsheetto take into account their budgetary and logisticalimplications. As a general guideline, we believe it is better to reduce the number of partitions imposedin this way to a minimumand to keep their samplingfractions as close as possibleso that the total sampledoes not differ too much from a selfweighted national sample. While reasonable statisticiansand econometricians hold varying opinions of the theoreticalvirtues of self-weighting,we are much swayedby more pragmaticissues. The more complicatedthe sampledesign, the more often the sampler will make mistakesin executingthe samplingand the less often others will be able to detect them. There is also a long history of sampling weights being lost, incorrectly calculated, or omitted or misused in analysis. Self-weightedsamples are much more robust to this kind of error than more complicateddesigns. In a self-weightedsample, the proportionsand averages obtainedfrom the sampleare unbiasedestimatesfor the proportionsand averagesin the population. However, when adjustments are made in step (3), the sampling fractions will becomedifferentacross analyticaldomains,and the samplewill no longerbe selfweighted. The households will need to be weighteddifferently to get unbiased estimates. CallingNkthe total numberof householdsin the populationof domain k and nk the number of households sampled in domain k, the weight wkto be applied to the values from that domain is W-2Nk n,* Note that wk is the inverse of the selectionprobabilityof each household in domain k. As with all sampling information, the basic set of weights (also known as expansion factors or raising factors) resulting from this step of sample designshould be carefullydocumentedand made availableto the survey analysts. The number of PSUs to be sampledis determinedby the total sample size and the numberof householdsto be interviewedin each PSU. The latter depends on both theoreticaland practical considerations. On the one hand, the numberof householdsper PSU affectsthe precisionof the sample, as explainedabove when discussingcluster effects. On the other hand, the number of householdsper PSU is a functionof the length of the interviews, the number of interviewersin each team, and the time each team spends in the PSU. Typically, each field team 65 visits 20 PSUs per year, spends two weeks in each PSU, and interviews 16 householdsin each, though in some LSMS surveys as few as 10 or as many as 24 householdsper PSU have been selected. Implementationof the First Sampling Stage THE SAMPLING FRAME. Implementationof the sample begins with the sample frame - the completelist or file of units from which the sampleunits are 31 selected. To develop a sample frame from census data, it is important to obtain a computer-readablelist of all PSUs, with a measure of size such as the number of households, the number of dwellings or the population, recorded in 32 each of them. All statisticalagencies must eventuallyprocess this information in order to obtain the classic censustabulationsfor larger geographicaggregates, but the preparation of the PSU list as a separate by-product is often forgotten. When the list is not available, the data must be compiledand put into a computer file as quickly as possible. This shouldnot take more than a few weeks, and the list usually fits on one diskette; it does not require that all data from the census be entered or analyzed. Though only the total numberof householdsor dwellings in each PSU is really needed, the list will probablyalso includethe total populationof each PSU, broken down by sex. This informationshouldbe entered into a spreadsheetlike the one shown in Figure 4.1. If the sample considers differently weighted domains, the procedure described here should be applied independentlywithin each of them (i.e. the sample frame data should be entered in a separate spreadsheetfor each domain). The spreadsheetcontains one line for each PSU and columns for descriptive information such as the province, district (or whatever administrativehierarchies are used locally), PSU number, population, number of males, number of females, and number of householdsor dwellings. 31. For an extensive discussion of sample frames, see UN (1986). 32. At least minimally sufficient census information has been available in most LSMS surveys. One exception was the Conakry 1988 survey. There the last colonial census had recorded some 50,000 people in the city, which had grown to about I million by 1988. This situation was resolved with a special cartographic operation and a subsequent area sampling procedure that does not need to be further described because it is unlikely to be necessary in other countries. The current wave of LSMS surveys will benefit from the 1991-1993 wave of national censuses, which provide census data for most countries. 66 Figure 4.1: List of First Stage Sampling Units A 1 Pro2 vince 3 4 1 5 1 6 1 7 1 B District 1 1 1 1 C PSU D Population 365 262 357 503 E N' of Males 180 143 172 267 F N' of Females 185 119 185 236 0 G N of Hholds 62 43 58 71 1 2 3 4 After all the data have been entered and before proceedingany further, a series of checks shouldbe carried out to ensure that no PSUs have been omitted from the listing and that all the data are correct. These tests are relatively easy to implementwithin the spreadsheet, and may include the following: (i) The total populationin each PSU should equal the number of males plus the number of females in the PSU. (ii) The masculinityrate (number of males as a percent of the numberof females)in each PSU shouldbe within reasonable limits (e.g., between 80 and 120 percent). (iii) The average household size in each PSU should be within reasonable limits (e.g., between 3 and 10 persons per household). (iv) The total number of PSUs and households,as well as the totals by sex in eachadministrativeunit, shouldbe consistentwith the other information available from the statistical agency. Also, the list should be scanned to make sure that the PSUs are not too small. Small PSUs may be too homogeneous(and some of them could even be too small to select the required numberof householdsin the second stage). PSUs smaller than 30 householdsshouldbe appendedto some of the neighboringPSUs, whichis facilitatedby the fact that statisticalagenciesgenerallynumberthe PSUs according to some geographicalpattern, so that two PSUs with sequentialcodes will be neighbors. For example, when the sampleframe was being developedfor an LSMS being planned for Paraguay, almost all PSUs in urban areas were smallerthan 10 householdsand an ad hoc computerprogramwas written to create larger aggregates. SELECTING PSUs. After the sampleframe has been reviewed, the actual selection of the sample of PSUs to be visited by the survey can proceed. The methodfor makingthis random selectionwith PPS will be explainedbelow. Here we assume that the nwnber of householdsis used as a measure of PSU size. The same method would apply if some other reasonable measure of PSU size were used. Another column must be added to the spreadsheet for the cumulative number of households. This columnwill contain the total number of households up to and including the correspondingPSU on each line, as in column "H" in 67 Figure 4.2. The last line in column H will contain the total number of 3 households.3 Figure 4.2: CumulativeTotalsin the List of First Stage Sampling Units A 1 2 3 4 5 6 7 Province 1 1 1 1 District 1 1 1 1 ... S C PSU 1 2 3 4 ... D Poputation 365 262 357 503 ... ... E F G H No of N' of Hates Females 180 143 172 267 ... N' of Cumutative HhoLds N* of Hhhs 62 43 58 71 ... 185 119 185 236 ... ~~~..... 62 105 163 234 The complete spreadsheet should be printed and kept for reference. SelectingPSUs with PPS can be done manually on the printout or automatically with the spreadsheet. For the sake of simplicity the manual procedure is describedhere. First, divide the total number of householdsby the numberof PSUs to be selected and round it to the nearest whole number. Call this number "SI" (the samplinginterval). SI = Number of households Number of PSUs to be selected For instance, if the number of householdsis 200,000, and 184 PSUs are to be selected, then SI = 200,000 / 184 = 1,087. Second, using a table of randomnumbersor a scientificpocketcalculator, obtain a random number between 1 and SI (if a calculator is used, obtain a random number between 0 and 1, multiply it by SI, add 1, and drop the decimals). Call this number "RS" (the random start). Assume, for instance,that RS turns out to be 127. with a simple formula. Continuing 33. ColumnH can easily be calculatedwithin the spreadsheet with the examplein Figure3.1, the formulaG4 + H3 wouldbe entered in cell H4, and then copied all the way down columnH. 68 Third, write a sequenceof the 184 numbersobtainedby starting with RS, and repeatedly adding SI. With the above values of RS and SI, this sequence would start like this: 127 127 + 1087= 1214 1214 + 1087 =2301 2301 + 1087 =3388 Fourth, starting with the first number in the sequence, scan the printout of the PSU list for the first PSU where the 'Cumulative N of Households"is equal to or larger than this number. This PSU is selected for the sample. Continuing with the example above, the first number in the sequence is 127. Scanning the PSU list, the first and second PSUs should be skipped, because the respectivecumulativenumbersof householdsare 62 and 105, which are less than 127. However, the cumulativenumberof householdsfor the third PSU is 163, which is greater than 127. PSU Number 3 in District 1 of Province I would therefore become the first PSU selectedin the sample(see Figure 4.3). Figure 4.3: Selecting the First Stage Sampling Units A 1 Pro2 vince 3 4 1 5 1 6 1 7 1 .. I 1 B District 1 1 1 1 C PSU 1 2 3 4 D Population 365 262 357 503 ... E N of Males 180 143 172 267 ... F N' of Females 185 119 185 236 ... G H N of Cumulative Hholds N1 of Hhs 62 43 58 71 ... ~~~... 62 105 163 234 .... _ ..... .. ........ Finally, repeat the above procedurefor the remaining 183 numbersin the sequenceand create a separate list of the province, district, and numbers of the PSUs thus selected.3 SORTING THE SAMPLEFRAME. The selection procedure described above will almost certainly result in a sample of householdsthat conserves the overall characteristics of the sample frame. In other words, the proportion of urban households in the sample, the distribution of the sample by province, and so forth, will all be statisticallysimilar to those in the general population. However, since the selectionis random some slight deviationsmay occur. For instance,by 34. This method is known as systematicsamplingwith PPS.' Alternativemethods for PPS selectionare possiblebut seldomused in practice. 69 sheer bad luck the samplemay containa larger proportionof northern households than the sample frame. There is, however, a simple way of making sure that one particular distributionalcriterion of the householdsis reproduced in the sample in the best possible way. All that is needed is to sort the PSUs in the sample frame according to that criterion (north to south, for instance) before the selection." In many cases, the "natural' order of the sampleframe - accordingto encoding of administrative units - will be adequate and no further sorting will be necessary. SEGMENTING LARGE PSUs. The household listing operationbecomes too burdensomein PSUs larger than 300 households. This problem is aggravatedby the PPS procedure, which tends to bring disproportionatelymany of the larger PSUs into the sample. One possible solutionis just to accept that the household listing operation will be harder and longer than usual in those cases, but if they are very large or if many of them are selected in the sample, it may become necessary to split them into smaller units, called segments. This need only be done for the large PSUs actually selected in the sample. Segmentationconsists of dividing the area of the PSU into pieces, only one of which is selectedin the sample. Segments shouldhave clearly defined boundaries,and a rough estimate of the numberof householdsin each segment shouldbe made, either using recent maps or aerial photographsor by means of a "quick count" of dwellings in the field. The original PSU in the list is replaced by the segments (each with their size measures adding up to the original). Then listing only the segment that is selected need be done. PLANNING THE FlEwD WoRK. To distribute the selectedPSUs among field teams, their locationsshould first be plotted on a map of the country. They can then be grouped into regions of approximatelyequal size while trying to spread the workloads evenly and reduce travel time as much as possible. As a byproduct of this process, the optimal locations of the teams' base stations are determined. Figure 4.4, for example, shows the locationof the clusters surveyed in the 1988-89Ghana LSMS survey, the regions covered by each field team, and their headquarters. The next step is to establish the work schedule for each team, that is, to determinein advance when each PSU will be visited. In standardLSMS surveys householdinterviewsare conducted throughouta 12-monthperiod. To even out 35. Sorting of the sample frame prior to systematic selection is sometimes referred to as 'implicit stratification." This method is simpler and more reliable than forcibly allocating the number of PSUs to be selected to certain categories. The latter approach is prone to subjective decisions that unnecessarily sacrifice the self-weighted character of the sample or its domains. In addition, all too often these decisions are undocumented or the documentation lost, so that the required corrective weights cannot be used. 70 the effects of seasonality,the order in which each team visits the PSUs assigned to it should be random.' For the Nepal LSMS, this was done by giving a serial number to each of the 275 PSUs selectedin the first samplingstage. The numbers001 to 275 were given to the PSUs at random. After the 275 PSUs were distributedamong the 12 field teams (unevenlyin this case, given the differencesin accessibilityinside the country), a simple sort by PSU serial number produceda work schedule for each team. Most programming languages and other software have built-in random number generators, but applying them to assign serial numbers to a group of objects in a random order (a problem technically known as "random permutation," or colloquiallyas 'shuffling") is not as easy as it seems. A short algorithm to produce a random permutation of the first N integers is given in Basic in Figure 4.5. The algorithm can easily be implemented in other languages. Implementation of the Second Sampling Stage HousEHoLD LISTNG. A list of all dwellings in each selected PSU is needed to determine which dwellings on the list will be visited in the survey. Usually this list will have to be created or updated for the survey, thoughin some cases it can be borrowed from a census or from another survey. The option of borrowing an existing list shouldbe examinedcritically, however, to ensure that the existing lists are recent, complete, and have good addresses. In particular, demographicmobilitymakes it dangerousto use lists that will be more than one or two years old by the time of the actual field work. The standard for completeness is difficult to set, but under-enumerationin the census of five percent would be worrisome and the standard could well be stricter. The informationon the list should make it easy to locate the householdsonce they are selected. In areas with a good street address system, addressesmay be sufficient. Alternately,grid codes on census maps may be used, or references to landmarks and the name of the household head. 36. It is sometimesarguedthat such a random arrangementis too expensivebecauseit forces the teams to moveback and forth across their territoriesduring the year rather than visitingthe PSUs in a more orderly fashion. The latter option, however, entailsthe danger of confusingtime and space at the analyticalstage. In other words, if all PSUs in an area are visited in the same months, it may be unclear if a certain constant condition is due to seasonality or to some geographiccharacteristic. An 'orderly' arrangementof the PSUs is also unlikely to be more economicin any case becauseLSMSsurveys are devised so that field teams come back to their headquartersbetween field visits - a feature that will be explainedlater, in Chapter5. 71 Figure 4.4: Assignmentof WorkAreas, GhanaLiving StandardsSurvey, 1988-89 B 14 B ~ -9~A D r 00, zlw 2: w RK I NA F A ~ ~~~~~~~~~BU SO -:7 11-- - - -_, _ %--_ F I-111.~~~~~o I 01ac I ( *LI/ H K't O'30- Jff4. ' J J4 ]-'A r / O'.. _ S_ 2_ _ GHANA7oni 71 726 VING STANDARDS STUDY / \ t (o-C6b 107 W /. t | 710 AnerYn 9tr~~ EAST UPPR > GIsne Cbwdu6 ASSIGNMENTWORK OF AREAS TEAMS TO DATACOLLECTION PPJMARY PRMO ROADS ~~~~~~~~~~~~~~~~1988-89 c < I& 3r- 67 J< 12 '_ * * 8ASE B aTIES UPPER f 69A CIES SAMPSING NATINALCAPITAL RECION BOUND'ARIES * ,T765 A g X 12 .~ 5 BOUNDARIES --- -o% ) INTERNATKIOAL ENUMERATION AREAS Tuni \ -:30P - X / ~~~~~~~~~~~~~~~~~~NORTHERN 6o86' X .i L 9 30 '- 10 10 91 00' 9. D'IVOIRE 2r j 749 678~ ~~~~~~~3 ~ 8 ~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~tOst-i;0y| ,=.Dodoio K& 670 kr ] sac0 6' i 4 TO G O 8 W. 1'AAF *653 .7 ~~~~~YacMionso IknMu BRONG-r aJ0@ . 't r | 653uw YwaMonro I 3/ 757 666 __-v Teckimo,. AHAFO . 8, a/ Kw-kro Ak-r 670 0 Okynm.kro t 0 J ~~~~~~~~~~~~~~758 7 | t / * ~~~~~~~65A 56~t5g A ~~621 61,we. - /<4, 622 ^ 0a662 / '_K 6 I OHANAf36 n JTASHANTI , r ot -7 00, ' .~ 300 230' 51167 0 sio 09~ ~ ~ ~~ ~ ~ ~ ~ raG' ~ ~ 1~ ~ *39qintt4by ~ ~ ~~~~~~N 100' ~ W n.v . . .( |' ~~~~~~~~~~~kr DIOR" --' 7 OD' 6J3' 70 nt~~~~~~~~~~~~~~~~~~~~~~~~~w#os e rA. nq w nd.n0.Iin of Cuine w<t; B X 6 :-- | i - C. :: -K D 37 . 477 A$A.J F Figure 4.5: An Algorithmto Producea Random Permutationof the Integers I to N randomize timer input N dim P(N) for 1=1 to N PMI)=I K=l+int(I*rnd)) swapP(l),P(K) next The statement "dim P(M)"initiatizes array with N elements. In the an P subsequent "for ... next"loop, the arrayelements are successively given the values 1, 2, 3, .. , 1, .. , N. The element I is interchanged with one of the etements already presentin the array(K), selected random. The initiat at vaLuesare givenin the "P(I)=I" statement the interchange done with the and is "swap P(l),P(K)"statement". The statement"1K=1+int(I*rnd)rn produces a random integer from 1 to I ("rnd" K, generates random a realnumberbetween and 1 0 and 'int"takesthe integer part of a nuTber). Household listing can be carried out either as a separate field operation conductedin all PSUs before the surveystarts or by the survey teams themselves when they first arrive in each PSU. The first option is more expensivebut more reliable. The expense is incurred because each locality must be visited twice, once during the listing and then again during the survey. It may also entail some difficultyin locatingthe selecteddwellingsduring the surveybecause of the time that will pass between the listing and the survey itself. Listing as a separate exercise is more reliable than listing as part of field work because staff that are specificallytrained and devoted to listing are less likely to bias the sample by excluding the dwellings that are harder to reach. (These dwellingsare usually inhabitedby poorer householdswho have arrived in the area recently). The survey teams, working under pressure to start interviewingquickly, are moreprone to make mistakesin this regard. Also, with separate listing the dwellings to be surveyed can be randomly selectedfrom lists in a single central location using reliable and uniform procedures. The two most importantcharacteristicsof the list are that all dwellings in each PSU be includedon it and that it allows the selecteddwellings to be located easily."' Some practical guidelines can help attain these objectives: 37. The importance of listing procedures is underscored by the experience of the CMted'Ivoire LSMS surveys. The mean household size observed in the survey dropped from 8.31 to 6.33 persons between 1985 and 1988. Close investigation of this striking phenomenon suggests that it was probably caused by a change in the listing method (see Coulombe and Demery, 1993 and Demery and Grootaert, 1993.) In 1985 and 1986 shortcut procedures were used, rather than the recommended full listing of the households in reasonably sized PSUs. The implications for policy analysis of the apparently inaccurate sampling in the early years were considerable. Demery and Grootaert, 1993, calculated weights to try to correct for the change in sampling procedures. They 74 * Field work should always start with a cartographic reconnaissance. The maps do not need to be very precise in terms of scale or the locationsof the dwellings, but they should show the PSU boundaries and the landmarks used to split it into smaller areas. This helps to organize the daily work of the differentenumerators. Each enumerator should scan the assigned area in an orderly fashion, striving to keep neighboringdwellingsclose to each other in the list. As a rule of thumb, the time neededto list a PSU can be estimated from a standard daily yield of 80 dwellingsper enumeratorin urban areas to 50 in rural areas. The lists shouldreflect the proper concepts of dwellings and households. Enumerators should be trained to tell the difference between the two. Dwellings should be clearly listed with appropriate addresses so that interviewerscan find them easilyduring the survey. Designersshoulduse some imagination to achieve this goal where street names and house numbers are not well established. In many surveys dwellings are numberedas a part of the listing operation, either by affixinga numbered sticker to the outside of the home or by paintinga number on the wall or door. At the time this is being written (in fall 1995) the possibilityof using Global PositioningSystems(GPSs) to support field work of future LSMSs is being considered. GPSs are battery-operateddevices the size of a pocket calculator, currently commerciallyavailable for about $500, that use satellite signals to pinpoint the user's position with remarkable accuracy (within 10 meters or so in the three dimensions: latitude, longitude, and altitude). Enumerators could use GPSs to record the dwelling locations during the listing operation; interviewers would use them later to locate those selected for the sample. The complete list shouldalways be recorded in a standard form with one line per dwelling. The list can be several pages long, depending on the size of the PSU and the numberof enumeratorsengaged in the operation. Though the precise layout of such a form depends on local conditions, a typical list form is shown in Figure 4.6. * * * * * then calculated mean consumption, poverty, and a series of other important indicators using the weighted and unweighted data and found substantial differences. For example, the head count estimate of poverty in 1986 fell by 14 percent when corrective weights were applied. The bias differed widely among socioeconomic groups and regions. The time series analysis of poverty was also affected. The unweighted data apparently underestimated the increase in poverty between 1985 and 1987. 75 Figure4.6: 74picalListing Fonn Region: Province: Locality: Enumerator: PSU Code: Page: [ m Date of the Listing: |Seial||Address the dwelling of Head of the household | HI l ito ____I * ''T11 Supervisionof the listing operation is crucial. Listers have an obvious incentive not to be too diligent in locating hard-to-find or remote dwellings. Since there is no criterion to tell how diligent they are being that can be easily monitored in the office, the field supervision will be key. Supervisory staff (or other listers) must re-visit a subset of listed 3 areas, especially the difficult parts of them, to verify the listing.8 An option that might be feasible in some settings is to use lists from other sources to help in this process. For example, if the PSUs can be identifiedwith electoral areas, voting lists might be used. Althoughnot every resident of the PSU will be on the voting list, any address on the voting list shouldbe listed in the PSU listing. Columnsmay be added to this model for key landmarks, the occupations of the head of the household,or whateverother informationcould help in finding the dwelling. It may also be useful to have the enumeratorsfill in separate lines for buildings that are not dwellings, such as shops and offices; in that case, a specialcheck column shouldbe added so that the real dwellings can be told from the other buildings. However, only essential informationneeded to identify the dwelling shouldbe recorded. Includingtoo much data slowsthe field processand risks shifting the enumerators' interest from listing to interviewing. So far this discussion has assumed that the maps from the most recent census are available, so that the listing focusses on updating the listing of dwellings within well defined boundaries. In fact, it is often the case that some, 3 or even all, of the maps have been lost in the interveningyears. 9 In such cases, 38. It can be especiallyuseful to do this around dusk, when lights or smokefrom cooking fires may help locate dwellings. Carrying binoculars may be useful for finding dwellings across ravines, or down roads markedno trespassing. 39. The samplingchapter in Delaine and others (1992) treats the allied problem of what to do when the boundarieswere poorly definedin the originalmaps. 76 it is sometimes possible to reconstruct the maps. This would happen, for example, when only the occasional map has been lost and other maps for the contiguous samplingunits still exist. Another means of reconstructing the maps may be possible when the sampling units correspond to some administrative unit that the populace or officials will recognize. This is often the case, especially in rural areas. For example, a sampling unit might correspond to a ward or village. In this case there is a special detail to watch for. Say that PSU 348 was labeled Alama, which is the name of the ward that it corresponds to. It would seem a straightforward matter to send listers to the ward of Alama and have them establish the boundaries of the ward and start listing. But Alama may have grown a good deal in the several years since the census and the area subdivided into new wards. The central area will still be called Alama, but the new wards will have other names, say Bendicion,Caceres, Durango, and Esperanza. In this case if the lister goes to Alama and asks where its boundariesare, he or she will be told about the new boundaries that cover only a fraction of the area of the original Alama. All the area covered by Bendicion, Caceres, Durango, and Esperanza would be omitted and would not be listed. The populationof these areas would effectively be excluded from the sample. The solution to these problems lies in trying to verify from the appropriateauthorities (the ministry of local government, ward officials, etc.) whether the boundariesand names have been constant since the last mapping. This shouldbe done both by the statistical agency's central office for the country as a whole and verified by individual listers. ADJUmNG FOR DIFFERENCESIN PSU SIE. Differences are sure to be found between the "census" size of each PSU (the size that was used for PPS selection in the first sampling stage) and the "observed" size (from the listing operation). For instance, the listing operation of the Nepal LSMS - conducted in mid-1994, two years after the 1992 census - showed that in 153 of the 275 selected clusters the "census" and "observed" cluster size differed by more than ten percent. The minimum and maximumvalues of the ratio of "census" to "observed" size were 0.23 and 3.84. The mean value of the ratio was 1.06. These differencesare partly due to imperfectionsin the census and partly due to demographicmobility. Whatever the reason, the differencesalter the selfweighted character of the sample in each analytical domain, which makes it necessary to correct the sampling weights in order to obtain unbiased point estimates from the survey. Assuming for simplicity that the number of households was used as a measure of PSU size in the first sampling stage, and calling C; and Oi the census and observed number of households in PSU i (belongingto weighteddomain k) the expansionfactor w; for the householdsin 77 that PSU should be WI = Wk C oi Ci where wk=Nk/nk is the basic sampling weight of domain k, defined before (see Section Determining the Basic Sample Design Parameters). The formula would be slightlydifferent if some other measureof size (such as the populationor the number of dwellings)had been used in the first samplingstage. It goes without saying that the completelist of weights w; for all PSUs (and better still, the list of all Ci's and Oi's) should be carefully kept and made available to analystsas a part of the survey documentationand data sets. SELECTING DWELLINGS. The dwellings to be visited are selected by systematic sampling from the PSU listings. A few extra dwellings are also selected to be used if replacementsare needed in the field. The selectionprocedure, thoughgenerallywell known to statisticalofficers everywhere, is illustrated below in Figure 4.7. This example assumes that 16 dwellings are to be interviewedand that 4 extra dwellings are to be selected in each PSU as replacements. The exercise is to select those 20 dwellings, based on information contained on a typical listing form such as that shown in Figure 4.6. First, count the total number of dwellingsin the PSU and record it in the space on top of the form. Assume, for example, that there are 86 dwellings in the PSU. Second, divide the total number of dwellings by the number of dwellings to be selectedand keep the first decimalplace. The result is called the sampling interval (SI) and is also recorded on top of the form. In this example, if the number of dwellings to be selected is 20, SI would be 4.3 (because 86 / 20 = 4.3). Third, select a one-decimalrandomnumberless than the samplinginterval (in the example, this would be a number from 0.0 to 4.2; it can be obtainedby selectinga random integer from 00 to 42 and inserting a decimal point before the last digit). Add 1 to that random number. The result is called the "randomstart" (RS) and is also recorded on top of the form. Assume, for instance, that RS turns out to be 3.2. Write the 20 numbers obtained by starting with RS and 78 repeatedlyadding SI. With the above values of RS and SI, the 20 numberswould be: 3.2 3.2 7.5 + 4.3 + 4.3 = 7.5 = 11.8 = 16.1 = 20.4 20.4 + 4.3 = 24.7 24.7 + 4.3 29.0 + 4.3 33.3 + 4.3 37.6 + 4.3 = = = 29.0 33.3 41.9 + 4.3 46.2 + 4.3 50.5 + 4.3 = 46.2 63.4 + 4.3 67.7 = 67.7 z 50.5 = 54.8 z 59.1 = 63.4 11.8 + 4.3 16.1 + 4.3 37.6 41.9 54.8 + 4.3 59.1 + 4.3 + 4.3 = 72.0 72.0 + 4.3 = 76.3 76.3 + 4.3 - 80.6 80.6 + 4.3 - 84.9 Finally, take the integer part of each number. The 20 numbersobtained in this way (3, 7, 11, 16, 20, 24, 29, 33, 37, 41, 46, 50, 54, 59, 63, 67, 72, 76, 80 and 84), are the sequencenumbersof the dwellingsto be visited in the survey. The correspondinglines in the listing shouldbe transferredto another form, called the List of Selected Dwellings (see Figure 4.7). The households to be visited during the survey are those listed on the sixteen unshaded lines in the form. The dwellings on the shades lines are kept as reserve for possible replacements. Both the full listing form with all dwellings and the list of selected dwellings will be needed by the field team responsible for the PSU during the survey (the former will help them locate the selected dwellings in the field, by referring to their neighbors). As this operational requiremententails the risk of loosing thesedocuments, it is highly recommended providethe field teams with to photocopies and to file the original lists securely for at least five to 10 years. The lists constitute a precious material for central supervisionand may even be required long after the end of the original project for panel or follow-upsurveys, or even as base material for different surveysconductedby the statisticalagency. REPLACINGHOUSEHOLDS. The above selection procedure implicitly assumes that it may be impossible to interview the households in some of the selected dwellings and that a standard procedure for replacing them has to be implemented. The most frequent reasons for replacement are: The dwellingis unoccupiedand is likely to remain unoccupiedfor the full survey period. The dwelling has disappearedor is not being used for housing. The dwelling cannot be located because the information in the listing is bad or insufficient(for example, illegiblenames or addresses). The household refuses to be interviewed. 79 Figure4. 7: List of SelectedDwellings Region: Province: Locality: Random start: PSU Code: Interval: Z Total No. of DweLlings: Household size SerialPageSeriaL NU-ber andNuuber in the Address dwelling of Head of the household tat me in saimpLe the List 01 02 MM .M"11:1! lli l . _I.IK ;II; IImM1..........E . . . . ........ . . . . . . . .... I.I..l . _ ....... 04 05 06 Ijj105MI == 07 l Mi iii i ii == iii HfhiEi KEEili9mm HmiiEiEiiii =il iii i i iiHiHiii2iiiEiU iliiiiiiiE EEii',i EiiiEEi liEiiEiiEEEEmiiiiEi mnii 09 10 11 12 14 15 16 17 H.H. M MM -MiiERME~E Hin iIIiIUiiiUILIiEEIIEiiEiUUUEII IE II MMMUMIUjIEI!iiiIiIjEjj!jE iHiE iffiffimiii 1iE iiEE 19 20 _ == = = These cases should be carefully studied by the team supervisor. Only when the supervisor is convinced that the interview is impossible should the dwelling be replaced with the one on the nearest shadedline in the form.40 40. Notice that the shaded lines in the form are evenly interspaced between the unshaded lines. The idea is to replace households by near neighbors, which are likely to have similar socioeconomic characteristics. Shading every fifth line allows for replacement of up to 4 out of the 16 dwellings selected (a 25 percent non-response rate). A smaller proportion of replacements could be insufficient in certain worst-case PSUs; shading a much larger proportion of lines could 80 If the dwellingis occupiedby a householddifferentfrom the one recorded during the listing operation, the new householdis interviewedwithout more ado. As said before, the LSMS samples are actually samples of dwellings, and such cases should not be counted as non-responses. SelectingRandom Persons in a Household To reduce interviewtime, the LSMS questionnaireis sometimesdesigned so that certain modules are applied to one randomly selected person in the 4" household. The CMted'Ivoire LSMS, for instance, collected fertility information from one woman 15 years or older. As opposedto the other randomselectionsdescribedso far, whichare most reliably carried out at central offices, the choice of a person at random in each householdmust be performedby the interviewerin the field. A simpleprocedure must be devised for this that gives each eligible person the same chance of selection and is verifiable so that the work of the interviewer can be tested for accuracy (the latter precludesthe use of dice or other "truly random" methods). Instead of the traditional Kish tables (Kish, 1965), LSMS surveys have 4 opted for an original, alternative method.2 As explained in the chapter on questionnairedesign, each householdmember is assignedan identificationcode, generally from 01 to 20, in the household roster of the questionnaire. An adhesive label with a different random permutationof these numbers is affixed to each questionnaire. To select the person, the interviewer scans the list of identification codes on the label until the code of an individual who meets the defined eligibility criteria is reached. Figure 4.8 shows one of these labels. Figure 4.8: Sticker Usedfor Selectinga RandomIndividual Withinthe Household 03 06 07 08 11 12 10 17 04 02 16 15 05 18 19 01 13 20 09 14 be interpreted by some field supervisorsas an invitationto replace with abandon. 41. This sectionhas been adapted from Ainsworthand Mufioz(1986). 42. Kish tables do not always give exactly the same chance of selection to every eligible individual. A more seriousdisadvantage Kish tables is that theyrequire the eligibleindividuals of to be given a serialnumber prior to the selection,in additionto their standardID codes. The coexistenceof two different numberingsystems for the same person may confuse the interviewer, and is avoidedin the LSMSmethod. 81 The procedure is simple but requires careful training of the interviewers. They should scan the list of ID codes one line after the other, always from left to right, crossing out all the numbersthat are rejected and circling the numberof the first person who qualifies. This was not made clear at the beginning in CMte d'Ivoire, where at least one of the interviewers always searched for code 02 (usuallythe head's wife), and circled it withoutconsideringother women's IDs. The process is verifiableby the supervisor, who can repeat the procedure with the label stuck to each questionnaire. It can also be checked by the data entry program. The labels for all questionnairesin a survey can be quickly generatedwith a personal computer. A completeprogram is not given here because it needs to be adapted to specificcircumstances well as to the numberof ID codes needed. as The production of a different random permutationfor each sticker is done with the algorithm presented in Figure 4.5. 82 Chapter5: Field Operations Key Messages * LSMS field operations are conducted by independentteams. Each team is headed by a supervisor and composed of interviewers, a data entry operator, and a driver. An anthropometrist may also be included to record height and weight data and help complete the community-level questionnaires. Each team works throughout one year, visiting two PSUs per month. Each interviewer visits eight households in each PSU. Each household questionnaireis completedin two rounds, with visits two weeks apart. Data from the questionnairesare entered into a computerbetweenthe two rounds. If errors are detected, they are corrected immediatelyby actually re-visiting the households. The supervisor controls the quality of the team's field work through a variety of means that includes check-up interviews in some of the households. Central supervision of the field teams is performed by the survey core staff, composed of the survey manager, the field manager, and the data manager. * * * * The field operations for LSMS surveys face two major challenges: to collect high-quality data and to make the information available for analysis quickly. These goals have often been realized, thanks largely to the procedures developed for field work. Since the first two LSMS surveys were conductedin 1985, field operations have evolved in response to changingtechnologiesand to differing conditionsin specific countries. Certain basic features, however, have remained relatively stable. Section A of this chapter describes recommended LSMS field procedures. It is of interest to all readers. SectionB describes how to prepare for the field work. It will be of interest to those involved in planning the field work for a survey; others may skip or skim the section. A. Standard LSMS Organization of Field Work Four-WeekInterview Cycle LSMS field operations are organized on the basis of four-week cycles, which are spread over a 12-monthperiod. In each four-week cycle, the field team completes the interviews for the sample householdsin two of the sample clusters (localities). The data entry operator works on the computer installed in 83 the team base office, while the rest of the team travels betweenthe office and the two locations. The householdquestionnaireis divided into two parts, or 'rounds," which are approximatelyequal in interviewtime. In the first week of the cycle, the first round of interviewingis completedin locality A. During the second week, the first round of interviewingis completedin locality B. The data from the first round of the household interviews in locality A are entered into the computer during the second week. Many common errors can be detectedat this stage, as will be describedin Chapter 6. During the third week, the interviewers return to locality A to carry out the secondround of interviews. They can also correct any errors found in the first round of data. The data entry operator, meanwhile, enters the data from the interviewsin locality B. During the fourth week of the cycle, the interviewersreturn to location B. They completethe second round of interviews and make any correctionsneeded on the first round. The week-byweek activitiesof the team membersare summarizedin Figure 5.1. During the first visit to the household,the interviewerfills in the roster section and tries to schedule interviews with all of the people who should be respondents for the other modules in the questionnaire. For the sections that pertain to individuals, such as health, education, employment,and so forth, the interviewer tries to interview in person each householdmember who is 7 years or older. The adults who are the most responsiblefor young children respondon their behalf. Usually the interviewer will try to go through the roster in all householdsin the first day or two in the locality. On these first visits, some of the mini-interviews may be done if it happens to be convenient to the will respondents. But more often a series of mini-interviews be set up during the remainder of the week. For sectionsof the questionnairethat pertain to part or all of the household,such as householdpurchases, farmingactivities, and housing quality, the person best informed is determined during the round one interview. Then an appointmentis made to interviewthat person in round two. In each mini-interview the interviewer will go through all pertinent modules in the order they appear in the questionnaire. Thus the first week's interviewwith the woman of the house would cover her health, education,labor activities, etc. Then, if she had a child under the age of 7, the interview would proceed with the appropriatemodulesfor health, education,child health, etc. with the mother respondingfor the child. Then, possibly at a different time or on a different day in the first week, the interview with a teenage member of the household would take place. In the second week in the locality, the woman of the household might be interviewedagain for the consumptionmodules. This practice raises data quality in several ways. First, since the person best qualified to answer each part of the questionnaireis interviewed, inaccurate responses are avoided. Second, the wholesurvey, whichcan take three or more 84 Figure 5.1: WeeldyActivitiesof the Field Memberse lik.tyl Supervisor A 1 Introducesthe team to local authorities 2 Selects householdsor locates householdsselected previously 3 Contacts selected householdsand determinesreplacementsif needed 4 Observes one interview per interviewer 5 Verifies that questionnaires are complete and encodes items if needed 6 Re-interviewsrandomlyselected households 7 Collects community-level data 8 Gives completed questionnaires to data entry operator 9 Reviews printoutsof round two from previous locality 2 B 1 to 8 (same as week 1) 9 Reviews printouts of round one, LocalityA 3 A 1 to 8 (same as week 1) 9 Reviews printouts of round one, localityB Team mefmbers _____ Interviewers Anthropometrist Data entry operator Corrects inconsistencies Conduct round one Weighs and and enters round two data in all households measures individualsin from previous locality all households Conduct round one Weighs and Enters round one in all households measures data for localityA individuals in all households Conduct round two Weighs and in all households measures selected Correct round one individuals errors detected by data entry program Conduct round two Weighs and in all households measures selected Correct round one individuals errors detectedby data entry program Enters round one data for localityB 4 B 1 to 8 (same as week 1) 9 Reviews printoutsof round two, localityA Corrects inconsistencies and enters round two data for localityA hours in each household, is broken into a series of more manageable miniinterviewsthat usually run no more than 30 minutes. Respondentfatigue is thus minimized. Third, since the mini-interviewsare scheduled according to the respondents' convenience(within the week in the locality), the refusal rate is also minimized. Fourth, the period between the two interviews bounds the recall period for many of the questions on consumption,which minimizesone kind of recall error. 43. Adapted from Ainsworth and Munoz (1986). 85 The actual length of the interview varies widely from one household to another and betweencountries. The differencebetweencountriesdepends on the length of the questionnaires used. Within countries, interview time varies dependingon the number of people in the householdand the numberof different household activities. For instance, the agriculture module is only given to farmers and the module on household enterprises only applies to the selfemployed;likewise, the health sectioncan take a few secondsor several minutes, depending on whether the respondent has been sick recently. However, given that interviewersgenerally need to visit each household several times, it is more useful to evaluate interview time (and consequentlyinterviewer productivity)in terms of "householdsper week" or "householdsper day" rather in "hours per household." With the standard LSMS setup, interviewers are expected to completeeight half-interviews week - an average of two half-interviews per per day. (The term "half-interview"refers to the splitting of the questionnaireinto two rounds of approximatelyequal length). The LSMS field organizationoffers several powerful advantages. First and perhaps most important, it raises data quality. The concurrent data entry makes it possible to correct mistakes while interviewers are still in the field. Spreading the interviews over a full year also makes it possible to use a small number of field teams. With a small number of teams, training can be centralized. This helps ensure that all field staff receive the same instructions. Each interviewerconductsmany interviewsand thus becomes more adept than in surveys that rely on larger teams. Using a small number of teams also makes their close supervisionby the central office possible. Perhaps more important, it makes managementeasier. It is difficult to imagine that the quality would remain as high with hundredsof teams as with a handful. Second, the concurrentdata entry also makesthe wholedata set available for analysis just days or weeks after the final interview. Thus the goal of the timely availabilityof the data is accomplished. Third, the year-round field work ensures that estimates derived from the whole data set will not be subject to seasonal biases, an analytic/measurement advantage. Fourth, since each team must be equipped with a vehicle and computer, having fewer teams lowers overheadcosts. There are a few disadvantagesto the LSMS field organization. For one thing, the field workers need to be highly competent. Often they commandhigh salariespartly because of their high skill level and partly because of the hardship of continuous travel. For statisticalagencies with permanent interviewers who work on a series of surveysthat have a short field work period, the LSMS teams may seem outside the mainstreamand the wage differentialsmay be difficult to accept. Also, frustration is often expressed that the lead time between the decision to carry out the survey and the availability of data is extended by the long period of field work. AlthoughLSMS surveys are usually able to produce preliminaryresultsafter the first six monthsof field work, the delay is sometimes 86 of real concern. However, given the usual record of long lags between the end of interviewingand the availabilityof data, the total lead time for LSMS surveys remains better than the average for national household surveys of similar complexity, even with the long period of field work. Compositionof Survey Staff The key headquarters staff includes the survey manager, data manager, and field manager. These are the minimumrequirements.In most countriesthis base structure must be strengthenedby appointinga deputy data manager and a deputy field manager, and sometimesit is useful to employ a secretary and a bookkeeper. The need for a bookkeeperwill be greater where financingis being provided by more than one agency or where substantialprocurement will take place. The LSMS core staff should be organized to work together as a team, with the survey manager being the only head during the entire preparation stage instead of having individualsrespond to separate divisions in the data collection agency. This is especially important - and sometimes hard to achieve - in large national statisticalagencies that are organized under the traditionaldepartmental structure with a census department, a household survey department, a data processing department, and so forth. If the LSMS is designed as a permanent effort, rather than a one-shot exercise, the statistical agency may decide on a different managerialstructure once the survey has become a routine activity. In cases where rather than having the core team report to a single head of the LSMS survey, each person reports to the head of department (for example, the data managerto the head of the data processingdepartment,the field manager to the head of the surveys department, etc.) it is exceedinglydifficult to ensure that the numerous details all get done in time to dovetail with each other. And often tasks fall between the cracks altogether. The most recent case of this in LSMS experience is in Tunisia. Although they had planned to use full LSMS field work techniques,in fact at the time of this writing, the survey has been in the field for two months and the computers are still stuck in customs, the data entry program is not finished and data entry operators have not been trained. This probably has multiplecauses, includinga lack of convictionthat concurrent data entry is worthwhile,but clearly the fact that arrangementsfor field work and data managementwere not coordinatedby a single team was critical to the failure to realize the original plan. Each of the LSMS field teams is headed by a supervisor and usually includes two interviewers, a driver with a car, and a data entry operator. This standard setup has been used in Cote d'Ivoire, Peru, Ghana, Mauritania, and Tanzaniaand elsewhere. In some countrieslocal conditionshave dictatedchanges in the compositionof the field teams: 87 If the survey is to collect weight and height data for the household members then a specialized technician - the anthropometrist - may be included in the team." When a team has to work mainlyin large urban areas, a third interviewer may sometimesbe added. This allows for additionalinterviews to take 45 place at a low marginal cost. Cultural constraintsin certain countriesmay require that an adult must be interviewed by someone of the same sex. This was the case in the Pakistan LSMS. Since it would likewise not have been appropriate for a female to travel alone with several men, each team had two female and two male interviewers. The female interviewers also served as anthropometrists. Duties of Survey Team Members SURVEYmANAGER. The survey manager should have decision making authority. This person coordinates the questionnaire's design, maintains communication with the technical assistance suppliers and data users, sets up the activitiesleading up to the survey in liaisonwith the existingstatisticalstructures, and manages the implementationof the survey itself, and ensures that data documentationand disseminationproceduresare put in place. 44. The LSMS'spracticeof usinga specialteam memberfor anthropometry rather than assigning this task to the interviewersdevelopedin part for reasons specificto the first countrieswhere the surveyswere done and that maybe not persuasivein other countries. If anthropometry assigned is to the interviewers, then the logisticsof whether to have one set of equipmentor two, how to avoid carrying it around more than necessary,howto arrangeadequatetraining and the like, need to be worked out but should not be insurmountable. Also, since the anthropometristsoften provide a gooddeal of assistanceto the supervisorsin helpingadministerthe communityand price questionnaires,their eliminationmay haverepercussionsfor the total work load of the supervisors which would need to be considered. 45. The only additional cost is the third person's salary and travel expenses. No extra supervisors,data entry operators,cars, or computersare needed. One obviousdisadvantageof the larger team is that supervisionis somewhatdiluted. The cost/benefittradeoffof increasing the size of the interview team is, however, also affected by the sample designand cluster size. Normallyall team membersneed to work in the same location,where householdswill tend to be similar. Thus the marginal value in the accuracy of the estimates gained from adding an interviewerto an existingteam and locationwill be lower than from using the extra interviewer in a different location. For instance,three teams of two interviewersworking in three clusters would provide more accurate estimates than two teams of three interviewersworking in two clusters. However, in big cities it is possibleto have the three interviewerswork in different clusters and still be within reach of the team supervisor. This requires selectingmore clusters in the first samplingstage and fewer householdsper clusterin the second. Alternately,the number of days spent in each cluster mightbe reduced. 88 DATA mANAGER. The data managerdesigns and develops the data entry program and has input into the data entry aspects of the questionnaire design. This person writes the data entry manuals, selects and trains the data entry operators, prepares the data bases for analysis, and helps prepare tabulationsand graphs for the first statisticalabstract. FIELD mANAGER. The field managerdesigns and supervisesthe sampling procedures and the householdlisting operation, and is responsible for preparing the pilot survey and the field test. This person also designs the field operational procedures and the field manualsand is responsiblefor selectingand training the field staff. When the survey is fielded, this person implements the central supervisionof the teams. This includesreviewingthe various written supervision instruments described below and occasionally conducting the same kind of observationand double-checking interviewsas the field supervisors. of SUPERVisOR. As the primary person responsible for the quality of informationcollected in the field, the supervisoris the most important member of the field team. The supervisor's main responsibilitiesinclude: D OVERALL FIELD SUPERVISION,COORDINATION, AND MONITORINGOF DATA ACTIVITIES.An important part of this task is coordinating COLLECTION the work of the anthropometristand interviewerin each householdand of the male and female interviewerswhen staff of both genders are needed. This is particularlyimportantwhen an exchangeof questionnairesbetween them becomes necessary. In addition, the supervisor may on occasion have to assist the interviewers in locating households and ensuring their willingnessto respond to the survey. If necessary, supervisorswill select replacement householdsin line with criteria determined for the survey as a whole by the central managers. * PUBLIC RELATIONS. The supervisor should establish contact with local authorities in each area visited by the survey, and deliver letters of introduction, specially prepared brochures, and any other materials and informationthat might be necessary to ensure their cooperation. PREPARATIONOF THE QUESTIONNAIRES. The supervisor must copy the * household number and the name of the household head onto each questionnairebefore giving it to the interviewers. * QUESTIONNAIRES. When the task COMPLETING THE COMMUNITY-LEVEL cannot be delegatedto the anthropometrist,the supervisor must complete the community, price, and facility questionnaires. For the community questionnaire, part of the informationis derived by observing the place and recording aspects actually experienced by the teams (e.g. the condition of the roads or the distance to the nearest large city). Other people in the locality, such as data must be obtainedfrom knowledgeable 89 mayors, village elders, police chiefs, and so forth, For this part of the questionnaire, the supervisor has a great deal of discretionover whom to choose as respondent.4" For the part of the communityquestionnaireor the facilityquestionnaires pertaining to local schools and health facilities, the supervisor must identify which schools and clinics to include and then interview the headmaster and clinic administratoror their delegates. For the price questionnaire,the supervisordetermines which markets or shops to visit and fills out the price questionnaire. The supervisor explains to the vendor the purpose of the survey and gathers price information in an interview. The interviewer does not bargain for the 4 goods. 7 Most food and some non-fooditems are weighed as well. The full instructionsfor the price questionnaireused in the Kagera Health and Developmentsurvey are given in Annex IV. * MONITORING, REVIEW, AND EVALUATIONOF THE QUALITY OF FIELD TNTERVIEWS.The supervisor is expected to routinely observe interviews without advance notice to the interviewer. The supervisor should give immediate feedback based on established criteria for evaluating interviewers. The supervisor uses the "Interviewer Evaluation" form provided for this purpose. (Examples of this form and information on how it is designed are provided below). 3 QUALITYCONTROL OF COMPLETED QUESTIONNAIRES. Once data are collected for each round of the survey, the supervisor should check that the interviewer's writing is legible, skip patterns were followed, and the instructions in the questionnaire were followed. The "Questionnaire Verification" form is used to record the information from the quality check. (See below for more details on the design of the form.) 46. In contrast the SDA community survey handbook (Wold, 1995), recommends using group interviews where the group is meant to include representatives of different sub-groups within the community (men and women, poor and better-off, different ethnic groups). Two interviewers are used in such cases, one to lead the discussion and another to record the results. 47. Since it is generally true that outsiders will have to pay higher prices than local residents where bargaining is the norm, the question arises as to whether this procedure for gathering data results in accurate prices. The reader must remember, however, that in the small towns and villages where the procedure is most often used, the news of the survey team's arrival will probably have already spread and been commented on. Thus the social context of the transaction is different than if someone from the capital city arrived and just wanted to buy some food for his own consumption. The SDA community survey handbook (Wold, 1995), nonetheless recommends using one of three alternate procedures: hire local residents to conduct the price survey, interview a community group, or gather price information on the household questionnaires. 90 * CHECK-UP INTERVIEWS. supervisor should also revisit randomly The selected householdsin each location to verify that the interviewershave visited the householdand to cross-checksome of the informationprovided by the household. The results are recorded on the "Check-upInterview" form (see below). * CHECKING DATAENTRYPRINTOUTS. The supervisorshouldcompare THE the printout with the data on the questionnaireand shouldcheck errors in data that were detectedby the data entry program. Either the supervisor or one of the interviewers should revisit the household, if possible, to correct the errors. * MANAGEMENT OF PERSONNEL, EQUIPMENT, AND VEHICLES. The supervisor is responsible for managingthe team's support staff (i.e., the data entry operator and the driver). The supervisorshouldensure that the staff work efficiently to provide efficient and trouble-free data collection and be responsible for the proper handling and care of computer equipmentand vehicles. In certain cases the supervisor may be also be responsible for managing the team's finances, including the monthly payroll of salaries and bonuses. * EXCHANGE OF INFORMATION BETWEEN CENTRAL SURVEY STAFF AND FIELD TEAMS. the mainchannel of communication,the supervisorsees As to it that any advice or instructionsfrom the central managementstaff is relayed to and followed by the field team and that the central staff is regularly informed of the progress of data collection. In order to manage the field work effectively, the supervisor must have a thoroughunderstandingof the tasks required of each member of the team. The supervisor should be able to respond to specificinterviewingproblems that may arise in the field and may on occasion need to perform interviews personally if any of the team's regular interviewersfalls ill or becomesotherwise unavailable. INTERVIEWERS. interviewer's main responsibilitiesinclude: The * ESTABLISHING CONTACTWITH THE HOUSEHOLDS. With the help of the supervisor, the interviewer must first introduce him or herself to each householdand explain the survey's objectivesand methodology simple in terms. The interviewershould explain that the householdwas selectedat random, along with many other households in the country, to help planners understandthe people's living conditions. It shouldthen be made clear that the survey is not concerned in any way with taxes and that all informationwill be kept confidential. * SELECTING INDIVIDUAL RESPONDENTS. The interviewer should complete the family roster, determinewho the membersof the householdare, and 91 agree upon a convenientschedule for interviewingindividualhousehold members. The interviewershouldtry by all meansto intervieweach adult member personally, if possible in private. This may require visiting the householdseveral times during the survey period or going to the farm or 48 place of business of the respondent. * CONDUCTING THE INTERVIEWS. The interviewer should conduct the interview in accordance with good survey practice. For example, they shouldbe polite but exhibita neutral attitudetoward whateveranswersthe respondentgives. They shouldrespectthe wording on verbatimquestions and follow the flow dictated by the skip pattern, without any variation. * PROBING. Probing for responses may be necessary, either because of explicit instructions in the questionnaire (such as when looking for secondaryactivities or establishingthe list of crops grown by a farmer), or in order to help respondentswhen they cannot answer exactly. The latter may be necessary,for instance,to obtain approximateamountsspent on certain budget items or to record approximate birthdates. As said before in the chapter on questionnairedesign, approximate answers are always preferable to "don't knows." For the recording of dates, interviewers are usually provided with a calendar of events (see Figure 5.2). ANTHROPOMETRlIs7. The anthropometrist responsiblefor measuringthe is height and weight of designated individuals. In the most ambitious LSMS surveys, all individualswill have their weight and height measured. In other countries some subset, often children under the age of five or children and their mothers, will be measured. On the first trip to the cluster the anthropometrist tries to weigh and measureeveryonefor whom anthropometricdata are to be gathered. The heights, weights,and ages are input with the rest of the data. The data entry program will flag observationswhose combinationof height, weight, and age is unusual (more 4 than three standard deviationsfrom the establishednorms).9 The program will 48. In marketing research surveys, where usually just one person is selected to answer a few questions in each household, it is generally accepted that at least three attempts to personally interview that person should be made, at different dates and times, before abandoning the household or accepting a proxy response. LSMS interviewers, who generally have to spend two weeks in each locality, should be still more enterprisingin getting their person. 49. The norms used in the data entry programare based on those used by the World Health Organization. 92 Figure5.2: A Calendarof Events A calendar of eventsis a list of importantmilestoneslikelyto be commonlyremembered. Calendarsare typicallydone in two formats. One gives a great deal of detail for the five years precedingthe survey. It is used to place accuratelythe month of birth of young children. Accuratemeasurementof thir ages is required to accuratelyjudge their nutritionalstatus. A hypotheticalexampleof such a calendar is shownhere. Further guidelines constructing using for and such calendarsare given in UNNHSCP(1986b). Less detailedcalendarscoveringlonger time periodscan be used to establishthe age of adults. The calendar used in the KageraHealth and Development Survey is shownin Annex V. January February Armual Events 1986 New Year's School begins March April May Labor Day June Rainy season begins World Soccer Cup JuLy mid-term schooL holiday August September October Harvest begins November Harvest Festival December School hoLiday begins National Indepen- Lent/ dence Day Easter National National Hero's day soccer tourney elections 1987 Earthquakein the north Winter Otympics Summer Olympics Pepe wins featherweight .___ ____ __ _ _crown 1988 1989 ___________ National Census__ _ _ _ ___ _ _ _ _ 1990 Great riots of 1990 Martial Law announced Carmen wins Miss World Curfew Soccer Cup lifted The big flood 1991 Universe _contest_ 1992 _ Winter Olympics 1____W3___ Summer Olympics National Elections 1993 1994 Winter Olympics ___ ___ ___ ___ ___ __ ___ ___ _ _ ___ ___ __ ___ __I Sugar scandal World Soccer IC up I _ _ _ __ _ _ __ _ I __ _ _ also randomly select a portion of individuals in each household (usually 20 percent) to be remeasuredas a check for measurementerror. During the second visit to the cluster, the anthropometristwill weighand measure anyone missedthe first time and will remeasurethe individualsindicated by the data entry program. For individualsmeasured twice, the measurements are compared by the data entry program. Significantdifferencesare flagged so that the anthropometrist and supervisor will be alerted that there may be a problem in the quality of measurements. The anthropometristmay also be assigned to help collect informationfor the price, community, or facilityquestionnaires. DRIVER. The driver not only provides transportationfrom the regional base to the location itself, but will help ferry team members to the various households, farms, and markets they may need to visit. When these are spread out and the supervisor, both interviewers,and the anthropometrist need to visit all two or more places in one day, the driver will be kept very busy. DATA ENTRYOPERATOR. The data entry operator enters the data from each round of the interviews on the week following data collection. This person revises all errors and inconsistencies flaggedby the data entry program, corrects those that are an outcomeof his or her own mistakesor omissions, and produces printoutswith the rest of the errors so that they can be reviewedby the supervisor in time for correctionsto be made in the field. Team logistics A sufficientinitial stock and a steady supplyof survey materialsand inputs should be ensured for all field teams throughout the whole survey period. The list of materials includes obvious items such as questionnaires, pencils, and erasers, less obvious ones such as diskettes, clipboards, and briefcases, and a myriad of country-specificitems ranging from raincoats to camping stoves. The elementsthat have proven hardest to manage in all countriesare fuel, oil, and everything related to car maintenance. Probably the best solutionis to provide a cash revolving fund and make the supervisoraccountablefor it. This may be hard to do in the more bureaucratic statisticalagencies. Complexityof LSMS Field Operations One of the main reasons why countries considering whether to do an LSMS survey decide not to do so, is that the field work seems too intimidating. Reading this chapter might even reinforce the impressionthat the field work is difficult because there are so many details to get right. The intent is, of course, different. All surveys require getting myriads of details right, and at the right 94 time. We hope that by providing guidelinesand examples here we can make it easier to plan for and carry out a survey. The reader should further note that while some aspects of LSMS field work procedures are new, many others are not. The LSMS surveys have, however, been unusually vigorous in implementingquality control procedures. For example, the job description of survey supervisors in all sorts of surveys around the world includes the notion of check-upinterviews. But in practice this is done far too little - the ratio of interviewers to supervisors is too high, the distances involved too great, access to vehicles too limited, and the importance of the task deemed too low. The LSMS surveys, in contrast, have low interviewer to supervisor ratios, supervisorsthat travel with the interviewers, a set standard for the number of interviewersthat must be checked, forms to note results from the checks for feedback to the interviewer, and a mechanism to verify that the check-up interviewsare in fact done. There is no denyingthat LSMS surveysare complexand demanding. But the differencebetween what LSMS surveys require and what should be done to guarantee high quality data from other surveys is not so large. Indeed, often it is less than the differencebetweenwhat shouldbe done for the other surveys and what is done. Thus the reluctance to undertakean LSMS may not be due solely to its inherent complexity,but partly to its adherence to high standards. Alternativesto LSMS StandardField Proceduresand Their Implications Mostcountriesthat have implemented LSMS surveyshave not traditionally used the mobile field team organizationfor field work. Indeed, many countries considering an LSMS are reluctant to undertake such major changes in their procedures. This section therefore discusses the implications of common alternative systems of field work. SHORT, IAfENSE PERIODOF FIELD WORK. In the traditional setup used by many statistical agencies, the interviews are conducted during a shorter period (usually one to three months), rather than spread throughout a year. This requires using a large numberof interviewersfor each survey. In some cases the interviewer staff is permanentlyemployed, and their time allocatedto a series of different surveys during the year. In other cases, the permanent staff is very small and new, temporary interviewersare recruited for each survey. Besides the fact of it being familiar to statisticalagencies, this organizational setup presents certain advantages,especiallyfor conductingsingle-purpose surveys with simple questionnaires. It can reduce the period during which the survey is in the field and providesa massivenumber of completedquestionnaires in a short period of time. Also, in circumstancesof high inflation, concentrating all interviewsin a short period may be the only way of obtainingexpendituredata that are comparableacross households. 95 This systemof data collectionalso has some seriousdisadvantages,which are especiallyworrisome in the case of complex surveys such as the LSMS. First, it is very difficult to provide uniform training of good quality to a large number of field operators. Training in such cases often must be done in batches, which entails the risk of giving different instructions to different interviewers. Sometimes, the same trainers go to different regions to provide training to differentgroups of interviewers. Often this is not practical, so central trainers train other persons who then provide training to local groups of interviewers, which makes uniformity even more difficult to achieve. These problems will be especially acute when temporary interviewers are used. Furthermore, the interviewersnever get very experiencedwith the questionnaire. In the typical LSMS set-up, each interviewer would be responsible for interviewing 320 households. In contrast, in one recent survey with purposes similar to the LSMS', the number of interviewers used was so large that on average each conductedonly twelveinterviews, fewer than an LSMS interviewer would do in the first two weeks in the field. Second, it is also hard to implementeffectiveprocedures for supervising the interviewers. Even when this is done, problems can be detected, but seldom corrected opportunely. Moreover, this approachis often (though not necessarily) associated with high ratios of interviewersto supervisors, say five or ten to one, which compoundsthe difficultiesin providing adequate supervision. Third, this scheme makes it almost impossible to integrate coding, data entry, and data editing with field operations. These are designedas independent, ex-post facto activities, either with classic batch techniques (i.e., straight data entry followedby a series of editing programs) or with programs that check the quality of the data as they are entered. This serious disadvantageof short, intensive surveys is shared by other departures from the standard LSMS setup, and is explored in more detail below. sAMPLE. The so called master sample is the other USE OF A MASTER common method of organizing field work. A master sample is a large number of clusters (usuallyseveral hundred), which are selectedby the statisticalagency at one moment (generallyjust after the census) and for which updated household listings are maintained,in order to select from that pool of households those to be interviewedin each survey. Often, arrangementsare made for a resident in or near each cluster to became the interviewerfor all surveys for the years until the next census. This setup has many appealing features for a country that expects to conduct a program of household surveys over several years, the most obvious being the economiesof scale it provides to the survey samplingcomponent. It is not necessary to select a new set of clusters for each survey, nor to conduct a householdlisting operation each time. Another advantageis that travellingcosts 96 are reduced, because each interviewer only has to move within the relatively small area of his or her own cluster. Finally, for continued surveys, the concept of a master sample lends itself easily to various strategies for gathering panel data. For complex surveys such as the LSMS, however, the master sample presents certain inconveniences. The first very serious one is that it is virtually impossible to give to hundreds of interviewers the kind of intensive, uniform training that is required by the LSMS. Even in the unlikelyevent that gathering them all for one month (and finding enough trainers, etc.) were possible, that wouldentail travel and lodgingcosts large enough to annihilatethe savingsgained from their immobility during the survey done until the next census. Another inconvenience is the difficulty of maintaining effective supervision of the interviewers. The latter can only be done if supervisors are made to travel extensively, which would again erode the benefits of having static interviewers. Third, it can be difficultto arrange for concurrent data entry since the interviews are so dispersed. It is sometimesmentionedas a benefit of mastersamplesthat interviewers will become familiar with the householdsto be interviewed,which may minimize non-response. This feature, however, is more of a disadvantage than an advantagebecause answers given to acquaintancesare essentiallyunreliable.5' OAE-ROUAiNEERVIEWS,wiTHMOBILE DATA ENRY. The Nepal LSMS (which began in June 1995) is faced with significantobstacles in implementing standard LSMS procedures. These are caused by difficult access to most parts of the country. The average time to reach each locality is expected to be about two days (each way), with cases of five days or more being not uncommon. In addition, most travel has to be done by foot, so that interviewers have to carry with them all the necessary equipment and survey materials. An added complicationis that electricityis rarely available, exceptin Kathmanduand a few other cities. Obviously, the standard LSMS setup of two visits to each household,two weeks apart, could not be implementedunder these conditions. Instead, the field teams visit each localityonly once and spend there the necessarytime to complete the questionnairesfor all households. Insteadof staying at a regional office, the data entry operator travels with the rest of the team and enters the data from the questionnairesonto a portable computer while the team is still in the locality. Though the localitiesto be visitedby each team in the year were arranged at random, the teams will not always go back to their regional headquartersin 50. In market research suveys interviewersare indeed instructed never to interview persons they know beforehandif they happen to be selected in the sample. 97 between localities. If two localities that have to be visited serially happen also to be close to each other, the team may proceed directly from one to the next. Using the mobile interviewteams required solving two further problems. The first is ensuringa good level of central supervisionof the field teams. This will require spending greater time and money to get to the remote areas than would be necessary elsewhere. It will also require finding the field team's exact location once the central staff get to approximatelythe right location. Since the field teams do not report to the base office on such a regular schedule as in the standard set up, this may require some extra trekking or detectivework. The second problem was how to operate the data entry computer without access to electricity, sometimesfor weeks at a time. The option selectedwas to provide each team with a set of solar panels, high-performancebatteries, and other electric equipment. In additionto the electric paraphernalia,the teams have to transport the computer itself, the printer and a sufficient supply of paper, diskettes, and so forth. The advantageis that this will allow proven field and data entry procedures to be used with only minor modification. The modificationsthat are being made relate mostly to reducing the amount of paper used in the supervisionand data managementprocess, in order to reduce the weight that has to be carried around. The main disadvantagesof this approach to mobiledata entry are the risk of somethinggoing wrong with such an elaborate setup and the weight the teams will have to carry. As this manual is going to press (in fall 1995), no major problems had been detected. ENHANCING AN ExJsTJNG SURVEY BASED ON LSMS ExPERIENCE. In some countries, the lessons from LSMS experience are being used to enhance an existing survey rather than to start a completelynew survey. In this subsection we make some general observationsabout issues that occur in the enhancement process. Then we describe the enhancements carried out or planned in Indonesia and Bangladesh. An enhancementprogram can involve either one or several aspects of a survey: the questionnaire,the sample, the field work techniques,and/or the data management. Conceptually it is very straightforward to add modules to an existing survey to move it toward the integratedcontent of an LSMS. However, as the total complexity and length of the questionnaire grows and begins to approach that of the full LSMS, it will become increasinglyimportant to ensure that the quality control mechanismsare able to cope with the new requirements. This will probably involveadopting some or all of the LSMSfield work and data managementtechniques. It is also straightforwardto thinkabout enhancingthe field work and data managementtechniquesindependently the questionnaire. Enhancements the of of field work and data management will, however, generally require more management commitment to implement than those that pertain only to the 98 questionnaire. The field work involvesa muchlarger numberof peopleand more fundamental administrative and management systems than the design of the questionnaire. Thus the consensusthat enhancementis necessaryand beneficial will need to be held by a wider group of people. Someof these may initially feel threatenedby the closer quality control or change in job descriptionthat may be implied. Enhancementprograms are often harder to carry out than entirely new surveys. Sometimes the opposition to changing procedures is greater for an establishedproduct than for an "experimental"survey. Enhancementprograms may require significant creativity and managerial oversight to blend the old procedures with the new in an appropriate way. There is also a tendency to provide inadequate financingand supervisionfor enhancementprograms, since they do not lead to a "new' product. In Indonesia, the SUSENASsurvey was to be reformed in 1991 (for a fuller discussion of the reforms, see World Bank, 1993). The SUSENAS questionnaireshave had a core and a rotating moduledesign for manyyears. The core is administeredannuallyand the rotating modules are alternated. The field work is organizedalong a master sampleprinciple, with a very large permanent field staff (on the order of 2000 interviewers). Before the reform, the sample was representativeat the province level, with about 64,000 households in the sample. One element of the reform plan was to refine the way the core and rotating modules fit together. In particular it was desired to include a measure of consumptionin the core rather than as one of the periodic rotating modules. Other indicatorsthat it was desired to measureannually, or that were neededfor the analysis of the informationin the various core modules, were also moved to the core. The second elementof the reform was to produceresults representative at the district level, for which about 200,000 householdswould be interviewed. The increase in sample size was achieved by reallocating the time of the permanent interview force away from other tasks to spend more time on the SUSENAS. Their annual quota of interviewsfor the SUSENASmoved from an average of about 30, to an average of about 100. Fundamentalchanges in the field work, supervision,and data entry techniqueswere not contemplated. Thus it was necessaryto devisea consumptionmodulethat would not radicallyshift the balance between the core and rotating modules in the overall burden of field work. To achieve the goal of collecting more usefulinformationwithouthaving to revamp fieldworkprocedures, a carefullycontrolledexperimentusing alternate formulations of the consumption modules was carried out on samples large enough to be statisticallymeaningful. The statisticalcomparisonsof the resulting 99 measures of consumptionsuggested that a relatively short consumption module could be used to gather informationof acceptablequality. There are several features to note from the Indonesia experience. First, the scientific rigor of the experiments that preceded the reformulation of the questionnaire was unusually high. This reflected a very strong management commitmentto the reform process. Second, because the field work and data managementprocedures were not fundamentallychanged, there were relatively few technicalor politicaldifficultiesin implementingthe reform program, which proceeded rather smoothly. In Bangladesh,the startingpoint in 1995 was the HouseholdExpenditure Survey (HES), a classic household budget survey that had been in existence for almosttwo decades. The questionnairecontent was standardfor a budget survey. A master sample approach was used, with interviewers coming from the approximately400 thana (county)level offices. A detaileddaily diary systemfor recordinghouseholdexpenditureswas followed. Data entry was ex post facto and centralized, with data editingin the traditionalsense. The enhancementplan was as follows: In terms of survey content, a new communitymoduleand a module on education were to be added to the questionnaireused in previousHES rounds. An LSMS-typedata entry program was to be developedand data entry was to be carried out at a regional level (26 offices nationwide). Apart from the LSMStype components,core interviewer training and supervisionprocedures were to remain unchanged. Recognizing the long tradition of household surveys in Bangladesh, a central objectiveof the enhancementprogramwas to emphasizeownershipby the Bureau of Statistics. Accordingly,managementof the surveysand design of the new questionnairesremainedlargely the responsibility the Bureau of Statistics of and local experts, with technical expertise provided as necessary by expatriate consultants. This objectivealso dictatedto an extent the desire to limit changes so as to keep them manageable. At the time of this writing, the enhancedsurvey has only been in the field for a month, so it is too early to judge the overall successof the program. Some early lessons from the program are, however, worth highlighting. The developmentof newquestionnaires proved to be more time consuming than expected, largely becauseof the lack of familiarityof the design team with rigorous questionnaire design aimed at ensuring consistency and ease of interviewing. Somewhatlate in the planning process, it was also realized that a Demographicand Health Survey and a labor force survey were planned for the same year in the same clusters. In order to link the data files, it was therefore decided to interviewthe same householdsfor all three surveys, yieldinga richer data base than would the HES. However, since each survey had separate managements,the capacity for coordinationwas limited and the fine tuning of 100 questionnairesto simplifymatchingat the individuallevel or to avoid unnecessary overlap in content was not possible. The coordinationmechanismused was an identicalsystem of householdidentifiersacross all three surveys. Also, to allow capacity building and to limit the changes being made in field proceduresin this first contact with LSMS-typetechniques,it was also decided to focus only on the HES in terms of the qualitycontrol mechanismsof the field work, data entry, and data cleaning processes. The enhancements the data entry programand survey logisticswere not of easy to achieve. Despitedetailedplans to the contrary, implementation remained a serious problems. Particular difficulties were experiencedwith finalizingthe data entry program, ensuring that the data entry operators were sufficiently trained, and the rest of the field staff sufficientlyaware of the new method of interacting with the data entry process. Within the first month of field work, steps were taken to overcome these difficulties. New regional managerswere appointed, further functions were added to the data entry program to make it more user friendly and self-instructing,and more training was arranged for data entry operators. These remedial measures appear to be working. However, it is clear that the level of managementeffort involved was underestimatedby the Bureau of Statistics, both to organize the new procedures and to fit the existing survey managementprocesses. An important lesson from this experience so far is that the requirements for successful implementationof an enhancement program can be even more onerous than for the successful implementation of a new LSMS survey. Managementmust be interestedand committedto the program; they must also see how it may or may not dovetail into existing procedures. A core team with adequate managementand logistic support must be put in charge of the survey. Consultantsshouldbe used when neededfor specifictasks, but cannotcompletely substitute for the core team. Finally, field staff (interviewers, supervisors, and data entry operators) must be adequatelytrained. They must know not only how to do their specific task, but the quality control principles for the survey as a whole, and how their roles fit into them. the variations from the prototypeLSMS organizationof field work make it difficult to achieve adequate training and supervision or concurrent data entry. We therefore discuss the implicationsof these in turn. A lively illustration of the importance of interviewer training and supervisioncan be obtainedfrom market research. Market research surveys are more abundantthan national surveys, occur in a much shorter time frame (days or weeks instead of months), involve smaller budgets (thousandsvs. millionsof dollars), and are subject to more immediateempirical, even painful, sanctions. THE EFFECTS IMPERFECTTRAININGAND SuPERVISION. Many of OF 101 Years ago in Chile, a multi-nationalcompany commissioneda market study to assist them in decidingthe best package for a product they sold. Due to a restrictive schedule and budget and the sheer incompetence of the survey planners, this study was neither supervisednor did the interviewersreceive any training. The data collected indicateda markedpreference for a 2 kg. package as opposed to the 1 kg. package that was available at the time. Important strategic decisions were made, millions spent, and the product packaging was changed. But this item, a perishable food product, spent months on the shelf in its new packaging without selling. An ex post facto evaluationshowedthat the culprit in this debacle was a lack of supervisionand training. As it turned out, the photo cards accompanying the questionnaire had been applied incorrectly by certain interviewers, thus skewing the collecteddata. The pictures on the photo cards were smaller than the actual size packages,a fact that interviewerswere supposedto indicate to the respondents. Proper training would have stressed this point. Supervisionwould have also detected the mistakes as they were being made and would have given the firm an opportunityto have corrected them immediately. This and many other anecdoteshave been incorporatedinto the annals of marketinglore, demonstratingthat good supervisionand training are an intrinsic part of the terms of reference of serious market research. Since LSMS surveys are so much more complex, the need for excellent interviewer training and supervisionis all the greater. THE EFFECiS OF NOT INTEGRATING DATA MANAGEMENT TO FIELD OPERATIONS. A common featureof the departuresfrom the standardLSMS setup seen so far is that they make it difficultor impossibleto integrate data entry and field operations. As explained before in this chapter, the standard LSMS uses dedicated data entry operators and computers in each field team and organizes data collection in two rounds to allow for the correctionof inconsistenciesby reinterviewingthe households in the field. Under other organizationalschemes, data entry may be forced to become a separate operation, usually performed in a single location(or perhaps in a few centers), after field work is completedand withoutactually re-visiting the households. Even under these less than ideal circumstances, the use of data entry programs that can check the quality of the information while it is entered (sometimes referred to as "intelligent data entry programs") can improve the quality and timelinessof the survey data sets. At the very least, a good program should be able to detect many of the errors produced by the data entry operators themselvesand ensure that the data sets are "formallycorrect"; that is, that there are no alphabeticcodes in numeric fields, no out-of-formatrecords, and so forth. Furthermore, if the survey is conductedover a sufficiently long period, data entry can be organized as an ongoingactivity, conductedin parallel to data 102 collection and with a minimum delay from it. In this case, an intelligentdata entry program may indirectly improve the quality of field work, by providing early warningon the most commoninterviewererrors and allowingfor corrective measures to be taken while the survey is still in the field. The Romania LSMS provides an example. There the survey used over 500 interviewers, each working in a differentclusters throughoutthe year - the master sample strategy. Becauseof the obviousdifficultiesof properly selecting and providing uniform instructionsto such a huge number of people, the first month of the survey (March 1994) was expected to be a field test of field operationsand procedures,and the informationcollectedwas then to be excluded from the survey data sets. However,even performing the necessaryassessments and defining the subsequentcorrectiveactions would have been impossibleif an intelligentdata entry program had not been availableas was fortunatelythe case. Data entry for the Romania LSMS was done in 50 regional offices during the month following data collection and the resulting household files were sent to Bucharestby modem for centralizedtabulationand analysis. The printouts with errors and inconsistencieswere reviewed locally by regional supervisors who, albeit unableto return to the householdsfor corrections,could at least point them out (opportunely the interviewersso that the same mistakeswere not repeated. to Thus, in Romania, decentralized,intelligentdata entry both provided the momentum for quality control and ensured that the same criteria were applied consistentlythroughoutthe country. It is probablynot an exaggerationto say that the data entry program became the de-facto survey supervisorduring the hectic early days of the Romanian LSMS. While perhaps not an ideal system, the Romaniansurvey found a way to preserve many of the principlesof LSMS field work within the very different frameworkof the master sample. However, it should be borne in mind that only a real re-visit to the household of the standard LSMS can ensure that the data sets are not just internally consistent, but also reflect the reality of the field. It should be also pointed out that the quality assurances provided by intelligent data entry are complements, not alternatives, to the other supervision tools (interviewer verification,and check-upinterviews)that are described evaluation, questionnaire in Section B of this chapter. These should be implemented and enforced, regardless of the options adopted for data entry. B. Preparationfor Field Work This section discusses the main features of preparation for field work, other than designing the questionnaireor drawing the sample. Many of the data rely extensivelyon the preparationfor the field work. qualitycontrol mechanisms Survey preparation is thus very important and should be given due time and attention. Often there is the temptationto skimpon preparationin order to move to the field too rapidly. This temptationshouldbe avoided. 103 Personnel The survey's success relies on its staff. In this section we discuss some of the criteria that have proved successfulin recruitmentfor LSMS surveys. * kTESURVEY MANAGER should be a social scientist or statisticianwith a grasp of the goals to be achieved by the survey. Often this person will have a graduate degree in statistics, economics, or demography. At a minimum the survey manager should have a lower university degree. Since the supervisor will be expected to have a permanentdialogue with top levels of the statisticalagency and the sectoral ministries, as well as to liaise with financingagencies, data users, and technical assistants, a certain seniority is desirable. * THEDATA MANAGER be a systemsanalyst or senior programmerwith can prior experience in statisticaldata management. However, as substantial LSMS-specificdata managementskills will have to be transferred to that person in any case, it is usually better (and easier) to look for an economist or statistician with knowledge of computer programming instead. THE FIELD MANAGERis usually a person from the statisticalagency and * should have substantial managerial skills, inside knowledge of the statisticalagency, and experience in conductinghousehold surveysin the country. All membersof the core staff (notjust the data manager)should be familiar enough with personal computers to use word processing software and spreadsheets. *F THE SUPERVISORS should have completed secondary education and - within the possibilities of the local labor markets - some advanced education, preferably in the social sciences or humanities. In several of the LSMS surveys former primary school teachers have proved to be excellentteam supervisors. However,experiencein managingpeopleand resources and the ability to foster teamwork are more important than credentials. * THE INTERVIEWERS shouldalso have a good secondaryeducation,but they do not need to have pursued further studies. In fact, that may be a disadvantage,as graduates tend to be more likely to abandonthe survey halfway through if they are offered a better or more interestingjob. * ITHEANTHROPOMETRISTS should also have obtained a school leaving qualification. However, it is not necessary, as tends to be assumed, for anthropometriststo be nurses or other people with clinical experience. Weighing and measuring children in the field is very different from 104 weighingand measuringthem in a clinic, and extensivetraining is needed whatever the recruits' professionalbackground. Similarly, experience with computers is not an essentialrequisite for the DATA ENTRY OPERATORS, but it is helpful if they have keyboard skills. It is not difficult to learn to enter data and it is not necessary to understand how a program works in order to use it successfully. It is better, in fact, if the data entry operators are interested in the survey rather than in computers; that way, when they note incorrect answers, they are more likely to use the same terms as the questionnaireinsteadof explainingthe error in computerterms. In some countriesit may also be desirable that team members, especially interviewers, be fluent in two or more languages. Moreover, it is best if the interviewers in each team have complementarylanguage skills so that between 1 them they can conduct all (or nearly all) the interviews themselves." In the Peru 1985 survey, for instance, all field staff in the sierras were able to speak either Aymara or Quechua(or both) in additionto Spanish. Unless specific cultural or religious conditionsdictate, there seems to be no a priori reason for preferring male to female interviewers or vice versa. There is some anecdotal evidence, however, that households are less likely to refuse women interviewers. In the 1990 Peru (Lima only) survey, for example, only female interviewerswere used. Prior experiencethere had shownthat it was culturally more acceptable for female respondentsto admit female interviewers to their homes when the men were absent. Moreover, the surveywas carried out during a time of widespread terrorist activity and women were considered less threatening. Even when male interviewers are used, anecdotal reports from several countries suggestthat females may find it easier to establish rapport for the modules on fertility and child mortality. However, a comparisonof results from the CMted'Ivoire LSMS (which used only male interviewers) and the Ivorian FertilitySurvey (whichusedonly femaleinterviewers)found basicallythe samelevels of fertility;the interviewer's gender was not shownto have any effect on the reported number of children ever born. (Answers might have been different for questionsabout sexual behavior or contraception). Hiring good field workers is tricky. Obviouslyall team membersmust be diligent, organized, and responsible. They should be enthusiastic about the survey and good at establishing rapport with the community members to be 51. If interpreters are needed, they are usually other members of the household or community members acquainted with the respondent. Though both the presence of a third party and the fact that the interpreter is acquainted with the respondent violate some of the basic rules of interviewing, this may be the only possible option for some interviews. Interviewers should be aware - and be trainedto avoid - a still more serious problem posed by a local interpreter, which is a tendency to answer for the respondent. 105 interviewed. Becauseit is often difficult to assess these characteristicsin a short job interview, LSMS surveyshire more field staff than will be required, usually 15 to 25 percent more. All the hires are trained. The rules are made clear to the prospectiveinterviewers from the very beginning: they will have to work hard, including Saturdays and Sundays, in rain and snow, and with unusual working hours. Anthropometristsmust be willing to travel with heavy anthropometric equipment. During the training period the candidates' work characteristicsand ability to establish rapport with intervieweesis displayed and can be assessed 5 more accurately.2 Then, after the training period, a final selectionis made. Regular supervision that includes practical suggestions to practical problems can be helpful in maintainingmorale and professional standards. As discussedin the sectionon the supervisor's duties, supervisionabounds in LSMS surveys. Training As in all surveys, good training contributes greatly to the quality of the data collectioneffort. There are severaltypes of training used for LSMS surveys. * THE SURVEYMANAGER AND FIELD MANAGER are assumed to be about surveys in general. Thus, the professionalsalready knowledgeable only training they need is in the peculiarities of the LSMS survey. This training occurs on the job as they prepare the surveyin collaborationwith people familiar with the LSMS in other countries. * For the DATAMANAGER, training needs are usually more specific but the are also accomplishedon the job. About two to four weeks of close collaborationwith people who have developedand applied integrateddata managementtechniquesfor other complex surveys are usually required. Training is both theoretical and practical. The conceptual framework includes criteria for survey data consistency,error levels (range checks and consistency checks), design of a dictionary of variables, file management for data entry and analysis, and questionnaire design techniques for effective data management. The practical part of the training consists of translating the questionnaire structure into a set of linked data entry screens, graphicallydesigning several of these screens, and defining the most important range and consistencychecks. Thus during the training, part or all of the data entry program is prepared. SUPERVISORS. Some supervisors will be trained on the job, as they will take part in the project from the early stages of field testing and will * 52. Training more recruits than needed also provides a pool of candidates from which to draw replacements if any member of the team must be replaced because of illness, poor performance, or resignation. 106 activelyparticipatein preparationsfor the survey. However, some aspects of the job should be presented formally, through training sessionsand in the supervisor manual. These include: LSMSobjectives, sampledesign, contents and designof the survey, structure of the interviews, community questionnaires, structure of the managementteam, structure of the field teams, qualitycontrolcriteria, coding,and householdreplacementcriteria. One or two weeks shouldbe allocated for this training. * INTERVIEWERS AND DATA ENTRY OPERATORS. LSMS surveys normally allow four weeks to train interviewers and data entry operators. A general outline of these courses is shown in Figure 5.3 and detail for the first day of training is discussed in Box 5.1. The training period for LSMS surveys is much longer than for other surveys (which tend to average less than one week) for two reasons. First, LSMS surveyshave made an unusual effort to reduce non-samplingerrors and the training of all personnel is key to this process. Second, the LSMS questionnairesare far more complex than those of most other surveys, so more training is required to achieve a given level of understanding. The training must cover the basic structure of how to understandand use the questionnaire, but it must go much further. In order to probe effectively, the interviewers must thoroughly understand the economic concepts being Figure 5.3: Interviewerand Data Entry OperatorTrainingProgram Week Data entry operators Interviewers Introduction the survey. to Introduction personal to computers and printers. Unpacking the computer. DisketteManagement. 2 The dataentryprogram. Presentation round 1 dataentry of screens. Practice round 1 (trainees of enter questionnaires completed by interviewer traineesthe previous week). Presentation round2 dataentry of screens. Practice round2 data of entryscreens(candidates enter questionnaires completed by interviewer traineesthe previous week). Inter-record checks. 3 Introduction the survey. to Generalsurveyprocedures.The questionnaire. Definition a of household.Theoryof round 1 sections. Fieldpracticeof round 1. Trainees mustconductat least two observed interviews (one urbanandone rural). Interpretation the dataentry of programerror reports. Theory of round2 sections. Fieldpracticeof round2 (trainees re-visit households visitedon the secondweek). 4 107 Box 5.1: Day 2 of a 7ypical InterviewerTrainingSession The first day of training is usually taken up with introductions. There may be a formal opening ceremony with big shots and benedictions, after which the trainers, core survey staff, and interviewers are introduced. Finally, there may be an overview of the purposes of the survey, the role of the interviewer, and the structure of the questionnaire. Detailed coverage of each module wil usualy begin on the second day. The roster is usually covered first. Though it is only one page long, and takes very little interview time, it usually receives a day or more of training time. The definition of a household is essential to the success of the survey but may be easily misunderstood, so it must be covered thoroughly. In addition, many of the features of the questionnaire and of good interview technique are introduced at this time. Each trainer has an individual approach, although obviously certain specific information always needs to be covered. An example of one trainer's technique can be instructive. In this case the trainer begins by giving the definition of the household. On the chalkboard or flip chart, the trainer sketches a simple household with stick figures and balloons descnbing each member's name, age, relationship to other members, and so forth. An overhead projector displays a copy of the roster page for a household; this is printed on a transparent film so that it may be filed in with markers as the session proceeds. The trainer demonstrates filling in the questionnaire, explaining in the process how to read skip codes, showing that instructions to the interviewer are in capital letters and not to be read aloud, and so forth. The proper way to code answers is explained, as is the need for legible handwriting. The meaning of each question is explained, along with any factors that are important in achieving a correct answer. For example, the age variable is meant to record the number of complete years that a person has lived, so that someone who is 35 and 9 months old would have a recorded age of 35. After the trainer has filed in the roster for the first person or two in the sample household, or even for the whole first household, the trainer then has interviewers take turrs coming to the front of the room to fill in a line in further examples. Eventualy, instead of having the examples sketched on the board, the instructor begins to play the role of various survey respondents so that the interviewers have to extract the information for the examples. To keep this section lively, the trainer may use a few props, such as hats, pieces of clothing, or objects, to help the interviewers imagine they are speaking with people of different genders, ages, and ethnic or economic backgrounds. As the interviewers catch on to the basic concepts, more complicated cases are introduced - heads of household living away from home, children in boarding schools, domestic servants, boarders, guests, and so forth. As a change of pace from the examples, a more formal presentation is made on how to use the calendar of events to place ages. Calendar use is then included into the examples. By the end of the afternoon, the class breaks into pairs and the interviewers request information and fill out rosters for each other. The results of this trial run are discussed at the beginning of the third day. Going into this kind of detail for every section of the questionnaire and providing interviewers the opportunity to practice administering each section and to receive feedback requires a great deal of time. This explains why the whole training takes four weeks. There are several other things to note about training. First, it requires considerable preparation on the the instructor, who must have concocted numerous examples that illustrate al possible complications. part of The instructor should have thought out in what order to present material and have made up checklists that can be used to verify that all the concepts implicit in the presentation, such as skip patterns and probing, have been introduced. Second, the training is very interactive, with interviewers going to the board, interviewing the trainer in front of the others, carrying out practice interviews with each other, and doing other exercises. This is only possible if the group is kept smal - to 20 or 30 people. Third, the training wil be enriched by use of as much audiovisual equipment as is practical in the setting. This wil require foresight in gathering the materials required and doing a technical rehearsal far enough in advance of the training to solve any problems that result. 108 measured, especially in the labor activities, household enterprise, agriculture and consumptionmodules. * ANTHROPOMETRISTS. Anthropometrists shouldbe trained at the sametime as interviewers and data entry operators. It is tempting to assume that anthropometric training can be done in a few hours or days, because measuringand weighingpeople seems to be suchan easy thing. It is not. Anthropometric training requires about two weeks (see UNNHSCP, 1986a)and it is best done at the same time as interviewerand data entry operator training. It does not need to be coordinatedso closely with the training of the rest of the staff, but it can benefit from sharing some commonsessionson the survey's general objectivesand methodology. Of course, additional,specific training will be required if the anthropometrist are to assist in completing the community, price, or facility questionnaires. Noticethat the trainingprograms for data entry operatorsand interviewers are coordinated. When fine tuning the training program for a specific LSMS, a common session for both interviewersand operators may be useful. This reflects the importance of doing the actual required tasks as part of the training and the fact that the work of all staff will be coordinatedonce the survey is fielded. In order to ensure that uniformn criteria and instructions are conveyed, most LSMS surveys strive for centralizedtraining. This is also a reason to keep the number of teams small. In CMte d'Ivoire, Peru, Ghana, and Mauritania, for instance, all the interviewerscould sit in one room. This becomes difficult with more than 10 teams, as in Pakistan or Viet Nam, and parallel training sessions are needed. This entails close coordination and monitoring of the different lecturers. In more extreme situations such as the Romania survey, which employs over 500 interviewers,decentralizedtraining is the only possibleoption. In these cases training must be done in two steps: A group of trainers is trained first, so that they can later train others in differentlocations. All these factors should be carefullyconsidered when planninga course of training because of the need for suitable rooms, audiovisualequipment, and so forth. Other logistical arrangementsfor training include lodgingand transportationfor the trainees who come from outside. The training plan shouldemphasizepractice interviewswith households. Indeed, in the plan shown in Figure 5.3, half of the time is taken up actually in the field. This is the only way to discover whether interviewers are really understandingwhat they are meant to learn. Not even practice interviewswith each other will be as useful. Moreover, the interviewers will at first be rather shy with householdsand need time to get over that before the survey starts. Parts of the practice interviews should be observed by trainers, their assistants, or supervisors, to help detect where the interviewersare having problems. 109 When planning the training, it is therefore important to select two localities, one urban and one rural, that are close to the training quarters. The survey planners and team supervisors should visit those places well in advance and select (not necessarilyat random) householdswilling to cooperate with the survey in both places so that team members can interview them during their training. Dependingon the number of field staff to be trained, the experience and skills of the core staff, and language constraints, the training may be conducted by the core staff, internationaltechnicalassistants, or a combinationof the two. In C6te d'Ivoire and Ghana, training was done primarily by consultantsand in Peru primarily by the local core staff, with consultantsacting as advisors on the side. In Pakistan, where fifteen teams of six people (mostly non-English speakers) had to be trained, a small team of local consultantswas trained in the afternoon by foreign experts in English; the next morning, each of these consultants in turn trained a group of field staff in Urdu. The data entry program shouldbe close to its final form at the momentof training, though fine-tuningand debuggingare almost always necessary during the training period because the data from the actual questionnairescompletedby the traineesreveals situationsthat were not foreseenduringprogram development. The importanceof interviewertraining can hardly be overstressed. In one recent LSMS survey, one interview team decided that collecting data on wages in kind and consumptionin kind was double counting, so it stopped collecting either! This team apparently understood neither the role of the questions in analysis (that analystswanted to be able to measureboth the total value of income and the total value of consumptionand knew how to avoid double counting)and 53 the role of the interviewers(to administerthe questionnairesas designed). Manuals The main written materials used for training supervisors, interviewers, anthropometrists,and data entry operators are the questionnairesand the field manuals. Manuals are usually reproduced by photocopying.' It is recommendedto reproduce manymore manualsthan neededfor training, at least a few hundred copies of each, because apart from their obvious use as a support for field operations the manualsare also valuable tools for the survey analysts. 53. Both the data entry program and good supervision from the central office should help to detect such mistakes early in the survey and correct them before data collection gets very far. 54. Because of print shop delays, it often happens that household questionnaires cannot be printed in time for the training. In those cases a few questionnaires may have to be made by photocopying as well. 110 The basic contents of each kind of manual is described in the following paragraphs. An idea of their detail and clarity is provided in Annex IV, which reproduces a section from the interviewer manual for the Kagera Health and DevelopmentSurvey. * SUPERVISOR MANUAL. This manual should start by making explicit the objectives, methodology,and organizationof the survey. It should then specify the supervisor's responsibilities, duties, and the way the supervisorshouldbe connectedto the survey's core managementteam and to the statisticalagency's regular organization. Another chapter of the manual shouldbe devoted to the proceduresto be carried out in each cluster, including completing the community, price, and facilitiesquestionnaires the public relationstasks neededto ensure and cooperation from the local authoritiesand the selected households. The difficulties in locating the selected households and ways to deal with refusals and other forms of non-response (as well as selection and documentationof replacements)should also be made clear. Sectionsin the supervisorymanual shouldaddress the relationshipof the supervisor and the interviewers, including procedures for preparing questionnaires for both survey rounds, and the use of the supervision forms for interviewerevaluation,questionnaireverification,and check-up interviews. The latter shouldincludedetailed instructionson dealing with problems that might be found. If the surveycollectsanthropometricdata, the manual should also indicate how the supervisor is to manage the anthropometrist'swork and its relation to the interviewers' work. The manual shouldalso specify proceduresfor coding the open questions in the questionnaires, including the complete code lists to be used for occupations, activities, and geographiclocations. An important part of the supervisor manual should be devoted to data entry. It shouldexplain how and when the questionnairesare to be given to the data entry operator and how to interpret the data entry printouts along with the rest of the inconsistenciessignaled by the operator on the questionnaires. The manual should also explain how the data entry diskettes are to be sent to the survey core managementteam. * MANUAL. The fundamentalobjectives of the interviewer INTERVIEWER manualare to provideconceptsand definitions,to define field procedures, and to ensure uniformcriteria in the few parts of the questionnairethat are not self-explanatory. The manual should include general sections on the survey's objectivesand methodology,the attitudesand behavior expected from the interviewer, the relationship between the interviewer and the supervisor, the structure of the questionnaire,the conventionsused in the 111 questionnaire's design and interpretation of the data entry program outputs, and specific sectionson each moduleof the questionnaire. Some of the documents used in other LSMS surveys are available from the LSMS divisionof the World Bank and can be used as guidelines. In many other surveys the interviewer manual contains a list of all questions in the questionnaire, with detailed instructionson how to ask and record the answers to every single one (for instance, 'Question 4 (Gender). Record the gender of the respondent, using code '1' for male and '2' for female", and so forth). Such an exhaustive approach would have been both tedious and uselessfor LSMSsurveys, given the length of the questionnairesand the fact that they are pre-coded and have explicit skip patterns. Instead, the LSMS manuals should focus on clarifying economic conceptssuch as the differencebetweenwage earning and selfemployment,the treatment of sharecropping,etc. * This manual should explain in great detail the role of the operator in the field operational setup and how outputs of the program (e.g., on-line messages and printouts) are to be transferred from the operator to the team supervisor. Contrary to what might be expected, this manual needs to make very little reference to the computer or the data entry program. The use of the latter should be intuitive enough not to need further explanation. DATA ENTRY OPERATORMANUAL. * ANTHROPOMETRST MANuAL. This manual is not country-specificand generally can be based on existing material (such as United Nations, 1986). If the anthropometristis made responsible for completing the community,price, and facilitiesquestionnaires,a separate manual should be prepared for these tasks. DevelopingSupervisionForms Three of the tasks of the team supervisorsshouldbe supportedby written forms. These are (1) interviewer evaluation, documents, known as supervision (2) questionnaire verification, and (3) check-up interviews. The forms are intended to give these tasks formal definition, as opposed to loosely defined responsibilitiesleft to the supervisor'spersonal initiative, and to make it possible to supervisethe supervisorsthemselves(e.g, make supervisiontasks verifiableby the surveycore staff). Guidelinesfor the designof these forms are given below, with examplestaken from the Pakistan IntegratedHouseholdSurvey. EVALUATION.The purpose of interviewer evaluation is to ImiERVIEWER monitor the performanceand attitudesof the interviewers. At least once a week (more often for weak interviewers), the supervisor should sit in on interviews conducted by each of the interviewers in order to observe that they are administering the questionnairecorrectly. 112 The supervisorwitnessesan interviewerstrictly as an observer and should not talk to the intervieweror the respondent. The interviewershouldbe informed that he or she is not allowed to ask for advice during the interview and that the interviewer must behave as though the supervisor were not present. The interviewer evaluationform allows the supervisorto make notes on any questions or concepts that the interviewer may have difficultyasking or understanding. It should be filled in on the spot before the details of the intervieware forgotten. The main points to consider when designing the interviewer evaluation form are well illustratedin the form used in the Pakistan LSMS, shown in Figure 5.4. Interviewer evaluations also offer the chance to spot weaknessesin the questionnaireand suggestimprovementsfor future versions. The form mightalso contain space for making note of problems or difficulties in the interviewing process, particularly with respect to inappropriatelyworded questions, concepts that are unclear to the respondent,or questionsthat are not answeredbecause they are too personal or too sensitive. QUEs1ONNAIRE VERIFICATION.The purpose of this operationis to ensure that the questionnaireis completelyfilled out; that is, that everyone who should have been interviewed has replied and that every section is complete. Verificationshould be done the day after the questionnaireis completed, before the supervisorleaves the area and before the questionnairesare given to the data entry operator. A Questionnaire verification form should be designed to assist the supervisor in this task. It should be filled out for all questionnairesafter each round of the survey. If problems are found in the questionnaire, it should be returned to the interviewerwith instructionsto correct them immediatelybefore leaving the area. The supervisor must keep the verification forms for each questionnaireuntil the end of the second round. After data for the second round have been entered, the forms will be kept at the field office with the questionnaires. Questionnaireverification is not supposed to replace exhaustive quality controls to be performed later by the data entry program, but rather serves as an early warning of major omissions that could be amended by sending the interviewerback to the householdbefore the team leaves the area. 113 Figure5.4: Inerviewer EvaluationForm INTERVIEWER: E V A L U A T I ON A. Comportment the Interview of 1. Did the interviewer greeteveryone before beginning the interview? 2. Did the interviewer introduce himself herself or and explain thathe or she is working for the Federal Bureau Statistics? of of explain the objectives the survey 3. Did the Interviewer and that the Interview how was properly, the household chosen, confidential? wouldbe completely politeand patient withthe respondents 4. Was the interviewer ? during the interview at thankeveryone the end ? 5. Did the interviewer of B. Interview Respondents as 1. Did the interviewer the questions theyappearin the ask 7 questionnaire person the 2. Did the interviewer to Interview appropriate try in each section the questionnaire? of 3. Did the interviewer acceptII don'tknow"as an answer ? probing without C. Time Spent,on the Interview of avoidlongdiscussion the question 1. Did the interviewer withthe respondents while stillbeingpatient and polite? 2. If the interviewer received irrelevant compilcated or 7 answers, did he or she breakin too suddenly 3. Did the interviewer rush through the interview, thereby ? questions quickly respondents answer to encouraging D. lmpartiality 1. Did the interviewer maintain neutral a attitude toward thequestions answers and dwringthe interview 7 2. Did the interviewer volunteer opinion an ? 3. Did the interviewer appear surprised shocked or or disapproving aboutany of the answers ? 4. Did the interviewer suggest answers whenaskingthe questions? C R IT E R I A IiI RATING Satisfactory Unsatisfactory SUPERVISOR ___ _ DATE 114 Figure 5.5 shows the first page of the questionnaire verificationform used for the Pakistan LSMS. The full form has four pages and is shown in AnnexVI. Typical items to be considered in questionnaireverificationare: * MANDATORY SECTIONS. Some sections, such as housing and the inventoryof durable goods, shouldbe present in all questionnaires. Other sections, like farming, shouldalmostalwaysbe present in certain locations but not in others. * COMPLETENESS OF INDIVIDUAL SECTIONS. Depending on age, sex, or some other characteristic,certain sectionsof the questionnairemay or may not have to be completed. For instance, all women 15 to 49 years old, but no men, should answer the fertility section. COMPLETENESS OF LISTS.If the exhaustiveapproachis used to scan item lists in certain sections of the questionnaire (see Chapter 3), then all Yes/No questions should be completed, and a series of answers should follow each item marked "Yes." 3 FILTER QUESTIONS ANDOTHERMAJORSKIPS. Some questionnaire sections are headedby "filter" questions, whichindicateif the sectionis applicable or not to a particular household. The questionnaireshould be consistent about the structure and use of skip patterns. CHECK-UP INIERVlEWS. The purpose of the check-up interview is to confirm that the interviewer is indeed interviewingcompletely and accurately. The check-upinterviewsconvey the importanceof accuracy and completenessto the interviewer. This reinforcementis important in maintaininghigh standards, even among diligent interviewers. The check-upinterview may also reveal any unsatisfactoryinterviewersso that correctiveaction may be taken. These random revisits tend to be ignoredor neglectedby official statisticalagencies throughout the world. However, they are the best way to ensure effectiveinterviewsand are a standardprocedure in all serious marketingresearch surveys. It is generally considered acceptableto conductcheck-upinterviewsin 15 to 25 percent of the households. The check-upinterviewsshouldnot take longer than 15 minutes. It shouldbe kept in mind that a difference in a response from the re-interview and a response from the original interviewdoes not necessarily mean that the interviewer is not doing a carefuljob. Respondentsmay provide different informationat differenttimes and sometimesthe respondentscontacted by the interviewerand the supervisormay not be the same. However, numerous differencesindicate the need to follow up with the interviewerregardingpossible causes. 115 Figure S.S: Page One of Pakistan QuestionnaireVerification Form PROVINCE SUB-UNIVERSE STRATUM PRIMARY SAMPLINGUNIT RE SU L T HOUSEHOLD Sec- Ques tion tion 1A 2-5 Round One Check These questionsmust be compLeted for all names in Q.1 Satis- To be factory redone Notes / Remarks IA 9 All persons were correctlycLassifiedas members of the household. A cross was written in columnA for aLl members of the household (code 1 in Q.9) and the age in years was copied from 0.5 to coLumn B. This section was compLeted. IA A-B 2 3A A Line is fitted in for each householdmember 5 years or oLder. 4A 4B 4C A line is filled in for each household child 5 years or under. A line is fitled in for all householdmembers. A tine is filled in for each householdmember 10 years or older. A line is filled in for each househoLdmember 10 years or older. 1 If the answer is 1 (YES), a line is filled in for each householdmember 10 years and older. The ID code of the best-informedperson is to be transferredto the second page (Summary of survey results). If the answer is 1 (YES), 0.1-5 for First,Second or Third EnterpriseshouLd be filled in. Industrycodes of aLL enterprises must be filled in, and ID codes of best-informedpersons must be transferredto the second page (Summaryof survey results). A line is filled in for each female member 10 years or older. This section is completed. 5A 5B 6A 6B 1 6C M7A M7B 1 If the answer is 1 (YES), then 0.2-43 must be filled up. M7C 1 If the answer is 1 (YES), then 0.2-16 must be fiLLed up. M7D 1 If the answer is I (YES), then Q.2-12 must be filled up. M7E 1 If the answer is 1 (YES), then 0.2-28 must be filled up. 116 In LSMS surveys, supervisors fill out a check-up interview form to document the results of the re-interview. This ensures that the double-checking is thorough and impartial. It also allows the headquarters staff to effectively supervisethe supervisors. Figure 5.6 shows the check-upinterview form initially used for the Pakistan LSMS. The most important things to control for in the check-up interview form are the questions for which certain answers may represent substantialdifferencesin interviewtime later. For instance, the simple omission of a person from the roster means that this person does not need to be looked for and interviewedlater. "Roundingup" a woman's age a little bit may make her ineligibleto answer the fertility section, and so forth. Other typical omissionsare not probing hard enough for secondaryactivities(especiallywhen the person is self-employed) not probing for the completelist of crops grown. or More subtle omissions are not considering an illness serious enough to be reported in the health section, nor the purchase of small amounts worth the inclusion of certain items in the list of expenditures. Apart from those, the check-upinterview form may includecertain observationalrecords (like some the building materials in the housing section) and other questions that are deemed unlikelyto change between the interviewand the check-up interview. Interviewers should be made aware that some check-up interviews will take place, though of course they should never know in advance the households where these will be conducted. In marketingresearch surveys, whichare always brief field operations, it is also considered that the contents of the check-up interview(that is, the questionsthat will be re-asked) shouldbe kept secret from the interviewers. In LSMS surveys this is impossible, given that they are conducted over a much longer period of time, but it is possible to modify parts of the check-upinterview a few times during the period of field operations. The check-up interview forms are also instrumentaltools for centrally supervisingthe field teams. Occasionally,the survey field manager shouldjoin each team, select a few of these forms, and take them back to the same households for another check. There do not need to be very many of these 55 double checks, but they shouldbe random and unexpected. 55. This should be done even though it often causes the interviewees considerable irritation. In fact, this reaction confirms the original interview and the supervisor's visit. As far as the super-supervisor is concerned, the ideal response to a knock on the target household's door is for the household member to snap, 'Oh no, not the people from the household survey again! This is the third time you've been around pestering us!" At this point the super-supervisor should apologize profusely and beat a hasty retreat. 117 Figure5.6: Oheck-up InterviewForm PROVINCE SUB-UNIVERSE STRATUM PRIMARY SAMPLING UNIT R ES U L T HOUSEHOLD SECTION 2 Q U E S T I O NS a) What type of dwelling reside in? unit does the househoLd SATIS- UNSATISFACTORY FACTORY C OM M E N T S b) Does the household rent or own the unit? 3 a) Which members of the householdhave attended school? How much schoolinghave they completed? a) Has anyone in the household been ill recently? a) Is any member of the householdan agriculture laborer? Are they permanent workers, seasonaL workers, or casual workers? b) Was any member an enpLoyee outside the agriculturesector? What were their occupations? Which industries were they employed in? 6 a) Does any member of the householdwork on his/her own account or operate a business? Which member(s)? What type of work do they 4 5 do? 7 a) What do you cook your meals on (i.e. open fire, stove, etc.)? b) How do you heat you dwelling during cold months? 9 a) How much total land is owned by your household? How much land is owned close to the village? How much Land is owned far away from the village? b) Which crops did you grow during the last completed rabi and kharif seasons? (probe) If wheat or rice was grown, how many acres of each did you harvest? C) What kinds of agriculturemachinerydo you own? 12 a) What kinds of foodstuffshas your household purchased during the past two weeks? Were some of these purchases on udhar or credit? a) How many children has your wife had? many boys? How many girls? How 13 15 a) Do you currentlyhave any loans outstanding? Who did you borrow from? SUPERVISOR:_ DATE 118 SchedulingField Work As explained in the chapter on sampling,the task assignmentof each team shouldbe done concurrentlywith the first stage of sampleselection. The clusters are distributedamong the field teams and the order in which each team will visit the clusters assigned to it is decided at random. The schedule of each team should then be made explicit, to indicate what each team will be supposedto be doing each week of the survey year. With the standard LSMS setup explained in Section A, the 20 clusters assignedto each team will have to be grouped into 10 'pairs. " As four weeks are needed to visit each pair, each team will devote 40 weeks to field work during the survey year. The remaining 12 weeks of the year should be scheduledfor things such as: REST. The schedule should consider several rest periods because field work is very intensive and the staff is not supposed to have much free time during the 40 weeks of work. The field staff are either working in a cluster or travellingin between the clusters and the team base station. Weekendsare rarely devoted to rest because in most places these are the best days for finding respondentsat home. CATCWNG Bad roads, material breakdowns, natural disasters, and up. various other situationsmay make it difficult for some teams to keep their work deadlines. It is necessary to allow some slack time in the calendar for catching up with these contingencies. PROJECT EVALUATION. After the first month of field operations - and perhaps also at other key points of the calendar - it is advisable to bring the teams back to the central survey headquartersto discuss and solve the problems found so far. RETRAINNG. If the survey is to be conductedfor more than one year, it will be necessary to bring the teams back to the central office at the end of the first year for training in the new procedures for the second year. New materialcan include changesin the questionnaires,proceduresfor revisitingcertain householdsif the secondyear containsa panel component, and so forth. 119 Figure 5.7 shows an idealized scheduleof field work for the first year of 56 an LSMS survey with 100 PSUs, numberedrandomly from 001 to 100. These are assigned to five teams, sorted by PSU within each team and grouped in pairs. For instance, Team 1 will visit the PSUs numbers009, 011, 013, 015, etc. The teams first go to the field for four weeks to interviewone pair of PSUs each (for Team 1, this is composedof PSUs 009 and 011). They then come back to the central office to evaluate the experienceduring weeks 5 and 6. Over the next 10 weeks (weeks 7 to 16), each team will interview two more pairs; the last two weeks are devoted to rest or, if necessary, to catch up on contingencies. This is repeated three more times. At the end of the year, after each team interviewsits last pair of PSUs, everybodycomesback to the central offices to be trainedin the second year's procedures. Figure5. 7: Schedulefor Field Work Weeks I Team 1 1- 4 5- 6 Team 2 Team 3 Team 4 Team 5 002,010 009,011 001,019 003,004 006,012 EvaLuation first of month 7-10 013,015 032,045 005,007 020,021 016,022 11-14 017,027 047,048 008,0141 026,029 024,025 15-16 Catch-up and rest I 17-20 028,031 049,0501 018,0231 035,041 034,037 21-24 036,039 055,056 030,033 044,052 038,040 25-26 Catch-up rest and 27-30 057,060 058,063 043,046 064,066 042,051 31-34 062,070 065,074 053,059 069,073 054,061 35-38 075,079 080,081 067,071 076,082 068,077 39 40 Catch-up and rest Inportant national holiday 41-44 083,092 085,089 072,0841 087,091 078,088 1 1 45-46 Catch-up rest and 47-50 096,099 093,1001 086,0971 095,098 090,094 51-52 Rest (and training second for year) 56. The numbers shown in Figure 5.6 represent the order in which the actual PSUs were drawn, not the geographic codes for the PSUs. The numbers seem to indicate that time and space will be correlatedin the field, but it is not true. 120 Putting catch-up and rest periods together tends to reduce the time taken for minor contingencies. With the incentive of being able to take the full two weeks as leave rather than workingto catch up, the field staff exhibit considerable diligence in overcomingminor contingenciesand sticking to the schedule. This basic schedulecan be elaborated by staggeringthe catch-upand rest periods for the different teams. This helps to avoid any bias that might be caused by seasonalfactors. Since the work is already well spread out through the year, this has seldom been done in practice. Designing the actual schedule is very country-specific,as it is usually developed around national holidaysand other significantdates, with the goal of either excludingor includingthem in the work period. In Muslim countries, for instance, the Ramadanmonthis particularlyinterestingto observe becauseof the differencesin the householdconsumptionpatterns; Ramadan, however, is not a good month to train interviewersor to initiate the survey. For some important holidays,especiallythose that last only a few days or a week, it may be unreasonableto expect field crews to work or respondentsto be willing to be interviewed. Christmasin the United States is an example of such a holiday. In these cases, the scheduleshouldbe planned so that the holiday week comes between the four-week, two-PSU cycles. That will maintain the interval between interviews, which is important when there are recall periods boundedby the first interview. Such an adjustmentis shown in Figure 5.7 for the holiday in week 40. Ensuring Collaborationby Households The most important way to ensure collaborationby the householdsis to use polite, diligent, well trained interviewers and to have them make several return visits to ensure that the householdis contacted and a time convenientfor the interview arranged. Some additionalmeasures may be needed. There are no fixed prescriptions that will work everywhere, but experience from various countries should be observed and evaluated. USE OF MASS MEDIA. In general, the use of mass media is a waste of moneybecause the mass mediareach manypeople whom the survey does not. However, if mass media coverage can be obtained free it can be useful. Even if it is limited to a short newspaper story or some radio or TV briefs at the start of the survey, it can boost the field teams' morale and self-confidenceat this critical time. (Occasionallyinterviewerskeep the old newspapers throughout the entire survey period, to show the households that they are official and serious.) Obtaining free publicity requires some imagination; for the Peru 1985 survey, the head of the National Statistical Institutetook advantageof the monthly disclosure of the ConsumerPrice Index to publicizethe survey. (The announcementof 121 the CPI was understandablya major mediaevent, given inflationlevels at that time.) * TARGETEDPUBLICITY. This may include letters to the householdsand leaflets (preferablyin color, withgraphs or other illustrations)that explain the purpose of the surveyand the samplingmethodologyin simpleterms. In Ghana, the publicitywas handled as follows: One to two weeks before the team arrived in an urban cluster for interviewing,the supervisor sent out letters to inform the heads of householdsof the team's arrival in the community and the date the team would possibly visit. The supervisor then visited the foremost local political figures (such as the membersof the Revolutionary Defense Committee) and the heads of all selected households. * INCENTVES. Sometimes a gift or payment is given to the MATERIAL households in return for their collaboration. There is some controversy about the quality and quantityof the material incentives that should be used to foster the household's collaboration. LSMS surveys generally follow the practice standard in each statistical agency. Some consider incentivesto be standard procedure for all surveys. This was the case in Romania,where the householdsinterviewedfor the earlier Family Budget Survey received a monthlycash payment(albeit a very modest one). The Romanian LSMS inherited that feature. Other statistical agencies are reluctant to even consider the idea of rewarding the households in any way, to prevent householdsfrom becomingincreasingly demandingand affecting all the household surveysconductedin the country. This is the case in Jamaica. A relatively inexpensivealternative, likely to be costeffective and be accepted in all countries, consists of giving away small presents for the interviewed households. These can include t-shirts, calendars, brief statisticalbrochures, and similar items. In Peru (1990 and 1991), for example, households were given copies of an attractive popular magazinepublishedby the private survey firm that conductedthe survey. Ideally, the giveaways should have little or no intrinsic value. This both ensures that they do not affect the household welfare measurementand reduces the accountingcontrols required. * LEVEL. Publicity and motivation at the local community COMMUN1TY level are especiallyimportant in rural areas. Local authoritiesshould be contactedand convincedof the usefulnessof the survey. In rural areas of Ghana, letters were sent to the local chief or regent. The weekendbefore the survey, the team paid a courtesy call to the chief/regent and other prominent members of the community to explain the objectives of the survey, to introducethe team members,and to discuss the surveyschedule for the week. The supervisoroften used the occasion to administer the communityquestionnaire. After this meeting the interviewerscontacted 122 the selectedhouseholdsto introducethemselvesand to make appointments for interviews. Piloting Field Procedures Since the LSMS field procedures have worked well in several countries, the pilot test for field procedures is less to determine whether they can work in general than to fine tune the details of how they are implementedin the specific country. After the first four-week cycle of field work, all the teams convenein one place for a week or two. They discuss their experienceand teams compare notes about problems and how they could be resolved. This has been done in most of the LSMS surveys conducted under the standard scheme presented Section A. Most problems found during the field test fall into three classes: * In spite of all precautions, some problems regarding the supplyof one surveymaterial or other is always found, fuel for the cars being the most common. Sometimesthis is due to excessive bureaucracy at the central level but often also to the supervisors' failure to understand the extent of their autonomy. REFINiNG LoGISTICS. * DEBUGGING DATAENTRY THE PROGRAM.A major subjectof discussion is the working of the data entry program. Again, and as always is the case with software, no lab testing is ever able to show all the hidden features of the program that will be revealed when data from numerous real household are entered. More importantly,the need to program new consistencychecks that were unforeseenby the surveydata managerswill become obvious after the first few weeks of field work. * STATISTICAL QUALITY CONTROL THE DATA. Concurrent data entry OF makes it possible to conduct some preliminarystatistical analysis of the data collected in the first month. From an analytical standpoint,the data from just one month has no statisticalsignificanceunless the total sample is exceptionallylarge, but it often can provide interestinginsights into the quality of the field work. For instance, after the first month of the Mauritania survey, the frequencydistributionof the last digit of the ages recorded in years showed too large a proportion of "zeroes" and "fives" (demographersalways expectthis to happen,but not to that extent). More interesting, the samephenomenonwas observedin the last digit of weights (in tenths of a kilo) and heights (in tenths of a centimeter) that were recorded by the anthropometrists. The early detection of this problem allowed for corrective re-training, and also gave the data entry program credibility in the eyes of the field teams. It may happen that the problems revealedin the four-weekassessmentare serious enough to make the data collectedduring the first four weeks unreliable. Though this has not been the case in the LSMS surveysconducted so far, it is a 123 real possibility and the survey planner should be prepared to deal with this contingencyby excluding the first month from the data sets. This would entail either a smaller total sampleor expandingthe data collectionprocess by an extra month. In cases with more innovativefeatures, the fieldproceduresmay be piloted before the actual interviewingstarts. In Nepal, for example, a test of logistics was done to see how well it would work to have the data entry operator and computer travel with the field team. In Romania, because the survey involves 500 interviewers,it was impossibleto bring them all to Bucharestfor discussions. The first month of the survey (March 1994) was considereda de facto field test of the survey, with the explicit intention of excluding the collected 3,000 problems were found. householdsfrom the data sets if too many implementation This, in fact, proved to be the case. 124 Chapter 6. Data Management Key Messages * * Integrated data entry and field work are key to the timeliness and the quality of data from LSMS surveys. The data managementapproach for the LSMS surveys has four primary features: (1) precoded, verbatim questionnaires; (2) error detectionat the time of data entry; (3) data entry that is concurrent with field work; and (4) correction of suspected errors in the field. There must be substantialinteraction between the data manager and the analystsduring the drafting of the questionnaireand the definitionof error checks. To ensure smooth field operations and credible data, the data entry program must be well developed and tested before field work starts. Sufficient time must be allowed for these procedures. Five kinds of checks shouldbe made on the data as they are entered: (1) range checks should be defined for every variable; (2) checks shouldbe possiblebetween entered data and reference tables; (3) skip checks should be defined for all skips, both within and between different units of observation; (4) checks for consistencyof answersto differentquestionsshould be made, both within and between different units of observation; and (5) checks on typographicalaccuracy shouldbe possible. Before distributing the data files to analysts the statistical office should check the structuralconsistencyof the data files - that the files include all households and no redundancies, and that all files can be merged properly. procedureshave been used, ex post When the full LSMSdata management facto checks for logical consistencyof the files - for missing values, outliers, etc. - will be redundant with the checks done at data entry. Any further treatment of these problems shouldbe left to analysts, since there is no universally acceptable solution to these problems and their treatmentis very difficultto documentadequatelybut often critical to the interpretationof the analysis. * * * * * 125 * The number of different levels of observationin LSMS surveys creates complexitiesin data managementthat are best handled by a file structure that: (1) assigns one record to each individualunit observed; (2) allows a variable number of records of each record type; (3) limits the numberof variables in a record type to what can be contained on a data entry screen; and (4) uses a complete set of identifierson each record. Good data managementis critical to ensure the timelinessand quality of the survey data. This chapter describesthe problems to be addressedin managing LSMS data sets and the techniquesthat have been developed to deal with them. Sections A and B shouldbe read by all readers and provide an overview of the LSMS data management philosophy and the requirements for the data managementsystem. SectionC describesthe file structure usedin the customized LSMS data entry program and is of interest to readers who will be involved in data managementor choosinga data entry program. A. An Overview of LSMS Data Management Philosophy Objectives Two principles guided the developmentof the LSMS data management system - timelinessand quality. The primary reason for this type of survey is to provide policymakersand analysts with informationabout householdbehavior 57 and welfare; for the data to be useful, they must be recent. The LSMS surveysalso aspire to collect data of very high quality. The LSMSsystem speeds and simplifiesanalysis and gives the results credibility. Approach Developed In order to achieve the goals of timelinessand high quality, the LSMS approach uses four key features: (1) pre-coded, verbatim questionnaires with explicit skip patterns; (2) error detectionin the data entry program; (3) data entry concurrent with field work; and (4) correction of errors in the field. As QUESTIONNAIRES. explained in the chapter on PRE-CODED VERBATIM questionnairedevelopment, nearly all questions on the LSMS questionnairesare pre-codedor require numeric answersand the remainingfew questionsare coded in the field, as explained in the chapter on questionnairedesign. This eliminates 57. Before the LSMS model was developed,data from complexsurveys (such as agricultural surveys, nutrition surveys or householdexpendituresurveys) could take two to five years from the completion of field work to the availability of data for analysis. While general survey practice has improvedsomewhatsince then, the problem has not disappearedby any means. 126 the coding step in the data managementprocess, which often took months or years in other surveys and which introducesthe possibility of errors. ERROR DETECTION AT THE TIME OF DATA ENTRY. The data are subjectedto extensivechecks on their validity and consistencyat the time they are recorded, as will be explained in detail later in this chapter. LSMSsurveysenter data concurrentlywith the field work. As explained in the chapter on field work, this eliminates a long inactive period. Completedquestionnairesare not stored unattendedwhile field work progresses. Instead, the time-consumingtask of data entry is carried out simultaneously with field interviews. The systemof concurrentdata entry detects errors quickly and allows interviewers to revisit the household to try to correct apparent errors. CONCURRENT DATA ENTRY. Suspect data in the first half of the questionnaire that are flagged by the data entry program can be checked or corrected during the secondinterview. For data gathered during the second interview, the opportunity to correct any errors is not guaranteed. However, in urban areas, and occasionally in rural areas, there is no great difficulty in visitinga householda third time if this is required to correct errors from the second interview. CORRECTION OF SUSPECTEDERRORS iN THE FIFLD. Correcting the errors in the fieldgreatly speedsthe process of data editing and correction, sinceonly a single, quick, and conclusiveiteration is needed. It also increasesdramatically the level of certainty that the appropriate correction 5 is being made. 8 Even when such revisits are not possible, concurrent data entry improves the field work because it provides immediate feedback on common errors and problems. Thus, correctivemeasurescan be taken early on rather than having errors replicated throughoutthe whole survey. Implications for Survey Planning The use of LSMS data managementprocedureshas some implicationsfor other parts of survey planning, as follows: INTEGRATIONOF DATA MANAGEMENT AND QUESTIONNAIRE DESIGN. Data managementshould be integrated into questionnairedesign. The data manager shouldbe consultedon each major draft of the questionnaire,since he or she will have an especiallysharp eye for flaws in the definitionof units of observation, skip patterns, etc. Likewise, the analystswho have helpedto write the questions shouldhelp the data managerdetermineappropriaterange andconsistencychecks. 58. With ex-post facto data entry, the best that can be achieved is internal consistency of the data sets. There is no guarantee that they truly reflect the reality found in the households. 127 SKILs REQUlREDOF l7nEDATAMANAGER. The data manager's role is much more creative in this systemof data managementthan in the old fashionedsystem where a programmer waited to be told what to program. In this system the data manager needs to be creative and adept at taking initiative. He or she should have sufficientstatisticalor economictraining to determinethe content of the data quality checks independently. The basic programmingskills required to be able to learn techniquesfor LSMS data managementare familiaritywith the standard DOS commands and a programminglanguage. The specific skills of using a particular data entry software are usually part of the on-the-job training for the LSMS survey. In other words, the data manager does not need to be a professional programmer - indeed experience suggests that it may be better if he or she is not too much enamored with computersper se. The 7IMETABLE. data entry program for questionnairesentered in the field must be carefully developed, tested, and corrected before field operationsbegin. A data entry program that does not work well damages its credibility and usefulnessas a supervisorytool. LSMS surveys have usually allowed six to eight weeks for the complete preparationand testingof the data entry program and another two weeks between the training of the data entry operators and interviewersand the commencement of field work. Regrettably, this entire period is often absorbed by the program for the householdquestionnaire. Programsfor the community,price, and facility questionnaireshave not been designed or tested as well in advance of the field operations, with noticeable consequencesfor the resulting quality of data. B. Requirementsfor the Data ManagementSystem The minimum requirement for a satisfactorydata managementsystem is that its output be a useful, high-quality data set. This section describes the requirements for achieving this goal. Ease of Analysis of Resulting Data Files The structure of the final files must facilitate analysisby commonlyused statistical software. As the LSMS questionnaire contains so many units of observation, achieving this objective is not a trivial matter. Contrast, for example, the complexity in use resulting from alternate file structures for the demographicdata on the householdroster. One possibilityis to have one record for each person in the roster, with one field for age and one field for sex. Another possibilityis to have the wholeroster in one singlerecord, with separate fields for the age of person 1, the age of person 2, the sex of person 1, the sex of person 2, etc. Creating a table of sex by age requires a single program statement in the first case, whereas in the second case it requires previously combining the information from each of the sex variables and each of the age variables. The secondapproachwill also have to allow a numberof variables for 128 each personal characteristic(sex, age, education,etc.) equal to the largest likely household size, say 20 or 25 persons, even though the average size is much smaller. Evenworse structuresare possible- in one recent survey, the structure created two variables for sex for each person: male yes/no and female yes/no. This meant yet another step of aggregation was required before substantive analysis was possible. It also introducedthe possibilitythat individualscould be coded as both male and female or neither male nor female. One of the principalchallengesof managingLSMSdata is to producefiles from complex questionnaires that are easy to use for analysis. The questionnaires' complexityis due to the numerousdifferentlevels of observation 5 and their interrelationships,rather than to large file size. 9 In the number of levels of observationand their interrelations,LSMS surveysare among the most complex to be found anywhere. The household questionnairestypically have approximatelytwo dozen levels of observation. The questionnairesfor schools or health clinics may again have severallevels of observationbut the community and price questionnairesusually have many fewer, sometimesonly one, units of observation. Box 6.1 shows the whole list of levels of observationin the Kagera Health and Developmentsurvey. To avoid repeatingthe workof labelingvariablesin the statisticalsoftware during the analyticalstage, the program used for data entry should define the file structuresin the formats of commonstatisticalsoftware such as SAS, SPSS, and Stata, as well as in the .DBF format used by data base managerssuch as DBase, Clipper, and FoxPro. This can save several weeks of work, since LSMS questionnairescan have hundredsor even thousandsof variables. Many of them are categorical variables, a few of which have lengthy code lists (like occupations, geographic locations, or consumption items) and many more of which have shorter code lists (type of school or clinic attended, places where credit available). These are all labeled in the data entry screens so it makes sense not to have to duplicatethe work later. The data should also not be cluttered with unnecessary codes for "not applicable." Becauseof the explicit patterns built into the LSMS questionnaire, blanks can be reliably interpreted as "not applicable." At the data entry stage, this means that time need not be taken up by filling in artificial "not applicable" codes such as 999. This also greatly simplifiesanalysis since the annoying 999s have to be excludedby hand from all averages, cross-tabulations,models,and so forth. 59. In terms of bulk, LSMS data sets are large but not extraordinarilyso. The average LSMS survey gathers information from some 3,000 households. The data from each household can be stored in about 10 kilobytes (the range usually being between 5 and 20 kilobytes), so the entire survey may require around 30 megabytesof disk space, a figure that personal computersnow availablecan handle quite easily. When not in active use, the data size. can be compressedfor storageto about one eighth of its uncompressed 129 Box 6.1: Levelsof Observation the Kagera in Healthand Development Survey Household Questionnaire Household Individual member of household Children living elsewhere Children ever born to women in household Dead household members Dead non-resident relatives Plot of land Crop grown Type of crop processing Item of farm equipment Livestock type Livestock product Business Business input expenditure Business asset Item of fishing equipment Fishing input expenditure Dwelling or buildings Durable goods owned Home-produced crop consumption item Purchased food consumption item Household expenditure item Health Faality Questionnaire Facility Type of personnel Type of vehicle Services offered Vaccines Contraceptive methods Types of support received Drug supply Outpatient consultations by diagnostic category Traditional Healers Questionnaire Healer Health conditions Prescriptions and referrals Primary School Questionnaire School Grade Types of support received Community Questionnaire Community Credit and lending agencies Primary schools Secondary schools Health service providers Major crops grown Type of agricultural labor Data Quality ChecksDuring Data Entry When they are entered, data should be subjectedto five kinds of quality checks: range checks, checks against reference data, skip checks, consistency checks, and typographicalchecks. Each is discussedin turn in this section. RANGE CHECKS. There should be a range check on every variable in the survey. Categoricalvariablesshouldtake on only definedvalues. For example, for a yes/no question, the only legal codes should be "1" (yes) and "2" (no). Any other value should be flagged as an error. Chronologicalvariables should contain valid dates. For example, the date of February 29 would only be allowed in leap years. Numeric variables should be verified to lie within prescribed minimumand maximumvalues. For example, the age of each person shouldlie 130 between 0 and 95 years (see Box 6.2 for a discussion of how to set the boundariesfor the ranges on numeric variables). An error flag, such as a beep and a flashingfield on the screen, may be set off when an out-of-range value is entered. If the error is merely typographical, the data entry operator can correct it immediately. It should, however, be possible to override the flag if the value entered represents what is on the questionnaire. In that case a written error report should be generated so the supervisorand interviewercan verify the value during the second interview. The suspectdatum may be storedin a special formatthat registersits questionable status, but this format should be such that the analyst can use the datum in analysis if he or she deems it appropriate. REFERENCE TABLES. For the anthropometricmodule, the validity checks should be made by comparing the individual's height, weight, and age with the World HealthOrganizationstandardreference tables. Any value for the standard that indicators (height-for-age,weight-for-age,and weight-for-height) falls more than three standarddeviationsfrom the norm shouldbe flaggedas a possibleerror so that the measurementcan be repeated. A similar check using an outside body of data can be performed on food compositiondata, but so far the Romania survey is the only LSMS survey that has had such a check. The Romania survey verified that the monthly per capita energy intake for the householdlay within reasonable ranges, and also checked that the per capita energy providedby each individualfood did not exceed certain absolute maximumsor certain fractions of the total energy intake. Box 6.2: Setting Boundariesfor Range Checks Setting the boundaries for the range checks on some numeric variables is an art. Optimally, for example, the maximum permissible value for expenditures on a particular food item should be selected by referring to a previous household survey, choosing a value that includes 97 or 99 percent of the households beneath the bound, and then updating that value for inflation. Such a rigorous method of determining ranges can hardly be applied to all variables; setting up the ranges in practice may require some guesswork. In so doing, it is worth keeping in mind that the purpose of range checking is not to flag absolutely impossible values but to warn of probably erroneous values. The temptation to set up extremely wide ranges (like, for instance, $100 per week for caviar, just in case Mr. Rockefeller happens to be selected in the sample) should be avoided. Setting tighter ranges entails, of course, the risk of flagging a few 'false positives," but human supervisors are there precisely to apply their judgement to these situations; the data entry program should allow the operator to enter an out-of-range value if it correctly reflects what is written on the questionnaire and is not due to a typographical error. These values should, however, be flagged so that the interviewer and supervisor can determine in the field whether they are correct. 131 SKIP CHECKS. Skip checks verify whether the skip codes have been followedappropriately. For example, a simpleskip check verifies that questions to be asked only of schoolchildrenare not recorded for a child who answered "no" to an initial question on school enrollment. A more complicated check would verify that the right modules of the questionnairehave been filled in for each respondent. Depending on his or her age and sex, each member of the household is supposedto answer (or skip) specific sectionsof the questionnaire. For instance, children less than 5 years old should be measured in the anthropometricsection but should not be asked the questionsabout occupation. Women aged 15-49 may be includedin the fertility section, but men may not. The data entry program should not actually follow the skip codes itself. For example, if a "no" answer is entered to the question 'are you enrolled in school?" the fields to enter data about the kind of school attended, grade in school, and so on should still be presentedto the data entry operator. If there are answers actually recorded on the questionnaire,they can then be entered and the program will flag an incorrect skip. The supervisoror interviewercan determine the nature of the mistake. It may well be that the "no" was supposed to be a 'yes." If the data entry program had automaticallyskipped the followingfields, the error would not have been detected or remedied. All the skip codes in the questionnaireshouldbe verified in the data entry program. This may involve hundredsof checks. CONSISTENCYCHIECKS. Consistencychecks verify that values from one questionare consistentwith valuesfrom anotherquestion. A simplecheck occurs when both values are from the sameunit of observation, for example the date of birth and age of a givenindividual. More complicatedconsistencychecks involve comparing informationfrom two or more different units of observation. There are many complex consistency checks that are applicable in almost all LSMS surveys and so have become somethingof a de facto standard. For example: OF THE HOUSEHOLD. The consistency DEMOGRAPHIC CONSISTENCY between the ages and genders of all household membersis checked with a view to the kinship relationships. For example, parents should be at least (say) 15 years older than their children, spouses should be of different sexes, etc. CONSISTENCYOF OCCUPATIONS. The presence or absence of certain sections shouldbe consistentwith the occupationsdeclaredindividuallyby the household members. For instance, the farming section should be present if and only if some household members are reported as independentfarmers in the labor section. CONSISTENCY OF AGE AND OTHER INDIVIDUAL CHARACTERISTICS. It is possible to check that the age of each person is consistent with personal 132 characteristics such as marital status, relationship to the head of the household,grade of current enrollment (for children currently at school) or last grade obtained (for those who have dropped). For example, an 8 year old child should not be in a grade higher than 3. EXPENDrrURES. Severaldifferentconsistencychecks are possible. Only in a household where one or more of the individualrecords shows that a child is attendingschoolshouldthere be positive numbersin the household consumptionrecord for items such as school books and schooling fees. Likewise, only households that have electrical service should report expenditureson electricity. It is very importantto be able to do both skip and consistencychecks that involve more than one unit and level of observationat a time.' This criterion should be given heavy weight in choosing a software package for data entry because complex checks are numerous and tend to disclose the most important flaws in the field work, as well as those least likely for the interviewer or supervisor to find by a visual check of the questionnaire. A list of all the checks betweenunits of observationthat were includedin the Romaniasurveyis included in Annex VII. Since the resolution of checks on different units of observation often requires going back to the household,or at minimuma thoughtfulperusal of the questionnaire, a written report should be generated for the supervisor and interviewer to use in the process. An example is shown in Box 6.3. There is no natural limit on the number of consistency checks that can exist. Well-writtenversions of the data entry program for a full LSMS survey may have severalhundred of them. In general, the more checks that are defined, the higher will be the qualityof the final data set. However, given that the time availableto write the wholedata entry programis alwayslimited (usuallyto about two months), some expertise and good judgementare required to decide exactly which should be included. CHECKING TYPOGRAPHICAL FOR ERRORS.In most LSMS surveys the data entry program can print out the values entered in a format similar to that of the questionnaire. This printout serves two purposes. It can be checked visually against the original questionnaire (this is the duty of the supervisor) and the values that raised flags on the range, skip, and consistencychecks are printed 60. The level of observation is the kind of thing being observed - persons, plots of land, crops, householdbusinesses. The units of observationare the different individualswithin each set - person 1 or person 2, rice or corn. An example of a check between units of observation would be to check that the parents of a child are at least fourteen years older than the child. An example of a check between levels of observationwould be to check that if the head of householdis a farmer, the agriculturalmodulehas been filled out. 133 Box 6.3: Sample Report of Inter-Record Checks 01 E o IHousehold 02024: PART 3: Inter-record checks: o '---- Error number 1: Person No. 03 answeredPart B but was not a householdmember o in Wave 3 - --Error nuaber 2: o Person No. 09 is missing from the householdroster o o I jo 10 10 0 I I ---- Error number 3: Child No. 14C reports differentages on Section 2 and the Yellow Roster 10 j0 o ----Errornumber 4: ChiLd No. 33C is not on Section 2: Children residing elsewhere ---- Error number 5: Woman No. 02 must answer questions 3 to 15 on Section 9. E0 l 0 o o ---- Errornumber 6: 10 l X INTER-WAVECHECK: SECTION 4, QUESTION 8: o 1 Household reported2 familybusinessesduring Wave 3 but X only 1 during Wave 4. Please verify whether the household o 1 had these business in the past 6 months during round two. o0 6 errors detected in this Household o o 0 This figure offers an exampleof the inter-record checksgeneratedby the data entry program for the fourth wave of the Kagera Health and Development Survey. Once the operator has completed the data entry process for a household, he or she then runs the inter-record checks, which produce listings like the one shown. The operator can also look at the listing on the screen, since there is a possibilitythat some of the errors might have been typographical mistakesthat the data entry operator can fix. Otherwise, the supervisor receives the listing along with the questionnaire,since these types of inconsistencies have to be correctedduring the second visit to the household. The KHDS inter-recordchecks are particularly interestingas the survey had a four-wave panel design, so inter-wavechecks had to be programmed into the process as well as the standard inter-recordchecks. within bold boxes so that they show up easily for the interviewer to correct in the household. An example is shown in Box 6.4.61 61. Specialized pages to be used in interviews may also be printed. For example, the program prints a page for the anthropometrist to use as a questionnaire during the second interview. It has the same format as the original questionnaire page used in the first interview and lists the names of the individuals to be reweighed. (Individuals falling more than three standard deviations from the reference norms are flagged as being possible errors and 20 percent of individuals are randomly selected for remeasuring as a validity check.) 134 Box 6.4: Sample Page of the HouseholdPrintout o 0 o0 I Household02024 -- Section 1: HouseholdRoster: Page 3 Q QQQQ Q 1Q Q0 000 0 Q a Q QQ Q Q D 0 0 0 5 5 5 6 60 0 0 1111 C A2 3 4 D M Y Y M7 8 9 0 1 2 3 01 21 02|1121011|02|06|491044 0 15 0 1ao 0 0o 10 0~03 oj I 04 1121031108 0274019 16 _ _00_ 10 I aO 06 1 1 03 1019 09 79 t 07 1 0311 11 11 81 011 0 UU_1 00 00 10 10 0 , The figure shows a printout of a questionnaire page. The operator usuallyrequestsa complete printoutof the wholequestionnaire along with the inter-record checks,thoughhe or she also has the optionof selectinga single page as shownhere. In this particularexample,the page printedis the roster chosen page of the Tanzania LSMSquestionnaire.The formatwas specifically to mimicas closely possible actuallayoutof the questionnaire. as the The bold box surrounding answers for individualnumber06 to the questions and 7 denotesthat an inconsistency detected. How the error 6 was flag wouldappearon the dataentryscreenis shownin Box6.7. It must be admitted that the tediousjob of visually checkingthe printout and the questionnaire is probably not done as rigorously as needed to fully substitute for double-blinddata entry. Thus typographicalerrors that result in valid values may occur. These are probably most prevalent in the consumption or income sections, since the ranges for valid values are wide and relatively few consistencychecks are possible. For example, an expenditureof either $14 or $41 would be valid for the monthly consumptionof a staple food. The same mistake for age, however, mightbe caughtby the consistencycheckswith marital status or family relations. For example, a married or widowedadult aged 41 whose age is mistakenlyentered as 14 will show up with an error flag in the intra-record check on age and marital status. The impact of such errors in the consumptionsection is probably small, given the small fraction of the total that any one item contributes. 135 In Romania, an innovationhelped detect typographicalerrors. Lines for check totals were added to the bottomof each page on the consumptionmodule. The interviewer used a pocket calculator to total the value of expenditures on each page by hand and filled in the line. The resulting figure was entered with the raw data. Then an inter-record check was added in the data entry program to verify that the sum of the items entered equalled the check total. After Data Entry After data entry has been completedin the field, the central office has a few steps to perform. First, the data manager should gather the files with household data prepared by the various data entry operators throughout the country and verify that all households from each period are included without duplication. Though a good systemof householdidentifiersshouldalmostensure that no households are duplicated, there is still room for human error, such as entering data from one householdin two differentcomputersor readinga diskette twice at the central office. The same process may also be necessary for the data from the community, price, and facility questionnaires. Second, dependingon the file structure used by the data entry program, files may have to be convertedinto the few numerousindividualhousehold-based larger thematic files that are useful in data analysis. This process is illustrated in Box 6.5. Third, the files shouldbe convertedto the formatof the softwarethat will be used for analysis while producing the abstract. In fact, the files might be converted to additional formats in order to facilitate use by clients who use different software packages. A master version of the files should, however, always be maintainedin ASCII, since it is the universalstandard readable by all other software. The LSMS division, for example, distributes data sets in SAS, Stata, and ASCII formats. After the conversion it should be checked that the conversionsbetween softwarewere performed correctly, with the data assigned to the proper variables and the labels transferred correctly. Fourth, the data manager should check the structural consistencyof the files; that is, that the different thematic files with data from the household questionnairescan all be matchedwith each other, and that informationfrom the household can be merged with information from the community and price questionnaires. Problemsare mostcommonlyfoundin merginginformationfrom the three questionnaires,so this aspect should be checked the most closely. At this stage it is useful to compile basic univariate statistics for each variable. For qualitativevariables (i.e., those that only have a small number of possible values like yes/no questions), frequencies should be produced. For quantitative variables, the minimum, maximum, and mean values should be reported. Then these results shouldbe examined for rough plausibility. If, for 136 Box 6.5: File Structure,Identifiers the Interface and BetweenData Entry and Analysis When data entry is integratedwith field operations, the most natural unit for data managementis the householdfile (the set of records of differenttypes that pertain to one household), whereas at the analytical stage it is the thematicfile (the set of all records of the same type generated by all households). An important step in the data management process is therefore to transform one form of file organization into the other. The record format structure used in the custom LSMSdata entry program makes this process conceptuallytrivial, as will be illustratedbelow with a simple example. In practice this process may be complicatedbecauseof the large bulk of the data sets. Consider a three-householdLSMS survey with a three-section questionnaire: housing, roster, and budget. The housing page contains information on building materials for walls and roof; the roster contains the name, sex, and age of all household members; the budget page records the amountsspent by the householdon various items. Such a questionnairewould generate three record types: 001 for Housing, 002 for Roster and 003 for Budget. Assuming that the three householdsare numbered 11111, 22222 and 33333, the data entry program would generate three files, as shown on Figure 6.5.A. Notice that each record is uniquely identifiedwith a record type, a household number, and whatever extra identifiers are needed to distinguishindividualrecords of the same type within a household. In this case, each person in the roster has a two-digitID code, and each budgetaryitem has a three-digititem code (for instance, code "103' may mean 'bread'). Figure 6.5.A The Househo(d files Household 11111 001 111111 2 002 1111101 JOE 1 37 002 11111 02 JANET2 33 002 11111 03 JIMMY1 12 002 11111 04 JUOY 2 10 003 11111 103 000040 003 11111 217 002000 003 11111260 000150 Household 22222 001 222221 1 002 2222201 HOE 1 25 002 2222202 MARY 2 23 003 22222096 005500 003 22222103 000012 003 22222199000125 003 22222205 001200 Household 33333 001 333331 1 002 3333301 SAM 1 40 002 3333302 SANDRA 35 2 002 3333303 SAMMY1 15 003 33333015 000234 003 33333103 000020 003 33333201 000999 Box 6.5 continueson next page example, the mean height for adults is reported as 15 meters, it is a sure flag that something has gone awry in the reading of the variable, since the plausible answer would be in the neighborhoodof 1.5 meters. A different aspect of data cleaning refers to checking for logical consistencyin the observations. This refers to a hunt for and solution to blanks or missingdata, invalid data, outliers, or inconsistencies betweenobservationsthe very things the data entry program and concurrent data entry were designed to detect and prevent. Thus to do this step centrally is redundant. 137 Box 6.5 (continued) Records from all files should be first piled up together and then sorted by record type, household number, and any extra identifiers. This is easily done with any standard sorting programand is illustrated on Figure 6.5.B. and 6.5.1 Piling-up sorting Figure ALl househoLds ALL househoLds (sorted) (piLed-up) 001 111111 2 001 111111 2 001 22222 1 1 JOE 1 37 002 11111 01 001 33333 1 1 002 1111102 JANET2 33 JOE 1 37 002 1111101 002 1111103 JIMMY1 12 2 002 1111102 JANET 33 002 11111 04 JUDY 2 10 002 1111103 JIHMY1 12 003 11111 103 000040 002 1111104 JUDY 2 10 003 11111 217 002000 HOE1 25 002 22222 01 003 11111 260 000150 002 22222 02 MARY 2 23 001 22222 1 1 SAN 1 40 002 33333 01 Sortby: MOE 1 25 002 22222 01 2 002 33333 02 SANDRA 35 002 22222 02 MARY2 23 Type > 002 33333 03 SAMMY1 15 - Record 003 22222 096 005500 003 11111 103 000040 - HH Number 003 22222 103 000012 003 11111217 002000 - etc 003 22222 199 000125 003 11111260 000150 003 22222 205 001200 003 22222 096 005500 001 33333 1 1 003 22222 103 000012 SAM 1 40 002 33333 01 003 22222 199 000125 002 33333 02 SANDRA 35 2 003 22222 205 001200 002 33333 03 SAMMY1 15 003 33333 015 000234 003 33333 015 000234 003 33333 103 000020 003 33333 103 000020 003 33333 201 000999 _00333333 201 000999 Finally, the sorted file may be split into thematic files as shown in Figure 6.5.C. Each such file is a flat file, where records represent homogeneous statistical units and can be submitted to standard statistical software for separate analysis. The presence of household numbers in each record makes it possible to link these thematic files for more ambitious manipulation. The size of the thematic files should not exceed what would be easy to handle with the level of hardware and software expected of the final users. 6.5.C Figure Theme001 Housin 001 11111 1 2 001 22222 1 1 001 33333 1 1 files Thematic Theme003 Budget 003 11111 103 000040 003 11111217 002000 003 11111260 000150 003 22222 096 005500 003 22222 103 000012 003 22222 199 000125 003 22222 205 001200 003 33333 015 000234 003 33333 103 000020 003 33333 201 000999 Theme002 Demographics JOE 1 37 002 11111 01 2 002 1111102 JANET 33 002 1111103 JIMMY1 12 002 1111104 JUDY2 10 MOE 1 25 002 22222 01 2 23 002 22222 02 HARY SAM 1 40 002 33333 01 2 002 33333 02 SANDRA 35 002 33333 03 SAMMY1 15 138 When the full LSMSfield work and data managementstrategieshave been adopted, the LSMSdivision has alwaysrecommendedthat any data cleaning with respect to logical consistencyshould be left to individualanalysts. The files are thus ready for distributionto analysts at this stage. One reason for recommending checks for logical consistencybe left that to analysts is that no consensusexists as to how to identify or treat outliers and 62 missing observations. Since no procedure will satisfy all analysts, it has been deemed best to give them the raw data and let each analyst perform whatever cleaning he or she thinks best. Moreover, it is very important for analysts to know exactly what has been done so they can interpret their findings correctly. Since documentingdata editing is very difficult, it may be preferable to leave it to the individualanalyst. Naturally one of the important analysts of the data will be the statistical institute itself. Thus, the recommendation that the data be made publicly availableafter checkingonly for structuralconsistencydoes not preclude further data cleaning with respect to logical consistencyas part of the analysis process. The institute may wish to make public its corrections, e.g., imputations of alternate values for outliers or missing observations, along with its other computedvariables, such as incomeor consumptionaggregates. These computed variables should be labeled as such and distributed in addition to and not instead of the original data. The original data, free from adulterations, must be made availableto outside analysts. This small role of the central office in cleaning data is possiblebecause so much of the data quality control has been moved to the decentralized data entry stage. Before the availability of personal computers made this possible, data entry and data editing had to be done separately after the field work was concluded. For certain simplesurveys,ex post facto data editing might converge to a relatively "clean" data set after a few iterations, usually in a year or so. In complex surveys, however, implementingquality controls and the correction of the detected inconsistenciesin batch operations becomes difficult. Editing a complex survey in this way could take several years, and in extreme cases the resulting data set - even if internally consistent - can be extremely unreliable because of the myriad of undocumenteddecisions that had to be made along the way. Now personal computers can perform powerful quality checks at the time of data entry, and data entry, data editing, and field work can be integrated into a singleprocess. Thus the need for time-consumingand inaccurateex post facto data editing has been eliminated. 62. Some analysts do nothing. Others spend a long time identifying these problem cases. Some analysts drop the problem observations. Others develop elaborate routines to impute some kind of correction. 139 Finally, and very importantly, the central office must add the variables containing the sampling weights and prepare adequate documentation to accompanythe data sets. This is treated in detail in Section B of Chapter 7. C. File Structure Used in LSMS Data Entry Program A custom data entry program was developed for the first LSMS surveys and has been used since then in most of the surveys assisted by the LSMS division of the World Bank (see Box 6.6). The file structure it uses is described here. Experiencehas proved this structure to be satisfactory. It handleswell the complexitiesthat arise from having so many different levels of observationand addresses the perennial goals of limiting errors by data entry operators, minimizingstorage requirements,and interfacingwell with statisticalsoftwareat the analytic phase. In describingthe structure two terms must be defined: the record type and the record. Imagine a matrix of informationlike the grids on the questionnaire. The columns are the questions or variables. The rows are the different individuals(or plots of land or businessesor whatever) to which the questions pertain. The variables may be divided into smaller, more manageablesubsets which are referred to as record types (suchas employment,health, and so forth). The information pertaining to any row - an individual, for example - is a record. Correspondence Between Records Individual and Units Observed The data structure maintains a one-to-one correspondencebetween the individualunit within each level of observationand the records in the computer files. For example, to manage the data listed on the household roster, a record type would be defined for the variables on the roster and the data corresponding to each individual would be stored in a separate record of that type. Similarly, in the consumptionmodulea record type would correspondto food items and the data corresponding to each individualitem would be stored in separate records of that type. VariableNwnber of Records The number of records in each record type is allowed to vary. This economizesthe storage space required, since the files need not allow every case to be the largest possible. For example, the number of bytes necessary to enter the roster data for every person in one particular household will be determined by that household's size (with an average of about 5 persons per household), rather than be a fixed number that has to allow for the largest conceivable household (which might be 20 or 25 persons). 140 Box 6.6. An Evaluationof Data Entry Packages' Suitability the LSMS for When the first LSMS surveys were undertaken there was little data entry software available that could take into account both the complexities of the LSMS questionnaire and the integration of data entry with the field work. A customized data entry package was developed and has been used for most of the LSMS surveys that have been implemented, especially for those that have integrated data management and field work. The number and sophistication of commercially available data entry packages on the market and in use in statistical agencies has grown markedly since the first LSMS. In an effort to identify other data entry options for the LSMS, an independent software testing firm (National Software Testing Laboratories) was contracted by the World Bank to evaluate six data entry packages and one data base package to determine their adequacy for use with an LSMS. The software packages evaluated were: Package IMPS 3.1 BLAISE III 1.0 ISSA 2.28 Rode/PC 3.09 EPI-INFO 60 SPSS/DE 5.02 Paradox 4.5 (for DOS) Developer U.S. Bureau of the Census Central Bureau of Statistics of the Netherlands Macro International DXP/IDES U.S. Centers for Disease Control SPSS Inc. Borland Corporation Based on the evaluation criteria presented in this chapter, IMPS was the only software package that was found to meet all of the requirements of the LSMS data entry process. Practical experience using it for LSMS surveys has been limited. The Romania team started to use IMPS since they were already familiar with it, but determined that it was difficult to do all of the data quality checks they wa-.ed and thus switched to the custom program. In Ecuador, the statistical agency used IMPS and found it satisfactory in their LSMS survey. Blaise III 1.0 was unable to generate reports and had no method for forcing out-of-range values and ISSA 2.28 could only handle 940 variables, well below the needs of the LSMS. Note that at the time of the evaluation in fall 1994, new versions of both Blaise and ISSA were being developed which would deal with the constraints listed above. The other packages had a variety of limitations which made them unsuitable for use with an LSMS survey. It should be kept in mind that software developments are occurring rapidly and that there may well be other packages available now or in the near future that will meet the needs of the LSMS data entry process. If reviewing software options, the requirements listed in this chapter should be used. If the proposed software cannot function as outlined here, then a different option should be selected. This structure can also be generalizedto situationsof higher complexity. So far we have consideredthe case of a singlehousehold,which might have from 141 one to 25 members. A more extreme situation can be found in the fertility section, where data are sometimescollected for every child ever born to each woman older than 15. The questionnairemay include space for 5 such women and 15 children for each of them. The recommendedapproach uses just one record for each child actually recorded in the household (usually a reasonably small number), whereas the alternativeapproachmight require keeping spacefor 75 children in all households. One can even imaginethat the questionnairemight include informationon each of several illnesses for each child of each woman, thus adding another factor to the total number of cases. Limiting the Length of a Record 7ype It can help to eliminatedata entry errors to have a record type contain the same amount of informationas each screen presentedto the data entry operator. In some cases, this can be muchless informationthan what would be conceptually acceptable to group together for each observation. Consider the health module, which may be laid out across three or four physical pages of the questionnaire. Conceptually, the information on each person in the health sectioncould be treated as one record type. But that amount of information is often difficult to fit into an easy-to-read layout on a single computer screen (see Box 6.7). Moreover, using a single screen would require the data entry operators to flip back and forth through the questionnaireas they entered all the information for the first individual, then for the second, etc. In this case the data entry screen (and thus the record type) would better correspond to a string of questionson a single physical page of the questionnaire.That way the data entry operators can fill in one screen for each individualand then turn to the next page of the questionnaire. Identifiers It is essentialthat each statistical unit have a unique identificationcode. In the LSMS data entry program, each unit of observationis assigned a code in three or more parts. The first part is the "record type," which appears at the beginningof each record. It tells whether the informationis, for example, from the cover page, the third page of the health module, or for food expenditures. The record type is followed - in all records - by the household number. In most record types, a third identifier will be necessary to distinguish between separate units within the household,for instance, the person's ID or the code of the expenditureitem. In a few cases there will be only one unit for the level of observation and thus the third identifier is unnecessary. For example, housing characteristics are usually gathered for only one home per household. In a few cases there may be an additionalfourth code. For example, the third identifier might be the householdenterprise, and the fourth code would apply to each piece of equipmentowned for each enterprise. 142 Box 6.7.: Sample Data Entry Screen A HHOLD: 02024 SECTION 1: HOUSEHOLDROSTER: (Record Type 002) ID CODE: 06 IA STILL LIVING HERE?: 1 <YES 2 SEX: 1 <KALE 3 REL. TO HEAD: 03 <SON/DAUGHTER 4 CAN TELL DATE... ?: 1 <YES 5 DATE OF BIRTH DAY:19 MONTH:09 YEAR:79 6 HOW OLD IS..? YRS:013 MTHS: _ 7 MARITAL STATUS: 5 <WIDOW/WIDOWER 8 PARTNER LIVES HERE?: 9 ID CODE OF PARTNER: 10 No. OF MTHS AWAY: 00 11 WILL BE RESIDING HERE?: 12 HOUSEHOLDMEMBER?: 1 <YES 13 MTHS AWAY IN LAST 6M?: 0 This box shows a the data entry screen used for the KHDS family roster data. The fields that are underlined correspond to actual data entry fields. In practice, depending on the hardware used, these fields can appear in different colors or reverse video or in any other format to catch the operator's eye. The rest of the screen contains the names of the variables and the meaning of recorded values for qualitative variables. For example, in variable "SEX", code 01 means "MALE". There are also comments on the screen for the data entry operator such as the title on top "SECTION 1: HOUSEHOLD ROSTER: (Record Type 002)". Notice that the data entry program repeats the household number in the upper left-hand corner of every screen, even if the operator only has to enter it once when beginning each questionnaire. The identification code for the person number is in the upper right corner. On this screen there are two variables presenting an inconsistency. The age of the person is 13 years old and his marital status is widowed. On an actual computer screen the age and marital status fields would be blinking to flag the inconsistency. Though the techniques are easy to learn, designing a data entry screen takes more skill than one might think. Here the designer decided to include the question number as part of the label for each variable, to make it easier for the operator to link each field with its corresponding box in the questionnaire. The screen size is a limiting factor since it only has 80 columns and 25 lines. This screen is simple enough, but when there are more than 20 variables, it cannot be laid out as one variable per line. It takes practice and creativity to fit all the variables into some sort of logical order without overcrowding the space available and to abbreviate the labels in such a way that they are understood by the operators as well as by the survey users at the analytical stage. This is particularly true given that the variable labels defined for data entry are usually borrowed by the statistical software. A good designer can fit up to 50 variables on a screen without making it look overcrowded, but that takes practice. 143 Transfomration be struchtre of the files should entry is completed, the analysis. Once data entry to that appropriatefor from that appropriatefor data transformed Box 6.5. This process is illustratedin 144 Chapter 7: Beginning Data Analysis Key Messages * Data from LSMS surveys support a wide range of analysis on many topics, with methods ranging from simple descriptions to complex behavioral models. Achievingfull use of data from an LSMS survey requires that analysisbe consideredfrom the beginning. Identifyingthe uses to whichthe data will be put is a key part of planning, and many parties should be involvedin analysis and planning for it. Plans should be made early on about how to promote use of the data through maldng them widely available, commissioningstudies, holding workshops, or whatever other method is appropriate for the specific country. Adequatedocumentationmust be prepared as a matter of course so that analystscan effectively make use of the data sets. The basic abstract should present a limited number of tables that are analyticallyinteresting. They should be clearly presented and may be supplementedby graphs. More analyticallycomplex studies can be done on a number of topics. Examples include an analysisof poverty (how many poor there are, what they are like, and the reasons for their poverty); social services (access to services, use, quality, and the effect of changes in prices and quality on use); the impact of social programs; determinantsof householdbehavior (what affects decisions about labor supply, school enrollment, fertility, participationin transfer schemes);and other such studies. The payoff to conducting surveys is in the analysis of the data they of produce. Analysis will improve understanding householdwelfare and augment the government's ability to make good policy decisions. This chapter is concerned with how to start the policy analysisprocess. Section A highlightsactivities that may accompanythe survey project in order to promote data use by many analysts. Section B describes the requirementsfor the documentationand disseminationof data sets. Section C gives some guidelinesfor producinga basic abstract from the survey. Section D outlines more sophisticated work. All readers of this manualshouldat least skim this whole chapter. It is supplementedby Annex X, which outlines how to construct some of the basic aggregates from the survey. 145 * A. Policies and Project Components to PromoteData Use LSMSare so rich in informationthat exploitingtheir full potentialrequires 3 much more than a simple abstract.' It is thus important to facilitate data analysisas much as possible, beginningwith the design of the survey itself. Box 7.1 summarizesthe roles often played by differentactors in data analysis. In recent years LSMSsurveyprojects have increasinglyincludedactivities to promote data use. The scope of what is includedvaries widely from country to country, depending on the funding available, the enthusiasm of project designers, and the expectationsof how muchanalysis might occur in the absence of any specific activities to support data use. To date, we have not evaluated experiencesystematically enough to providerecipes for what is "enough"or what "works."64 In this section we list some of the initiatives that have been tried, that might be tried, or that seem promising. These are presented to help the survey planner get started on a brainstormingprocess that will lead to actions appropriate for his or her own country: * Hold a seminar or workshop to publicize the abstract as soon as it is available. At this workshop, advertise the availability of the data and make copies and documentationof the data available on diskettes. Have some remarks or presentationsdesignedto get peoplethinking about what further analysis might be possible. Hold a second workshopsix monthsor a year after the data are available to present all the analysisdone during that time. This could be a simple workshop. Altemately, some papers could be commissionedfor it or a competitioncould award prizes for the best papers. Provide funds for specific offices in the govemment (e.g., planning, health, agriculture, and so on) to commissionanalyses. These could be for topics identified at the time the survey is designed. Alternately, a dollar allocationcould be made initially and the agenda left open. Identify a few key policy issues and ensure that high-qualityanalysis of these takes place promptlyand is discussedwith policymakers. Ensure that any intemational agencies engaged in policy dialogue are aware of the data's existence. Often these agencieshave studiesplanned that would benefit from the data. * * * * 63. The abstract is, of course, very important and is discussed in Section C. 64. A qualitative evaluation should be available by 1996. 146 Box 7.1: 7he Role of Different Actors in Analysis Since LSMS surveys are rich in analyticpotential, many differentactors should have a role in their analysis. This box describes the typical sharing of roles, though of course they may differ in differentcountries. CentralStatisticalAgency.The centralstatistical agencyhas two main rolesin data analysis. First, it will usuallyproducea basic abstract. Second,it will supplydata sets and their documentation other users. In a few countries, the statisticalagency may also take to on other data analysis functions. It may, for example,conductis own program of analysis driven by sectoralor policyquestions. In general, however,such analysisrequires not only statisticalinfrastructure,but detailedsectoral knowledgeand, sometimes,complementary data that are usually found int he sectoral ministriesor research institutionsrather than the central statisticalagency. Planning Agency. The planning agency is often the agency responsible for conductingor contracting studiesof interestto several sectorsof government. Defining out a poverty line, studyingthe amountof poverty, and identifyingcharacteristicsof the poor are commonexamplesof such work. Studyingthe incidenceof governmentsubsidiesacross differentsocioeconomic groups and other targetingquestionsare additionalexamples. The planning agencymay also take an active role in promotinganalysis on a variety of topics by other agencies. Seaoral Ministries. Sectoral ministries(e.g. of Health, Education, Agriculture, etc.) may separately use the LSMS data to look at the coverageof services they provide. They may be interestedin analysesof how changesin the accessibility,quality, or pricing of their services would affect their use and the revenues from user fees. They may be interestedin knowingwhat parts of householdsbehavioror governmentactionmost affect the outcomesor indicatorsthat their ministriesare most interestedin. For example,what are the determinantsof child malnutrition of schoolenrollment? The sectoralministries or may have the analysis performedby their own staff members. More often, they will find it convenient to contract out the analysis to persons or agencies more specialized in quantitativestatisticalanalysis. Universities PrivateResearchInstitutes. Sinceuniversities private research and and institutesencompassa wide range of disciplinesand interests, it is difficult to generalize about what they may do. They may do any of the above kinds of analysison any sectoral topic. They are probablythe most likely to do analysisthat not only describesthe current situationbut also analyzeshow it came about or could be hanged. InternationalDevelopmentAgencies. Internationaldevelopmentagencies may conduct, fund, or use the results from all of the types of analysislisted above, since they require a solid empirical foundationfor their policy adviceand project evaluations. * Provide trainingthrough in-servicecoursesin-country. This might include training in any of four areas: (1) lessons in a commonstatisticalsoftware package (e.g., how to run a table or regression); (2) training in statistics (e.g., tests for significantdifferencesin tables, regression analysis); (3) workshops in how to present the results of statisticalanalysis simply and clearly; and (4) seminars in special topics of analytic interest (e.g., how 147 to draw a poverty line or conduct incidenceanalysis). The balanceamong the four and the level of sophistication of each will depend on the prevailinganalytic skills in the target audience(s). The courses might be offered to staff in the planning and statistics agencies, staff in line ministries, or universityresearchers. Such seminarscannotreplace sound universityand post-graduatetraining in the social sciences, but they can help polish skills that may have been neglected due to the lack of an opportunityto apply them. * * * * * * Ensure that key offices in government have adequate hardware and softwareto conduct data analysis. Provide technical assistance to planning and evaluation offices in key governmentagencies. Sponsor a peer-reviewed working paper series on quantitative policy analysis. Put the data sets into the data banks of principal universitiesand/or make them available through the Internet. Contact the professors of quantitativemethodscourses in universitiesand encourage them to use the data sets in their classroom exercises. Advertise the availability of the abstract and of the data in places where local graduate students studying overseas will find out about them, perhaps through alumni newslettersor mailinglists held by the principal sources of overseas scholarshipfunds. Provide graduate scholarships in local universities for students who concentrate on quantitative policy analysis. Require them to work in appropriate governmentoffices during or after their studies. Translate the questionnairesand documentation into English or other internationallanguagesso that researchers from around the world can have effective access to the data. * * B. Documentationand Disseminationof Data Sets If more than one agency (or indeed, if more than one person) is intended and to use the data, a system of documentation disseminationis required. Good support for data disseminationincludes five things: (1) an open data access policy; (2) good basic documentation;(3) organized files of the unit record data files, (also known as "raw data"); (4) a filing system that guaranteesthat the data and importantrecords will be permanentlyavailable; and (5) clear assignmentof 148 service standards and responsibilities for handling data documentation and dissemination. Data Use Policy In simple terms, data users should have early, unrestrictedaccess to the 6 survey unit record data files.5 This shouldbe formallyrecorded in a document that explicitly states the policy and is signed by someone in authority - at least the head of the statisticalagencyand possibly the ministerof planningor finance. An example of such a documentis provided in Box 7.2. Obviousthoughit may be that collectingdata is only useful if the data are used and that increasingthe number of users meansmore use, statisticalagencies may still be reluctant to distributethe unit record data. Somecommonarguments and their counter-argumentsfollow: QUALaY CONrROL. argument may be that the data need to be edited The at the central level to ensure quality. As explainedin Chapter 6, if the data entry program is prepared carefully and data entry is properly integrated with field operations, no further editing is needed or desirable at the central level. If the statistical agency wishes to add constructed variables (such as income and consumptionaggregates) to the public use files, these will take time to prepare. The agency should set a performancestandard of providing these supplemental files within a reasonable time, such as six months, of the end of field work, and the project budget should ensure that the resources to actually do the work in a timely fashion are made available. SENsTTIVr1Y. Sometimesthe data reveal facts about the living standardsin a country that the governmentmay be reluctant to publish widely. The political repercussionscan indeed be sensitivein some settings, but before data access is restricted, some factors should be considered. First, for those households suffering deprivations, the survey's showingthat deprivation exists will not be surprising. Second, good data will make it difficult for people to exaggeratethe extent of deprivation and, indeed, will sometimes reveal it to be less than popularly thought. Third, analysisof the survey data may help the government make better policy decisions about how to minimize poverty. Fourth, the 65. In many surveys assisted by the World Bank, the data access policies have been more restrictive. The data are usuallyownedby the host countryand the World Bankhas the unlimited right to use the data for internal purposes. The World Bank may sometimesrelease the data to third-partyusers only with written authorizationfrom the host government. In some cases this permission is freely and promptly given, while in others it may be a slow process to get permissionand in still others permissionfor distributionto third parties is only rarely granted. In the last few years, data access policies are becoming noticeablymore open. A widespread consensus now exists that the World Bank should not assist with surveys unless the unit record data will be widely availableto many users, especiallyto local governmentagencies, but also includingnational and internationalacademicsand internationaldevelopmentinstitutions. 149 Box 7.2: PrototypeData Access Policy A data access agreementshould contain the points shown here in boldfacetype. The policy should be publicly known; one of the best ways to accomplish this is to include the agreement in the abstract published from the data. Intendedusers. The data from the .... [Country X].... Living Standards Survey are intended for use by all researchers in government agencies, universities, and private research institutes, international development organizations, and other similar institutions. Researchers are requested to give due recognition of the source of the data in all publicationsand to provide copies of all publications arising from the analysis of the data to the libraries at the ..... [statistical agency, planning agency, international agencies, and university library]. Proceduresfor obtaining data. Requests for access to the data should be accompanied by a one to two- page outline of the proposed analysis. This will be kept on file so that other interested researchers may contact the analyst about the outcome of the work. Requestsshould be submitted to .... [name, title, address and phone number, fax number, and internet ID]. standards. The .... [agency implementingsurvey].... will normally have Performance the data available for public use not later than six months after field work has been completed. The request for data sets and basic documentation will normally be processed within three weeks of receipt. [A small fee consonant with the costs of staff time and supplies used in copying data and documents may be charged.] The data sets will be made available in ASCII and .... [any other].... format. government may be able to broaden the social policy dialogue in a way that is politicallypositive. Analysisof the surveymay help all parties to understandthe constraints on action and thus aid in forming reasonable expectations and a consensuson policies to reduce poverty. INSTTUTE. Sometimes the BY ANALYSIS WILLBE PROVIDED THESTATISTICAL data collection agency suggests that it will be able to provide the users with the tables they need, as opposed to the unit record data that would allow them to conduct the analyses themselves. It may be a desirableadditional service if the statisticalagencywill perform analysesfor users not able to do so themselves,but it does not replace disseminationof the complete data sets. Modem statistical modellingrequires a continuousinteractionbetween the analyst and the data; it cannot be reduced to the scrutinyof a set of predeterminedtabulationsproduced by intermediaries. This is especiallytrue for surveysand analysesas complexas the LSMS. TY. CONFIDENAL4L It is sometimesargued that releasing the unit record data violates the confidentiality of the responses made by each household. However, since the unit record data files need not contain any identificationof 150 the households other than a numeric identificationcode used to match sub-files to one another, this argument carries little weight.' Sometimesthe reasons above are given as pretexts to hide a simple and painful truth: statisticalagencies may not have the data sets and documentation sufficientlywell organizedto feel confidentabout offering public accessto them. The solutionto this problem is to ensure that buildingthis capacity is includedin the design of the project. Basic Documentation 6 The importance of adequate documentationcan hardly be over-stated.7 No reliance should be placed on individuals' memories or personal files. Not only are these haphazard, but it is inevitable that with time the staff will move into other positionsor out of the institutionaltogether. The LSMS divisionof the World Bank, through its own experience and in its dealings with statistical agencies around the world, has seen crucial documents and even whole years' worth of data lost. Each data user should easily be able to obtain three documents: the questionnaire, a summary basic informationdocument, and the abstract. These should be sufficientto fill two needs: to allow the potentialanalyst to determine whether the data suits his or her needs, and to providethose who do decide to use the data to make full, appropriate, and easy use of them. The user should also be able to obtain, as needed, other documents such as the sampling plans and manuals for field staff. The summarybasic informationdocument shouldinclude the following: QUESTiONNAiRE. synopsisof the questionnairemay be included. It can A be very short, sincethe questionnairesthemselvesshouldbe made availableto all users. SAMPLE. A concise but completedescriptionof the sampledesign and its implementation shouldbe given. The descriptionof the design shouldinclude the sample size and cluster size, the number of strata used and any implicit stratification,the number of stages used in the sample, the number of sampling units at each stage, and the probabilityof selectionat each stage. The description of the implementationshould cover any deviations from the original design, especiallythose involvingthe level and sources of non-responseand the number 66. Sometimes, and especially for panel surveys, records that include the names or addresses of the households may be kept. There is no need for these to be given to data analysts. 67. Even if there are restrictions on access to the data, the documentation is important and should be completed for qualified users. 151 of replacement householdsselected. The data files themselvesmust contain the samplingvariables, especially the samplingweights (raising factors) that should be applied to the raw data to obtain an unbiasedestimate of the populationmeans, and strata and cluster codes that should be used in adjustingsamplingerrors for sample design features. FIELDWORK. This section shoulddescribe the basic field proceduresand quality control techniques. Anything (good or bad) that would influence the interpretationor credibility of the data set as a whole or of particular variables should be explained. GUIDELINES USINGTHEDATA. Full information on how to link the FOR various parts of the survey must be clearly stated. The linkages may be between different parts of the household questionnaire;between the household, price, community, and facility questionnaires;between years of the panel; or between survey data and other external data sets. The codes for any items not pre-coded in the questionnaire should be made available in the text or appendicesof the document. For items with very detailed coding, such as industrialor occupation codes, it is useful to includethe codes at the one or two-digitlevel of aggregation and to steer the reader to the full code books. Any problems encountered in the data and the solutionstaken should be specified. Some illustrationsare useful here. Occasionally particular subset of a the data is deemed too flawed to use. It is importantto state clearly why the raw data are not availableor to inform the user of precautionsto be observedin using those variables. For example, the anthropometricdata from the 1988 Jamaica and LSMSsurvey had some observationsrecorded in Englishmeasurements some in metric and it was impossible to distinguish with surety which was which. Sometimesflaws have been fixed, for example, sometimesthe identityor location codes of certain questionnaireshave been corrected ex post facto, or occasionally responses to "other (specify)"responses have been coded. DATA SETS. Often the survey institute DOCUMENTATION CONSTRUCTED OF will make availablesome constructedvariablesin the public use files. The most common and useful of these are price indices, aggregates of household consumptionor income, as well as adjustmentsto these to account for variation in prices over time and space. Z scores for anthropometric variables are sometimesalso provided. Each of the constructedvariables should have a clear explanationof how it was constructedso that the user can determine whether it suits his or her specific analytic purpose and how to interpret it. It is desirable but not essential to also include (in the appendices or electronic files) the programs that were used to constructthe variablesin question. Constructeddata files shouldbe distributedin additionto, not insteadof, the raw data used in their construction. 152 DESCRIPTON FLES. The contents and names of the data files should OF be mapped to the correspondingsections in the questionnaireand the system of variable names and labels sketched. It is helpful to include the size of the files as well. REFERENCES TO OTHER DOCUMENTS. At minimum a list of ancillary documents shouldbe included. Referencesto other analysesdone with the same or similar data sets may also be useful, thoughthese may be difficultto organize. In additionto the summary documentation,the documentsfrom whichthe summary draws should be made available to users who desire more detail on certain aspects of the survey. Most importantamong these documentswill be the documents on sampling, the full code books, and the manuals for supervisors, interviewers, anthropometrists,and data entry operators. It is useful if the documentationcan be produced in both the national languageand a languageused by those in developmentagenciesand international academia. Obviously, producing these documents in the local language will encourage local use and is natural if the survey agency is producing them. Translatingthem into Englishor another internationallanguagecan be a low-cost way for the project to encourage analysisby other researchers. Unit Record Data Files The unit record data should be made availablein some reasonably userfriendly way. This normally consists of a series of files for each sectionof the questionnairethat contain all the records for each household (e.g., the thematic files explained in Chapter 6).U The files should always be made available in ASCII; it may also be easy and useful to distribute them already translated into formats for the most commonly used statistical packages. The location of the variables in the raw data files should be clearly documented, ideally in a computer-readableform. If the statisticalinstitute has taken the trouble to give detailed descriptive labels to variable names and codes, it will be performing a useful serviceby makingthese availablein formatsuseful in the various statistical softwares. 68. A single file is usually Dot convenient,given the hierarchicalnature of the data structure. It may also be too large for some users. Though those adept with computerscan usuallydigest a single file, many local analysts have computersand softwarethat are somewhatless than state of the art. They often do not have manuals (or cannot read them fluently because they are availableonly in English)and may be relatively inexperienced handling data films. in 153 Filing System It is important to store the various paper and electronic documents appropriately. Though it cannot be prescribed how exactly to do this, certain aspects are required for safe storage. INVENTORY CONTROL. is essentialthat master copies of all the important It files be kept in a separate archive and be used only to make new copies. In this way there is no danger of the final copy being given away or damaged. It is convenientto have multiple spare copies of the most commonlyused documents available for immediatedissemination. SECURnrY. ensure that the data are not lost or corrupted, the agency To in charge of the data should maintain a master and a backup copy of the data files. The masterdata set shouldinclude all the necessaryfiles, but no redundant 69 or outdatedfiles. The backup copy of the data set shouldonly contain the files in the master data file. Only those who are responsible for them should have access to the master files. This is most easily accomplishedby storing the files with a password required for write access. Reasonablesafeguardsagainstloss of the backup shouldbe taken. This might mean puttingcopies into a fire-proof safe or depositing backups in a separate building (such as a regional office of the statistical agency, the records center of the planning agency, or a university library). Both the masterfile and backup copies shouldoccasionallybe re-written to reduce the chance of media failure. More than oneperson shouldbe familiarwith the INSTflVTIONAL MEMORY. filing systemand passwordsso that if the primaryperson responsibleis absent on holiday or sick leave or quits, the documentationis not lost. for Setting Service StandardsAssignmentof Responsibilities Data Documentation and Dissemination It is important to thinkthrough the tasks that will be required to document a data set and to support its disseminationfor severalyears after the survey. For institutesthat have traditionallyfocusedonly on publishingstandardabstracts,this may require some innovativethinking. First, the agency will have to think about what product or service it will offer and how. For example, it may determine that it will offer to individual researchers paper copies of descriptive documents and electronic data files in 69. Often a particular data set will go through many versions before it has been correctly compiled. Over the life of the data files, there may be problemswith recordingall the changes, but confusioncan be reducedby keepingonly the correct files in the master data set. To ensure that the history of all changesis not lost, the person in charge of archivingthe data files mayalso want to keep copies of all versions; these should be kept separatefrom the master files. 154 ASCII. This will require that it set aside enough staff time and develop procedures agile enough to handle the anticipated number of requests within a reasonable response time. The agencycould instead or additionallyplace all the informationon the Internet or in public use accounts at universities. This would take slightly more preparation time initially, but if many users have access to those services, lower the number of individual requests for the data that the agency would have to cope with. Then, responsibilitiesfor carrying out the different functions for data documentationand disseminationmust be clearly assigned. Otherwise there is a tendency for things to fall through the cracks. Writing the Basic Information document, for example, will require input from several people - especially the sampler, the data manager and the analysts, but none might think it principally their duty. Likewise, the archival function has a tendencyto be split between a secretary for the paper records and the data manager for the electronic records, a recipe that leads easily to gaps. And who must do each part of servicing a request for data must be specified- who, if anyone, need grant permission, who assembled the informationto send, who keeps whatever files are required. It would appear that the lack of clearly assigned responsibilitiesfor data disseminationmay impededata access as muchas poor policies. In the short run, it should be the survey manager who is responsible for organizing the documentationand beginningthe datadisseminationfunction. In the mediumand longer run, that individual may be assigned to other activities, so a more permanent assignment of responsibilitiesshould be made. Note that it is not a good idea to set up a systemwhere someoneat the top of the statisticalagency's has to respond to each request for data. These people are too busy, and if they must act on each individual request, potential analysts will likely get very bad service. Rather authority should be placed closer to the working level. C. The Abstract The abstract is not only the first productof an LSMS survey, it is usually the most widely read. This section discusses its contents, its format, and the process required to produce it. CONTENT. The abstract shouldpresent a carefully selected set of tables. The tables should include the basic descriptionof the different facets of living standards. For example, employmentstatus, housing conditions, literacy and enrollment, nutritional status, incidence of ill health and use of health care services, and availabilityof basic infrastructuresuch as transportation,water, and electricity, shouldbe included. Abstracts should present the frequencies or means of the select set of indicators of living standards. They shouldalso tabulate these for selectedsocioeconomic groups. For example, they might present literacy or employmentrates 155 by rural/urban area, by gender, and by age. The tabulations should show the contrasts that are most importantto the country and topic at hand. For example, gender or regional differencesin school enrollment may be very large in some countries and very small in others. Where they are small, it is less necessary to report them. More ambitious abstracts will also present cross-tabulationsby welfare 70 groups such as quintile of consumption. The most ambitious abstracts will 7 include tabulations by categories of poor or non-poor. ' If poverty lines are used in defining socioeconomic groups, the sametabulationsshouldbe presented by quintile or decile.7 The quintiles or deciles present more informationabout the full distributionof welfare and their definition is less controversialthan that of a poverty line. The calculationof welfare measuresand poverty lines requires a high level of programming complexity and analytic decision-making. If available, such tables are always interestingand useful additionsto the abstract. But if their calculation will greatly slow down the abstract's production or generate too much controversyover methods, it may be best to leave them out of the basic abstract and produce separate, later reports on poverty. Many abstracts err on the side of including too many tables, a large number of which are of little analytic interest. At best, an overly thick volume may make it difficult for the user to find the items of interest to him or her and thus discourage use of the whole abstract. Worse, the mechanicaltabulationof many variables by many others often results in tables with so few observations in each cell that conclusions are likely to be misleading. This is often not apparent given the format of the tables. Annex VIII shows the table of contents for the statisticalabstract of the 1991 Pakistan IntegratedHouseholdSurvey. It is an excellentbasic abstract. It presents information broken down by one or two factors for each variable region, rural/urban, age, sex, and educationas appropriate. AnnexIX gives the table of contents from the 1993Jamaica Surveyof Living Conditions. It is more ambitious and it presents most tables disaggregatedby quintile as well as by rural/urban area, parish, and, where appropriate, age and sex. It presents some longitudinal tables. It also produces many more tables, with the list having grown over the years in response to readers' commentson the abstracts from the five previous surveys. Thus, while the number of tables is higher than often recommended,they are analyticallyuseful rather than merely mechanical. 70. Annex X provides guidance on how to calculate these. 71. Ravallion (1992) provides recommendations on how to draw a poverty line. 72. These divide the population into five or 10 equal-sized groups on the basis of a welfare indicator. For analysis of LSMS data, this is most often per-capita household consumption. 156 It is useful to include a basic descriptionof the survey in additionto the tables. This should include survey content, the sampleplan and its implementation, and the field work techniquesused. Sometimesthe full questionnaireand basic information documentare bound right into the abstract as appendices. At minimum,reference shouldbe made to where to obtain them. The policy on data access might be stated in the abstract. It is often useful to include in the abstract some data from sources other than the survey. Comparisonsof indicatorsobtainedfrom other sourcesmay be of interest -for example, comparing the age structure to that of the census or the mean per capita consumptionfigure from the survey to that from national accounts. When such comparisonsaccord well, it lends confidence to other analysis. When there are large discrepancies, it is important to note which technical factors might explain the differences. It may also be useful to add background information. For example, in a table on the incidence of a government program, it might be useful to note the cost of the program. This enhances the policy impact of the abstract, but usually requires an inter-agency team to help write since the statisticalagency cannot be expected to be wellversed on many sectoral programs. In addition to the basic abstract suitable for use by policymakersand data analysts, it can be useful to produce information in some other formats. In Nicaragua, for example, the statisticalinstituteproduced a very short abstract in cartoon form for distributionin primary schools (see Figure 7.1). FORMAT.Producing a well-formattedtable is an art. It is often learned by first producing confusing or cumbersometables and then gradually refining them. Studyingexisting abstractsto see which best conveysinformationis useful (see Box 7.3 for an example). Here we give some principles that help produce clear tables. * Row and column headers should be clear. They should use normal languagerather than computervariablenames. If they contain scales, the high and low ends shouldbe indicated. For example, rather than labeling quintiles as 1,2,3,4, and 5, they might be labeled as l(poorest), 2, 3, 4, 5 (richest). Detailed notes at the bottom of the table will often be required to supplementthe short labels possibleon row and column headers. These notes should provide completedefinitionsof the concepts covered. For example, a table might be titled "EmploymentStatusby Age Group" and have columnslabeled employed, unemployed,out of the labor force, and so forth. In that case the definition of employment used should be included at the bottom of the table, for example, "worked one hour or more for pay during the week preceding the interview." Many of these definitionsvary slightlyfrom surveyto survey, so it is importantto be clear. 157 * Figure 7.1: Illustrationof an Abstractfor PrimarySchools FORMAs DX ELIMI-ACIcON Y DISP051CIoN DE LA BASuRA 1A TIRAN (9 % LA QUa1MAN 40% 0 o 4 1 %/ Source. Nicaragua Instituto Nacional de Estadisticas y Censos (1994) 158 Box 7.3: TheDifferenceBetweena Goodand Bad Table The difference between good and bad table formats can be seen in Tables A, B, and C in this box. All are on the same topic from the Jamaica Survey of Living Conditions. The first two were included in the 1988 abstract, the latter from the 1992 abstract. (The 1988tables were formatted by one of the authors of this volume, so it is fair game to criticize them here. The 1992 table was produced by the joint efforts of the Statistical and Planning Institutes of Jamaica.) The 1992 table is much better than the 1988 tables. Note that by transposing the rows and columns, the later table can combine the two earlier tables into one single table, include what would have been two more tables, and still be easier to read. The 1992 table also has much better labeling. The title makes it clear that the group included is only those ill or injured who sought health care during the four-week reference period. The column headers make it explicit whether the care was sought in the private or public sector. To interpret the 1988 table, the reader had to know that doctor's office meant private sector and health centers meant public centers, and the distinction was not made at all for hospitals. The 1988 table neither indicated that the 1,2,3,4,5 in the column headers refer to quintiles of per capita household consumption nor which was wealthy and which was poor. The 1992 table does both. The later table also gives the number of observations in each row. From the percentages in the table, it is possible to calculate the number of observations in each cell. The later table could make explicit that the percentages in each three-column grouping sum to a hundred, but since the number of columns in each grouping is small it is fairly clear to the reader that they do, even though that feature has been omitted. ExAwPLEs OFaED TABLEs Table A: Place of Consultation by Consumotion Level Place of Consultation Hospital Health Center Doctor's Office Pharmacy Provider's Home Patient's Home Other 1 22.0 40.4 37.6 0.0 0.0 0.0 0.0 2 34.8 16.5 46.3 0.0 0.8 0.0 0.8 3 25.6 19.4 52.7 2.3 0.0 0.0 0.8 4 31.2 12.8 52.5 0.7 0.0 1.4 0.0 5 14.1 8.5 66.9 1.7 0.6 2.9 1.4 Jamaica 25.0 18.1 52.9 2.1 0.3 1.0 0.6 Table B: Place of Consultationby Area Place of Consultation Kingston Metropolitan Area 32.2 16.8 46.5 1.0 0.0 2.5 1.0 Other Towns 33.8 10.3 52.9 1.5 0.0 1.5 0.0 Rural 19.9 20.1 56.1 2.7 0.5 0.3 0.0 Jamaica 25.0 18.1 52.9 2.1 0.3 1.0 0.6 Hospital Health Center Doctor's Office Pharmacy Provider's Home Patient's Home Other Box 7.3 continued on next page 159 Box 7.3 (continued) EYAMPLEOFA GOOD TAKE Table C. Use of Public/PrivateSectorBy Ill/InjuredPersons for Medical Care, Purchase of Medicationsand Hospitalization During the Four Week ReferencePeriodBy Area, Quintile, Sex, and Age Source of care Percentage those of seeking mdical care Percentage purchasing mnedcations Public Private Sector Sector Both Percentage hospitalization (of those seeking medical care) Public Private Sector Sector Both aassfi)cation Area Public Private Sector Sector Both KMA(N=321) Othertowns(N=345) Ruralareas (N=1,159) Quintile 26.0 24.8 27.4 46.3 41.8 28.8 27.1 12.3 62.6 10.4 68.6 6.6 63.7 8.9 48.8 48.4 65.9 65.4 78.1 4.9 9.8 5.4 7.5 9.6 9.9 6.7 10.4 6.8 9.8 13.9 14.5 9.4 7.0 4.0 9.0 9.7 8.6 11.3 14.8 3.9 12.6 4.8 4.2 11.8 8.9 66.8 71.0 52.1 45.1 45.4 57.4 60.3 73.7 60.6 59.4 57.8 59.4 59.5 63.5 61.7 69.8 54.6 56.9 58.5 4.5 1.2 2.7 0.8 4.0 2.0 1.9 3.1 2.5 3.2 1.9 1.0 2.5 0.0 4.9 3.7 2.0 5.5 2.4 5.0 3.4 5.5 9.0 5.3 3.0 5.6 4.4 4.2 5.5 5.0 3.8 7.1 9.1 9.6 0.8 2.6 3.0 5.1 0.8 2.3 0.8 1.6 2.6 0.0 0.9 0.9 0.7 1.4 0.7 2.2 1.3 1.9 0.0 1.1 0.0 1.2 1.1 0.6 0.7 0.3 0.0 0.7 0.5 0.5 0.4 0.7 0.3 1.0 0.0 0.0 2.2 0.0 0.0 0.0 0.0 0.4 Poorest(N=353) 2 (N=335) 3 (N=378) 4 (N=381) 5 (N=378) Sex Male (N=834) Female(N=990) Age (years) 27.6 62.5 29.0 64.2 36.0 34.5 26.0 22.4 18.1 20.8 32.8 27.4 0-9 (N=488) 10-19(N=227) 20-29 (N= 132) 30-39 (N= 146) 40-49 (N= 146) 50-59 (N= 151) 60-64 (N= 101) 65+ (N=433) Jamaica(N =1,825) 55.4 8.6 63.2 2.3 63.7 10.3 69.6 8.0 71.4 10.5 70.0 9.2 58.8 8.4 64.7 8.0 8.1 28.5 63.4 * It must be clear what is tabulated. Is it the numberof occurrencesof each event? Is it a rate, percentage, or mean? If it is a rate or percentage, what is the divisor? If it is in monetaryunits, what is the currency, and what period or region's prices apply? Are the percentages based on the totals of the rows or of the columns? This can be made clear by including a row or column with the total percent. 160 * * It should be clear whether the table pertains to the whole sample or to a subset of it. For example, in a table of mean receipt of remittances, it is critical to specifywhether the average is across all householdsregardless of whether they do or do not receive remittances or only across the households that do receive remittances. Groupings should correspond to what is meaningfulor common for the country and topic. For example, in the education module, enrollment or attendance rates should be shown for the sub-groups of children appropriate for different levels of schooling. This might mean primary, middle, and secondary. Some systems, however, do not differentiate between middle and secondary, so the exact age cut-offs will differ slightly among countries. Good tables indicate not only the percentagesor means for each cell, but also the number of observations(N) in each cell. In some cases this can be done neatly within the cell itself. In other cases, it may produce a cluttered-looking table. Sometimesputting N on the row and column headers, or in separate rows and columns, will be neater and allow the interested analyst to calculate N for each cell. It is ideal, though admittedlyrarely done, to present standard errors or confidence intervals or tests of significantdifferences between different cells. Even if not done for every table, it would be desirable for at least a few key tables, such as the levels of consumption, poverty or groups (see Table 7.1 for an malnutritionamongdifferent socio-economic should not discuss differences unless they are large example). The text and statisticallysignificant. * * * Table 7.1: Sample Size, Mean, and StandardError of Estimae of Per Capita (SLC) 1992 Consumption, and 1993Janaica Surveyof LivingConditions SLC 92 Mean No. of Households Consumption Standard (1992 J$) Error (%) in Sample 1,001 841 2,643 4,485 22,653 18,032 13,889 16,998 3.6 3.0 2.2 2.0 SLC 93 Mean No. of Households ConsuMpion Standard (1993 J$) Error (%) in Sample 647 384 932 1,963 30,766 23,523 18,517 23,408 4.4 6.3 3.6 2.7 Area KMA Other Towns Rural Areas Jamaica Area. Note. KMA - KingstonMetropolitan Source: StatisticalInstituteof Jamaica(STATIN)and PlanningInstituteof Jamaica(PIOJ), 1995, Appendix 11.4,p. 126. 161 * To the extent practical, it can help the reader to understandthe tables if similar formats are used. For example, if a series of variables are being cross-tabulatedby area (capital city, other urban area, rural areas), agroclimaticzone (coast, mountains,jungle), and sex (male, female), it can be useful to have these three sub-blocksappear in the same order down the rows of each table and have the columns always be specific to the new variable (or vice versa). Graphs may be used to enhance the presentation. Where they are used, it is important to ensure that the actual numbers behind them and their definitions are maintained. Sometimes this can be accomplished by thoroughlylabelingthe graph. More oftenthe full table mustbe produced along with the graph. Tufte (1983) is a useful reference on how to use graphics effectively. It is not essential that much text be used to describe each table. Often such text is repetitive and boring, and writing it can slow the production of the abstract. PROCESS.As should be clear from the above, producing an abstract is not * * a strictly mechanicalexercise. The most obvious requirementsinvolve the computer programming. A good quality personal computer will be sufficient. For example, at the time of this writing, a 486DX with 8 megabytes of memory and 50 megaHertz speed would be considereddesirable, but the earlier abstracts were written using much less sophisticatedmachines. There are several common statisticalpackages that will do the job, although some are slightly better at one or another aspect than others. An important selection criterion should be which is most commonly known in the country in question, since computer programmingskills are often relatively scarce and jobs requiring them subject to high rates of turnover. A proficient programmer may be able to turn out the tables for an abstract in four to six weeks. More time will be required if the programmer is still learning the particular software package, or how to manipulatelarge and complex data sets efficiently. More time will also be required if it is not clear to analysts exactly how tables should be set up (which is often the case for the first abstract from a novel data set). The production of the abstract requires significant analytic input. It obviously requires knowing which issues are most important. It also requires details. For example, to a person outsidethe health knowingmany sector-specific sector it might seem natural to produce a table of vaccination rates for young children including all children under the age of 60 months. But health experts would not prepare the table this way becausevaccinationsshould be spread over the first several months of a child's life. So health analysts generally look at rates for children from 11 (or 12) to 60 months. It is these rates that can be 162 compared to a standard of 100 percent full coverage or the 80 percent goal for the decade. Thus it is important to have input from a range of sector specialists in writing the abstract. The consultationsas to what should be in an abstract, how to clean the data (see Box 7.4), what definitionsshouldbe used, and what interpretationscan be drawn mirrors the process of questionnairedesign. An initial list of tables, or set of actual tables, can be completedby a small team based on discussions held during questionnairedesignand the review of relevantabstractsfrom similar surveys. Then that draft should be passed around to individualsknowledgeable about different topics. They shouldcritique the draft with regard to i) whether some tables should be added or deleted; ii) whether the definitions, groupings, etc. are appropriate;and iii) whetherthe presentationis clear to them as potential users. When the survey plan includes a full year of field work, it can be very useful to produce a preliminaryabstract, based on the first six monthsof data, to be distributedto a limited number of experts. This allows ample time for the programmers to hone their skills and for full consultationamong many potential users. Then the final abstractcan be producedfrom the full data set quickly after the remaining field work is completed. D. Examples of Further Analysis The LSMS data sets allow exceptionallyrich work beyond what might be included in an abstract. This sectiongives some examplesof the kinds of uses to which data from surveys like the LSMS have been put. It is written as a "sample book" of common analyses based on data produced by LSMS surveys. The goal is neither to explain how to do the analysis nor to provide a connected discourse on poverty and householdbehavior. Rather, the examples are shown to spark the creativity of the survey planners in setting the agenda of analysis in each country. This section sketchesonly a few of the major issues that can be addressed with LSMS data. A single example from a sector is presented here as an illustration, even when parallel questionsin other sectors could be addressed as well. The emphasis is put on analysis with the most immediate policy implications. Much more analysis that could contribute to basic understanding of householdsis not covered here. Thus, the examples shown are by no means an exhaustive catalogue, either of potential or existing analyses. An excellent outline of analytic work to understandthe effects of structural adjustmentusing data such as that producedby LSMSsurveysis providedin Demery, Ferroni, and Grootaert (1993). Deaton (forthcoming)providessomethingof a textbookon the statistical issues in the policy analysisof householddata for selected issues. 163 Box 7.4: Data Caeaning DuringAnalysis In doing the analysis for the abstract, the statistical institute is converted from its role of data producer to that of data analyst. In this guise, it will confront the problem of data cleaning at a more complex level than done for the minimum dissemination function. In particular it will face the problems of missing observations and outliers. A missing value occurs when information that should have been filled in was not. For example, when a person who reports being ill and going to a doctor does not report how much the visit cost, how long he had to wait, etc. An outlier can be an extreme but correct value (Mr. Rockefeller spending$100 a week on caviar) or it may be so unlikely that it is almost certainly a mistake (a poor subsistence farmer spending $100 a week on caviar). Occasionally other information from the household (that it is Mr. Rockefeller's) can help to determine that an extreme value is plausible. In many cases, however, it is very difficult to tell whether a value is correct or not. Even when outliers represent correct information, but unusual cases, they will have a large effect on statistics from the survey, raising the means and standard deviations noticeably. Virtually every analyst will want to give some consideration about what to do about missing observations and outliers. Their decisionswill, however, vary according to the analytic question being addressed, the statistic being employed, and the number of "problem cases" in the data. Let us consider three examples of problems and the common solutions for them. Cross-tabulations of Qualitative Data. Consider a cross-tabulation of sex by primary school enrollment. Each variable in the table has two possible answers - male/female and yes/no. If all answers are present and in the correct range, the table will be have two columns and two rows. If the data quality mechanismswere not fully employed, there may be some cases where the information is missing or an answer other than I or 2 was recorded. Then the table produced would have spare rows or columns for missing values and invalid answers. Since these are nonsensical and do not add to understanding the issue being analyzed, the observations concerned are normally dropped from the analysis. The analyst should, however, note how often they occur. Invalid or missing answers in one or two percent of cases are not too disturbing. If they were to occur in 10 or 20 percent of cases, it would be a sign that something is seriously awry in the data set. Minor Omissionsin Aggregate Variables. Suppose the analyst is to compute total household expenditures and the problem is that a few households do not report the amount spent on kitchen matches. Some analysts would ignore the problem altogether, since kitchen matches would constitute an infinitesimal share of total household expenditures and cause little distortion in comparisons between households that reported expenditures on kitchen matches and those that did not. Other analysts might omit the households from the aggregate, and since there were few of them would not worry unduly about any bias or loss in degrees of freedom that might result. Box continuedon nextpage 164 Box 7.4 (continued) Larger Omissions in Aggregate Variables. If the problem is that a few households did not report the value of expenditures on a staple food like rice or tortillas, then some solution must be found, since the item is likely to account for several percent of total household expenditure. Some analysts would drop those households from the data set. Other analysts would impute a value of expenditure for "similar" households. This might be done by using the average value observed for the other households in the same cluster. Or it might be the average value for households in the same region (perhapsrural area of the coastal zone) and same household size and same economic status (perhaps quintile of expenditure, where the expenditure variable includes all items save the staple in question). In either case the decisions will affect the analysis - the "corrected" data become more homogenous than the original data, the variances are lower, and the researcher has replaced some data with assumptions. Treatment of problem data is an area of great controversy where reasonable professionals differ, often heatedly. It is therefore difficult to give firm guidelines on which solution to adopt. All analysts will, however, agree on four principles: * Strict use of all quality control procedures in data managementand data entry to minimize the problem as much as possible. * Rigorous explanation of what procedures were used. This would include the number of cases treated, the decision rule used to determine that a case was a problem, and what was done. If the treatment consisted of imputations, then the full formula used in making the imputationsshould be given. * Provision of the original raw data to all users (possibly in addition to "cleaned" data), so that the users can use other cleaning procedures if they prefer or if they deem the documentationof the cleaning procedures inadequate. * Use of statistics that are relatively insensitive to outliers, where outliers are seriously problematic, e.g., the use of medians insteadof means or of inter-quartile ranges instead of variances. The Study of Poverny POVERTYPROFEE. Poverty profiles show severaldimensionsof poverty. These include who is poor, where they are, how they earn their living, their access to and use of government services and subsidies, their living standards with regards to health, education, nutrition, and so forth. To cover the multiple dimensions of poverty, much of the information in a good abstract is used. Presented is part of a singletable from the Ecuador Poverty Report, World Bank, 1995a(see Table 7.2). Someof the findingsinclude: * The education level of the head of householdis very strongly associated with the level of poverty. The average poor householdhead in both urban 165 Table 7.2: Some Characteristics the Poor in Ecuador, 1994 of Urban Poor Education Education of household head (years) National Costa Sierra Oriente 5.2 4.9 5.8 5.9 24.8 27.3 19.7 26.3 54.6 54.6 56.3 54.9 15.5 11.8 22.1 8.7 57.3 43.5 78.9 62.9 97.8 97.9 97.7 93.6 61.2 48.9 79.9 85.3 59.7 52.2 70.5 59.9 Non-Poor 9.1 8.3 10.5 8.8 14.8 19.0 9.6 10.7 44.1 44.1 41.3 40.8 35.3 31.1 41.3 40.0 83.4 74.4 95.6 87.9 99.5 99.4 99.7 96.5 78.8 67.1 94.5 92.5 76.7 68.9 87.7 84.9 Poor 3.2 2.8 3.4 4.5 32.7 45.3 21.4 20.1 27.9 19.6 35.1 25.7 3.4 1.1 5.4 6.4 12.4 11.7 13.5 7.0 62.0 55.5 69.8 36.3 18.3 6.1 27.9 12.1 1.1 1.3 0.9 1.8 Rural Non-Poor 4.7 3.9 5.1 7.4 24.1 33.7 19.4 14.4 35.8 24.8 42.6 41.1 9.9 3.1 12.6 26.8 28.2 17.0 35.4 31.1 75.8 63.3 84.3 74.4 23.0 9.1 34.0 23.2 5.6 6.8 3.9 21.5 Total Poor Non-Poor 4.0 3.9 4.1 4.6 29.4 36.4 20.8 20.4 39.2 37.6 42.3 27.3 8.6 6.6 11.1 6.5 29.6 27.3 33.5 10.8 75.8 76.4 78.4 40.1 34.8 27.2 43.8 17.0 23.5 26.6 22.2 5.7 7.5 7.1 8.0 7.8 18.0 22.6 13.7 13.2 41.7 41.6 41.9 40.9 26.7 24.4 29.2 31.0 63.8 58.9 69.5 50.6 91.1 89.6 93.0 81.9 59.3 51.4 68.2 47.2 51.5 52.1 51.3 43.3 Health Diseases treated National informally Costa Sierra Oriente Employmeut Informal sector National Costa Sierra Oriente National Costa Sierra Oriente National Costa Sierra Oriente National Costa Sierra Oriente National Costa Sierra Oriente Regulated sector Basic Services Sewerage connection (%) Electricity supply (%) Water from public net (%) Waste collection National (%) Costa Sierra Oriente Source: World Bank (1995a), Table 2a and 2b. 166 and rural Ecuador has not completedprimary school, which lasts 6 years. In rural Ecuador, many of the poor households heads have barely completedthe basic cycle of primary school(3 years). Not surprisingly, while literacy at the national level now stands at about 90 percent, more than one-third of the extremely poor in the rural Sierra cannot read or write. In contrast, the average schoolingof the urban non-poorhousehold head is well into secondaryschool, and even beyond the basic secondary school cycle (9 years) in the Sierra. * A broad sectoral breakdown of the labor force reveals that informal activities play different roles for the urban and rural poor. The breakdown distinguishes between the informal, modern, public, and a narrowly defined farm sector. As expected, employmentshares in the farm sector are negatively correlated, and in the public and modem sectors positively correlated with per capita expenditures, but the more interestingfinding relates to the role of the informal sector. In the urban areas, the informalsector absorbsa higher share of the poor than the nonpoor labor force, especially women. About 65 percent of the occupied poor women work in the informal sector, which is their predominant source of entry into the labor market. In the rural sector, the opposite is the case - informal sector activityis higher for the non-poor than for the poor. Rural off-farm employment plays an important role in supplementing agriculturalincome, and for the poor it has a high potential to become a road out of poverty. Using a broad definition of off-farm employment that includes both primary and secondary occupations, it appears that as much as half of the non-poor of working age have some employmentin the off-farm sector. lThe link between poverty and basic services is not uniform but depends on area, region, and type of service. The rural non-poor are worse off than the urban poor in relationto water supply, hygienefacilities,garbage disposal, and electricity connection. However, services can have a different function in urban and rural areas, e.g., the threat from lack of hygiene facilities in rural areas is much lower than in the overcrowded urban centers, especiallyin the Costa, where the climate helps to breed diseases. Not all services distinguishthe living conditions of the poor from the non-poor. Electricity in urban Ecuador now reaches nearly every householc, independentof its status. In rural areas, however, there is a strong relationshipbetweenelectricityconnectionand poverty- most markedlyin the Sierra and the Oriente. Similarly,telephoneserviceis not a distinguishing factor for the urban population but is for the rural population. * DETERMiNANS OF HOUSEHOLDWELFAREIN COTED '1VOIRE, 1985. LSMS surveys are not only useful for measuring poverty, they can also be used to investigate the causes of poverty, which should provide useful information for 167 designing policies to reduce it. An exampleof such a study is that of Glewwe (1990, 1991), which investigated the determinants of household expenditures using the 1985 Cote d'Ivoire LSMS survey. Using multipleregression methods, he investigatedthe impact of educationlevels, householdassets, land owned, and 73 local infrastructureon per capita householdexpenditures. Separate regressions were run for urban and rural areas, some of the results of which are in Table 7.3. In urban areas education levels of both male and female householdmembers had a positive impact on householdexpenditures. Several types of household assets (value of home, if owned, value of household business assets, and value of savings in financial institutions)also had a strong positive effect on household welfare. Once all these factors were accounted for, regional differences in household expenditure levels (measured by dummy variables for each region) were insignificant. In rural areas of Cote d'Ivoire, the educationof household membershad very little effect on household expenditures, an anomaly which raises concern about the relevance of education for individuals employed in traditional occupations. As in urban areas, household assets were generally positively associated with welfare levels. Land ownership in rural areas also had a strong impact on household expenditure levels, and cocoa land appeared to have a substantiallystronger impact than did coffee land. Infrastructurehad substantial predictivepower in rural areas - householdslocated in villages that were nearer to both paved roads and public markets were relatively better off, as were householdsliving in areas with higher wage levels. These results have several policy implications. First, education of both men and women is an important determinant of household welfare in Cote d'Ivoire, especially in urban areas. Second, the result that education has very little relationship to household welfare in rural areas suggests that schools are performing poorly or that the skills learned have little relevanceto employment prospects in rural Cote d'Ivoire. The higher impact of cocoa land, relative to coffee land, on householdwelfare suggeststhat coffee cultivationshould not be subsidizedor encouragedin any way. Finally, the impact of roads and distances to the market suggests that infrastructureimprovementscould have high returns in rural areas. Understanding the Effects of the Economic Environment CHANGESIN PRODUCERPRICES. One of the common discussionsduring the mid-1980sin Cote d'Ivoire concerned what should be the pricing policy for 73. One technical note. Although it may be more intuitively appealing to classify households as poor and non-poor based on their household expenditure levels, and then estimate a probit or logit regressionof the determinantsof poverty, this estimationtechniqueignores a large amount of informationcontainedin the householdexpenditurevariable and so is a very inefficientestimation method. It is more informative use householdexpenditures to directly as the dependentvariable. 168 Table 7.3: Determinantsof HouseholdExpenditureLevels Urban Education Level of Most Educated Male Elementary Junior secondary Senior secondary University Education Level of Most Educated Femak Elementary Junior secondary Senior secondary University Value of Selected Household Assets Home Business assets Savings Hectares Agricultural Land Cocoa trees Coffee trees Distance to nearest Paved road Market Unskilled Wage (Males) Source: Glcwwe (1990). -0.0432 (-2.9) -0.0895 (-3.3) 0.3764 (6.4) 0.1721 (4.3) 0.0439 (1.3) 0.0644 (5.3) 0.0419 (3.3) 0.0815 (4.7) 0.1655 (4.9) 0.1130 (1.7) 0.2418 (3.1) 0.3451 (3.4) 0.5208 (4.1) 0.0740 (1.0) 0.2771 (2.2) 0.3760 (5.3) 0.6202 (8.6) 0.7957 (9.6) 0.9333 (9.4) 0.0406 (0.6) 0.0820 (0.9) 0.0561 (0.4) Rural coffee and cocoa. The producer prices were maintained well below the international prices. The taxes raised from the policy were a major source of government revenue. Table 7.4 shows analysis performed by Deaton and Benjamin (1988) helps to understand one dimension of the possible effects of changes in the coffee and cocoa pricing policy. The first row of the table shows that 14 percent of personsliving in farm householdsfall in the poorest decile and only 2.7 percent in the richest decile (based on per-capita household consumption). Thus farm households were poorer than average in 1985. The subset of farm households that produced cocoa and coffee were, on the other hand, slightly concentrated in the middle of the income distribution. The fifth row shows that cocoa sales were heavily concentratedin the sixth decile. Coffee sales were much less concentrated,but a larger share than proportional came from the middle deciles. This implies that if the cocoa or coffee prices were to have been increased, the gains would have been spread over the whole welfare 169 Table 7.4: COted'lvoire 1985 - DistributionalCharacteristicsof Coffee and Cocoa Farming Percentages in countrywidedeciles of population Poorest 1 Farm people Cocoa people Coffee people Land cultivated Cocoa sales Coffee sales 14.0 9.5 9.0 11.6 9.1 7.8 2 13.6 9.8 11.5 9.6 3.0 6.5 3 13.2 13.0 13.9 11.2 6.9 8.7 4 12.5 13.9 14.1 10.3 4.6 12.7 5 11.4 12.3 12.4 9.2 5.3 13.8 6 11.4 13.6 13.0 22.2 49.0 9.2 7 8.1 9.9 9.2 7.6 5.5 12.4 8 7.2 8.6 8.1 9.2 6.6 16.1 9 6.0 5.7 6.0 5.6 3.9 9.2 Richest Total 10 2.7 3.7 3.1 3.7 6.1 3.6 100 100 100 100 100 100 Average rank of people in agricultural households - 40th percentile Average rank of people in cocoa households - 45th percentile Average rank of people in coffee households - 43rd percentile Note: Each row adds up to 100 percent, so that, for example, the first row shows the distribution of people living in farm households across the deciles of population for the whole country, while the last shows the fraction of total coffee sales that accrue to people in each decile. Each person is accorded the household per capita total expenditure of the household to which he or she belongs, and each decile refers to 10 percent of people, not of households. Source: Deaton and Benjamin (1988), Table 11, page 38. distributionbut slightlyconcentratedin the middledeciles. The changein income would have been neither greatly pro-poor nor pro-rich. Of course, a more completeevaluationof the effectsof suchprice changes would have had to take account of changes in behavior induced by the changes in income. For example, farmers might have used inputs more extensivelyand would have spent their income in ways that would have effects throughout the economy. Perhaps most importantly, the revenue to government would have declined significantly, so some policy of reducing expenditures or of raising revenue from an alternate source would necessarilyhave had to accompanyan increase in coffee or cocoa prices. PRICES. It is similarly often important to CHANGESIN CONSUMER understandthe effect of consumerprice changeson householdwelfare. Important changes can come about as a result of reforms in tax, subsidy, or trade policies. In Tunisia, the consumer prices of several staple goods have been fixed by the governmentand heavily subsidizedfor many years. Since 1990, the government has been incrementallychangingthe level of subsidiesand the commoditiesthat are subsidized in an attempt to improve the effectivenessand reduce the fiscal costs of the subsidyprogram. Table 7.5 shows some analysisdone in the course of discussionsbetween the Governmentof Tunisia and the World Bank to try to determine what policy changes shouldbe adopted (see World Bank, 1995). The effect of various price changeson the caloric intake of expenditurequintileswas 170 Table 7.5: Thnisla - Estimated Nutritional Effects of Alternative Price Policies: Ependiture Quintks Poorest 1 Inpact of hypothetcal Price Changes: (1)50% subsidycut Percent change in calories as share of total caloric intake Resulting caloric intake (2) Targeted cut Percent change in calories as share of total caloric intake Resulting caloric intake 1993 levels (Kcal): Subsidized goods as a share of -30.1 1483 2 3 4 Rachest 5 Average -24.3 1688 -22.2 1813 -20.6 1975 -15.3 2549 -21.9 1902 -19.5 1708 2122 -20.9 1764 2230 -22.6 1803 2330 -22.6 1925 2487 -22.5 2332 3009 -21.7 1907 2435 58.9 49.4 47.4 42.4 28.4 45.3 total intake(1993) Notes: Scenario(1): Impactof cuttingsubsidiesby 50 percent from 1993 levels on quantitiesconsumed. Scenario (2): Impactof eliminatingsubsidieson specificgoodson quantitiesconsumed. (sterilizedmilk, grospain, bottled genericoil) A negativenumbersignals a loss in calorie intake. Estimations omit introductionof new goods since 1993. Recommended daily allowance:2165 caloriesper capita(INS). Source: World Bank (1995b),Tables 28 and 29. simulated. The simulations take into account the changes in consumption shares of specific foodstuffs due to price changes (e.g., total price elasticities) holding all other factors constant. The estimated effect of a hypothetical policy of reducing subsidies across the board by 50 percent was that caloric intake might fall by 30 percent among the poorest quintile. Targeted subsidy cuts on specific goods, however, are predicted to lead to a much smaller reduction in caloric intake, of about 19 percent for the poorest group, although simulations reveal that the subsidy cuts under both scenarios would generate comparable fiscal savings for the Tunisian government. Not surprisingly, the government has adopted a strategy that includes targeted subsidy changes. IN CHANGES THE WHOLEECONOMY. Household welfare is obviously affected by the health of the economy as a whole. In Peru, the economy went through considerable upheaval in the late 1980s. GDP per capita fell by about a quarter. The price index (base of 1980=100) rose from 3,474 in 1985 to 40,216,592 in 1990. Net international reserves plummeted. Data from LSMS surveys in Lima in 1985 and 1990 have been analyzed by Glewwe and Hall (1992) to show how household welfare changed during this period. 171 Table 7.6. Changesin Welfarein Lima 1985 to 1990 AllLimo 1985-86 Mean Expenditures SEX Al Lima 1990 Mean (Percent of Expenditures Population) 3,613.6 3,012.2 1,770.7 2,324.4 3,209.8 3,798.2 6,945.7 4,665.0 4,155.0 3,321.2 1,782.4 3,466.2 3,189.4 3,259.3 2,793.3 5,195.3 (85.4) (14.6) (3.5) 32.6) (44.1) (2.3) (12.9) (4.6) (14.6) 34.4) (0.7) (36.4) (2.1) (30.3) (34.7) (19.8) Percent Change in E 19en5itures 1985 since (Percent of Population) (86.6) (13.4) (2.8) (37.1) 35.4) (5.3) (15.5) (3.9) (19.1) (35.2) (1.3) (36.3) (3.7) 27.8) 37.3) (23.0) Male Female EDucAmON LEvEL 7,943.2 6,681.0 4,288.5 5,677.6 7,145.7 7,087.5 15,112.3 7,634.3 9,474.3 7,604.0 3,931.5 7,126.7 6,430.0 7,532.4 5,858.5 11,307.8 -54.5 -54.9 -58.7 -59.1 -55.1 -46.4 -54.0 -38.9 -56.1 -56.3 -54.7 -51.4 -50.4 -56.7 -52.3 -54.1 None Primary Secondary General Secondary Technical University Other Post-Secondary EMPLOYER OF HEAD Government Private Private Home Self-Employed COCCUPA OF HEW mON Agriculture Sales/Services Industry/Crafts White Collar UNEMPLOYED RE77.ED 8,098.5 7,495.9 7,774.4 (2.9) (4.9) 100 2,763.5 3,733.3 3,531.7 (5.1) (6.9) 100 -65.9 -50.2 -54.6 ALL Lamsi Notes: Population percentagesdo not add to 100due to missinginfonnationfor 0.3 percentof observationsin 1985-86and 1.8 percentin 1990. The mean cxpenditurelevel for those with secondarytechnicaltrainingbecomes6,252.4 for All Lima, 1990, if one outlyingvalue is included in the calculations. This would representa -11.8 percent changein expenditures since 1985. Source: Glewweand Hall (1992),Table 5, p. 21. Key findings of the analysis were that the welfare of the average householdin Lima fell by slightlyover half (see Table 7.6) and the welfareof the poorest dropped even more than the average. Poverty, defined as the inabilityto cover the household's basic nutritionalrequirements,increased from 0.5 percent to 17.3 percent of the population. Householdsheaded by persons with little or no education experiencedthe greatest loss of welfare. Households headed by women were not hurt worse than other households. Provision of Public Services Several aspects of the provision of public services can be studied with household survey data. 172 Table 7.7: Access to Infrastructurein Rural Viet Nam South Total Passable Road Public Transport Electricity/Generator Pipe-Borne Water Permanent Market Post Office Lower Secondary School Upper Secondary School Dispensary Pharmacy Clinic Doctor Physician Nurse Ag Extension Office Ag Ext Agent Visited Cooperative Adult Literacy Program Labor Exchange 58.0 61.2 91.6 7.5 71.5 46.8 82.9 10.6 55.6 78.3 92.2 50.9 100.0 94.4 18.4 72.1 8.7 81.9 93.0 Non-Poor 58.1 61.1 91.6 9.3 72.6 43.4 81.9 12.3 60.0 80.7 90.1 60.8 100.0 95.2 22.2 68.9 8.9 81.0 92.7 Poor 57.9 61.3 91.6 5.8 70.4 50.3 83.8 8.9 51.3 76.0 94.2 41.0 100.0 93.7 14.5 75.3 8.4 82.8 93.4 Total 76.8 47.2 85.6 3.6 43.5 27.7 90.6 9.3 19.7 65.5 93.9 34.7 94.0 88.4 27.8 71.3 90.6 85.3 97.4 North Non-Poor 88.5 54.3 90.0 5.6 55.6 28.9 92.6 9.4 20.0 72.0 97.1 42.5 96.8 88.8 29.9 75.8 94.2 86.9 97.1 Poor 69.4 42.7 82.8 2.3 35.8 26.9 84.9 9.3 19.6 61.3 91.9 29.8 92.2 88.2 26.4 68.3 88.3 84.3 97.6 Note: The poverty line used is calculatedfor sevendifferent regionsand separately for urban and rural areas in each region. The rational averagepoverty line is 1,117 thousanddong per person per year. Source: World Bank (1994),Annex 3.1, Table4 and 5, pp. 168-169. WHO ACCESS. The first questionto address in thinkingabout service HAS provision is, who has access to services? In addressing this issue the data from the communityquestionnairescan be especiallyhelpful. Table 7.7 showsa subset of the informationavailablefor rural areas from the Viet Nam LSMS used in the poverty assessment(World Bank, 1994b). It shows that in general the poor have less access to services than the non-poor but that the differencesare relatively small. Health facilities are more accessible in the south than the north, but agricultural services and literacy programs less so. WHO USESSERVICES. The next question is, who uses public services? This can also be answered by household surveys if they include appropriate questions. Figure 7.2 shows results from the 1990 Indonesian SUSENASas reported in World Bank (1993). Amongthose ill during the monthpreceding the field work, 67 percent of those in the richest decile sought health care, while 56 percent of those in the poorest decile soughtcare. More markeddifferenceswere shown in the places health care was sought. Among the poorest decile, 37 173 Figure 7.2: Indonesia- Percentof ThoseIll in Last Month Whno Sought Health Care, by Decile and Place Where Care Sought, Accordingto 1990 SUSENAS 70 6D - ->> 50- ~3D 2010r 0 1 ^ S 2 3 Hospital 4 5 6 7 8 9 10 Richest Poorest Deciles * " Private Doctors Public Subcenters D Poorest 1 Hospital Private Doctors Public Health Center Public Subcenters Others Total 2 3 21 15 16 57 Public Health Center Others Deciles 2 2 4 24 13 19 3 2 5 26 12 17 62 4 2 6 26 11 19 64 5 3 7 24 11 18 63 6 4 8 24 9 20 65 7 5 9 24 9 18 65 8 5 13 22 8 19 67 9 7 19 19 6 14 65 Richest 10 9 31 13 4 10 67 62 Source:World Bank (1993), Figure 1.10, p. 18. percent went to public health centers and sub-centers; only 3 percent of the ill sought care in private doctor's offices. In contrast, in the richest decile, only 17 percent of those ill sought care in the public health centers and sub-centersand 31 percent went to private doctors. How is THE VALUE OF THE SUBSIDY DISTRIBUTED? In order to complete the calculationof incidence,it is necessary to supplementinformationon the use of servicesfrom the householdsurveywith informationon the costs of providing services. This can come either from budget accounts or from special studies. When such information is available, it is possible to conduct analyses like that 174 shownin Table 7.8. The value of subsidiesto educationis greater than the value of subsidies to health and to householdconsumptionof kerosene. The absolute value of the subsidy captured by the richest decile is two to four times greater than the absolute value of the subsidycapturedby the poorest decile. However, the share of the subsidies in household expenditureis greater for the poor than the rich, indicatingthat these factors do help equalizethe distributionof welfare. Data from the facility surveys that sometimes accompany LSMS surveys can describe the quality of services available. A very sophisticated facilitysurveywas carried in conjunctionwith the 1989-IIJamaica Survey of Living Conditions. It surveyedall public and private hospitals, all public health centers, and a sample of private health centers. Informationon staff, buildings,equipment, supplies,and finances was collected. A wealth of informationwas availableand synthesizedin Peabodyet al. (1993). Among the interesting insights gained (see Figure 7.3), it turns out that public (urban and rural) facilities provide better perinatal diagnosis and counseling, immunization,and family planning than do private facilities. Private facilities, in contrast, are in better repair, better able to do laboratory testing, and have more equipment and supplies. In general differences in the quality indices between public urban and public rural facilitieswere small. WHAT IS THE QUALmryOF SERVICES? WHAT WOULD HAPPEN IF USER FEES WERE RAiSED? An important policy question in several sectors is the impact of cost recovery on the use of services and on the revenues of the service providers. Both of these have been analyzed extensivelyusing LSMS data, mostly for health but also for education. Figure 7.4 shows a simulationof how the use of health services for children might change in response to four alternative pricing policies. The simulationis done by Gertler and van der Gaag (1990) for the Sierra regions of Peru using the 1985 Table 7.8: Indonesia - 7he Distributionof SelectedSubsidies Year Poorest Decile Richest Decile National Average Subsidyper capita in Rp. per month Education Health Kerosene 1989 1989 1990 1161 113 94 2469 313 447 1520 213 243 Subsidy as percentage of householdexpenditure Education Health Kerosene 1989 1989 1990 13.18 1.00 .84 4.04 .38 .56 6.57 .70 .82 Source: Drawn from World Bank (1993), Annex 2.2, Tables 3, 4, 8, 9, 13 and 14. 175 Figure 7.3: SelectedIndicatorsof Qualityof HealthFacilitiesin Jamaica, Accordingto the ExpandedHealthModule, 1989 Surveyof Living Conditions Quality Maternal 100 of Prenatal Care Counselling 100 Diagnosis and Maternal Basic Equipment and Function in Facilities 80 80 6 0 - 60 li 40 - 40 - 20 20- 0 Public Urban Public Rural Private 0 Public Urban I Public Rural Private UPercentclinics of performing tests (7 or more out of 8) EJ Percent of clinics offering 15 of 20 counselling services LiWith60 percent of equipment working UWith60 percent primaryequipment present Note: The source did not give the tables with the exact numbers. These graphs are approximationsof the originals based on visual inspection. Source:Peabody et aL (1993), various figures. LSMS data. The simulationis run twice, first showing what would happen if private doctors did not raise fees in response to a rise in fees for public sector health care and then showingwhat would happen if private doctors did raise their fees. In both cases, children would use less health care. In the first case, some of the ill would use less health care. Others would still use health care, but would switchfrom public clinics to private doctors. In the secondsimulation,the use of private doctors would actually fall. Impact of Government Programs Finally, it is of interest to know the impact of government programs. Impact evaluations often require special sampling or other data sets to complement household survey data. Three examples where the special design features were kept fairly simple follow. 176 Figure 7.4: UserFee Simulationsfor Children's Health Care in Sierra Regionsof Peru, 1985 Sierra, no private doctor price response Percentageof ill populationseeking care Sierra, equal private doctor price response Percentage of ill population seeking care 40 30 20- 40 30 20- 10 Base 1 2 3 4 10 Base 1 2 3 4 Price scenario * Private doctor O G CliniC * Price scenario Private doctor U Clinic Hospital 3 Hospital Notes on price scenarios: Base: All fees at 0 intis. Scenario 1: Hospital fees set at Scenario 2: Hospital fees set at Scenario 3: Hospital fees set at Scenario 4: Hospital fees set at 7.5 intis. 15 intis. 15 intis and clinic fees at 7.5 intis. 15 intis and clinic fees at 15 intis. Note: The source did not give the tables with the exact numbers. These graphs are approximations of the originals based on visual inspection. Source: Gertler and van der Gaag (1990), Figure 7-4, p. 113. How MucH Do WORKERS BENEFrTFROMPUBLICWORKSSCHEMES? Governmentsoften fund public works schemesas part of their poverty alleviation efforts. The idea is that only the truly poor are willing to accept temporaryjobs that require hard physical labor and pay low wages, so that the jobs will be selftargeting. It is important not only to evaluate whether the prospect of good targeting is true, but how much workers benefit. Often the workers on such schemes could not afford to be completely idle if the scheme did not exist. Instead they might be selling chewinggum on street corners or turning up each day at places where daily laborers are hired. The earnings from these other activities might be low, but they would bring in some income. Thus for the workers, the monetarybenefit of a public worksjob is the differencebetweenthe wage it pays and whatever the workers could earn in their alternate activities. 177 In order to evaluate the benefits from the public works schemes financed by the Bolivian Emergency Social Fund (ESF), an extension of the Bolivian Permanent Survey was arranged. The Permanent Survey was carried out periodicallyin urban areas throughoutthe country. In conjunctionwith the 1988 Permanent Survey, a sample of workers on ESF projects in urban areas was administereda questionnairevery similar to that of the Permanent Survey. The two data sets were combined for the analysis. Data from the national survey were used to simulatewhat ESF workers would have been earning had they not worked for the ESF. Some of the results derived by Newman, Jorgensen, and Pradhan (1992) are shown in Figure 7.5. In the absence of the ESF, most of its workers would have fallen in the bottom four income deciles. Thus the targetingof the scheme was good. Moreover, with the ESF, the distributionof income shifts up. Thus, workers were made better off. The difference in the welfare levels can be seen in Figure 7.5. The black bars represent what workers would earn without the ESF, and the white bars what they would earn with the ESF. It is easy to see that the distributionof ESF workers movesto the right, indicatingthat they move up the income distribution with their ESF jobs. The average ESF worker experienceda 45 percent increase in weekly earnings over what he would have earned in the absence of the ESF. ON 7RANSFERS PRIVATETRANSFERS. Private, THE EFFECTOF GOVERNMENT non-market transfers occur almost everywhere in the world, but they are a particularly important part of economic life in developingcountries. While 15 percent of individuals in the United States report receiving transfers, the same figure in developing countries is much higher - 19 to 47 percent (Cox and Jimenez, 1993). Policymakers need to consider these patterns in their decisionmaking. First, the appropriate size of the public safety net depends in part on the size of the private safety net already in place. Tight public budgets imply that spendingmust be concentratedwhere the private network helps least. Second, private transfers might respond to changes in governmentprograms in ways that could diluteor perhaps reinforceprogram effectiveness. For example, an increase in publicly funded pension benefits may not benefit the elderly as much as expected if their children react by cutting their private support. Household surveys are key to analyzing patterns for private, interhouseholdtransfers of goods in-kind and cash. They can be used to explain how the patterns and amounts of private transfers are related to access to public transfers and other householdcharacteristics. These functionscan then be used to simulate what would happen if the amounts of those public transfers were to change. Researchers have used household data sets from several developing countries (Peru, C6te d'Ivoire, Ghana, the Philippines, Columbia, Poland, Kyrgyzstan, and Russia) to study the role of transfers. 178 Figure 7.5: Workers'Income in the BoliviaEmergencySocial Fund 50 A ;30 - 020 - Decile of primary earnings in Bolivia Permanent Survey 1 Socia!Fund * O Withi Emergency WithoutEmergency SocialFUnd Note. The source did not givc the table waiththe exact numbers. This original based on visual inspection. Souroc: Newman, Jorgensen, and Pradhan (1992), Figure 4.3, p. 61. graph is an approximation of the What are the results from recent research? Researchers have found that private transfers are directed toward those households that are also the recipients of government programs, such as the poor, the elderly, the infirm, those who lack access to formal credit (such as women and young people), and the unemployed. Moreover, private transfers are responsive to government policy, with important operational implications for the incidence of public transfers. The evidence shows that public transfers can "crowd out" private ones. Cox and Jimenez (1993) estimate that, in Peru, an increase of 100 units in public pensions would be associated with a decline in private transfers by 17 intis, leaving a net gain of 83 intis for the older household (Figure 7.6a). The "crowding out" results are most striking for the Philippines, a country with a minimal welfare state and widespread private transfers. For example, a 100-peso increase in public pensions to a retired household would cause private transfers to decline by an estimated 37 pesos (Figure 7.6b). But if employment insurance were instituted in the Philippines, reductions in private transfers would be so large that jobless households would only be slightly better off (Figure 7.6c). Thus, while public transfers would still benefit targeted households, such benefits would be smaller than those implied by analyses that ignore private transfer behavior. 179 Figure 7.6: Response of Private Transfersto Public Transfer Programs 7.6a: HoJ privaictransferswould respondto an expansion of public pensions in urban Peru If [be governmenin gave I rDeinti a increase public in pensionsaretdred to household... household... How SCHOOLS CAN INCREASE LEARNING. The contribution of edu- cation to economic developmentis now widely accepted. Yet in many developing inefficient countries in schools are very basic skills. teaching Leavings net B~~~~ail, Of 23 glhl fot hbouehold. Research on how schools can be By supplementing the Ghana Living Standards Survey with detailed school questions and tests of cognitive achievement, a standard LSMS survey was augmentedto examinethe impact of school and teacher characteristics on school achievement. Results are reported in Glewweand Jacoby (1992). The major findings of the re- improvedis thereforea high priority. Private transfers to thehousehold ,would deeline bybout17 L 7.6b: Howprivate transfers would respond to an expansion publicpensions theurban of in Philippines Ifthegovcrnment savea 100-peso increasepublic in pensions a retind to, pcnbions arcttrcLeaving thold a act gein oforthe search on determinants of achievement in Ghanaian middle schools were: (1) bousebold, Private u"ae tothe housebold woud declie , eby sout 37 Ar Havingblackboardsavailable(whichare not always found in Ghanaian schools) raises mathematics (English)reading and classrooms that leak when it rains have lower mathematics and reading achievement; and (3) Increased availability of textbooks raises reading achievement. Not only does improved school qualitylead to increasedlearning per year of schooling, but it also raises the number of years attended by the typical student. To supplement the household achievement; (2) Schools with 7.6c: How private ransfers wouldrespond to an expansionofunemployment insurance In the urbanPhilippines If the overnment created unemployment insurance that gavean additional100 pesos to . joblesshousehold... Leaving net a gainof onlyI Private transfers p to forjobless data, cost information was gathered for blackboards,roof repairs,and text-books and benefit/cost ratios were calculated _ -- thehousehold household. would declincby about peson.. 92 _ . for each type of investment. eaho Blackboards had the highest ratio, followed by repairing leaking roofs and providing additional textbooks. These investmentswouldbe morecost-effective than the commonlydiscussedoptions of building more schools or hiring more Source: Cox and Jimenez (1993) teachers. 180 The Deterninants of HouseholdDecisions If the governmenthopes to influencecertain outcomes, such as whether parents will enroll their children in school, the nutritional status of children, or the number of children a woman bears, it must understandwhat factors influence household decisions. Thus much "basic" research is useful background to government policy. A great deal of analysis of this sort has been done using LSMS data. THE DETERmINANSOF FERTJLry AND CONTRACEPvvE USE. The main advantage of using LSMS surveys for analysis of demographicbehavior is the wealth of economic variables that can be linked to individualsand households. Demographicsurveys, such as the World Fertility Survey and the Demographic and Health Surveys, collect vast amounts of information on demographic variables. They provide the basis for estimatingvery precise levels of fertility, mortality, nuptiality,contraceptiveuse, breast feeding, and so forth. However, greater demographiccoverage is realized at the expense of not collecting other information on women, children, and householdsthat would help to understand what motivates these demographicoutcomes. The LSMS surveys, on the other hand, typically collect informationon a sub-set of these demographicvariables (fertility, child mortality, contraceptive use) but can link them to a vast number of economicvariables, measured in the household and community: household income, expenditure, wealth, and productive assets; education, training, and labor force participationof women, children, and all other household members; past and current investmentsin the schooling and health care of children; price, quality, and availability of health care and familyplanningservicesin the community;price, qualityand availability of child schooling; and wage and price levels in the communities. The LSMS surveys also establish extensive links between different individuals within the household, making it possible to perform detailed analyses of household compositionand issues such as child fostering. LSMS surveys have been used to analyze many demographic issues, 74 including: * * * What is the effect of female schooling, male schooling, and household income on fertility? What are the factors that inducecouplesto have fewer children and invest more in each child? How is the availability, quality, and price of family planning services characteristicsof affectingcontraceptiveuse? What are the socioeconomic 74. See for example: Ainsworth (1989, 1990, and 1992); Benefo and Schultz (1994); Montgomery and Kouame (1995); Oliver (1995a,b); and Schafgans (1991). 181 users and non-users? Of those with access to family planning and those without? * What are the economic factors affecting child mortality? How does the family's child mortalityexperience in turn affect fertility decisions? Figure 7.7 and Table 7.9 illustrate some of the potential uses of LSMS data in analyzingfertility and contraceptiveuse. In Cote d'Ivoire, women in the highest consumptionquintilehave the lowestage-specificfertility rates, but those in the lowest consumptionquintile have the next lowest current fertility (see Figure 7.7). On the other hand, current fertility is sharply lower among all women with secondary schooling and among women over 30 with primary schooling(not shown). This suggeststhat increasingincomesamong the poorest Ivorian women will raise fertility unless levels of female schooling are also raised. Contraceptiveuse is far more sensitive than is fertility to differentialsin income and femaleschooling,however. In Ghana, for example, knowledge,ever use, and current use of a modern methodof contraceptionincrease with levels of female schoolingand householdincome and decline with distance to a source of family planning, although distancesof 4 miles or greater do not seem to have an effect on use (see Table 7.9). Multivariate analysis of current use of contraception revealed that: increased female education and household expenditureare strongly and independently associated with greater contraceptive use; reducing the distance to family planning facilitieswould have only a small effect on raising contraceptiveuse; but improvingthe availabilityof spermicides in public and private facilitieswould raise use by a larger amount. THE DETERMINAmNT LATE SCHOOLENRoLLMENT. In many developing OF countries children enroll in primary schoolat a relatively late age, such as 7, 8, or 9. From the point of view of learning and subsequentincome, these delays appear to be wasteful. The data from the 1988-89 Ghana Living Standards Survey were used to examine several hypothesesconcerningthe determinantsof late enrollment (see Glewweand Jacoby 1992). Little support was found for the conjecture that credit constraints led to delayed enrollment and no support was found for the hypothesis that overcrowded schools led to rationing of school places to students. However, convincingevidence was found that indicatedthat malnutrition led to delayed enrollments. Stuntedchildren (as measured by low height for age) were much more likely to enroll in high school at a late age than were otherwise identical well-nourishedchildren. These findings indicate that nutrition interventionsmay lead to improved educationoutcomes. This section has given a brief tour of the rich and varied policy analysis possible with data from LSMS surveys. It should be clear why an abstract is not enough to exploit the potential, and why mechanismsto disseminatedata and to support many researchers shouldbe planned from the outset. 182 FYgure7.77:Age-specificfertilhlyrates by women's age and consumptionpercentile, COted 'lvore, 1985-87 0.4 40th-60th 0.3 -- 0.3 , '_ _ _' _ I~~~owest % 20 02 0.2 jf ~~~h ighest 20% 0.0 15 20 25 30 Woman's 35 Age 40 45 50 Source:Montgomeryand Kouaam6 (1995). Table 7.9: Percent of women who have heard of, ever used, or are currentlyusing a modern method of contraception,Ghana,1988-89 Explanatoryvariabk Women's education None 1-6 years 7-10 yeass More than 10 years Knowldge 66 91 95 98 Ever use 10 24 45 55 Currentuse 2 7 10 16 Expenditure quearadult) Lowest Second Third Highest 73 81 84 90 92 89 72 72 16 21 29 40 39 29 17 18 3 6 6 9 11 6 3 4 Miles to the nearest source of familyplanning None 1-3 miles 4-8 miles More than 8 Source:Oliver(1995a) 183 Chapter 8. Developing a Budget and Work Program Key Messages * Before designingan LSMS survey, it is important to assess the country's survey infrastructure and examine the history of previous surveys and censuses. The budget for an LSMS survey can vary depending on local factors: whether or not items provided in kind are includedin the budget; the size of the staff and amount of equipment; and prices of items. The two design-related factors are size of the survey sample and length of the questionnaire. A prototype budget can help the planner determine that all relevant items are included in the budget and provides a starting point against which to measure costs. A prototype budget is provided in sectionB. The work program takes place in three phases. Planning often takes around a year. Field work takes another year, while writing an abstract, documentingthe data sets, and starting disseminationof the data require at least another six months. Many preparatory tasks take place simultaneously. Effective scheduling requires knowledgeof how long each activity takesand how the activities are interlinked. * * * * This chapter is designedto help the survey plannerset realisticwork plans and budgets for an LSMS survey. It covers the planning and data collection phase. Only the most minimal set-up for data analysis is covered - the production of the abstract and documentationof data sets. Further time and money shouldusually be set aside for moreanalysis, but sincethe extent of these is so varied, they are not covered here. Section A addresses how to assess the existing capacity of the statisticalinstitute. SectionB providesa generic budget that includes all the major items required to carry out an LSMS survey. Section C presents a generic work program and guidelines to adjust it to the existing capacity of the institute. Obviouslythe details of work program and budget will vary widely from country to country. The survey planner will need to adapt the generic at informationpresentedhere to the specificcircumstances hand. This framework should be used to ascertain that all required elements have been included. The planner will need to consider how big a gap there is between the statistical agency's existing capacity and what is needed to carry out the LSMS survey. This should be done separatelyfor each of the elements required for the survey, rather than on the basis of some sort of general average. 184 A. Assessingthe Country'sStatisticalCapabilities Before planning a project, it is necessaryto know not only the desired end product but also the starting point. This sectiondiscusseshow to assess existing statistical capacity, starting with the outputs of the statistical agency and then assessing the inputs it has available. Assessing the Outputs of the Statistical Agency This approach is based on the philosophy that 'the proof is in the pudding." If an agency has successfullycarried out complexsurveysin the past, this is a promisingindicatorthat it may be able to do so again. If an institutehas never carried out a complex survey, it will probably need a larger infusionof outside resources and take a longer time to do an LSMS survey than an institute with extensive survey experience. The first thing to examineis the record of surveysover the last five or 10 years and those planned in the next two or three years. The best finding would likely be an institute that has certain nationwidesurveys it carries out regularly with resources from its normalbudget (say, a censusevery 10 years, a household budget survey every five or 10 years, labor force surveys every six months, consumer price index every month, or similar surveys)and a diverse set of ad hoc surveys, which may be financedon a contractbasis. The regular nationwide surveys imply some stabilityand permanentcapacity, while the ad hoc surveys indicate flexibilityand client orientation. Next, one should find out about each of the surveys. Questionsto verify are listed in Box 8.1. The motivationis to assess the complexityand quality of recent surveys. The assessment of outputs should include the analysis and disseminationrecord as well as the data collectionitself. As part of the verification process, the assessor should try to obtain written materials about the different surveys. This serves two purposes. First, if the agency cannot produce key documents (questionnaires,sample plans, or abstracts) for recent surveys, it is a telling sign that some aspect of its capacity is weak. In this case it is worth trying to determinewhy the documentsare not available. If they were never produced, the survey implementationmay be of poor quality. If they were produced but no copies are on file, there may be a lack of good management. If there are archival copies but no extras, it may indicate that operating funds (such as the photocopybudget) are scarce. If the documentsare considered "secret," client orientation is extremelypoor. Second, it is much easier to assess quality from written documents, especiallyto distinguishbetweenaverage and excellent. For example, it is a bad sign if a survey does not have an interviewer manual. But if it does have a manual, it is important to see the manual to judge how well it is done. 185 Box 8.1: Assessing the Products of a Statistical Institute For each important survey that has been carried out over the previous three to five years, the assessor should try to determine answers to the following questions: Questionnaire How many different units of observation does it use? How coherent is the content? How good is its formatting? How long is the average interview? Sanmple How large was the sample? How many strata and clusters were there? Is the sample nationwide? Field Work What was the ratio of supervisors to interviewers? How many re-interviews were conducted? Were there any written supervision instruments? What was the non-response rate due to refusals? How good are the manuals? Data Management What kind of data quality assurance procedures were used? How soon after the completion of field work were data available for analysis? What kind of documentation is provided with the raw data to users? DisseminnaionRecord What publications are available from the survey? How much time elapsed between the field work and publication? How sophisticated is the analysis and presentation? Does the institute supply the unit record data to any users, or only to those who financed the survey? The assessmentof the statisticalagency's capacity must also include an assessmentof the inputs availablein the institute. A list of questionsto verify is produced in Box 8.2. The assessor should take particular interest in how the agency is budgeted (see Table 8.1). It will be possible to determine how much additionalstaff and equipmentare neededbased on what can be provided locally, and to arrange a financingplan for the total budget. Furthermore, knowing the existingcapacity will help customizethe workplan (discussedin SectionC of this chapter). 186 Box 8.2: Assessing the Inputs of a Statistical Institute Personnel How many people are on staff in relevant positions (field supervisors, interviewers, data entry personnel, programmers, samplers)? What is their level of formal education? How much experience do they have? What is the turnover rate (differentiated by type of job)? Do the people who worked on previous complex surveys still work at the agency? How much are staff paid compared to what they could earn elsewhere? Equipment How many vehicles does the agency have? How are they deployed? How many and what type of personal computers does the agency have? Who uses them and for what purposes? Are there sufficient peripherals (printers, universal power supplies, modems, air conditioners, cables, etc.)? What software does the agency have? Who uses it, and for what purpose? What is the availability of office equipment (phones, faxes, photocopiers)? How adequate is the supply of consumables(paper, diskettes, printer ribbons, pencils, etc.) Sample Frame When was the last census done? What publications are available, and at what level of disaggregation? What unit record data are available, and at what level of disaggregation? What methodological documents are available? What is the size distribution of the census units? Client Orientation What official and defacto data access policies exist? What fora exist for obtaining feedback from data users? B. Developinga Budget Few aspects of surveydesignare less conduciveto generic statementsthan the developmentof the surveybudget. On top of the country-dependent technical specificitiesof each budget item, the shape of the budget itself and the way it is broken down may be dictated by the national or the donor agencies' accounting procedures. Obviouslythe errors and omissionsmade at this point are extremely hard to amend later and can have serious consequencesfor the quality of the survey. 187 Table 8.1: ApproximateSurvey Budgets from Selected Countries? County Jamaica Ghana Morocco Pakistan Viet Nam Nicaragua Nepal Brazil Sample Size 2000 3200 3400 4800 4800 4200 3300 4480 Budget in Millions of USS .155 .819 1.178 1.024 .700 .781 .737 3.129 Note: a. These are the main budgetsformulatedwhen the projects were proposed, rather than the ex post facto amounts. No adjustmentshave been made for inflation, althoughthe budgets were done in years between 1987and 1994. Actual Survey Budgets Survey budgets for several of the LSMS surveysare shown in Table 8.1. They vary by a factor of 20-fold, from a low of about US$150,000for one year of the Jamaican survey of about 2,000 households,to a high of as much as US$3 million for the Brazilian sample of 4,480 households. Several surveys form a cluster, with budgets that range from US$750,000to US$1,000,000. The large differences between actual budgets stem from the different number of units of each input needed in each country, their prices, and whether they were includedin the budget or left off-budgetbecause they were suppliedin kind. The importanceof these is illustratedby lookingat how three items were handled in the real budgets included in Table 8.1. TRANSPORTDECISIONS. In Jamaica, no vehicles were included in the budget. The field work plan called for using public transportation or vehicles already owned by the survey agency. In Nepal, four jeeps are included in the budget at a price of about $12,000 each. The rest of the field work will be done in cities accessibleby public transport or in remote areas inaccessibleto vehicles. In Brazil, the budget calls for fourteen vehicles, one for each team. All are Brazilian made, and will cost about $45,000 each. PERSONNEL com. No allowanceis made in the Viet Namese budget for field staff, since the costs were whollyborne by the statisticalagency. In Nepal about $40,000 is allowed; in Nicaragua,about $80,000; and in Brazil, $800,000. 188 TECHNICAL ASSISTANCE. In Jamaica, most technical assistancewas provided by World Bank staff and hence was not budgeted. Only about $50,000 of on-budgettechnicalassistancewas used in the first year. In Brazil, the budget allows $158,000 for technical assistance and in Pakistan about $200,000 was budgeted. Base Case Prototype Since the actual budgets vary so widely in what was included (as well as in the unit costs and number of units of each item), it is useful to develop a "prototype" budget, which is shown in Table 8.2. It is designed to carry out a one-year, 3,200-householdLSMS survey. It is followed by commentson some of the budget items to explain how they can vary dependingon local conditions. At the very least the hypotheticalbudget should be useful as a checklist of the things that shouldnot be forgottenwhen costing a survey; at best, the comments may serve as guidelines on how to better tailor the budget to the country. This budget shows all the major inputsthat are needed, withoutregard for whether they must be purchasednew for the survey or will be provided in-kind by the statistical or internationalaid agency that helps to finance the survey. Usually the infrastructureand occasionallystaff and vehicles are contributedby the statisticalagency. In the past, a large share of technicalassistancehas been suppliedin kind by the World Bank and not included in project budgets. Here we includethe costs of such technicalassistancein the prototypebudget, because increasinglythe technicalassistanceis contracted out. Some of the implications of this are discussedin Box 8.3. The amounts in Table 8.2 are in a hypothetical common currency suggestiveof 1994 US Dollars. Real budgets are usually written both in dollars and in the local currency. Allowancefor inflationmay be neededin long projects or countries where inflation is high. SALARIES. The budget in Table 8.2 assumesa full complement of headquarters staff, a project manager, a data manager, a field manager, two assistantmanagers, a secretary, and an accountant. It also assumes the standard LSMS setup of field operations, where 10 field teams visit 3,200 householdsin one year. Each team is composed of one supervisor, two interviewers, one anthropometrist,one data entry operator, and one driver. Somesurveys make do with fewer core staff. The numberof the field teams budgetedhere is average, but it can easily be larger or smaller. 189 Table 8.2: Generic,All-InclusiveBudgetfor a One Year, 3,200-HouseholdLUving StandardsSurvey Level of effort No. (1) Base salaries: ..-------------------.. Project manager Data manager Field manager Assistant managers Accountant Secretary Supervisors Interviewers Anthropometrists Data entry operators Drivers (2) Travel allowance: --------Project manager Data manager Field manager Assistant managers Supervisors Interviewers Anthropometrists ..1 1 1 2 1 1 10 20 10 10 10 30 months 30 months 30 months 30 months 24 months 30 months 14 months 13 months 13 months 13 months 13 months Unit amount Total amount 385,300 Photocopier Fax machine Stationery,toner,etc. Measuring tapes (adults) Scales (adults) Measuringboards (children) Scales (children) Survey material (4) PrInting and xerox . Questionnaires Manuals First abstract Other (5) Consultancyand travel: ..-. Foreign consultants Local consultants International travel International per diem (6) Other: 313,330 Level of effort No. 1 1 30 months 10 10 10 10 10 Unit amount 4,000 500 50 50 150 300 150 50 Total amount 4,000 500 1,500 500 1,500 3,000 1,500 500 16,500 800 600 600 450 450 350 400 350 350 350 300 24,000 18,000 18,000 27,000 10,800 10,500 56,000 91,000 45,500 45,500 39,000 114,400 4000 400 1000 500 O 1 90 days 1 60 days 1 90 days 2 60 days 10 200 days 20 200 days 10 200 days 40 40 40 40 10 10 10 2 5 4 5 8,000 2,000 4,000 2,500 236,500 Drivers (3) Materials: ..--------------------. Vehicles Fuel Car maintenance .. - 3,600 2,400 3,600 4,800 20,000 40,000 20,000 10 200 days 14 5 12 240 10 20,000 Office space Communications Pilot survey Household Listing Software Translation ..--.--..--.. SUBTOTAL Contingency TOTAL man/months 10,000 man/months 2,500 trips 4,000 days 150 140,000 12,500 48,000 36,000 147,000 12 12 12 10 10 12 12 4 1 1 1 15 13 months 13 months 15,000 220 110 1,200 500 800 1,200 1,400 6,000 500 1,500 50 180,000 34,320 17,160 12,000 5,000 9,600 14,400 5,600 6,000 500 1,500 9,750 30 months Data entry coqmuters Data entry printers UPS, stabilizers,etc. Air conditioners and safety Core team crnqputers Comiputer for analysis Core team printer Laser printer Computer supplies 100,000 200 5,000 20,000 10,000 6,000 --. 100,000 6,000 5,000 20,000 10,000 6,000 --. 1,213,030 121,303 1,334,333 --.--.. 10.0% 13 months Box 8.3: Contracting Outfor Technical Assistance In the first eight years or so of LSMS experience, the World Bank provided a good deal of the technical assistance and managementoversight of LSMS surveys itself by using staff or consultantson its payroll. More recently, this role is being assigned to technical assistants outside the Bank. This brings the conduct of LSMS surveys closer to that of other kinds of projects (where the Bank provides financing for the country to purchase technical assistance) and allows the Bank to support a larger number of LSMS-type surveys than would be the case if all the technical assistance had to be provided by a small pool of people in the Bank. The Bank is, however, still learning how to contract out to best advantage. Listed here are some of the pitfalls experienced to date so that they may be avoided in future surveys. Mismwachbetween Budgets and Desired Product. A very common problem where the implementationof the surveys have been contracted out is that the budget allocated falls very far short of what it would take to produce the desired product. It is not unusual for the allocated budget to be only half of what is realistically required, and in some cases it has been a much smaller fraction. This happens most often and most severely when the idea of doing a survey is slotted into a larger project and the budget is assigned without doing a mission or two to clarify what product is actually wanted and what the existing statistical infrastructure is. The solution to this is to treat an LSMS survey as any other complex project element and use successive missions to define three levels of issues: 1) The big picture - in addition to a data set, how ambitious will the project be in building capacity for data collection? Sponsoring analysis of the data set? Building capacity for data analysis? 2) The medium picture - what will the basic parameters of the survey look like? Will something like the full set of LSMS questionnairesbe used or will they be truncated? Will the field work and data management use the full LSMS procedures? How large will the sample be? and 3) The smaller detail - what are the total requirements for inputs? Who will finance each? Inadequate Terms of Reference. In several cases the terms of reference have not been sufficiently specific in defining what sort of survey and institutionalprocess was wanted. In some cases the consultants did what seemed reasonable to them, but it was not what the country or the Bank really wanted. In other cases, consulting firms were asked to bid on terms of reference that included making decisions on items that would affect the costs, e.g., how large the sample should be and whether or not to do anthropometrics. This obviously made it difficult for the firms to bid sensiblyon the projects. The solution to this is to write better terms of reference. This means that the task manager must allocate adequate time to the task and above all should consult with other task managers and survey specialists in the Bank about the strengths and weaknesses found in the terms of reference that have been used in various countries to date. Inadequate Learning from Experience with LSMS Surveys. All too often new LSMS-type surveys are being planned without taking into account the lessons from experience in old surveys. This can result in the above problems of bad budgets and terms of reference, in badly designed questionnaires, in inadequate quality control, etc. The solutions to this are two-fold. First, the LSMS division in the World Bank is making an effort to make the lessons of experience more widely available. Writing this manual is an importantpart of the effort. The LSMS division will provide examples of questionnaires, manuals, basic information documents, abstracts, and other key products to those planning new surveys.75 The division can provide additional support to surveys sponsored by the World Bank. The division sponsors a training course for Bank task managers in how to do LSMS surveys. Also, it spends a portion of its time helping those working on new surveys by critiquing draft terms of reference, budgets, work programs, questionnaires, etc. The other part of 75. Those who are planningnew LSMS surveysshould send an electronicmail messageto LSMS@worldbank.org requestthese materials. to 191 Box 8.3 continues on next page Box 8.3 (continued) the solution, of course, is that those working on new surveys must make the effort to learn from experience. In a surprising number of cases the people in charge of new surveys do not do this. Interacnionbetween the World Bank and the Technical Assistants. Although the technical assistance is contracted out, World Bank staff or consultantswill have to be involved in the development of surveys sponsored by the World Bank, as follows: In the project identification phase, Bank staff or consultantswill help define the project and its budget, write the terms of reference for the technical assistants, and supervise the process of selecting the technical assistants. In the implementationphase, the Bank staff provide the consultants with the lessons of experience and supervise the work they do. This should include providing two or three days of orientation for the contractor, reviewing successive drafts of the questionnaire, manuals, and data entry program, and participating in the field test. This should also include being 'on call' to answer queries about specific issues as they arise. In the analysis phase, Bank staff and consultantswill again be involved to ensure that the survey documentation, calculation of consumption aggregates, and calculation of poverty lines are appropriate. Bank staff will also have a very important role in ensuring that the links between the survey and policy analysis and policy makers are made. Obviously adequate time and funds must be allowed for both the Bank staff/consultantsand for the technical assistants to carry out their respective roles in this collaborative effort. At the time this manual is being written, there have not yet been enough successfulexperiences to know exactly what the right amount is. The current guess is that this is on the order of 15-25 weeks of staff/consultant time over the 30 calendar months from the beginning of planning the project to the production and distribution of the abstract and documented data sets, of which half or more should take place in the country where the survey is being developed. In the technical assistance contracts, time and travel funds should be made available for those involved in the project to attend the orientation course. The time involved for interactions on each of the specific sub-products (questionnaires, manuals, etc.) will be lumped in with interactions with the other parties involved in each. The successful interactionwill require the allotment of both enough days of technical assistance time for each sub-task and enough lead time for each iterative process. Advantages of Being Sinultaneously Technical Assistants, Data Users, Policy Advisers, and Financiers. Finally, it should be realized that there are some advantages to having the Bank simultaneously be technical assistants, data users, and financiers. First, the technical advice the Bank gave was consistent with what it wanted as a data user. Since competent, reasonable survey experts and analysts will differ from each other on some issues, this is not guaranteedwhen the technicalassistance is divorced from the user, even when all of the above problems with using outside technical assistance are solved. Second, in the Bank's role as data user and policy adviser it is often present when policy decisions are being debated. When the same individuals are working on the survey, they can provide an exceptionallygood channel for ensuring that the survey adequately addresses policy issues and that the results of the survey are considered in the policy decisions. Third, it can greatly ease and speed implementationif the financiers don't have to be separately tutored, courted, or cajoled into setting budgets and time tables appropriately. Losing the synergy between roles is probably inherent in contractingout the technical assistance. The impact of the loss may be minimized by providing the following: including terms of reference that clearly specify the analytic requirements of the surveys; providing in the project for feedback mechanisms between data users and survey designers; providing in the project data analysis and information transmission to policy makers; and providing adequate time for the Bank's task manager to supervise the project. 192 For a one-year LSMS survey, most the field staff salaries should be budgetedfor 13 months (the 12 monthsof field work plus one monthof training); but it is wise to allow an extra month for the supervisors, whose participation may be necessary earlier to help in tasks such as the field test of the questionnaire. Core staff salaries should be budgeted for about 30 months to allow for preparatory activitiesand post-fieldwork analysis and documentation. Determiningappropriatesalary levels is almostalways a thorny issue. In most countriesit is difficultto obtain the level of effort and competencerequired for a successful LSMS survey at civil service wage rates. Though there are nearly always some diligent, knowledgeablepeople working in the statistical agency at regular government wages, these few dedicated souls are usually already overburdenedwith the work that shouldbe done by those in vacant posts. It is unrealisticto expect these people to take on an LSMS survey in addition to their other tasks. It is likewise unrealistic to think that it will be possible to easily hire more such people. A way has to be found to reward those who work on the survey so that they are willing to accept the difficultiesof setting it up and managing it well. The job entails many months of intense work and it is unrealistic to assume that statistical agency officials - always badly paid, even by governmentstandards 7" - will do it well without proper incentives. The problem is that paying high wages to new hires from outside discouragesthe permanentstaff. Paying some of the permanent staff more than others may lead to similar resentment. But relying on permanentstaff with no extra rewards will, in most countries, doom the survey to produce low-qualitywork far behind the hoped-for schedule. The underlyingconundrumof the civil service - low salariesleading to low productivityleading to low salaries - is a much larger problem than can be solved in planning one survey. The survey planner must, therefore, find some solution (usuallyinvolvinga good deal of compromise)tolerable to the specific country. This problem must be approachedwith a mixtureof creativity, diplomatic skill, and some research into the ways it has been solved previously for similar projects in the country. A typical solution consists of establishing a systemof performance-based incentiveslinked to the extra activitiesthe staff must undertake because of the survey. These incentives might include overtime premia, travel allowances,and productionbonuses. In some cases the staff may act as temporary consultants to the project, either corporately or as individuals. The payment problem is often harder to solve for the core team, which is almost always composedof regular staff membersof the statisticalagency, than for the field staff, who are more often hired externally. 76. Survey contractors in the private sector know this very well. A typical call for bids to conducta marketingresearch survey usuallyestablishescertainminimum levels of remuneration for interviewersand supervisors. 193 This budget itemis very country-specific. Each field ALLOWANCE. TRAVEL team devotesabout 40 weeks duringthe survey year to effective surveywork, but how much travel is involvedwill vary. Some teams will spend a lot of the year visiting householdsin the samelocalitieswhere the teams are based, so no travel allowanceis necessary. This will be the case for the team based in the capital city. In other cases, the localitieswill be close enough to the team's headquarters to allow for daily trips from the base stationso that a small meal allowancerather that a full per diemwiU suffice. In other cases, as in remote rural areas, much of the year will be taken up in travel. The per diemamountsand the numberof travel days shownin Table 8.2 are intendedto be illustrativeaverages,but a good budget should be based on some estimateof the proportion of localitiesthat will fall in each of the three accessibilitysituations mentioned(i.e., in the team's base station itself, close to the base station, and far from the base station). This of course requires some educated guessing and/or prior knowledge of survey work in the country; if none of these are available, it is best to err on the safe side and assume that most localities will require expensive per diems. Generally, no per diemdifferentialsare consideredwithin the field team's staff, though in some cases the supervisor may be expected to have an extra "representation"responsibilityand given a slightlylarger daily amount. Theper diems of the core team, who also have to travel country-wide for central supervision,are usually much larger' This budget item considers several groups of expenses. MATERIALS. The TRANSPORTATION. sample budget in Table 8.2 assumes that new cars will be purchasedfor the surveyand that these will be regular two-wheeldrive vehicles. It also assumesthat one car will be needed for each of the field teamsand that two extra cars will be needed for transportationof the core staff. The prices do not include importduties, as officialgovernment development programs almost always benefit from some kind of tax exemption. All these assumptions are, of course, country-dependent. Sometimesthe statistical agency may provide vehicles from the existing fleet; in certain countries it may be politicallyincorrect to suggest that anything less than four-wheel-drivevehicles are suitable for the field teams or that the core staff might travel in cars that are not airconditioned. Fuel and car maintenancecosts can be estimatedfrom assumptionson the distance to be traveled (usually from 2,500 to 3,000 miles per car per month). 77. Rememberthat travel allowancesfor the core staff can also be used as a way to increasetheir base salaries. 194 COMPUTERS PRINTERS. AND Each field team will need its own data entry computerand printer. Technically,the data entry computerscan be very 78 simple, but if new machineshave to be purchasedit would be unwiseto select anything less that the standard entry-level configuration of the moment(as this is being written, this is a 80486-SX25 megahertz[Mhz] machine with 4 megabytes [MB] of random access memory [RAM] and an 80 MB hard drive). The printers can be narrow-carriage, dot-matrix machines. Most of the core team's computerscan also be entry-level machines,and one simple dot-matrix printer can be shared by all members for most tasks. The data manager's machine, however, should have the largest configuration that can reasonably be acquired (presently, it would be somethinglike a 80486DX66Mhzor Pentiummachine with a 8MBRAM and a 400 MB hard drive, and a fast laser printer). Some form of backup system, such as a cartridge tape drive or Bernoulli box, should be includedin the data manager's setup. Individualdata entry operationscan back up daily onto standard diskettes. As said before, the field computers should be installedat the field base stations, in reasonablysafepremises, with universalpower supplies(UPS) and air conditioners. All the core staff machines can share one or two UPSs. The budget should also ensure a sufficient supply of computer supplies (diskettes, printer paper, ribbons, toner, and so forth) throughout the survey period. OFFICEEQUIPMENT. At least one photocopierand a fax machine(and in certain cases even basic furniture)shouldbe made availableto the project and budgetedfor. ANTHROPOMETRIC EQUIPMENT AND SURVEY MATERIAL. If the survey includes an anthropometricmodule, each team should be equippedwith one set of boards and scales. PRiNTiNG AND PHOTOCOPiES. This budget item depends on the printing facilitiesthat may be availablein house at the statisticalagency, as well as on the bulk of the reports to be produced as a direct result of the survey. The only reports includedin the budget in Table 8.2 are the preliminaryand final statistical abstracts. 78. The 1984 C6te d'Ivoire survey was done with standard IBM PCs (8088 machines with 128K RAM and no hard drive). 195 CONSULTANCY AND TRAVEL. The amount of consultancy will obviously vary widely depending on the statistical agency's capabilities, the amount of training needed, and the amount of analysis included in the project design. As a median, the budgetin Table 8.2 includes 14 monthsof internationalconsultancy and five of local consultancy. The low end of the range of internationalconsultancyrequired is about six months. This would be relevant when the survey institute has all the basic technical skills required and needs principally to learn about LSMS surveys themselves. The six monthsmight include: three monthsof contact with analysts who have helped design questionnairesand write abstracts and documentation from other LSMS surveys; one month of contact with those who have helped arrange the organizationand logistics for other LSMS surveys; one month with someoneto teach local staff how to customizethe LSMSdata entry program; and one month for other training and consultations, including the training of the anthropometrists.Several monthsof local consultancymight be used to hire local policy analysts to draft the questionnaireand to help the statistical agency draft the abstract. The high end of the range of internationalconsultancyis on the order of 79 36 months. This would be required where more technicaltraining is desired and where a technical advisor is hired full-time for two years to assist or 8 substitute for the survey manager.0 The remaining 12 months of short-term consultancymight be used as follows: three months to develop questionnaires; one month to help design the logistics; two months to prepare the data entry program; one month to train the anthropometrists;and five months to provide training in analytic software and assistance in producing the first abstract and documentation. Local consultancieswould also be arranged to assist in drafting the questionnaireand abstract and to supplementthe training in analyticsoftware. Dependingon the country, some of the required technicalassistancemay be availablelocally,especiallyfor the designof specialquestionnairemodulesand the communityquestionnaire. A small allotmentis therefore made in the budget. Internationaltravel and per diem are necessary both for foreign experts travellingto the countryand for core staff memberstravellingabroad for training. In several past surveys, the latter has proved to be a cost-effective way of 79. Consider, for instance, the selectionof the localitiesto be visited by the survey. With a computerizedsampleframe, an expertcan do this in one afternoon. However,if each step of the process has to be explainedand discusseddidactically,the same task may extend to two weeks or more. 80. Suchlong-termcontractscosta good deal less thanthe $10,000/month budgetedfor the shortterm contracts. 196 conductingcertain tasks such as data entry program developmentor drafting the first statisticalabstract. OTHER Com. There are a numberof other costs that are difficultto classify, but important nonetheless. The core team offices are usually provided by the statistical agency, representing one of the national contributionsto the project. The agencymay or may not also provide the premises to be used as field team headquartersand other amenitiessuch as utilities, fumiture, etc., which otherwise would have to be considered in the budget. OFFICE SPACE. This item should consider the cost of both national long-distancecalls during the survey period and the cost of international calls, e-mail, couriers, and the like, which are especially important to ensure frequent and efficient contact between the local core staff, the internationalconsultants,and the internationalagenciesduring the survey preparatory phase. COMMUNICATIONS. PILOT SURVEY AND HOUSEHOLD LISTING. The actual cost of these activities depends heavily on local conditionsand can vary widely from the amounts indicatedin Table 8.2. This listing may not be required at all or may take up as muchas a third of the cost of the field work. These activities have been includedhere mainly as a reminder of their tendency to be accidentallyomitted from survey budgets. SOFrWARE. Operatingsystems are usually included with computers. If the customizedLSMSdata entry programis used, it will be tailoredto the survey as a part of the project. If a commerciallyavailablepackage is used, enough copies for all the data entry operators and data managers should be purchased. All computers should be equipped with virus detectionsoftware. Someadditionalsoftware, to be used at the core team computers, will also have to be purchased. A major statisticalprogram is essential, as well as several copies or a corporate license of a standard word processor and a spreadsheet. A graphics package, a presentation manager, and some compilers are also useful. CONTINGENCY. A contingency item should always be appended to any budget and that of an LSMS survey is certainly no exception. Given the uncertaintiesfaced by the surveyplannerwhen the budget is being developed,this should be five to 10 percent of the total cost. The budget should be costed in detail, but when the project is submitted for funding it is safer to keep the details as a technicalreference and cluster the budget items into larger groups. This usually makes them conform with the 197 agency reporting requirements and provides the survey managers with more accounting flexibilitylater. Discussion The budget in Table 8.2 may look daunting when compared with the budgets of other surveys previouslyconducted in the country. The evaluators should consider, though, that this budget is all-inclusive. It takes into account many direct and indirect costs that are often excluded from other budgets. In other words, the budget intends to reflect the total cost of the survey regardless of who provides the financing for differentparts of it. In particular, the total of expenditures for local salaries and travel allowances has been costed; this is an item that can often be covered by the regular budget of the statisticalagency. A realistic amountof technicalassistance has been explicitly considered in the budget as well. For many surveys this is provided in kind by the sponsoringinternationalagency. In many LSMS surveys part of this was provided in kind by World Bank staff. As the number of new surveys increases and the role of the LSMS division evolves, more technical assistance must come from hired consultants. It should also be noted that a substantialpart of the budget is devoted to the purchase of cars, computers,and other equipment,most of whichwill become part of the statistical agency assets and be useful for other projects after the survey is finished. Indeed, past projects may have left the agency with similar equipmentwhich can be used in this survey, considerablyreducing the costs to be financed. The cost of office space also may well be absorbed in kind by the statisticalagency. It is certainly preferableto have the headquartersteam present in the main offices of the agency, and if the field teams can take advantage of regional offices, that is useful as well. SensitivityAnalysis Table 8.3 shows how the total budget might vary if the sample size and number of years of the survey were changed. A little over half the budget for a one-year survey is devoted to start-up costs. These are mostly for salaries of the core team during preparation, internationalconsultancies, and purchase of equipment. The cost of the field staff, supplies,and infrastructureduring the field work constitutesthe difference between the start-up and one-year costs. Thus the additionalcost of conducting the surveyfor a secondyear is much lower than double the cost for the first year. It includesthe extra year of salariesand suppliesbut less technicalassistancethan used in the first year. In general, it adds about 60 percent of the cost of the first year's survey. 198 Table 8.3: SensitivityAnalysison the Budget 71mePeriod Start-up Costs only Start-up plus One Year of Survey Start-up plus Two Years of Survey 1600 Households 592,000 991,000 1,529,000 3200 Housewods 717,000 1,340,000 2,100,000 4800 Households 842,000 1,687,000 2,671,000 The fact that so much of the cost of the survey goes to the financing of preparatory activities means that the bulk of disbursements will come early in the work. This should be considered when planning the project's cash flow. About 40 percent of the total one-year cost is proportional to the sample size. This is the cost of field team salaries, their equipment, and their supplies. Thus, to increase the sample size from 3,200 households to 4,800 households (that is, to increase the number of field teams from 10 to 15), will only increase survey costs by about a quarter. However, the advantages of increasing the sample size at the marginal cost of adding new survey teams should be carefully weighed against the increased managerial complexities this entails. C. Developing the Work Program To set up any household survey a large number of people need to undertake preparatory activities in a coordinated way. The situation is even more delicate for LSMS surveys, because they differ from what is considered to be standard statistical practice in most countries. It is therefore essential to establish a plan for all the activities needed to implement an LSMS survey. Though such a plan must of course be tailored to each specific situation, the elements it should include are common to most countries. The length of time needed for some activities may vary; for others it may differ little from country to country. The timetable in Figure 8.1 is a schematic representation of the most important activities that must be completed to conduct a generic LSMS survey for one year. The time frame is a 30 month period, of which the first 12 months are devoted to preparatory tasks, months 13 to 24 to data collection in the field, and 199 Figure 8.1: Generic 7ime Tablefor Survey Management 1 MANAGEMENTAND LOGISTICS -MONTHS3 2I2 4 5 6 7 8 9 10 11 12 13 14 15 1.02Appoint core staff **** 1.03Core staffLogistics ** 1.04Acquire anthropometric equipment .. ** 1.05Acquire csurvters .*******. *tra 1.06 Acquire survey materials .. ... .... 1.07Nobility strategy vehicle and acquisitfon ... 1.08Publicity and hh motivationstrategy ... I ***... ... |. . . . . *** *** ***.** . .. *********** ... ... ... -MOmTHS 2 QUESTIONNAIRE DEVELOPMENT 12 3 4 ...... | .... .. 5 |6 7 8 9 10 I1 12 13 14 15 2.01 Identify policy-relevant issues ... 2.02 Prepare draftof the hh questionnaire ... 2.03 Distribute draftof hh questionnaire 2.04 Seminar . 2.05 FinaLize draftand pLan the fieldtest the . 2.06 Field test .. .. . . | .. * . . . . ** *** * | .. . 2.07 Review fieldtest 2.08 Printhousehold questionnaire 2.09 Prepare community price & questionnaires . . . .. .. |* .. .. ... 3 SAMPLING AONTH- | 2 34 5 6 7 .. . 8910 .. . .. . .. . Ill .. . 12 113 14 115 .. . |... . 1 .. . 45 ... 3.01 Sampte design 3.02 Develop the sample frame . .*** .. .. ... ... 3.03 SeLect sampLing units 3.03Plan the fieldassignments 3.04 Dwelling listing& cartographic updating 3.05Selectdwellings each CU in . . . . . . ... * .** ***.************* . .. dENTMS- . . .***** ******. 4 STAFFING AND TRAINING 12 . . . . . . 4 . .. . 5 .. . 6 7 . 8 9 101 Il |.. *** ; ******* ... . *** 12 13 114 115 4.01 Select and train field test staff 4.03 Prepare interviewer manual 4.02 Prepare supervision procedures and manual 4.04 Select interviewers 4.05 Train interviewers |***.* ... . .. . . . ... .* . . . . . . . .. -MONTH 5 DATAMANAGEMENT 1 234 .. l. 5 |6 . . . . . . l 7 . . 8 9 10 .f . . .. . 12 113 114 15 . . |. ... . . | 5.01 Prepare 1st version of date entry program 5.02 Final version of data entry program 5.03 Prepare data entry manual .. 5.04 Installdata entry computers for training . 5.05 Train data entry operators . 5.06 Install data entrycomputers in the field . 5.07 Data entry in the field (1st month) . . . . . . . . . . . . . . . . . . | . . . . . .. . . | .. . . ******* | *** l | . . . . . .**** . . . . **l . . ... ... . | |. . . | ... ... ... . .. . ... . 5.08Evaluation debugging and Define data managementprocedures 5.10Data entry in the field 5.09 . . . . . . . . . . . . . .. . .. . .. . .. . .. ... . . . -MONH5 6 FIELD WORK 123 5 6 7 .. 8 . 9 10 11 12 13 14 15 . .. . . .. ... ... 6.01 Survey in the field (1st month field test) 6.02 Evaluationof field test 6.03 Survey in the field ****| .. .. 200 Figure 8.1 continues on next page Figure& 1 (continued) 6 FIELD WORK (continued) 16 117 1 XI |19 120 122 123 124 25 2 121 27 28 29 30 6.03 Survey in the field 5.10 Data entry in the field X I I I I I I_ 4ONTS- _ 7 7.01 7.02 7.03 7.04 7.05 7.06 7.07 7.08 DATA ANALYSIS DOCUMENTATION AND Define first plan of tabulations Create data sets for first six months Prepare preLiminary statistical abstract Distribute preliminary abstract Seminar Revise contents of abstract Create complete data sets Prepare complete statistical abstract raw data sets to analysts ... ... ... ,, ... ... ... ... ... ... ... ... ... ... ... ... _ _ _ _ _ - 16 17 18 19120 ... ... ... ... ... ... ...... ... **, * ... ... ... ... ... ... ... ... ... ... 21 22 23 24 25 26 27 28 29 30 .. ... ** .. ** . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 7.09 Preparesurvey documentation 7.10 Distribute ... ... ... . ... ... ... . ... ****I. ... ... ... months 25 to 30 to preparationof the statisticalabstract and disseminationof data sets and policy analysis based on the survey data. The tasks are divided into seven main areas: (1) finance, management, and logistics; (2) questionnaire development; (3) sampling; (4) staffing and training; (5) data management; (6) field work; and (7) data analysis and documentation. The first five areas consist mainlyof preparatorytasks conducted between months 1 and 15. Field work takes place from months 13-24. Data analysis and documentationtake place between months 16 and 30. Certain activitiescan be conductedin parallel, but some sequencesmust be respected. The simultaneous development of several tasks is sometimes precludedby the limitedavailabilityof resources (mainlyby staff time at the core level), whereas in other cases the sequencesare imposed by logical constraints (for example, the data entry program can only be completed after the is questionnaire complete).Two activitiesare extremelyimportantto the planning process in this regard. One is the field testingof the questionnaire(Activity2.06 in the timetable)and the other the training of the field teams (Activity4.05). The real schedulingof preliminarytasks, which consists of assigningactual calendar dates to all activities, is usually done around these two key events. Following the timetable is an explanationof the activities that are most different from normal survey procedures, or most likely to be overlooked. Management and Logistics E AGREEMENS AND INS UTIONAL NVIRONmENT. main institutional The actors in the survey process are the data users, the data producers, the providers 201 8" of technical assistance, and the financiers. These actors must be identifiedas early as possible and their relationshipsclearly agreed upon. Three key items must be agreed upon: financing, access to the data, and mechanismsby which data users provide input into the survey design. The local data users may include the national planning agency, sectoral ministries, and local universities. Internationaldevelopmentorganizationssuch as the World Bank, several U.N. agencies, and some bilateral aid agenciesare also important data users. In most countries the most natural data collecting organizationwill be the official nationalstatisticalinstitute, which is often one of the survey initiators. Alternatively,the survey may be conducted by one of the local data users or contracted to a private organization. Developing new LSMS surveys is easier when it is possible to use experience from past surveys. Althoughsomeof this can be done throughwritten documents, much of it requires personalcontact; hence the need for international technicalassistance. In the first LSMS surveysthis was suppliedexclusivelyby World Bank staff and consultants. Increasingly, however, a wider group of individualsand agencies are becominginvolved. Financingfor an LSMSsurvey generallycomes from severalsources. The World Bank itself may be a major provider, usually through the evaluation componentof a larger loan in one of the social sectors. Bilateralor international agencies like the United States Agency for International Development, the Japanese Grant Facility, or the United Nations DevelopmentProgram may also be interested in supporting the project, especially if it includes an institutionbuilding component. Specialized internationalagencies may help with specific budget items (for instance, UNICEF may provideanthropometricequipmentand training if one of the survey modules assesses child nutrition). The country's own financing is usually an in-kind contributionof office space and personnel. The first agreements are, of course, those related to the survey financing. These should be established as soon as an initial budget is drafted and shouldbasically make clear who will be paying for what, when, and what administrativeprocedures will be used for spendingthe money. FINANCING. DATA ACCESS POLICY. The next agreement should concern user access to the survey data, as discussedin Chapter 7. 81. Sometimes the same persons or agencies can appear in more than one role. In the first surveys, the World Bank was the sole financier and provider of technical assistance and distressingly close to being the only data user. More recently, a variety of agencies have participated in each of these roles. 202 MECHANISMS FOR USERINPUTiNTO THESURVEYDESIGN. Experienceto date suggeststhat the surveysthat have been used the most are those that had the most input from potentialusers. These mechanismsmay be formalizedand continuous, as in the case of an official steering committee, more occasional, as through workshops held at key stages, or informal, through consultationsthroughoutthe process. APPOINTMENT CORE STAFF. A central core staff, composedof at least OF the survey manager, data manager, and field manager, must be appointed early in the planning process and be made responsible for running the survey on a day-to-daybasis during the preparation period. PROCUREMENT. The equipmentand supplies must be availablein time to ensure adequatequality in planningand carrying out the field work. It would not be much of an exaggerationto say that delays or difficultiesin obtainingsome of the basic items has occasioned more headaches and absorbed more time than some of the more substantivetasks in survey preparation (such as questionnaire design or drawing the sample). Anecdotes of such problems and their consequencesare legion, but one here will suffice. In the first year of the Jamaicasurvey, original procurementarrangements called for the anthropometrists' scales and measuring boards to be purchased through an international procurement agency. The arrangements languished becauseof unclear responsibilities,inattention,and bureaucraticdelay. Suddenly the training for the anthropometristsand the beginningof field work were only a few days and weeks away, respectively. The consultant engaged to train the anthropometrists happened to have a supply of equipment on hand, but the equipment was calibrated in the English system and the labels on the questionnaireswere in the metric system. Althoughexplicit instructions were given during training that the measuring units were to be recorded from the equipmentand not convertedby the anthropometrists, these instructionswere not followeduniformly. In the end, some field workers recorded weights in pounds and others in kilos. Despite several attempts in the ex post facto (not 82 concurrent) data entry to rectify the situation,the anthropometricdata from that year had to be discarded. The procedures for purchasing will vary accordingto the item, cost, and rules of the country or agency financing the survey. Moreover, the tasks in procurement for LSMS surveys are no different from those for procurement in general. Thus it is not necessary to discuss the "how to" in detail here, but an 82. If the survey had been organizedusing the standard two- round interviewand concurrent decentralizeddata entry, the problem probably could have been caught early enough to solve it in the field. The Jamaicansurvey, however, uses a single interview,a short periodof field work (usually about 10 weeks), and ex post facto data entry. 203 outlineof when the various times will be neededis provided. The survey planner will have to work out the details based on country-specificarrangements. MAKE LOGISTIC ARRANGEMEMS FOR CORE sTAFF. Essential tasks here include obtaining and equipping office premises from which the core team can work and decidingon transportationproceduresfor these staff. If the survey includes an anthropometricmodule, special equipmentmust be bought and the procurement procedures initiated well in advance because the suppliers of good quality equipmentare few and probably far away. The measuringboards and scales used in clinics are not suitable for field work. ACQUIREATHROPOMETRIC EQUIPMENT. ACQUIRE COMPUTERS. In addition to the computersand printers needed for each field team, a few extra machineswill have to be acquired for the core staff in the statisticalagencyheadquarters.The field machineswill onlybe needed at the time of data entry operator training (about two months before the survey starts), but the core staff machinesshouldbe availableas soon as possiblebecause they will be needed for some early tasks like questionnairedevelopment. ACQuIE SURVEY MATERLS. The interviewers will need standard equipment such as pencils, erasers, clipboards, simple pocket calculators, and briefcasesin whichto carry the questionnaires. Badgesand credentialsshouldbe produced so the interviewers can identify themselves. In some countries interviewers may also need items such as boots and raincoats. (For the CMte d'Ivoire LSMS, each team was also provided with a tent and campbeds!) How many and what kinds of items are needed will depend on the climate and the availabilityof lodgings. MOBLTY sTEGY AND VEHICLEACQuSiTON. The decision will have to be made on whether to acquire cars for all teams or if some of them can rely on public transportation. Vehicle procurement is usually a long procedure that should be initiated as soon as possible so that the cars are available before the survey goes to the field. However, even an efficient bureaucracy is unlikelyto produce the cars on time for certain pre-surveyactivitiessuch as the field testing or the householdlisting operation. Specialarrangementsmay need to be made to have a few vehiclesavailableearly on. They may be borrowed from the existing fleet or rented. If public transportationis to be used, other logisticalissues shouldalso be considered. It will have to be decided, for example, how and by whom the transportationbudget will be managed(ideally,it shouldbe managedby the team supervisors, but the accounting procedures should be devised so that they are neither too lax nor too bureaucratic). In a few countries, governmentemployees on official businessreceivereduced or zero fares on public transportation. Where 204 this arrangement is to be used, the appropriate credentials and authorizations shouldbe obtained. PUBLICITY HOUSEHOLD AND MOTIVATION STRATEGY. The use of mass media and preparation of targeted motivationalmaterials can be developed in parallel with the rest of the preliminarytasks. It is best to initiate these activities early because they tend to require substantialattention from the survey manager. Questionnaire Development IDEN77FY POLiCY-RELEVANT iSSUES. The main issuesto be addressedby the survey should be made explicit as early as possible. Meetings with the various interested parties shouldbe scheduled, either on a short and intensetimetable or over a longer period. Often key decisions will be made when the international technical assistants are present on a series of short trips. Decisions on successivelymore detailed issues can be made on each trip. QUEsONNARE. The hard part of this PREPAREDRAFTOF THEHOUSEHOLD task is, of course, the intellectual translation of relevant concepts and policy issues into concrete questions,a themealready addressedin Chapter3. However, the mechanicalpart of the process - that is, the physical productionof a lengthy paper document - cannot be overlookedor, worse, considered to be a clerical job that can be delegated to a secretary. Though it is usually done on a computer, questionnairedrafting is not thejob of the data managereither. More often than not the survey manager will have to assume this task personally. Efficient word processing software makes drafting the questionnaireeasier, but it will probably take two monthsor more to complete. In fact, the job shouldbe spread over more calendar months to allow commentsas needed. Extra time may be needed for translations. If only the final questionnaire is to be translated into one or more commonlyspoken languages, two or three weeks may be sufficient. If translationsare needed to facilitate the discussions among local and internationalmembers of the team drafting the questionnaire, two to three weeks will be neededjust to translate the first draft. Several more days may be needed for each subsequentdraft and to update the translation to reflect revisions made in the drafts. QUESTiONNAIRE.Two weeks to a DISTRIBUTEDRAFTOF THEHOUSEHOLD month should be allowed for internationalconsultants,data users, subject-matter specialists in internationalagencies, local sectoral ministries, and academics to analyze the draft. The process of revision and comments may be repeated two or three times if necessary. SEMINAR. Some of the people to whom the draft questionnairewill be distributed will provide written comments, but more feedback can be obtained 205 from a brief seminar of one or two days, conductedone month after the draft is circulated. FINALIZE THE DRAFT AND PL4N THE FIELD TEST. While the questionnaire draft is being revised (usually for about two weeks, and maybe for two more weeks if it has to be translatedat this stage), logisticalarrangementsfor the field test can be completed. These include selecting and briefing a small number of experiencedinterviewers,who will conductthe field test along with the core staff, and ensuring their transportation,lodging, and communicationsduring the field test. Around 200 questionnaires will have to be reproduced, probably by photocopyingrather than printing. TESTING QUESTIONNAIRE. least four weeks should be allowed THE At FIELD7 to field-test the questionnaire, with one or two additional weeks to review it in the central office. As explained in Chapter 3, the personal participationof the survey core staff, the survey planner, and experienced consultants in both activities is extremely important. This requires appropriate scheduling to guarantee that their participationis possible. It PRINTHOUSEHOLD QUESTIONNAIRE. is never a good idea to put too much pressure on a print shop because small flaws in the printing can cause large problems in field work. Plenty of time should be allowed and proper quality control procedures should be agreed on with the printers. There should be frequent, regular monitoringof the work by the core team. About five weeks should be allowed in total. AND PRiCE QUESTONNAIRES. PREPARE COMMUNIrY The community and price questionnaires should be developed in parallel with the household questionnaire. These questionnairesare not a major undertaking,comparedwith the household questionnaire,but they have a tendency to be overlooked. Some check points on these questionnaires' developmentshouldbe establishedduring the months of survey preparation. Ideally, these questionnairesshould be field tested at the same time as the household questionnaire, though as explained in Chapter 3 constraints in manpowermay dictate that they be tested at a different time. Sampling DESIGN. This part of the survey planning process is as much a SAMPLE financial and political issue as it is a technical one, and its broader parameters (such as total samplesize and stratification)are often decidedalong with the idea of conductingthe survey. If this has not been the case, an early decision must be made on the number of explicit strata and how the sample will be allocated among them. Also, the numberof householdsto be visited in each cluster has to be determinedand from it the numberof clusters to select in each stratum. The length of time required to reach these decisions dependslargely on the difficulty 206 of establishinga consensus. As little as two weeks or as long as two months may be required. DEVELOP THE SAMPLE FRAME. The actual implementation this task may of vary widely among different countries. If there is no recent census or if the censusrecords are notproperly kept, developinga sampleframe may take several months and have a substantialimpact on the survey budget. In countrieswith a recent census and strong statisticalcapabilities, developingthe sampling frame may require virtually no work at all. SELECT SAMPLING UNITS. This consists of sorting the sample frame accordingto any desired implicit stratificationcriteria and selectingthe required number of primary sampling units in each stratum with probabilityproportional to size (PPS). This used to be a major undertakingB.C. (before computers),but now it can be done in a few days at most if the sample frame is in a computer file. PLAN THE FIELDASSIGNMEN7S. The selectedclustersneed to be distributed among the field teams and the order in which they will be visitedthroughout the year must be decided. Planning the task assignmentscan usually be completed in a few full-time days. DWELLING LISTING AND CARTOGRAPHIC UPDATING. A new listing of 83 dwellings or householdswill almost always be needed in the selected clusters . The time and resources needed for the dwelling listing are to some extent country-specific,but a first rough estimatecan be obtainedby consideringa yield of 80 dwellings per interviewer-dayin urban areas and 50 in rural areas. The maps of all selected CUs should be made available. Statistical agenciesalways prepare these maps as a part of the census operations, but they have a tendency to get lost shortly after the census so sometimesnew ones will be needed. If this is the case, cartography and the dwelling listing should be planned in parallel to avoid prolongingthe survey preparation period. A sample of the same number of dwellings is needed in each cluster. Additionaldwellings should be selected to act as reserves in case a dwelling or household has to be replaced, as explained in Chapter 4. This task can be completed in about one full-time month in the central offices, either after the dwelling listing is completed for all CUs or as a parallel activity. SELECT DWELLINGS IN EACH CLUSTER. 83. The 1995Tunisia survey is an exception. The household listingswere availablefor the whole country in a computer file prepared from the March 1994 census, whichwas both recent and of a very good quality. 207 Staffing and Training SELECT AND TRAIN FIELD TEST STAFF. In addition to the core staff and consultants,a few interviewers should participatein the field test. They should be selected on the basis of their experience, given that they cannot be formally trained (there will be no manuals at that point of questionnairedevelopment). Dependingon their performanceduring the field test, they can be considered for later promotion to survey supervisors. PREPARE SUPERVISION PROCEDURES AND SUPERVISOR AND INTERVIEWER MANUALS. Preparing the supervisorand interviewermanuals and the supervision forms are the field manager's most important tasks during the two-monthperiod between the field test and interviewertraining. SELECT INTERVIEWERS. A larger number of interviewers than needed shouldbe selectedfor training,to allow for a final choiceafter their performance is assessed during the practical part of the training. The selection process should be initiated three to six weeks before the interviewer training is scheduled. Selectionmay take longer if the interviewers are to be hired externallyrather than from the ranks of the statisticalagency. It may also take longer, and even involve decentralized personnel searches, if interviewers with specific geographic, ethnic, or linguistic backgrounds are sought. TRAIN INTrERVIEWERS. Training should be held over four weeks, as explained in Chapter 5. It should include practical sessionsof interviewingreal households interspersed with classroom work on theory. This means that households that were not part of the field test and will not be part of the final sample must be selected for field training. Transportationmust be arranged. Anthropometrictraining shouldbe treated as a separate module. A large supplyof babies and children is needed for this aspect of training, and a nursery school willing to make its children available should be found near the training headquarters. Data Management PREPARE FIRST VERSION OF THE DATA ENTRYPROGRAM. The development of a first version of the data entry program shouldbe started as soon as possible, because in addition to its obvious importance, it is also the first and most important training activity for the data manager. However, even a rough first version of the program cannot be initiated before a relatively developed questionnaireis available. Usually, this happens shortly before the field test. 208 Two to four weeks should be allowed to develop the first version of the data entry program. Much of this time will be absorbedin on-the-jobtraining in conceptual issues related to data management for LSMS surveys. Thus the required time will not be determined primarily by the programmingskill of the data manager but by the manager's experiencewith such complex surveys. As a part of the training, the questionnaire'sorganizationinto sectionsand modules will be translated into a set of data entry screens and as many of these screens as possible will be designed. Fields and ranges for all variables and the correspondingintra-record checks must be defined for each screen. FINAL VERSION AND DEBUGGING OF THE DATA ENTRYPROGRAm. While the questionnaireis field tested and finalized, the data manager will complete the design and intra-recordcheck definitionfor the rest of the screens. It is generally better to postponethe definitionand programmingof inter-record checks until all or most of the individualscreens have reached their final form. The survey data manager is the main person responsible for testing and debuggingthe program thoroughly. However, the first real test of the program comes while the interviewers and data entry operators are trained, when actual questionnaireswill be completedand entered. Generally still more refining and debuggingis necessary after the first month of field operations. DATA ENTRYMANUAL. The data entry manual can be written in about two weeks. COMPUTER INSTALLATION AND DATA ENTRYOPERATOR TRAUNING. The data entry computers should be initially installed in one large room at the statistical agency central offices or in some other room made available for the training of the data entry operators. It is better to start thinking about the logistical implicationswell in advance, because finding adequate premises may be harder than it seems (a frequent problem is ensuring a safe and adequate power supply and plugs for all the machines). The operators are trained at the same time as the interviewers in theoreticaland practical sessions. This means the operators enter the data on the questionnairesactuallycompletedby interviewersduringthe practicalpart of their training. When training is completed,the computersmust be moved to the teams' base stations throughoutthe country. It is technicallypossiblefor each operator to carry and install the team's own computerpersonally (this must be part of the training anyway), but in some countries this may void the supplier's guarantee. In this case, different arrangementsare needed. 209 DATA MANAGEMENT PROCEDURES. The task of the data manager is to consolidateall the informationarrivingfrom the field,I to ensure its completeness (that is, establish that each data entry operator is sending data for all the householdsthe team was supposed to visit), and to prepare the unit record data sets to be released to the survey analysts. Consolidationis usually done on a monthly basis. It is better to postpone the data manager's training on data set formation until after the field test. The reason is that these operationsare very different from the previous task of preparing and debuggingthe data entry program and they call for different skills (the former requires imagination, whereas the later requires discipline). It is also better to test the central data management procedures with the actual survey data that will be availableafter the field test. Field Work PILOT TEST OF FIELDPROCEDURES. As explainedin Chapter 5, a separate pilot test of field proceduresis rarely used in LSMS surveys. Instead, a review of the field experience is made after the first month of field work. The review itself will take one to two weeks. If a formal field test before the beginning of field work is planned, it should take place after the questionnairehas been field tested and the data entry program completed, but before the field staff is trained. This may add as much as two months to the preparationperiod, which could be inserted about month 10 in Figure 8.1. FIELDINGTHESURVEY. The survey should be fielded as soon as possible after interviewersand data entry operatorsare trained. However, usually at least one week must be allowed for the data entry computers to be moved to and installed at the field teams' base stations. The surveywill be in the field for one year. Data Analysis and Documentation PRELIMINARY sTATISTIcAL ABs77AcT. The first plan of tabulations can be made after the survey has been in the field for about four months. The list of tables can be circulatedamong the users for comment. Then as soon as the first six months' data are available, work can begin on the first statisticalabstract. It usually takes about two weeks to prepare the data for analysis and four to six weeks to prepare the abstract. It shouldbe disseminatedwidely. A month or so after the abstract has been made available, a seminar should be held. This will give the survey further publicity, but more importantlycan be used to critiquethe preliminary abstract so that the final abstract reflects the users' interests. 84. Exactlyhow this is done dependson the softwareused for data entry. In most of the LSMS surveys conducted so far, each household is kept as a separate file and the information is transferred from the field stations to the central office in lots made by month or by CU. 210 A polishedversion of the statisticalabstract should be assembled from the full year's data. This can usually be prepared within about three months of the end of field work. The final data will take a week or so to come in from the field. Constructingthe completedata sets will then take approximatelytwo weeks. Analysisitself will take about four to six weeks. Note that many of the analytic programs used in the preliminaryabstract will require only minor modifications. STAT7STICAL ABSTRACT. DATA SET DOCUMENTATION AND DISSEMINATION. The basic survey documentationshouldbe prepared concurrentlywith the abstract. This normally takes about two to four weeks. Mechanisms to permanently support the disseminationof the basic documentationand data sets should be set in place. More in-depthanalysiswill continueover monthsand years. Some may be financed specificallyunder the project that paid for data collection, more may be sponsoredfrom other sources. F;URTHER ANALYSIs. 211 Annex I. Description of Questionnaires from Viet Nam LSMS Survey Questonnaires HouseholdQuestionnaire The householdquestionnaire contains modules(sections)to collectdata on household demographic structure, education, health, employment, migration, housing conditions, fertility, agricultural activities, household non-agricultural businesses, food expenditures, non-food expenditures, remittances and other income sources, savings and loans, and anthropometric (height and weight) measures. For some sections (survey information, housing, respondentsfor second round, remittances and other incomes, credit and savings) the individual designatedby the householdmembersas the householdhead provided responses. For some others (agro-pastoral activities, non-farm self employment, food expenditures,non-foodexpenditures) member identifiedas most knowledgeable a provided responses. Identificationcodes for respondentsof different sections indicate who provided the information. In sections where the information collected pertains to individuals (education, health, employment, migration, fertility) each member of the household was asked to respond for himself or herself, except that parents were allowedto respondfor younger children. In the case of the employmentand fertility sections it is possible that the information was not provided by the relevantperson; variablesin these sectionsindicate when this is the case. The household questionnairewas completedin two interviews two weeks apart. Sections0-8 were administeredin the first interview, sections9-14 in the second interview, and section 15 in both interviews. The survey was designed so that sensitiveissues such as savingswere discussednear the end. The content of each module is briefly describedbelow. I. FIRST INTERVIEW Section 0 SURVEYINFORMATION OA HOUSEHOLD HEAD AND RESPONDENT INFORMATION OB SUMMARY OF SURVEY RESULTS OC OBSERVATIONSAND COMMENTS The date of the interview, the religion and ethnic group of the household head, the languageused by the respondent,and other technicalinformationrelated to the interview are noted. Section OB summarizes the results of the survey visits, i.e., whether a sectionwas completedon the first visit or the secondvisit. Section OC, not entered into the computer, contains remarks of the interviewer and the supervisor. Since the data on Section OC are retained only on the 212 Annex I questionnaires, researchers cannot gain access to them without checking the original questionnairesin Hanoi. Section 1 HOUSEHOLDMEMBERSHIP IA HOUSEHOLD ROSTER IB INFORMATION PARENTS HOUSEHOLD ON OF MEMBERS IC CHILDREN RESIDING ELSEWHERE The roster in Section 1A lists the age, sex, marital status, and relationto householdhead of all people who spent the previous night in that householdand for household members who are temporarilyaway from home. The household head is listed first and receives the personal ID code of 1. Household members were defined to include "all the people who normally live and eat their meals together in this dwelling." Those who were absent more than nine of the last twelve months were excluded, except for the head of the household and infants less than three months old. A lunar calendar is provided in the questionnaireto help respondentsrecall the year and monththey were born. For individualswho are married and whose spouse resides in the household,the personal ID number of the spouse is noted. This way informationon the spouse can be collected by appropriately merging informationfrom the roster and other parts of the survey. Section lB collectsinformationon the parents of all household members. For individuals whose parents reside in the household, parents' personal ID numbers are noted, and information can be obtained by appropriately merging informationfrom other parts of the survey. For individualswhoseparents do not reside in the household,informationis recorded on whethereach parent is alive, as well as their schoolingand occupation. In section 1C informationis collectedfor children of householdmembers living elsewhere. This information is only collected for children below thirty years of age. Children who have died are not included. All living children are listed along with the personal ID number of their father and mother (if parents reside in the household). Then informationon the age, schooling, and current place of residence of each such child is recorded. Section 2 SCHOOLING In Section 2, data were collected on self-reportedliteracy and numeracy, schoolattendance,completion,and currentenrollmentfor all householdmembers of creche or pre-schoolage and older. The interpretationof creche or pre-school age appears to have varied, with the result that while education informationis available for some children of pre-schoolage, not all pre-school children were included in this section. But for ages six and above informationis available for nearly all individuals,so in essencethe data on schoolingcan be said to apply to all persons ages six and above. For those who were enrolled in school at the time of the survey, information on school attendance, distance, travel time, expenses, and scholarshipswas also collected. 213 AnnexI Section 3 HEALTH Data on any illness or injury experiencedin the 4 weeks preceding the date of interview were obtained for all household membersin this section. For those who reported being ill in the past 4 weeks, informationwas obtainedon the duration and type of illness, type of care sought, distance to health provider, travel time, and cost of medicationand consultation. All individuals,whetheror not ill in the past 4 weeks, were asked if they had been ill in the year before the survey, and if so the total amount they had spent on health care in the previous year. At the request of the World Health Organization, several questions on smoking were asked of all individuals6 years of age and older. Section 4 EMPLOYMENT 4A 4B 4C 4D 4E 4F 4G 4H TYPE OF WORK AND JOB SEARCH MAIN JOB DURING THE PAST SEVEN DAYS SECONDARY JOB DURING THE PAST SEVEN DAYS SEARCH FOR ADDITIONAL EMPLOYMENT MAIN JOB DURING THE PAST TWELVE MONTHS EMPLOYMENT HISTORY SECONDARY JOB DURING THE PAST TWELVE MONTHS OTHER ACTIVITIES All individualsage six and older were asked to respond to the economic activity questions in Section 4, beginning with questions on the nature of their work in the last seven days. For persons who did not work in last seven days, data were collected on job search, and reason for not seekingemployment. For work in last seven days, information was collected on hours, length of employment,type of employer, taxes, distance and travel time to place of work, money and in-kind compensation,and benefits. Similar questionswere asked on the secondaryjob in the last seven days. Questions were asked on search for additional employment, including the kind of work sought and the lowest acceptablewage. If main work in the last twelve months was differentfrom the main or secondaryjob in the last seven days, the complete set of questions was answered for that work as well. Type of work and years of experience at any work prior to that of the main job in the last twelve months were collected. Again, if there was a secondaryjob in the last twelve months different from the otherjobs, data on work conditionsand compensationwere collected. Days and hours spent doing household chores were collected for each household member age seven and older. Occupation and sector of employment codes are not available in the householdquestionnaire. These appear in the supervisor's manual. Section 5 MIGRATION All household members age 15 or older responded to the questions on migration in Section 5. If not born at current place of residence, respondents 214 Annex I were asked whether the place of birth was a village, town, city, or other. The age at which such individualsleft their place of birth was recorded, as well as the main reason for leaving. In addition, individualswere asked the main reason for coming to the current place of residence, from what region they had come to the current place, and whether the previous place was a village, town or city. Finally, respondentswere asked how many places they had lived for periods of more than three months in their life. Section 6 HOUSING 6A TYPE OF DWELLING 6B HOUSING EXPENSES 6C HOUSING CHARACTERISTICS Section 6 containsinformationon the type of dwelling, housingexpenses, and housing characteristics for all households interviewed. Information was collected on the number of rooms in the dwelling, ownership status, wall material, roof material, water source, toilet type, utilities expenses, and square meters of living area. Respondents for all 4,800 households, regardless of whetherthe dwelling was ownedor rented, were asked for the resale value of the dwelling. This section also contains information on type of cooking fuel used, the time and distance involved in collecting wood, and whether it is the primary cooking fuel used by the household. Section 7 interview) RESPONDENTS CHOSEN FOR ROUND TWO (the second In Section 7, the principal respondent for Round One was asked to identify: 1) the household member who knowsthe most about all the agricultural and livestockactivitiesof the household;2) the householdmember who shops for food; and 3) the household member who knows the most about the other householdexpenses,income, and savingsof householdmembers. The respondent was also asked to identify the three most important businesses and trades belonging to the household,and the household members who know most about them. Finally, a woman was selected at random from among the women in the household between the ages of 15 and 49 to respond to the fertility module. In principle, those identified in this section for interviewing in later sections shouldbe the ones who are actuallyinterviewedin those sections. While this is true for many householdsthere are some cases where the respondentsfor the agriculture, food expense, and non-foodexpense sections are different from those identifiedin this section. This is possible if the person identifiedwas not present at the time the section was completed(i.e., during the second visit to the household). Section 8 FERTILITY 8A FERTILITY HISTORY 8B FAMILY PLANNING 215 AnnexI In each household one woman 15-49 years old, randomly selected in Section 7, responded to the questionsin Section 8. If a householdcontained no woman in this age range, Section 8 was not completed. The woman was asked if she had ever been pregnant and, if so, whether she had ever given birth. Women who respond that they have are asked the birth date and sex of all children they have given birth to, includingchildren who did not survive. If the child is not alive the woman is asked how long it survived. The womanis asked about the birth and breastfeedingof her last child, the age at which she was married, and the number of miscarriages she has had. Section 8B gathers informationon knowledge,use, source, and cost of six modem and six traditional methods of family planning. In using data from this section it should be kept in mind that unlike the Demographicand Health Surveys and the World Fertility Surveys, interviewerswere not necessarilywomen. 11. SECOND INTERVIEW Section 9 AGRO-PASTORAL ACTIVITIES 9A1 AGRICULTURAL LAND 9A2 FORESTLAND 9A3 SELLING BUYINGLAND OR 9A4 VACANT LOT, BALDHILL, LANDCLEARING RECLAMATION 9A5 AGRICULTURAL TAXES 9B1 PADDY 9B2 OTHERFOODCROPS 9B3 ANNUAL INDUSTRIAL CROPS 9B4 PERENNIAL INDUSTRIAL CROPS 9B5 FRUITCROPS 9B6 FORESTTREES 9C CROPBYPRODUCTS 9D FARMINPUTS 9E TRANSFORMATION HOMEGROWN OF CROPS 9F LIVESTOCK 9G OTHERANIMAL PRODUCTS WATERPRODUCTS 9H RAISING/PLANTING 91 EXTENSION CONTACTS FOR LIVESTOCK 9J LIVESTOCK EXPENDITURES 9K HANDTOOLS 9L FARMING EQUIPMENT In Section 9 the respondent was the household member identified in Section 7 as the one most knowledgeableabout the household'sagricultural and pastoral activities. Most questionsrefer to the past twelve months. This section is by far the largest sectionof the householdquestionnaire,with many subsections that contain informationon differentaspects of agriculturalproductionand related livestockactivities - collectivelyreferred to as agro-pastoralactivities. Sections9A1 to 9A5 collectinformationon household's controlover land of different tenures. These include land allocated by the commune, auctioned land, privately held land, rented/sharecroppedland, and swidden land. In each 216 AnnexI case data are obtainedon total land size, size of irrigated land, and paymentsfor use of land. For annual crop land informationis also obtainedon qualityof land. Similar informationis obtainedon water surfacecultivated, forest land controlled, land reclaimed from a bald hill, newly ploughed land, and roadside/riverside land. In these sectionsdata are also obtainedon purchasesand sales of land, and land taxes paid by the household. Section9B1to 9B6containdetailedoutput informationfor all crops grown by the household. This information is obtained separately for each crop and includes (in most cases) information on quantity produced, value of output, quantity sold in the market and given to the cooperative,quantitykept for seeds, quantity fed to livestock, and quantity given as gifts. In the case of paddy information is obtained, separately, for the summer crop, winter crop, and the autumn crop. It should be rememberedthat while data is obtained for each crop cultivatedby a household,it is not possibleto link the informationon land tenure (and size) with output information to determine the tenure structure of land on which a certain crop is cultivated - unless a householdcultivatesonly one crop on the land it cultivates. Section 9C contains informationon crop byproducts. Section 9D obtains detailed information on seeds, manure, fertilizer, insecticides, and transportation for all crops cultivated by a household. This informationis also crop-specificand can, theoretically,be linked with the output information in the earlier sections by matching the datasets by householdcodes and crop codes. Informationon other inputs such as hired labor, packing and storage costs, etc., are obtainedat an aggregatedlevel for each household. Other crop-specific information obtained in this section consists of data on home consumptionand on the use of agriculturalextensionservices. Section 9E contains informationon transformationof home grown crops that were subsequentlysold. This includes data on output for sale, codes of householdmemberswho participatedin the productionprocess, number of sales, revenues from these sales, and costs of production. Section 9F collects informationon livestock, poultry, and other animals that are either consumedby a household or generate income. These data include an inventory of current numbers possessed, the numbers born, sold, consumed,given away or lost, and the numbers bought by a household. Also included is informationon the value of current stocks, revenue from sales, and purchase costs. Section 9G then collects information on animal products such as milk, eggs, silk, manure, etc. Here information is restricted to revenue from sales. In section 9H similar informationis collected for water animals (fish, shrimp, etc.). Section 91 collects information on extension services for livestock, and section9J containsinformationon livestockexpenditures. Finally, section9L and 9K collect data on implementsand farm machineryowned by the household. 217 AnnexI Section 10 NON-FARMSELF-EMPLOYMENT 1OAWORKING CONDITIONS 1OBEXPENDITURES 10C REVENUES 10D BUS1NESS ASSETS Section 10 gathers data on household businesses for the three most important enterprises operated by the household. The respondent for each enterprise is the householdmember most familiar with its operation (as identified in Section 7). Data are gathered on the ownership, number of employees, and type of employee compensation for each enterprise. For each business, expendituresover the last twelvemonths on wages, raw materials, and taxes are collected. The respondentis asked how much, in moneyand goods, was received from sales and how much of the enterprise's product was consumed by the household since the first interview. Information on ownership, sales and purchases of assets - buildings,land, vehicles, tools and other durable goods in the last twelve months is also collected. Section 11 FOOD EXPENSESAND HOME PRODUCTION I IA HOLIDAY EXPENSES I1B NORMAL EXPENSES In Section 1 A the amounts spent on holidays,primarily Tet (New Year), 15th January, 15th July, Moon festival, and Independenceday. The range of food items for whichsuch expenseinformationis obtainedis smaller than that for which informationis obtainedin the Section 1 B. The main reason for separating holiday expenses from normal expenses, a departure from the standard LSMS survey format, is to take into account the fact that the Tet holiday in Viet Nam often represents significant departures from normal spending patterns particularly unusually high expenditures. Section 1 B collects detailed information on market purchases and consumption from home production for forty-five food items. Information is obtained for expenses since the interviewer's first visit. For a longer recall period (twelve months) data are obtained on which months (in the preceding twelve months) each food item was purchased, the number of times purchases were made during those months, the quantitypurchasedeach time, and the value per purchase. These four pieces of information can be combined to obtain the total expenditureon food in the twelve months before the date of the interview. Note that this, in effect, is a variable-recallprocedurebecause the time frame for which purchase informationis provided by a respondentcan differ for two food items, as well as across respondents. Besides market purchases (including barter), information is also collected on consumption from home production. Again data are obtained on which months each item was consumed, but unlike market purchases, the quantityand value of consumptioninformation applies to the whole year. 218 Annex I Section 12 NON-FOOD EXPENDITURES & INVENTORY OF DURABLE GOODS 12A DAILYEXPENSES 12B ANNUAL EXPENSES 12C INVENTORY DURABLE OF GOODS 12D EXPENSES FOR REMITTANCES Section 12 collectsinformationon non-foodhouseholdexpendituresfrom the householdmember identifiedin Section7 as the one most able to answernonfood expenditure questions. In section 12A respondentswere asked to recall the amount spent since the first interview (approximately two weeks) on daily expenses such as lottery tickets, cigarettes, soap, personal care products, cooking fuel, matches and candles, and gasoline. In section 12Bexpendituredata, both in the last two weeks and the last twelve months, were collectedfor shoes, cloth, clothing, home repairs, public transport, paper supplies, kitchen equipment, medicalservices, domesticservants,jewelry, entertainment,and other goods (see household questionnaire). Purchase price, year of purchase, and resale value of durable goods owned were collectedin Section 12C. Relationand locationof the recipients of remittances sent out from the householdare noted in Section 12D (remittancesreceived by the householdare recorded in Section 13A). Section 13 OTHER INCOME 13A INCOMEFROMREMITTANCES 13B MISCELLANEOUS INCOME Section 13 collectsdata on money and goods that come into the household as remittancesor from other sourcesunrelated to employment,such as employee welfare funds, dowries, sale of consumer durables, rental of buildings,etc. Section 14 CREDIT AND SAVING 14A MONEYANDGOODSLENTAND BORROWED 14B LOANSCONTRACTED 14C SAVINGS Section 14 collects information on the amount of indebtedness of household membersto people or institutionsoutside the household. If money or goods have been borrowed, or borrowed and repaid by any househoid member in the last twelve months, informationis collectedon those loans, includingthe source and amount of the loan, interest, side payments, collateral, repayment schedule, reason for borrowing, and number of loans from the same source. The household is asked to list the location of its savings, if any, including bank, housing savings bank, rural savings bank, foreign currency account, other bank accounts, bonds, stocks and home. The respondentis also asked the total value of all savingsaccounts. 219 AnnexI Section 15 ANTHROPOMETRICS Anthropometricmeasurements completedfor each householdmember. are Data were collected on the household member's age, gender, date of measurement,weight, height, and arm circumference. It was also notedif female respondents were pregnant or breastfeeding. If a person was not measuredthe reason why is noted. Community Questionnaire A Communityquestionnairewas administeredby the team supervisorand completed with the help of village chiefs, teachers, government officials and health care workers. The questionnairewas administeredonly in rural areas, i.e. commune numbers 1 to 120. Section I (DEMOGRAPHIC INFORMATION) includesthe populationof the community, a list of principal ethnic groups and religions, the length of time the community has existed and whether or not it has grown. Section 2 (ECONOMY AND INFRASTRUCTURE)questions include a list of principal economic activities, access to a motorableroad, electricity, pipe-borne water, restaurant or food stall, post office, bank, daily market and public transport. There are also questionson employment,migration for jobs, and the existenceof community developmentprojects. Section 3 (EDUCATION) asks distance to primary and middle schools. For up to three primary schools, the nearest middle school, and the nearest secondary school, information is obtained on whether it is public or private, whether it is for boys or girls, or both, how many classes there are, and when it was built. Enrollmentrates and reasons why children do not attend school are also collected. Section 4 (HEALTH) collects data on distance and travel time to the nearest of each of several types of health workers (doctor, nurse, pharmacist, midwife, family planningworker, communityhealth worker, traditionalbirth attendantand traditionalhealer) and each type of several types of health facilities(hospital,dispensary, pharmacy, maternityhome, health post and familyplanningclinic). The questionsin Section5 (AGRICULTURE) include the type of crops grown in the community,how often and when they are planted and harvested, and how the harvest is generally sold. This sectionalso includes questions on the availability of an extension center, agricultural cooperatives, and machinery, and questions on the use of pesticides and irrigation. Qualitativedata on the last year's rainfall, the local land market, the prevalence of sharecropping,and agriculturaland non-agriculturalwages in the communityare also gathered. Price Questionnaire In rural areas (communenumbers 1 to 120), price data were collectedby the team supervisor for thirty-six food items, thirty-one nonfood items, nine medicines, seven insecticides/fertilizers,and five types of services from local 220 AnnexI markets. Three separate observationswere made and these did not necessarily involve actual purchases. In some communesfewer than three observationswere made, either becauseof a lack of three distinctmarkets, or for some other reason. A separate set of prices are availablefor urban areas (communenumbers 121 to 150). These were collectedby the General StatisticalOfficeas part of a separate effort to construct a spatialprice index, and their valuesappear to be comparable to those of the rural prices. 221 Annex II. Annotated List of Selected References THE SOCIAL DIMENSION OF ADJUSTMENT SURVEY PROGRAM Delaine, Ghislaine and others. 1992. The Social Dimensions of Adjustment IntegratedSurvey: A Survey to Measure Poverty and Understandthe Effects of Policy Change on Households. Social Dimensions of Adjustment Working Paper No. 14. World Bank, Washington, D.C. The SDA Integrated Survey is quite similar to the LSMS survey - indeed an outgrow from them. Unlike this manual, the SDA manual gives particular attention to explainingthe objectivesof the survey, the content of the prototype questionnaireand its analysis. It puts more emphasison some of the theoretical issues in sampling and data management. However, it was written at the beginningof the SDA surveyprogram, before a body of practical experiencehad been amassed. Marchant, Timothy, and Christiaan Grootaert. 1991. The Social Dimensions of Adjustment Priority Survey. World Bank, Washington. T'heSDA PrioritySurvey is designed as a lighter survey that uses a much shorter questionnaireand a largersampleto gatherinformationwhich is less detailedbut coversmany of the same topics as the LSMS or the SDA IS. This manual contents are analogousto that of the manual on the IntegratedSurvey. INSTITUTIONAL ISSUES IN PROJECT DESIGN Grosh, Margaret E. 1991. The Household Survey as a Toolfor Policy Change: Lessons from the Jamaica Survey of Living Conditions. Living Standards Measurement Study Working Paper No. 80. World Bank, Washington, D.C. Using the Jamaican survey as a case study, Grosh discusses seven strategic choices in designing a survey project. Among others, these include how much and how to build institutional capacity, how to involve users in steering the survey, and how much emphasisto put on speed vs quality in data collection. QUESTIONNAIRE DESIGN In addition to the references to existing materialsprovided below, the reader shouldkeep abreast of the resultsof a major researchinitiativelaunchedin 1995. It undertakes a complete review and critique of the content of the LSMS questionnaires and will make recommendationsfor changes that should be adopted. General Overview Grootaert, Christiaan. 1986. Measuring and Analyzing the Level of Living in Developing Countries: An Annotated Questionnaire. Living Standards Measurement Study Working Paper No. 24. World Bank, Washington, D.C. 222 Annex 11 This document frequently serves as the description of the LSMS 'prototype' questionnaire. Some improvementsand many countryspecific variationshave been made since, but only one other nicely annotatedquestionnaireis available, so this remains a classic reference. Ainsworth, Martha, GodlikeKoda, George Lwihula, Phare Mujinja, Mead Over, and Innocent Semali. 1992. Measuring the Impact of Fatal Adult Illness in Sub-Saha Africa:An AnnotatedHouseholdQuestionnaire. ran LivingStandards MeasurementStudy Working Paper No. 90. World Bank, Washington,D.C. The Tanzania survey is one of the most specialized and most ambitious of the LSMS surveys. This questionnaire is interesting not only because it is well documented,but because it goesfarther than most in trying to address time use, intrahouseholdallocation issues, householddynamics, and behavior related to illness and death. Some parts of it may be too detailed to be of interest in more general-purpose surveys. Ainsworth,Martha, and Jacquesvan der Gaag. 1988. GuidelinesforAdaptingthe LSMS Living StandardsQuestionnaires Local Conditions.LivingStandards to MeasurementStudy Working Paper No. 34. World Bank, Washington,D.C. This discusses how to think about changingsections, emphasis, and wording of the questionnairesused in Cote d'Ivoire and Peru to make them applicableto a new country. 7he documentis a good beginning, though the changes it suggests may not be deep enough. Also, since the document was written early in the history of LSMS surveys,few real-life examplesare incorporated. United NationsNationalHouseholdSurvey CapabilityProgramme(UNNHSCP). 1985. Development and Design of Survey Questionnaires.United Nations Departmentof Technical Cooperationfor Developmentand StatisticalOffice, New York. This is a basic primer on issues of measurement, questionformulation, and questionnaireformnatting. Because the manual tries to address all kinds of surveys, treatment is confinedto a rather general level. Experiences with Specific Modules Grosh, Margaret E., and Henri-Pierre Jeancard. 1995. "The Sensitivity of Consumption Aggregates to Data Collection Methods: Some Preliminary Evidence from the Jamaican and Ghanaian LSMS Surveys." Poverty and Human Resources Division, Policy Research Department, World Bank, Washington,D.C. Thispaper addressesthe sensitivityof consumptionestimates to three variations in the consumptionmodule: the length of recall period used; the omission of some sub-components;and the use of an alternative point-of-purchase orientation for the questions. 223 Annex II Jolliffe, Dean. 1995. "Review of the LSMS Agricultural Activities Survey Module." Poverty and Human Resources Division, Policy Research Department,World Bank, Washington,D.C. This paper reviews the experience with the Ghana and Viet Nam LSMS agricultural modules. Minor changes are suggested to the module when its purpose is to measure netfarm income. Suggestionsare madefor much larger refornulations when its purpose is to understand farm behavior. Scott, Christopher, and Ben Amenuvegbe. 1990. Effect of Recall Duration on Reportingof HouseholdExpenditures: Experimental An Study in Ghana.Social Dimensions of Adjustment in Sub-Saharan Africa Working Paper No. 6. World Bank, Washington,D.C. Analysis of a specialexperimentshows thatforfrequently purchased items, recall erodes rapidly over short time periods. Vijverberg, Wim. 1991. Measuring Income from Family Enterprises with HouseholdSurveys. LivingStandardsMeasurementStudy WorkingPaper No. 84. World Bank, Washington,D.C. After reviewing the data from the COted'lvoire, Peru 1985, and Ghana LSMS data sets, Vijverberg shows that the estimatesof enterpriseincome resulting from different approaches to calculation (profits, net revenues, earnings) are not consistent. He proposes some modificationsto the module. World Bank. 1993. Indonesia: PublicExpenditures,Prices and the Poor. Report No. 11293-IND.Indonesia Resident Mission, Country Department III, East Asia and Pacific Region, Washington,D.C. Analysis of a data collection experiment in the Indonesia SUSENAS survey suggests that very short consumptionmodules can give results very similar to those from much longer and more costly modules. The data chapter also discussesthe core and rotatingmoduledesign of the survey and the choices made in reformingit. SAMPLING Cochran, William G. 1977. Sampling Techniques, 3rd ed., New York; John Wiley and Sons. Kish, Leslie. 1965. Survey Sampling. John Wiley and Sons: New York. Azorin Poch, Ernesto. 1967. Curso de Muestreo y Aplicaciones. Aguilar S.A.: Madrid. The above referencesare among the classics of the field. The drawbackis that they were all written before the modern computerage. What is nowfeasible is different, so some of the recommendationsare no longer well justified, and certain branchesof thefield are underdeveloped. 224 Annex II Verma, Vijay. 1991. Sampling Methods: Training Handbook. Statistical Institute for Asia and the Pacific, Tokyo. This is an excellent introduction to sampling as it is actually practiced in household surveys, at a level that is deeper than was possible in this manual yet less academic than the classics. Grosbras, Jean-Marie, and Jean-Claude Deville. 1987. Algorithmes de lirage (in Droesbeke, Jean Jacques, et.al., editors. Les Sondages. Economica, Paris. Provides guidelines to develop algorithmsfor selecting samplesfrom computerized files (with and without replacement, with PPS, etc.). UNNHSCP. 1982. Non-Sampling Errors in Household Surveys (Assessment and Control). United Nations Department of Technical Cooperation for Developmentand Statistical Office, New York. This document reviews the different sources and ways to control non-sampling errors, especially those stemming from inadequate or incomplete sample frames. UNNHSCP. 1986b. Sampling Frames and Sample Designs for Integrated Household Survey Programmes. United Nations Department of Technical Cooperationfor Developmentand StatisticalOffice, New York. This manual covers how to design and to maintain sampling frames for integrated The reader may find particularly useful the household survey programs. discussion on listing and updating the sample frame which we passed over lightly in this manual. Extensive treatment is given to the use of master samples. Scott, Christopher. 1990. Master Sample: Advantages and Drawbacks. Inter-stat, March 1990, No.2, 33-42. Eurostat/ODA/INSEE. French version: 1989. Echantillon-mattre: avantages et incovenients. STATECO, Dec. 1989, No.60, p.91-105. INSEE. Provides a balanced assessment of the pros and cons of master samples, based on the author's experience in nine Latin-American and Asian countries. Howes, Stephen,and Jean Lanjouw. 1994. "MakingPoverty ComparisonsTaking Into Account Survey Design: How and Why." first draft. World Bank and Yale University. This paper demonstrates the importance of adjusting standard errors for sample design features such as stratification and clustering. Sensitivity analysis is done using data sets from the Pakistan and Ghana LSMS surveys. Samplingin LSMS Surveys Coulombe, Harold, and Lionel Demery. 1993. Household Size in Cote d'lvoire: Sampling Bias in the CILSS. Living Standards Measurement Study Working Paper No. 97. World Bank, Washington,D.C. 225 Annex II Household size as measured in the CILSS declinedmorefrom 1985 to 1988 than was plausible. The paper examinespossible causes and concludesthat changes in samplingprocedures are the likely culprits. Scott, Christopher, and Ben Amenuvegbe. 1989. Sample Designsfor the Living Standards Surveys in Ghana and Mauritania/Plans de sondage pour les enquetes sur le niveau de vie au Ghana et en Mauritanie. Living Standards MeasurementStudy Working Paper No. 49. World Bank, Washington,D.C. After first-stage sampling with probabilityproportionate to size one can update the size measures andpreserve self-weighting re-allocatingworkloads:instead by of one workloadin each primary samplingunit, one allocatesnone, one, two, or (rarely) three. The paper explains how this was applied to two LSMS surveys. ANTHROPOMETRICS UNNHSCP. 1986a. How to Weigh and Measure Children: Assessing the NutritionalStatus of Young Children in Household Surveys. United Nations Departmentof TechnicalCooperationfor Developmentand StatisticalOffice, New York. the standard referenceon how to carry out anthropometncmeasurements. Kostermans, Kees. 1994. Assessing the Quality of Anthropometric Data: Backgroundand IllustratedGuidelines SurveyManagers. LivingStandards for MeasurementStudy WorkingPaper No. 101.World Bank, Washington,D.C. -the sensitivity of estimates of malnutrition to errors in the measurements is simulated using datafrom the PakistanLSMS survey. Suggestionsare madefor the analyst on how to assess the quality of existing data sets. Suggestionsare madefor surveyplanners on how to carry out qualitycontrol throughsupervision and data management. DATA ANALYSIS Examplesof Sinple DescriptiveAnalysis A number of abstractspublished by the governmentalstatistical institutes are availablefrom the LSMS Division. Glewwe, Paul. 1987a. The Distribution of Welfare in Peru in 1985-86. Living Standards Measurement Study Working Paper No. 42. World Bank, Washington,D.C. Also availablein Frenchand Spanish. Somethingof an annotatedabstract the for survey. 226 Annex II On MeasuringPoverty Ravallion, Martin. 1992. Poverty Comparisons: A Guide to Concepts and Methods. Living Standards Measurement Study Working Paper No. 88. World Bank, Washington,D.C. A detailed primer on how to measure poverty and make comparisons between time periods or regions. Assumes some facility with mathematical notation, but little prior knowledge of the matter at hand. Howes, Stephen, and Jean Olson Lanjouw. 1994. "MakingPoverty Comparisons Taking Into AccountSurvey Design: How and Why." first draft. World Bank and Yale University. Most poverty analysis is based on the asswnption that the household surveys used are simple random samples of the national population. This is often not true most are two or three-stage samples, many are not self-weighting, and stratification is common. This paper shows how to correct standard errors for common sample designs. It also provides empirical examples using the Pakistan and Ghana LSMS data. Correct standard errorsfor well-known poverty measures can be about one-third higher than uncorrected statistics. Howes, Stephen. 1994. "SAS DominanceModule." draft. softwarepackage. Howes has made publicly available a set of SAS routines to conduct statistical tests on differences between common measures ofpoverty, welfare, and inequality. 7he routines can be run using either the PC or mainframe versions of SAS. Glewwe, Paul, and Jacques van der Gaag. 1988. Confronting Poverty in Developing Countries: Definitions, Information and Policies. Living Standards MeasurementStudy Working Paper No. 48. World Bank, Washington,D.C. Illustrates the degrees to which different measures of household welfare identify the same households as poor using data from the C6te d 'voire LSMS. Kakwani, Nanak. 1990. Poverty and Economic Growth: With Application to Cote d 'Ivoire. Living Standards Measurement Study Working Paper No. 63. World Bank, Washington,D.C. The paper explores the relationship between economic trends and poverty, and develops the methodology to measure separately the impact of changes in average income and income inequality on poverty. The methodology proposed is applied to the data taken from the 1985 Living Standards Survey in C6te d'lvoire. Ravallion, Martin, and Gaurav Datt. 1991. Growth and Redistribution Components of Changes in Poverty Measures: A Decomposition with Applications to Brazil and India in the 1990s. Living Standards Measurement Study Working Paper No. 83. World Bank, Washington,D.C. The authors show how to parse changes in poverty measures between growth and redistribution components. Analysis is provided for Brazil and India. 227 Annex 11 Ravallion,Martin. 1994. "How Well Can MethodologySubstitutefor Data? Five Experimentsin Poverty Analysis." PolicyResearchDepartment,World Bank, Washington,D.C. One of the experimentsis an attempttoforecast poverty using aggregatestatistics (suchas agriculturalwages and yields) in the absenceof householdsurvey data. The paperfinds that one-yearforecasts are reasonablyaccurate, but that sizable drift can occur after just a year or two. SophisticatedAnalysis The majority of the LSMS Working Papers contain applications of modern econometricmodelingto householdsurveydata. Many themes and countriesare covered. The reader is advised to look at the full list of papers in the inside covers of recent working papers. The abstracts of the first 59 papers are contained in a booklet compiledby Brenda Rosa, and availablefrom the same location as the LSMS WorkingPapers. Deaton, Angus. 1994. "The Analysis of Household Surveys: Microeconometric Analysis for Development Policy." Book manuscript. Poverty and Human Resources Division, Policy Research Department,World Bank, Washington, D.C. This book is designed as a reference text for the policy analyst new to the sophisticatedanalysis of household survey data. Although basic knowledge of statistics is asswned, the book goes to considerableeffort to explain the intuition and policy relevance of the statistics and econometricsit teaches. Demery, Lionel, Marco Ferroni, and ChristiaanGrootaert. 1993. Understanding the Social Effects of Policy Reform. A World Bank Study. World Bank, Washington,D.C. This is a compendiumof thought on how to analyze the effects of policy reform (especiallyof the kinds of policies that comprisestructuraladjustmentpackages) on varioussocial dimensionsof welfare. Each chaptertreats a separate issue poverty, employment and earnings, migration, education, health, nutrition, fertility, women, and smallholderagriculture. The chapterauthorsare amongthe leading experts in applying householddata to the topic at hand. 228 Annex EII. LSMS Working Papers No. 1 2 3 4 TITLE Living Standards Surveys in Developing Countries Poverty and Living Standards in Asia: An Overview of the Main Results and Lessons of Selected Household Surveys Measuring Levels of Living in Latin America: An Overview of Main Problems Towards More Effective Measurement of Levels of Living, and Review of Work of the United Nations Statistical Office (UNSO) Related to Statistics of Level of Living Conducting Surveys in Developing Countries: Practical Problems and Experience in Brazil, Malaysia, and The Philippines Household Survey Experience in Africe Measurement of Weffare: Theory and Practical Guidelines Employment Data for the Measurement of Uving Standards Income and Expenditure Surveys in Developing Countries: Sample Design and Execution Reflections of the LSMS Group Meeting Three Essays on a Sri Lanka Household Survey The ECIELStudy of Household Income and Consumption in Urban Latin America: An Analytical History Nutrition and Health Status Indicators: Suggestions for Surveys of the Standard of Living in Developing Countries Child Schooling and the Measurement of Living Standards Measuring Health as a Component of Living Standards Procedures for Collecting and Analyzing Mortality Data in LSMS The Labor Market and Social Accounting: A Framework of Data Presentation Time Use Data and the Living Standards Measurement Study The Conceptual Basis of Measures of Household Welfare and Their Implied Surveys Data Requirements Statistical Experimentation for Household Surveys: Two Case Studies of Hong Kong The Collection of Price Data for the Measurement of Living Standards Household Expenditure Surveys: Some Methodological Issues Collecting Panel Date in Developing Countries: Does It Make Sense? Measuring and Analyzing Levels of Living in Developing Countries: An Annotated Questionnaire The Demand for Urban Housing in the Ivory Coast The Cdte d'lvoire Living Standards Survey: Design and Implementation (English-French) The Role of Employment and Earnings in Analyzing Levels of Living: A General Methodology with Applications to Malaysia and Thailand Analysis of Household Expenditures The distribution of Welfara in Cate d'lvoire in 1985 (English-French) Quality, Quantity, and Spatial Variation of Price: Estimating Price Elasticities form CrossSectional Data Financing the Health Sector in Peru AUTHOR Chander/GrootaertlPyatt Visaria United Nations Statistical Office Scott/de Andre/Chander 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Scott/do Andre/Chander Booker/Singh/Savane Deaton Mehran Wahab Saunders/Grootaert Deaton Musgrove Martorell Bridsal Ho Sullivan/Cochrane/Kalsbeek Grootaert Acharys Grootaeort GrootaertlCheurg/Fung/Tam Wood/Knight Grootaert/Cheung Ashenfelter/DeatonlSolon Grooteert GrootaertlDubois AinsworthlMunoz Grootaert Deaton/Case Glawwe Deaton Suarez-Berenguela 229 Annex III No. 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 LSMSWorkingPapers TITLE AUTHOR Suarez-Berenguela Van der GaagNVijverberg Ainsworth/Van der Geag Dor/Van der Gaag Newman GertlerGLocaySandgrson ~~~~~~~~~~~~~~~~~~DorNVan der Gaag Stelcner/Arriagada/Moock Deaton Strauss Stelcner/Van der Gaag/ Vijverberg Glewwe Vijverberg Deaton/Benjamin GertlerNan der Gaag Vijverberg Glewwe/de Tray GlewweNan der Gaag Scott/Amenuvegba Laraki Strauss/Mehra Van der Gaag/Stelcner/Vijverberg Ainsworth Gertler/Glawwo Levy/Newman Glewwolde Tray AldermanlGartler Rosenhouse Vijverberg Jimenez/Cox Kakwani Kakwani Kakwani Informal Sector, Labor Markets, and Returns to Education in Peru Wage Determinants in Cdte d'/voire Guidelines for Adapting the LSMS Living Standards Questionnaires to Local Conditions The Demand for Medical Care in Developing Countries: Quantity Rationing in Rural Cdte d'hvoire Labor Market Activity in Cdte d'lvoire and Peru Health Care Financing and the Demand for Medical Care Health Wage Determinants and School Attainment among Men in Peru The Allocation of Goods within the Household: Adults, Children, and Gender The Effects of Household and Community Characteristics on the Nutrition of Preschool Children: Evidence from Rural Cdte d'lvoire Public-Private Sector Wage Differentials in Peru, 1985-86 The Distribution of Welfare in Peru in 1985-86 Profits from Self-Employment: A class Study of Cdte d'lvoire The Living Standards Survey and Price Policy Reform: A Study of Cocoa and Coffee Production in Cdte d'lvoire Measuring the Willingness to Pay for Social Services in Developing Countries Nonagricultural Family Enterprises in Cdte d'lvoire: A Developing Analysis The Poor during Adjustment: A Case Study of Cdte d'lvoire Confronting Poverty in Developing Countries: Derinitions, Information, and Policies Sample Designs for the Living Standards Surveys in Ghana and Mauritania (EnglishFrench) Food Subsidies: A Case Study of Price Reform in Morocco (English-FrenchJ Child Anthropometry in Cdt@d'lvoire: Estimates from Two Surveys, 1895-86 Public-Private Sector Wage Comparisons and Moonlighting in Developing Countries: Evidence from C6te d'lvoire and Peru Socioeconomic Determinants of Fertility in Cdte d'lvoire The Willingness to Pay for Education in Developing Countries: Evidence from rural Peru Rigidite des salaires: Donnees microeconomiques et macroeconomiques sur l'ajustement du marche du travail dans le sacteur moderne (French only) The Poor in Latin America during Adjustment: A Case Study of Peru The substitutability of Public and Private Health Care for the Treatment of Children in Pakistan Identifying the Poor: Is 'Headship' a Useful Concept? Labor Market Performance as a Determinant of Migration The Relative Effectiveness of Private and Public Schools: Evidence from Two Developing Countries Large Sample Distribution of Several Inequality Measures: With Application to Cdte d'Ivoire Testing for Significance of Poverty Differences: With Application to Cdte d'lvoire Poverty and Economic Growth: With Application to Cdte d'/voire 230 LSMS Working Papers No. TITLE AUTHOR Annex III 6...