# CH1 - STAT1303A Data Management 1 Introduction to Data...

STAT1303A Data Management 1. Introduction to Data Management 1 Introduction to Data Management Before we introduce the concept of data management, some of the related concepts are covered. The °rst concept introduced is statistics. Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more e/ective decision. Simply speaking, Statistics is based on numbers (data). For example, ° The total number of postgraduate students decreases from 1,654 in 2004/5 to 1,406 in 2005/6. ° The proportions of the postgraduate students who are from the Faculty of Science is 35.5%. ° Based on a sample survey of 100 postgraduate students, the proportions of the students who are from the Faculty of Science is 36%. The second concept is data. Data is de°ned based on variable which is a characteristics that varies from one person or object to another. Then, data is the de°nition obtained by observing values of a variable. For example, ° Variables for student - Age, Marks in STAT1303 and Gender. ° Data for a student - For student A, we have his/her data of 20yr, 67.5 and Male. 1.1 Data Management and Data Management System 1.1.1 Data Management Large scale data collection and analysis is common today. For example, in academic research, huge amount of genetic data is investigated in biomedical research and survey data in social science research. In particular, the genetic data studied in biomedical research can help to develop new methods to treat diseases. In government research, enormous amount of data survey data is obtained in census and general household survey for regular period of time. After then, social policies, e.g. transportation, can be determined from the observed survey data which indicates the demographic di/erences in various districts of Hong Kong. In the area of business research, business opportunity can be discovered from the results of marketing research and data mining after huge amount of data about potential customers are studied. Consequently, e¢ cient data management can provide reliable and accurate information for decision making in various aspects. HKU STAT1303A (2009-10, Semester 1) 1 ± 1

STAT1303A Data Management 1. Introduction to Data Management Indeed, data management is the systematic handling of information, ultimately stored in electronic form, to preserve the value of the contents for future investigation. Also, data management is a collection of tasks and process to achieve the goal of storing information in electronic form and preserving the value of the contents for future investigation. 1.1.2 Data Management Tasks Data management task/process is a set of activities or procedures directly related to accomplishing a speci°c data management objective. Typically, they include 1. Data collection 2. Coding of data 3. Data entry 4. Data editing and cleaning 5. Data backup and security 6. Documentation 7. Data summarization and presentation 8. Data manipulation and preparation in related to developing statistical analysis plan for the data 1.1.3 Data Management System
