Chapter 3 Data Summarization

Chapter 3 Data Summarization - STAT1303 Data Management 3....

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
STAT1303 Data Management 3. Data Summarization 3 Data Summarization In Chapter 2, we have illustrated the input of data into SAS by Data Step. Then, the next step is to SAS procedures (PROCs) to summarize our data as the amount of data for analysis is huge typically. As a result, some basic features of SAS procedures are introduced. Afterwards, the SAS procedures for data summarization are used to demonstrate their use in data analysis. 3.1 Introduction to SAS Procedures Mainly, the procedures (PROCs) in BASE SAS module are used fo rd a t a summarization and give descriptive statistics about the data. Descriptive statistics play a role in many data management tasks which are 1. Data presentation - a few reports, tables, and graphs can give interested party most of the information they want. 2. Data cleaning and validation - errors in data collection and data entry quite often result in observations ’di f erent’ from the otherobservat ions . By identifying unusual observations, one may spot these errors. 3. Data exploration - exploring the structure and the relationships of variables in the data, as well, as the patterns of unusual observations helps in understand the data. 4. Data manipulations and preparation - summary statistics may be useful in data manipulations and preparation tasks. Subsequently, the data set can be used for further analysis. 3.1.1 SAS PROCs SAS PROCs are pre-written programs. Thus, using a PROC is lik eF l l ingou ta form. We can Fll in the blanks of the PROC and choose from a list of options. Each PROC has its own unique form with its own list of options. All PROCs have required statements and most of them have optional statements. ±or example, the print procedure requires only PROC PRINT although we can add many optional statements to PROC PRINT . 3.1.2 Basic structure of a SAS PROC All PROCs start with a PROC statement and followed by a number of required /op t iona ls t em en t s : An ewDa taS t ep( DATA statement), a new PROC ( PROC statement) and the RUN statement ends the current PROC. A PROC statement starts with the PROC keyword followed by the name of the procedure. Options followthe procedure name. Typically, the DATA= option is common to all procedures: HKU STAT1303 (2011-12, Semester 1) 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
STAT1303 Data Management 3. Data Summarization PROC procedure_name DATA= data-set <options> ; BY by_details ; LABEL label_details ; WHERE where_details ; 3.2 PROC PRINT The main function of this procedure is to print observations in a data set. It can be as simple as PROC PRINT data=contact; RUN; Then, the observations in the data set CONTACT will be printedontheOu tpu t window. The general syntax of PROC PRINT takes the form of PROC PRINT DATA= data-set NOOBS LABEL; VAR variable-list ; ID variable-list ; SUM variable-list ; BY variable-list ; LABEL var = var-label ...; WHERE expression ; Here, the data-set is the name of input data set, variable-list is a list of variables in the data set data-set , var is a variable, var-label is a character constant and expression is a SAS expression. The more optional statements can be foundinSAS Procedure Guide.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/09/2012 for the course STAT 1301 taught by Professor Smslee during the Spring '08 term at HKU.

Page1 / 37

Chapter 3 Data Summarization - STAT1303 Data Management 3....

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online