{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Chapter 3 Data Summarization

Chapter 3 Data Summarization - STAT1303 Data Management 3...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
STAT1303 Data Management 3. Data Summarization 3 Data Summarization In Chapter 2, we have illustrated the input of data into SAS by Data Step. Then, the next step is to SAS procedures (PROCs) to summarize our data as the amount of data for analysis is huge typically. As a result, some basic features of SAS procedures are introduced. Afterwards, the SAS procedures for data summarization are used to demonstrate their use in data analysis. 3.1 Introduction to SAS Procedures Mainly, the procedures (PROCs) in BASE SAS module are used for data summarization and give descriptive statistics about the data. Descriptive statistics play a role in many data management tasks which are 1. Data presentation - a few reports, tables, and graphs can give interested party most of the information they want. 2. Data cleaning and validation - errors in data collection and data entry quite often result in observations ’di ff erent’ from the other observations. By identifying unusual observations, one may spot these errors. 3. Data exploration - exploring the structure and the relationships of variables in the data, as well, as the patterns of unusual observations helps in understand the data. 4. Data manipulations and preparation - summary statistics may be useful in data manipulations and preparation tasks. Subsequently, the data set can be used for further analysis. 3.1.1 SAS PROCs SAS PROCs are pre-written programs. Thus, using a PROC is like filling out a form. We can fill in the blanks of the PROC and choose from a list of options. Each PROC has its own unique form with its own list of options. All PROCs have required statements and most of them have optional statements. For example, the print procedure requires only PROC PRINT although we can add many optional statements to PROC PRINT . 3.1.2 Basic structure of a SAS PROC All PROCs start with a PROC statement and followed by a number of required / optional statements: A new Data Step ( DATA statement), a new PROC ( PROC statement) and the RUN statement ends the current PROC. A PROC statement starts with the PROC keyword followed by the name of the procedure. Options follow the procedure name. Typically, the DATA= option is common to all procedures: HKU STAT1303 (2011-12, Semester 1) 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
STAT1303 Data Management 3. Data Summarization PROC procedure_name DATA= data-set <options> ; BY by_details ; LABEL label_details ; WHERE where_details ; 3.2 PROC PRINT The main function of this procedure is to print observations in a data set. It can be as simple as PROC PRINT data=contact; RUN; Then, the observations in the data set CONTACT will be printed on the Output window. The general syntax of PROC PRINT takes the form of PROC PRINT DATA= data-set NOOBS LABEL; VAR variable-list ; ID variable-list ; SUM variable-list ; BY variable-list ; LABEL var = var-label ...; WHERE expression ; Here, the data-set is the name of input data set, variable-list is a list of variables in the data set data-set , var is a variable, var-label is a character constant and expression is a SAS expression. The more optional statements can be found in SAS Procedure Guide.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}