CH7 - STAT1303A Data Management 7. Combining Data Sets 7...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
STAT1303A Data Management 7. Combining Data Sets 7 Combining Data Sets We know that powerful statistical analysis methods are built in SAS PROCs and observation is the fundamental unit of operation in data analysis tasks within SAS PROCs. However, data may not be in a single data set and a logical unit of data may not be in a single observation. Therefore, we have to logically relate and combine out data before performing data analysis tasks. Usually, a complete collection of data records (observations) are from di/erent sources (data sets). The data are archived according to year or branch and many SAS PROCs operate on a single data set. As a result, a complete collection of information about a single entity (a complete single observation) come from di/erent sources (observations in a number of data sets). For example, information about a patient: (i) medical records, (ii) medical tests result, (iii) hospital visits records, and (iv) demographic record and etc. Furthermore, the unit of data analysis tasks in many SAS PROCs is single observation within a single data set. These explain why we need to combine SAS data sets. In this chapter, some techniques of combining data sets are discussed, namely, stacking, matching, merging and updating. 7.1 Methods for combining SAS data sets 7.1.1 Stacking Observations from 2 or more data sets are combined into a single data set. Each observation in the combined data set is solely from one of the data sets. There are two types of stacking: concatenate in which data sets are stacked sequentially, interleave in which observations from data sets, ordered by one or more common variables are stacked. 7.1.1.1 Concatenation Concatenation results in a data set that has as many observations as the sum of all the observations in the data sets being combined. There are a number of methods that can be used to concatenate data sets. The most general way is to use a statement SET with multiple data sets. Alternatively, PROC APPEND is another way to concatenate data sets. HKU STAT1303A (2009-10, Semester 1) 7 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
STAT1303A Data Management 7. Combining Data Sets 001 004 010 024 001 004 010 024 janpat Patient # febpat Patient # 002 004 009 024 002 004 009 024 Combined data set Patient # Example 7.1. Concatenate two data sets JANPAT and FEBPAT. data janpat; *same layout for both data sets; input @4 patno $3. @7 weight 3.; cards; 001140 004180 010210 010210 024137 ; data febpat; *same layout for both data sets; input @4 patno $3. @7 weight 3.; cards; 002123 004178 009160 024142 ; data jan2feb; set janpat febpat; run; proc print data=jan2feb; run; The output is HKU STAT1303A (2009-10, Semester 1) 7 2
Background image of page 2
STAT1303A Data Management 7. Combining Data Sets Obs patno weight 1 001 140 2 004 180 3 010 210 4 024 137 5 002 123 6 004 178 7 009 160 8 024 142 Data sets JANPAT and FEBPAT hold the patient records for January and February respectively. Then, the combined data set JAN2FEB holds patient records
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 05/04/2011.

Page1 / 22

CH7 - STAT1303A Data Management 7. Combining Data Sets 7...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online