This preview shows pages 1–10. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 1 Combining data sets 2 ¡ Needs ¡ Stacking ¡ Matching and Merging ¡ Updating 3 Need for combining data ¡ Facts ¢ Powerful statistical analysis methods are built in SAS PROCs ¢ Observation is the fundamental unit of operation in data analysis tasks within SAS PROCs ¡ However, ¢ Data may not be in a single data set ¢ A logical unit of data may not be in a single observation ¡ Therefore, we have to logically relate and combine out data before performing data analysis tasks 4 ¡ A complete collection of data records (observations) are from different sources (data sets) ¢ Data are archived according to year or branch ¢ Many SAS PROCs operate on a single data set ¡ A complete collection of information about a single entity (a complete single observation) are from different sources (observations in a number of data sets) ¢ Information about a patient: (i) medical records, (ii) medical tests result, (iii) hospital visits records, and (iv) demographic record, etc.. ¢ The unit of data analysis tasks in many SAS PROCs is single observation within a single data set 5 Methods for combining SAS data sets ¡ Stacking ¢ Observations from 2 or more data sets are combined into a single data set ¢ Each observation in the combined data set is solely from one of the data set ¢ 2 subtypes ¡ Concatenate – stack data sets sequentially ¡ Interleave – stack observations from data sets, ordered by one or more common variables 6 ¡ Matching ¢ Observations from 2 or more data sets are matched into one observation in the combined data set ¢ The operation uses 2 or more independent SET statements applying to each of the data sets ¢ The matching operation stops when any one of the data sets reaches the end. It is not a complete merging of data sets. In merging, all observations in all data set will be combined in some way to form the resultant data set 7 ¡ Merging ¢ Observations from 2 or more data sets are merged into one observation in the combined data set ¢ 2 subtypes ¡ Onetoone merge – combine observations in data sets into a single observation based on their relative position in the original data sets ¡ Matchedmerge – combine observations in data sets according to the values of one or more common variables ¡ Updating – applying changes to a master data set from a transaction data set based upon the values of key variables 8 Concatenation 024 010 004 001 024 009 004 002 janpat febpat Patient # Patient # 024 010 004 001 024 009 004 002 Patient # Combined data set jan2feb data jan2feb; set janpat febdat; run; 9 ¡ Concatenation is to stack data sets – place observations of one data set after another ¡ Concatenation results in a data set that has as many observations as the sum of all the observations in the data sets being combined ¡ There are a number of methods that can be used to concatenate data sets. The most general way is to use a SET statement with multiple data sets ¡ Note: PROC APPEND is another way to concatenate data sets 10...
View
Full
Document
This note was uploaded on 02/09/2012 for the course STAT 1301 taught by Professor Smslee during the Spring '08 term at HKU.
 Spring '08
 SMSLee
 Statistics

Click to edit the document details