CH8 - STAT1303A Data Management 8. Rearranging Data 8...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STAT1303A Data Management 8. Rearranging Data 8 Rearranging Data In Chapter 7, we have illustrated how to combine the SAS data sets which may be created by SAS PROCs or user themselves. However, these techniques are not su¢ cient for data analysis. In this chapter, we expand the techniques used in Chapter 9, e.g. we expand the observations of a SAS data set into several data sets. To arrange relevant information within a single source data set, we may come across certain situations: 1. A single observation in a source data set forms multiple observations in the destination data set. 2. Multiple observations in a source data set form a single observation in the destination data set. 3. Convert variables into observations or observations into variables. 4. Split a single observation into many (one to many) by using the techniques of array and DO-loop. 5. Combine multiple observations into a single observation (many to one) by the statement RETAIN and automatic variables FIRST.variable and LAST.variable. 8.1 One-to-many Problem Suppose we have a data set of observations that contain multiple occurrences of a medical diagnosis. Each observation represents one person and a person may have multiple medical diagnoses. There are altogether over 500 di/erent medical diagnosis (code 001 &500). We have two tasks in this example (1) Create a table that shows how often each diagnosis occurs and (2) Create a list of patients who have both of 2 speci¡c diagnoses. Example 8.1. Create a SAS data set DIAGS from the raw data ¡le each diagnosis is represented by variables DX1,DX2,...,DX5. * Example 8.1 read in data; data diags; infile &D:\temp\dx.dat& missover; length id $3 dx1-dx5 $3; input id dx1-dx5; run; Part of the ¡le: HKU STAT1303A (2009-10, Semester 1) 8 & 1 STAT1303A Data Management 8. Rearranging Data 001 328 138 412 002 116 440 082 368 003 153 428 442 340 004 359 146 410 299 005 428 442 006 092 488 162 210 086 007 308 113 008 142 158 403 009 074 010 041 207 495 243 011 353 478 496 002 012 491 015 ... Note that to read in the variables DX1, DX2, DX3, DX4, DX5, we have used the abbreviation DX1-DX5 which indicates DX1 to DX5. This notation is useful to simplify a list of variables. The &rst task is to create a table that shows how often each diagnosis occurs within the data set. The di¢ culty is that the diagnoses are stored in 5 di/erent variables in the data set. To create the frequency table for diagnosis, there are three methods: METHOD 1. The &rst method is to perform PROC FREQ on each of the 5 diagnoses and combine the results of the 5 tables. Example 8.2. * Example 8.2 list frequency of diagnosis codes; * primitive method: list each of the 5 diagnoses and add up manually (!); proc freq data=diags; tables dx1-dx5/nopercent nocum; run; The example of one-way frequency table for each diagnosis is shown as follows: HKU STAT1303A (2009-10, Semester 1) 8 & 2 STAT1303A Data Management 8. Rearranging Data dx1 Frequency dx2 Frequency dx3 Frequency- - - - - - - - -...
View Full Document

This document was uploaded on 05/04/2011.

Page1 / 12

CH8 - STAT1303A Data Management 8. Rearranging Data 8...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online