# lecture2 - Handout 2 Data Exploration Reading Assignment...

• 10

This preview shows pages 1–3. Sign up to view the full content.

Handout 2: Data Exploration Reading Assignment: Sections 1.3, 1.4, 1.5, 1.6, and Chapter 3 We previously looked at methods of sampling and the measurement levels of data. Suppose that you are a project development manager for an energy project. Your company, a new wind power energy company in Michigan, wants to help minimize emissions while producing optimal energy levels and wishes compare their emissions with the rest of the nation as well as in Michigan. Now we will begin to discuss analyzing data; however, before doing in-depth analyses, it is important to summarize what information is present in your data. Please note that we will be using data from the annual U.S. Electric Power Industry Estimated Emissions Report for this handout. Below is a sample of ten of the 16156 observations. From the sample of data, we see that there are seven variables Year , State , Type of Producer , Energy Source , CO2 , SO2 , and NOX but by just looking at the sample of the data, we do not get all the infor- mation. It is known that the data is collected from all 50 states between the years of 1990 and 2009. Also, the carbon dioxide, sulfur dioxide, and nitrogen oxide measurements (all in metric tons) are taken from all eight different energy sources with all seven types of producers (Knowing this information, can you speculate whether the entire dataset is a sample or a population?) Looking at the sample of the data and the variable descriptions, state what the levels of measurement for each variable are in the table below. Also, are the numerical variables continuous , taking on any value in an interval (e.g. height, blood pressure), or discrete , taking on only one of a countable list of distinct values (e.g. number of roommates living with you)? Variable Description Year State Type of Producer Energy Source CO2 SO2 NOX 1

This preview has intentionally blurred sections. Sign up to view the full version.

Categorical Data Recall that categorical data consists of groups or category names and that they may or may not have a logical ordering to them. In order to summarize categorical variables we need to count how many subjects fall within each possible category. Typically, percentages are used rather than counts because they usually are more informative than counts. This method can also be used for summarizing two or more categorical variables, which we will discuss at a later time. A relative frequency table is a listing of all possible categories along with their relative frequencies, typically given as a proportion or percent. Both counts and percentages are commonly given together (see the figure below). Relative Relative Energy Source Frequency Frequency Percentage Natural Gas 4322 0.27 27% Petroleum 4090 0.25 25% Coal 2695 0.17 17% Other 1786 0.11 11% Other Biomass 1415 0.09 9% Wood & Wood Derived Fuels 1066 0.07 7% Other Gases 675 0.04 4% Geothermal 107 0.01 1% Grand Total 16156 1.00 100% Relative Frequency Table of Energy Source A bar chart is useful for summarizing one (see figure below) or two categorical variables. These can be very helpful when comparing two categorical variables, as will be shown later.
This is the end of the preview. Sign up to access the rest of the document.
• Fall '08
• Heun

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern