lab5solutions.pdf - Lab 5 The Sampling Distribution and the Central Limit Theorem(Part 2 Solutions INST 314 Fall 2018 Motivation The Price of Diamonds

lab5solutions.pdf - Lab 5 The Sampling Distribution and the...

This preview shows page 1 - 3 out of 9 pages.

Lab 5: The Sampling Distribution and the Central Limit Theorem (Part 2) Solutions INST 314 Fall 2018 Motivation: The Price of Diamonds In the ggplot2 package, there is a dataset called diamonds containing attributes for almost 54,000 diamonds. In this lab, let’s pretend that it’s the entire stock of a certain diamond resale company and that we’re interested in the average price of diamonds sold by the company. In other words, the prices in the diamonds dataset consists of our population. In the previous lab, we looked at the relationship between two categorical variables, generating a sampling distribution by randomizing one in order to treat them as independent. In this lab, we’ll start out with the population. Now, remember, we (almost) never have access to the entire population like this — we’re looking at the population here to see the Central Limit Theorem in action. The Central Limit Theorem for Sample Means Let’s start as always by bringing in the packages that we’ll be using. require (openintro) require (dplyr) require (ggplot2) # Bring in the Million Songs dataset for examples songs <- read.csv ( "/Users/bkim/Downloads/MillionSongsFinal.csv" ) Now that we’ve brought in the appropriate packages and initialized our seed, let’s get to the interesting part. In this lab, just like the previous lab, we will use simulations to find the sampling distribution. Just to recap, there are a few basic steps to how this works: Do the sampling process Compute and store the sample statistic Repeat many times The only difference is, this time, we’re going to be randomly drawing from the population instead of randomizing a value. Looking at the Population Let’s start by looking at an example dataset which we should all be familiar with by now, the Million Songs dataset. We will treat all the songs in the dataset as our population and look at the loudness of songs in this datset. Let’s look at the overall distribution of our population. First, some numerical summaries. songs %>% select (loudness) %>% summary () ## loudness ## Min. :-32.339 ## 1st Qu.:-13.575 1
Image of page 1

Subscribe to view the full document.

## Median :-10.258 ## Mean :-11.229 ## 3rd Qu.: -7.543 ## Max. : -1.997 Looks like the mean is slightly smaller than the median. That suggests we might be looking at a skwewed distribution, or possibly just have some outliers on the low end. Let’s take a look a graphical view of the distribution.
Image of page 2
Image of page 3

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Ask Expert Tutors You can ask 0 bonus questions You can ask 0 questions (0 expire soon) You can ask 0 questions (will expire )
Answers in as fast as 15 minutes