• 4

ECS 132 Homework 3Problem A:Download the DNC e-mail co-recipient dataset from canvas. Here wewill explore the degree distribution of the resulting graph.The format seems to berecip1ID recip2ID nummsgsOne might treat this as a weighted graph, but we will not doso. Ignore the third column.The data seems to be duplicated, with (a,b) and (b,a) bothappearing in the data. This is odd, as the third columndiffers, lets take only the rows in which recip1ID < recip2ID.This indeed cuts the data in half.Explore how well a power law fits the data, as follows. Let midenote the count of recipients having degree i in the data, i =1,2,3,... The form of the pmf (check this!) implies that a plotof log(mi) against log(i) should look like a straightline.There will be points above and below the line, due to samplingvariation, but the trend should look linear. Make this plot,and comment.NOTE:In this and all future work for the course,you must use Rto generate your graphs. This can be either base R,ggplot2(appendix in our book) orlattice.Assuming a power law, estimate γ. (We use the termestimateherebecause the data is only a sample from a population (real orconceptual). Use R'slm()function for this. We willstudy thisfunction in detail later, but for instance the following wouldfind the intercept and slope of a line fit through the points(1,1), (2,2), (3,4):> x <- 1:3> y <- c(1,2,4)> lm(y ~ x)

Term
Fall
Professor
Ghosal,D
Tags
Probability theory, Exponential distribution, Discrete probability distribution