Election fraud is in the news! The US recently held a national referendum - the subject of which is
very mysterious (and not all that important). The US Secretary of State posted the election results to its
web site. One of your colleagues collected a snapshot of the election results page and cleaned it up a bit. She
sent it to you via email as the le exercise.html. However, there is still some work to do to extract the data!
Updated results will likely be available by the time we evaluate your script, so focus on correct logic rather
than obtaining the exact numbers.
Examine the le and load the data into a data.frame in R. Complete the following tasks:
a. (6 pts) Get an overall count of yes and no votes. Store the number of yes votes in a variable named
total_yes, and store the number of no votes in a variable named total_no.
b. (7 pts) Count the number of cities in which the referendum passed. Store this number in a variable
c. (8 pts) Which city had the highest percentage of yes votes? How about the highest percentage of no
votes? Store your answers in the variables named highest_pct_yes and highest_pct_no, respectively.
d. (8 pts) One sign of fraud might be that the total number of votes cast in a city exceeds the population
of that city. Generate a list of potentially fraudulent cities and store the list in the vector variable
*I don't know how I'm supposed to be extracting the data. Below is my code, bu everything comes up as 0L or null. When I tried adding is.na or as.numeric, it only changed it to 1L.
Here's the HTML:
Here's my code:
html_data <- read_html('exercise.html')
total_yes <- sum((as.numeric(html_data$vote_yes, is.na=TRUE))
total_no <- sum(html_data$vote_no)
num_cities_passed <- sum(html_data$vote_yes > html_data$vote_no)
num_of_votes <- html_data$vote_yes + html_data$vote_no
highest_pct_yes <- html_data$city==(html_data$vote_yes / num_of_votes)
highest_pct_no <- html_data$city==(html_data$vote_no / num_of_votes)