Immediately after the successful 2012 Obama campaign, Campaign Manager and former White House Deputy Chief of Staff for Operations Jim Messina discussed their "Targeted Sharing" analytics program: "I think it's one of the most important things we did." Targeted Sharing (TS) addressed the problem that young voters were important for the Obama campaign, but were difficult to reach via the traditional channels employed by political campaigns (direct mail, phone, email). The TS solution involved targeting the friends of the roughly half-million "authorized" Facebook Obama supporters. The analytics involved prioritizing the friends for the "targeted" sharing of campaign content, as oversharing leads to Facebook deprioritizing the content in the friends' streams.
Your liberal leanings led you to volunteer, and now you have been selected to lead the design of an analytics method for prioritized targeting of the 70 million unique friends for a "getting out the vote" message. Choose one of the methods we have covered this semester. Explain your design by answering the following questions. The notion of "proxies" for unavailable data may be helpful. One or two sentences will suffice to answer each question.
1) What precisely is your problem formulation? What general category of data mining task does this correspond to?
2) What is your data representation. Is this a supervised or unsupervised formulation? Why? What features will you use? Describe a few elements of your feature vector precisely.
3) What method do you propose to use? Why?
4) How will you evaluate whether your model has captured any generalizable knowledge? Explain two different ways, and what metric you propose to employ.
5) If the evaluation shows that it has indeed captured generalizable knowledge, why will (help) to solve the prioritization problem? Explain precisely. Are there other important factors that should be included in the solution?