This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Plausibly Deniable Search Mummoorthy Murugesan Department of Computer Science Purdue University, W. Lafayette firstname.lastname@example.org Chris Clifton Department of Computer Science Purdue University, W. Lafayette email@example.com Abstract Query-based web search is becoming an integral part of many peoples daily activities. Most do not realize that their search history can be used to identify them (and their inter- ests). In July 2006, AOL released an anonymized search query log of some 600K randomly selected users.While valuable as a research tool, the anonymization was insuf- ficient: individuals could be identified from the queries alone . Government requests for such logs serves to in- crease the concern. We propose a client-centered approach based on plausibly deniable search : actual user queries are replaced with a set of queries that hide the actual query. By using a singular-value decomposition approach (demon- strated on TREC-4), we are able to generate cover queries that have characteristics similar to the actual user query (although on unrelated topics), preventing the actual query from standing out from the cover queries. 1 Introduction Search engines such as Google, Yahoo!, and MSN boast huge user bases. Logs of the queries can give extensive insight into peoples interests and activities. This data can be used in the aggregate, but can also be used to develop profiles of individuals, raising concerns about the privacy of users. Anonymizing the logs does not solve this issue. In July 2006, AOL released an anonymized search query log  of around 600,000 randomly selected users. The logs had been anonymized (at the server side) by removing individually identifying information such as IP address, username, and any other personal information associated with that user, but assigning random ID for each user. However, this sim- ple anonymization proved ineffective; the query itself often * This material is based upon work supported by the National Science Foundation under Grant No. 0428168. Partial support for this work was provided by MURI award FA9550-08-1-0265 from the Air Force Office of Scientific Research. contained identifying information  (e.g., ego-surfing). Adding to this concern is recent government attempts to obtain query logs. The U.S. Government has subpoenaed search logs from the major search engines . While the subpoena did not request identifying information, it did re- quest the query text - which as AOL discovered, may be inherently identifying. This can have serious implications both for individuals and for the search engine companies; witness the use of information obtained from Yahoo! in the jailing of a Chinese dissident, and the aftermath. We are aware of only one that addresses protecting search text....
View Full Document
- Spring '09
- Computer Science