This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Privacy-preserving Anonymization of Set-valued Data Manolis Terrovitis Dept. of Computer Science University of Hong Kong firstname.lastname@example.org Nikos Mamoulis Dept. of Computer Science University of Hong Kong email@example.com Panos Kalnis Dept. of Computer Science National University of Singapore firstname.lastname@example.org ABSTRACT In this paper we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of transactional data that contains detailed information about items bought together by individuals. Even after remov- ing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have par- tial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the point of view of the adversary. We define a new version of the k-anonymity guarantee, the k m-anonymity, to limit the effects of the data dimensionality and we propose efficient algorithms to trans- form the database. Our anonymization model relies on gen- eralization instead of suppression, which is the most com- mon practice in related works on such data. We develop an algorithm which finds the optimal solution, however, at a high cost which makes it inapplicable for large, realistic problems. Then, we propose two greedy heuristics, which scale much better and in most of the cases find a solution close to the optimal. The proposed algorithms are experi- mentally evaluated using real datasets. 1. INTRODUCTION We consider the problem of publishing set-valued data, while preserving the privacy of individuals associated to them. Consider a database D , which stores information about items purchased at a supermarket by various cus- tomers. We observe that the direct publication of D may result in unveiling the identity of the person associated with a particular transaction, if the adversary has some partial knowledge about a subset of items purchased by that per- son. For example, assume that Bob went to the supermarket on a particular day and purchased a set of items including Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, to post on servers or to redistribute to lists, requires a fee and/or special permission from the publisher, ACM....
View Full Document
This note was uploaded on 11/12/2010 for the course CSCI 271 taught by Professor Wilczynski during the Spring '08 term at USC.
- Spring '08
- Computer Science