{[ promptMessage ]}

Bookmark it

{[ promptMessage ]} - HKU CS Tech Report TR-2008-04 Anonymity...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Anonymity in Unstructured Data Manolis Terrovitis , Nikos Mamoulis § , and Panos Kalnis Department of Computer Science University of Hong Kong Pokfulam Road, Hong Kong [email protected] § Department of Computer Science University of Hong Kong Pokfulam Road, Hong Kong [email protected] Department of Computer Science National University of Singapore 117590, Singapore [email protected] Abstract In this paper we study the problem of protecting privacy in the publication of set-valued data. Con- sider a collection of transactional data that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the point of view of the adversary. We define a new version of the k -anonymity guarantee, the k m -anonymity, to limit the effects of the data dimensionality and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm which finds the optimal solution, however, at a high cost which makes it inapplicable for large, realistic problems. Then, we propose two greedy heuristics, which scale much better and in most of the cases find a solution close to the optimal. The proposed algorithms are experimentally evaluated using real datasets. 1 Introduction We consider the problem of publishing set-valued data, while preserving the privacy of individuals associ- ated to them. Consider a database D , which stores information about items purchased at a supermarket by 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
various customers. We observe that the direct publication of D may result in unveiling the identity of the person associated with a particular transaction, if the adversary has some partial knowledge about a subset of items purchased by that person. For example, assume that Bob went to the supermarket on a particular day and purchased a set of items including coffee, bread, brie cheese, diapers, milk, tea, scissors, light bulb. Assume that some of the items purchased by Bob were on top of his shopping bag (e.g., brie cheese, scissors, light bulb) and were spotted by his neighbor Jim, while both persons were on the same bus. Bob would not like Jim to find out other items that he shopped. However, if the supermarket decides to publish its transactions and there is only one transaction containing brie cheese, scissors, and light bulb, Jim can immediately infer that this transaction corresponds to Bob and he can find out his complete shopping bag contents.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 21 - HKU CS Tech Report TR-2008-04 Anonymity...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon bookmark
Ask a homework question - tutors are online