778_2006_Article_39 - The VLDB Journal (2008) 17:789804 DOI

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
The VLDB Journal (2008) 17:789–804 DOI 10.1007/s00778-006-0039-5 REGULAR PAPER Providing k -anonymity in data mining Arik Friedman · Ran Wolff · Assaf Schuster Received: 30 September 2005 / Revised: 24 May 2006 / Accepted: 2 August 2006 / Published online: 10 January 2007 © Springer-Verlag 2007 Abstract In this paper we present extended defini- tions of k -anonymity and use them to prove that a given data mining model does not violate the k -anonymity of the individuals represented in the learning examples. Our extension provides a tool that measures the amount of anonymity retained during data mining. We show that our model can be applied to various data mining problems, such as classification, association rule mining and clustering. We describe two data mining algorithms which exploit our extension to guarantee they will gener- ate only k -anonymous output, and provide experimental results for one of them. Finally, we show that our method contributes new and efficient ways to anonymize data and preserve patterns during anonymization. 1 Introduction In recent years the data mining community has faced a new challenge. Having shown how effective its tools are in revealing the knowledge locked within huge databas- es, it is now required to develop methods that restrain the power of these tools to protect the privacy of indi- viduals. This requirement arises from popular concern about the powers of large corporations and govern- ment agencies—concern which has been reflected in the A. Friedman ( B ) · A. Schuster Computer Science Department, Technion—Israel Institute of Technology, Haifa, Israel e-mail: arikf@cs.technion.ac.il A. Schuster e-mail: assaf@cs.technion.ac.il R. Wolff Management Information Systems Department, Haifa University, Haifa, Israel e-mail: rwolff@mis.haifa.ac.il actions of legislative bodies (e.g., the debate about and subsequent elimination of the Total Information Aware- ness project in the US [ 10 ]). In an odd turn of events, the same corporations and government organizations which are the cause of concern are also among the main pur- suers of such privacy-preserving methodologies. This is because of their pressing need to cooperate with each other on many data analytic tasks (e.g., for coopera- tive cyber-security systems, failure analysis in integra- tive products, detection of multilateral fraud schemes, and the like). The first approach toward privacy protection in data mining was to perturb the input (the data) before it is mined [ 4 ]. Thus, it was claimed, the original data would remain secret, while the added noise would average out in the output. This approach has the benefit of simplic- ity. At the same time, it takes advantage of the statistical nature of data mining and directly protects the privacy of the data. The drawback of the perturbation approach is that it lacks a formal framework for proving how much privacy is guaranteed. This lack has been exacerbated by some recent evidence that for some data, and some kinds of noise, perturbation provides no privacy at all [ 20 , 24 ]. Recent models for studying the privacy attainable
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 16

778_2006_Article_39 - The VLDB Journal (2008) 17:789804 DOI

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online