angin-neville-snakdd2008

angin-neville-snakdd2008 - A Shrinkage Approach for...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation Pelin Angin Department of Computer Science Purdue University pangin@cs.purdue.edu Jennifer Neville Department of Computer Science and Statistics Purdue University neville@cs.purdue.edu ABSTRACT Recent research has shown that collective classification in re- lational data often exhibit significant performance gains over conventional approaches that classify instances individually. This is primarily due to the presence of autocorrelation in re- lational datasets, which means that the class label of related entities are correlated and inferences about one instance can be used to improve inferences about linked instances. Sta- tistical relational learning techniques exploit relational auto- correlation by modeling global autocorrelation dependencies under the assumption that the level of autocorrelation is sta- tionary throughout the dataset. To date, there has been no work examining the appropriateness of this stationarity as- sumption. In this paper, we examine two real-world datasets and show that there is significant variance in the autocorre- lation dependencies throughout the relational data graphs. To account for this, we develop a technique for modeling non-stationary autocorrelation in relational data. We com- pare to two baseline techniques which model either the local or the global autocorrelation dependencies in isolation and show that a shrinkage model results in significantly improved model accuracy. 1. INTRODUCTION Relational data offer unique opportunities to boost model accuracy and to improve decision-making quality if the learn- ing algorithms can effectively model the additional informa- tion the relationships provide. The power of relational data lies in combining intrinsic information about objects in iso- lation with information about related objects and the con- nections among those objects. For example, relational infor- mation is often central to the task of fraud detection because fraud and malfeasance are social phenomena, communicated and encouraged by the presence of other individuals who also wish to commit fraud (e.g., [4]). In particular, the presence of autocorrelation provides a strong motivation for using relational techniques for learning and inference. Autocorrelation is a statistical dependency be- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee....
View Full Document

This note was uploaded on 11/12/2010 for the course CSCI 271 taught by Professor Wilczynski during the Spring '08 term at USC.

Page1 / 6

angin-neville-snakdd2008 - A Shrinkage Approach for...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online