63 Pages

CSE-TR-546-08

Course: HIST 546, Winter 2008
School: Michigan
Rating:
 
 
 
 
 

Word Count: 21264

Document Preview

Methods Novel in Information Retrieval Vahed Qazvinian Dragomir R. Radev School of Information and Department of EECS University of Michigan Winter, 2008 {vahed,radev}@umich.edu 1 Contents 1 Introduction 2 Random Walks 2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Random Walks and Electrical Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3...

Register Now

Unformatted Document Excerpt

Coursehero >> Michigan >> Michigan >> HIST 546

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Methods Novel in Information Retrieval Vahed Qazvinian Dragomir R. Radev School of Information and Department of EECS University of Michigan Winter, 2008 {vahed,radev}@umich.edu 1 Contents 1 Introduction 2 Random Walks 2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Random Walks and Electrical Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Random Walks on Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Semi-supervised Learning 3.1 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Co-training . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Graph-based Methods . . . . . . . . . . . . . . . . . . . . 3.2 Semi-supervised Classication with Random Walks . . . . . . . . 3.3 Graph-Based Semi-supervised learning with Harmonic Functions 4 Evaluation in Information Retrieval 4.1 Overview . . . . . . . . . . . . . . . . . . . . . 4.1.1 Binary Relevance, Set-based Evaluation 4.1.2 Evaluation of Ranked Retrieval . . . . . 4.1.3 Non-binary Relevance Evaluation . . . . 4.2 Assessing Agreement, Kappa Statistics . . . . . 4.2.1 An Example . . . . . . . . . . . . . . . 4.3 Statistical testing of retrieval experiments . . . 5 Blog Analysis 5.1 Introduction . . . . . . . . . . . 5.2 Implicit Structure of Blogs . . . 5.2.1 iRank . . . . . . . . . . 5.3 Information Diusion in Blogs . 5.4 Bursty Evolution . . . . . . . . 5.4.1 Time Graphs . . . . . . 5.4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 5 5 7 9 .9 .9 . 10 . 12 . 12 . . . . . . . . . . . . . . . . . . . . . 13 13 13 14 14 15 15 15 17 17 17 18 18 20 20 21 22 22 22 23 24 24 25 25 6 Lexical Networks 6.1 Introduction . . . . . . . . . . . . . . . . . . 6.1.1 Small World of Human Languages . 6.2 Language Growth Model . . . . . . . . . . . 6.3 Complex Networks and Text Quality . . . . 6.3.1 Text Quality . . . . . . . . . . . . . 6.3.2 Summary Evaluation . . . . . . . . . 6.4 Compositionality in Quantitative Semantics 7 Text Summarization 27 7.1 LexRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7.2 Summarization of Medical Documents . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7.3 Summarization Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2 7.4 MMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 32 32 33 34 34 35 35 37 37 39 40 41 42 42 43 43 45 46 46 46 47 48 48 50 50 50 51 52 54 54 55 56 57 59 8 Graph Mincuts 8.1 Semi-supervised Learning . 8.1.1 Why This Works . . 8.2 Randomized Mincuts . . . . 8.2.1 Graph Construction 8.3 Sentiment Analysis . . . . . 8.4 Energy Minimization . . . . 8.4.1 Moves . . . . . . . . 9 Graph Learning I 9.1 Summarization . . . . . 9.2 Cost-eective Outbreak 9.2.1 Web Projections 9.3 Co-clustering . . . . . . . . . . . . . . . . . . . . . . . 10 Graph Learning II 10.1 Dimensionality Reduction 10.2 Semi-Supervised Learning 10.3 Diusion Kernels . . . . . 10.3.1 Reformulation . . 11 Sentiment Analysis I 11.1 Introduction . . . . . . . 11.2 Unstructured Data . . . 11.3 Appraisal Expressions . 11.4 Blog Sentiment . . . . . 11.5 Online Product Review . . . . . 12 Sentiment Analysis II 12.1 Movie Sale Prediction . . . . 12.2 Subjectivity Summarization . 12.3 Unsupervised Classication of 12.4 Opinion Strength . . . . . . . 13 Spectral Methods 13.1 Spectral Partitioning . . . 13.2 Community Finding Using 13.3 Spectral Learning . . . . . 13.4 Transductive learning . . References ..... ..... Reviews ..... ........ Eigenvectors ........ ........ 3 1 Introduction Information retrieval (IR) is the science of information search within documents, relational databases, and the World Wide Web (WWW). In this work, we have tried to review some novel methods in IR theory. This report covers a number of the state of art methods in a wide range of topics with a focus on graph-based techniques in IR. This report is created based on the literature review done as a requirement of the Directed Study, EECS 599, course at the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor. The rst author would like to thank Prof. Dragomir R. Radev for his discussions, reviews, and useful suggestions to support this work. 4 2 2.1 Random Walks Preliminaries A random walk is a mathematical formalization of successive movements in random directions. This analysis applies to Computer Sciences, Physics, Theory of Probability, Economics and some other elds of studies. In general, the position of a random walker is determined by its previous state (position) and a random variable that determines the subsequent step length and direction. More formally, X(t + ) = X(t) + ( ) where X(t) is the position of the random walker at time t and ( ) is a random variable that determines the rule to take the next step. Dierent categorizations for random walks have been proposed, based on whether they are discrete or continues, biased or unbiased, one dimensional or of higher dimensions. The simplest form of a random walk is a discrete one-dimensional walk in which ( ) is a Bernoulli random variable with p being the probability of a positive direction, and 1 p being the probability of a negative direction. Polyas theorem [63] indicates that a random walker on an innite lattice in d-dimensional space is bound to return to the starting point when d = 2, but it has a positive probability of escaping to innity without returning to the starting point when d 3. The walk that will meet its starting place is called recurrent, while if theres a positive probability of escaping it is called transient. An innite lattice can be considered as an extremely big graph that ts in it. Let Gr be the subgraph of an innite lattice in which no node has a lattice distance of greater than r from the origin. This means that no shortest path along the edges of the lattice with its head in origin has length greater than r. Let G(r) be the sphere of radius r about the origin, (points with the exact distance r from the origin). G(r) looks like a square in a 2-dimensional lattice. Now (r) consider a random walker that starts its walk from origin. If pesc be the probability that the random walker reaches G(r) before returning to origin, then the escape probability of the random walker (r) is limr pesc which decreases as r increases. With this denition, If this limit is 0, the walk is recurrent, otherwise it is transient. In the rst part of this chapter we will have an overview of the electrical approach to proof Polyas theorem discussed in [28], and in the second part we discuss the work described by Aldous and Fill [5]. 2.2 Random Walks and Electrical Circuits One of the attempts has been to interpret the Polyas theorem as a statement about electric networks, and then to prove the theorem using the electrical theoretic point of view [28]. (r) To determine pesc electrically, the authors in [28] ground all the points of G(r) and maintain the origin at one volt, and measure the current i(r) owing into the circuit. They show that p(r) = esc i(r) 2d For d being the dimension of the lattice. Given that the voltage is 1, then i(r) is the eective 5 conductance between the origin and G(r): i(r) = So p(r) = esc 1 (r) Ref f 1 2dRef f 1 2dRef f (r) If the eective resistance from origin to innity is Ref f then pesc = Thus if the eective resistance of the d-dimensional lattice is innite, then the escape probability is zero and the walk is recurrent. In other words: simple random walk on a graph is recurrent if and only if a corresponding network of 1-ohm resistors has innite resistance out to innity. It is trivial that an innite line of resistors has innite resistance. Using this fact, Doyle and Snell [28] conclude that simple random walk on the 1-dimensional lattice is recurrent, which conrms Polyas theorem. For the two-dimensional case, they use the Shorting Law to calculate the resistance of the lattice network out to innity as 1 4n + 8 n=1 which tends to . So in the two dimensional case: pesc = 1 = 2dRef f 2d 1 1 n=1 4n+8 =0 Which again conrms that simple random walk on the 2-dimensional lattice is recurrent as stated by Polyas theorem. For three dimensional lattice, denote by P (a, b, c; n) the probability that the random walker be at (a, b, c) at time n. p(0, 0, 0; 0) = 1 and p(a, b, c; n) = + + 1 p(a 1, b, c; n 1) + 6 1 p(a, b 1, c; n 1) + 6 1 p(a, b, c 1; n 1) + 6 1 p(a + 1, b, c; n 1) 6 1 p(a, b + 1, c; n 1) 6 1 p(a, b, c + 1; n 1) 6 Using the technique of generating functions they reach p(a, b, c; n) = 1 . (2)3 ( cos x + cos y + cos z n ) cos(xa) cos(yb) cos(zc)dxdydz 3 6 Then the expected value of the number of returns to origin is reached by assigning a = b = c = 0 and summing over n: m= 3 (2)3 1 dxdydz 3 (cosx + cosy + cosz) A simple solution to this integral is proposed in [34] who evaluated this integral in terms of gamma functions: 6 1 5 7 11 m= ( )( )( )( ) = 1.516386... 3 32 24 24 24 24 where (x) = et tx1 dt 0 If u is the probability that a random walker, starting at the origin, will return to the origin, then the probability that the walker will be there exactly k times (counting the initial time) is uk (1 u). Thus, if m is the expected number of times at which the walker is at the origin: m= k=1 kuk1 (1 u) = 1 1u So, m1 m This shows, in this particular 3-dimensional lattice, that the probability of a random walker, starting at the origin, returning to the origin is u= u = 0.340537... Which means that in the 3-dimensional lattice there is a non-zero probability that the random walker will not return to the origin and will escape. 2.3 Random Walks on Graphs In this section, we mainly focus on the graph application of Markov chains [5]. A Markov Chain is a stochastic process {Xn , n = 0, 1, 2, } that takes on a nite or countable number of possible values. Whenever the process is in state i, theres a xed probability Pij that it will in state j in the next step. P {Xn+1 = j|Xn = i, Xn1 = in1 , , X1 = i1 , X0 = i0 } = Pij where Pij i, j 0, 0, Pij = 1 j=0 P is called the one-step transition Probability matrix. 7 P = P00 P01 P02 P10 P11 P12 . . . . . . . . . Pi0 Pi1 Pi2 . . . . . . . . . If is the stationary distribution of a nite irreducible discrete-time chain (Xt ), the chain is reversible if i pij = j pji In fact, if (Xt ) is the stationary chain (X0 has distribution ) then (X0 , X1 , , Xt ) = (Xt , Xt1 , , X0 ) In the forth section, the authors talk about coupling which is a methodology to nd an upper(i) (j) bound for d(t) := maxij Pi (Xt = .) Pj (Xt = .) . Coupling is a joint process ((Xt , Xt ), t 0) (i) (j) such that: Xt is distributed as the chain started at i, and Xt is distributed as the chain started at j. Then the assumption is made that there is a random time T ij < such that (j) (j) Xt = Xt , T ij t < and T ij is called the coupling time. The coupling inequality is Pi (Xt .) Pj (Xt .) = Pi (Xt ) .) Pj (Xt ) .) P (Xt = Xt P (T ij > t) (i) (j) (i) (j) The calculation of random walks on large graphs can be done under two settings. First, when the graph is just 1-dimensional. Second, when the graph is highly symmetric. Aldous and Fill [5] further talk about tree states, and that on a n-vertex tree, a random walks stationary distribution is rv v = 2(n 1) 8 3 3.1 Semi-supervised Learning Survey Many semi-supervised learning methods have been used before. Some are EM with generative mixture models, self-training, co-training, transductive support vector machines, and graph-based methods. There is no direct answer which one should be used or which is the best. The reason is that the labeled data is scarce and semi-supervised learning methods tend to make strong model assumption which is highly dependent on the problem structure. The big picture is that semi-supervised learning methods use unlabeled data to modify the hypotheses (p(y|x)) derived from labeled data (p(x)) alone. In this section semi-supervised learning refers to semi-supervised classication, in which one has additional unlabeled data and the goal is classication, unless otherwise noted. 3.1.1 Co-training Co-training [55] is a semi-supervised learning techniques which makes three preliminary assumptions: 1. The feature space can be splitted up into two separate sets. 2. Each feature space is sucient to train a good classier. 3. The two sets are conditionally independent given the class. Initially, each classier is trained using the initial labeled data on the corresponding feature space.Then both classiers classify all unlabeled data. In each iteration each classier adds n negative data points and p positive ones, about which it is the most condent to the labeled data and returns the rest to the shared unlabeled data. Both classiers will then re-train using the new labeled data. Input:: F = (F1 , F2 , ., Fm ) which are m semantically dierent feature sets (that can be considered as dierent views of the data) C = (C1 , C2 , ., Cm ) which are m supervised learning classiers (each of which corresponds to a unique feature set) L is a set of labeled training data U is a set of unlabeled training data The procedure of Co-training is as follows. Training each classier Ci initially on L with respect to Fi For each classier Ci : Ci labels the unlabeled training data from U based on Fi Ci chooses the top p positive and top n negative labeled examples E from U according to the condence of the prediction 9 Remove E from U Add E into L with corresponding labels predicted by Ci More generally, we can dene learning paradigms that use the agreement among dierent learners in which multiple hypotheses are trained from the shared labeled data, and are necessary for making predictions on given unlabeled data. In general, multiview learning models do not require the assumptions made by Co-Training. 3.1.2 Graph-based Methods In graph based methods, the techniques are applied on graphs with labeled an unlabeled nodes, and similarity of nodes as edges. Graph based methods are problems that deal with estimating a function f on the graph where f should satisfy two properties: 1. Should be close to the given labels yL on the labeled nodes. (loss function) 2. Should be smooth on the whole graph. (regularizer) Most of the graph based methods are only dierent from one another in that they have dierent loss functions and regularizers. It is more important to construct a good graph than to choose among dierent methods. We will itemize some of the methods and discuss the graph construction after that. Mincut Blum and Chawla [14] represent the problem as a mincut problem in which positive labels act as sources and negative labels act as sinks. The goal is to nd a minimum set of edges whose removal blocks all ow from the sources to the sinks. After that, all nodes connected to the sources are labeled as positive, and all nodes connected to the sinks are labeled as negative. The loss function is a quadratic loss with innity weight iL (yi yi|L )2 This ensures that the values on the labeled data are xed at their given labels. The regularizer is 1 1 ij |yi yj | = ij (yi yj )2 2 2 i,j i,j The above equality holds since y takes binary values. Putting all together, the goal of the mincut method is to minimize iL (yi yi|L )2 + 1 2 ij (yi yj )2 i,j given the constraint yi {0, 1}, i 10 Gaussian Random Fields Zhu et, al [74] introduce the Gaussian random elds and harmonic function methods which is a relaxation of discrete Markov random elds. This can be viewed as having a quadratic loss function with innity weight, so that the labeled data are xed at given label values, and a regularizer based on the graph combinatorial Laplacian () iL (fi yi )2 + 1 2 ij (fi fj )2 i,j = i: (fi yi )2 + f T f fi R is the key relaxation to Mincut. Local and Global Consistency The method of local and global consistency [76] uses the loss function n (fi yi )2 i=1 But in the regularizer it uses a normalized Laplacian, that is, D1/2 D1/2 . The regularizer is fj 2 1 fi ) = f T D1/2 D1/2 f ij ( 2 Djj Djj ij Local Learning Regularization The solution of graph-based methods can often be viewed as local averaging. In other words, the solution f (xi ) in an unlabeled point xi is the weighted average of its neighbors solutions. Since most of the nodes are unlabeled, we do not require the solution to be exactly equal to the local average, but regularize f so they are close. A more general approach is to generalize local averaging to a local linear t. To do so, one should build a local linear model from xi s neighbors to predict the value at xi . Thus, the solution f (xi ) is regularized to be close to this predicted value. Tree-Based Bayes In this method, a tree, T , is constructed with the labeled and unlabeled data as the leaf nodes, and has a mutation process. In the mutation process a label at the root propagates down to the leaves. While moving down along edges, a label mutates at a constant rate. This makes the tree uniquely dene the probabilistic distribution P (X|Y ) on discrete Y labelling. In fact, if two leaf nodes are closer in the tree, they have a higher probability of sharing the same label. Although the graph is the core of the graph-based methods, its construction for this purpose has not been studied widely. Majer and Hein [47] propose a method to denoise point samples from a manifold. This can be considered as a preprocessing step to build a better graph in order to do the semi-supervised learning on the graph which is less noisy. Such preprocessing results in a better semi-supervised classication accuracy. 11 3.2 Semi-supervised Classication with Random Walks In this section we reviewed a method of semi-supervised classication using random walks described in [68]. In their work, Szummer and Jaakkola [68] rst create a weighted K-nearest neighbor graph, with a given metric, and form the one-step transition probability matrix as ik pik = j ij if i and k are neighbor nodes, otherwise pik = 0. Two points are considered close if they have nearly the same distribution over the starting states. When t all the points are indistinguishable if the original neighbor graph is connected. If P is the one step transition probability then P t is the t-step transition probability which is rowstochastic, i.e. rows sum to one. A point k, whether labeled or unlabeled, is interpreted as a sample from the t-step Markov random walk. The posterior probability o the label for point k is given by Ppost (y|k) = i t P (y|i)Pik The class is then the one that maximizes the posterior probability: ck = argmaxc {Ppost (y = c|k)} The authors then use two separate techniques to estimate the unknown parameter P (y|i): Maximum likelihood with EM and maximum margin given constraints. They conclude that the Markov random walk representation provides a robust approach to classify datasets with signicant manifold structure and very few labeled data. 3.3 Graph-Based Semi-supervised learning with Harmonic Functions Most of the semi-supervised learning algorithms are based on the manifold structure and the assumption that similar examples should belong to the same class. The goal in graph classication is to learn a function f : V R. For all labeled nodes vi we have f (vi ) {0, 1}. The desired situation is that close and similar nodes are assigned to the same classes. Therefore, an Energy function is dened as: E(f ) = 1 2 ij (f (i) f (j))2 i,j Solving the classication problem, now reduces to minimizing the energy function. A solution f to this minimization problem is a harmonic function with the following property: f (j) = 1 dj ij f (i) ij for j = l + 1, , l + u. It should be noted to keep the value of labeled nodes constant (0 or 1). The other nodes should have values that are calculated with the weighted average of their neighbors. The harmonic function is a real relaxation of the mincut function. Unlike mincut, which is not unique and is NP-hard, the minimum-energy harmonic function is unique and eciently computable. Although harmonic function is not formulated as discrete values, one can use the domain knowledge or hard assignments to map from f to labels. 12 Retrieved Not Retrieved Relevant true positive (tp) false negative (fn) Not Relevant false negative (fn) true negative (tn) Table 1: Retrieval contingency table 4 4.1 Evaluation in Information Retrieval Overview The rst part of this study focuses on an overview of Information Retrieval (IR) evaluation in general. Concepts in this section are mainly referred to [49]. To evaluate the eectiveness of an IR system, one needs three things: 1. Document collection. 2. A set of queries. 3. A set of relevance judgment for each query. (Usually a binary judgment as relevant and not relevant) In fact, a document is called relevant if it addresses the users information need. The eectiveness of some IR systems depends on the value of a number of parameters. Manning et, al [49] adds that It is wrong to report results on a test collection which were obtained by tuning these parameters to maximize performance on that collection. 4.1.1 Binary Relevance, Set-based Evaluation Two basic measures in IR system is precision and recall. Precision (P ) is a fraction of retrieved documents that are relevant and Recall (R) is the fraction of relevant documents that are retrieved. According to table 1 the precision and recall values are calculated as: P = tp/(tp + f p) R = tp/(tp + f n) (1) (2) Another measure to evaluate an IR system is accuracy which is the fraction of its classications that are correct: Accuracy = (tp + tn)/(tp + tn + f p + f n) (3) In most of the cases the data is extremely skewed: normally over 99.9% of the documents not relevant to the query [49]. Thus, trying to nd some relevant documents will result in a large number of false positives. A measure to make a trade o between precision and recall is the Fm easure: F= 1 ( 2 + 1)P R 2 1 = , = /P + (1 p)/R 2P + R (4) Values of < 1 emphasize precision, while values of > 1 have recall highly favored. 13 4.1.2 Evaluation of Ranked Retrieval Measures mentioned in previous section are set-based measures, and use an unordered set of retrieved documents to evaluate the IR system. The rst approach to evaluate a ranked retrieval is to draw the precision-recall plot. The precision-recall plot have a distinctive shape in which if the (k + 1)th document retrieved is nonrelevant then recall is the same as for the top k documents, but precision has dropped. If it is relevant, then both precision and recall increase and the curve makes a shift to the right. Usually the interpolated precision (Pinterp ) is plotted. Pinterp is the highest precision value of the system at a certain recall level. For a recall value r: Pinterp (r) = maxr r P (r ) (5) The intuition behind this denition is that everybody would want to retrieve a few more documents if it would increase precision. The 11-point interpolated average precision is a traditional approach for this and was used in the rst 8 TREC Ad Hoc evaluations. The interpolated precision is measured at the 11 recall levels between 0.0-1.0. Another standard measure in TREC is Mean Average Precision (MAP) which reports a single value for the eectiveness of IR system. Average precision focuses on returning more relevant documents earlier. It is the average of precisions computed after cutting the list after each of the relevant documents retrieved in turn. The mean average precision is then the mean value of the average precisions computed for each of the queries separately. If the set of relevant documents for a query qj Q is {d1 , , dmj } and Rjk is the set of ranked results from the top result until the document dk then 1 M AP (Q) = |Q| |Q| j=1 1 mj mj P recision(Rjk ) k=1 (6) MAP is roughly the average area under the precision-recall curve for a set of queries. Precision at k is a measure that nds the accuracy of an information retrieval system bashed on the rst few pages of the results and does not require any estimate of the size of the set of relevant documents. An alternative for this measure is R-Precision which requires having a set of known relevant documents of size R. The precision is then calculated when the top R documents returned. 4.1.3 Non-binary Relevance Evaluation Normalized discounted cumulative gain (NDCG) [38] is an increasingly adopted measure and is designed for situations of non-binary relevance and like precision at k, it is evaluated using k top search results. If R(j, d) is the relevance score the annotators gave to document d for query j then, N DCG(Q, k) = 1 |Q| |Q| k Zk j=1 m=1 2R(j,m) 1 log(1 + m) (7) Zk is a normalization factor calculated to ensure a perfect rankings NDCG at k is 1. 14 Judge 1 Relevance Yes No Total judge 2 Yes 300 10 310 Relevance No 20 70 90 Total 320 80 400 Table 2: Example on statistics 4.2 Assessing Agreement, Kappa Statistics Once we have the set of documents and the set of queries we need to collect relevance assessments. This task is quite expensive in that judges should agree on the relevance of a document for each given query. The Kappa statistics [19] is = Pr(A) Pr(E) 1 Pr(E) (8) where P r(A) is the relative observed agreement among judges, and P r(E) is the probability that the judges agree by chance. The above formula is useful to calculate the kappa when there are only two judges. In the cases when there are more than two relevance judgments for each query Fleiss kappa is calculated: = P Pe 1 Pe (9) 1 Pe is the maximum agreement that is reachable above chance, and, P Pe gives the agreement actually which is actually achieved above chance. A value of 1 means complete agreement while it means disagreement when it is 0. Krippendor [41] discusses that nding associations between two variables that both rely on coding schemes with < 0.7 is often impossible, and that content analysis researchers generally think of > 0.8 as good reliability with 0.67 < < 0.8 allowing provisional conclusions to be drawn. Kappa is widely accepted in the eld of content analysis and is interpretable. 4.2.1 An Example Table 2 shows an example from [49] on the judgments of two assessors. P (A) = (300 + 70)/400 = 370/400 = 0.925 P (nonrelevant) = (80 + 90)/(400 + 400) = 170/800 = 0.2125 P (relevant) = (320 + 310)/(400 + 400) = 630/800 = 0.7878 P (E) = P (nonrelevant)2 + P (relevant)2 = 0.21252 + 0.78782 = 0.665 = (P (A) P (E))/(1 P (E)) = (0.925 0.665)/(1 0.665) = 0.776 4.3 Statistical testing of retrieval experiments An evaluation study is not always complete without some measurement of the signicance of the dierences between retrieval methods. Hull [37] focuses on comparing two or more retrieval methods using statistical signicance measures. The t-test compares the magnitude of dierence between 15 methods to the variation among the dierences between the scores for each query which is analyzed. If the average dierence is large comparing its standard error, then the methods are signicantly dierent. The t-test assumes that the error follows a normal distribution. Two non parametric alternatives to t-test are paired Wilcoxon signed-rank test and the sign test. Let Xi and Yi be the scores of retrieval methods X and Y for query i, and let Di = + i where i are independent. The null hypothesis is = 0. Paired t-test t= Paired Wilcoxon test T= Sign Test T= 2 I[Di < 0] n n Ri 2 Ri D s(Di )/ n , Ri = sign(Di ) rank|Di | where I[Di > 0] = 1 if Di > 0 and 0 otherwise. 16 5 5.1 Blog Analysis Introduction Nowadays weblogs, (aka. blogs) play an important role in social interactions. People who maintain blogs and update them, so called bloggers, involve in a series of interactions and interconnections with other people [12]. Each blog usually has a constant regular number of readers. These readers might make links to that blog or comment its posts which is said to be a motive for future postings [69]. Although, not until 1997, had the term blog been coined [2], today many people maintain blogs of their own and update it regularly, writing their feelings, thoughts, and any other thing they desire. The role of blogs as a social network is clear, and so, many research projects are devoted to analyze relations in blogs, from political conclusions [2] to nding mutual awareness and communities [45]. There are two reasons why blogs are systematically studied [42]: 1. Sociological reasons the cultural dierence between the blogspace and regular webpages is that blogspace focuses heavily on local community interactions between a small number of bloggers. There is also a sequence of responces when hot topics arise. This makes it necessary to see if this dynamics can be explained and modeled. 2. Technical reasons The blogspace provides the notion of timestamp for blog posts and comments which makes it easier to analyze this space overtime. Analysis of the bursty evolution in blogs concentrates on bursts in the evolution of blogspace. 5.2 Implicit Structure of Blogs Blogs are wonderful to analyze to track memes as they are constantly used and have and underlying structure. Blogs are signicantly used to record diaries and are easy to update, so make the online document growth faster. In addition, the hyperlinks between bloggers form a network structure in the blogspace which makes the blogspace a wonderful testbench to analyze information dynamics in networks. The microscopic patterns of blog epidemics are splitted up into implicit and explicit. Blog epidemics studies try to address two questions: Do dierent types of information have dierent epidemic proles? Do dierent types of epidemic proles group similar types of information? General categories of information epidemics as well as introducing a tool to infer and visualize the paths specic infections in the blog network are the main contributions of [3]. Microscale dynamics of blogs is also studied in [3], where the authors specify two major factors to consider to address epidemics in blogs: Timing and graph. A few problems are to be considered in this analysis. First, The root is not always clearly identied. Second, there might be multiple possibilities for infection of a single node. As an example assume, A, B, C are blogs. B links to A and C links to both A, B. C might be infected either by B (After B is infected by A) or directly by A. Third, there is uncrawled space and the whole blogspace structure is not known to the analyst. Explicit 17 Figure 1: The results on the similarity of pairs in [3] analysis is easy and can be done using the link structure. It is usually hard to analyze the implicit structure of blogs though. This is known as the link inference problem. Possible ways are to use machine learning algorithms such as support vector machines, or logistic regression. The full text of blogs, blogs in common, links in common, and history of infection are used as available source of information. One experiment that Adar et, al [3] carry is to nd pairs that are infected and are linked, and pairs that are infected but are unlinked. Although in general case the machine learning approach for both experiments shows more than 90 percent accuracy in the classication task, it doesnt work well in the specic case, in which for a given epidemic it connects all blogs. Figure 1 is selected from [3] and shows the results on the distribution of similarity of pairs in links to other blogs. 5.2.1 iRank Adar et, al [3] conclude with a description of a new ranking algorithm, iRank, for blogs. iRank, unlike traditional ranking algorithms, uses the implicit link structure to nd those blogs that initiate the epidemics. It focuses on nding true information resources in a network. if n blogs are pointing to one, and that one is pointing to another single information resource the latter might be the true source of information. In the iRank algorithm rst, a weighted edge is drawn for every pair of blogs that cite the same url, u: u Wij = W (dij ) Weights are then normalized so outgoing weights sum to one, and at last a pagerank is performed on the weighted graph. 5.3 Information Diusion in Blogs Dynamics of information in the blogosphere is studied in [36] and is the main focus of this section. Over the past two decades there has been an increasing interest in observing information ows as 18 well as inuencing and creating theses ows. The information diusion is characterized along two dimensions: Topics This includes identifying a set of postings that are about the same topic, and then characterizing the dierent patterns the set of postings may fall in. Topics, according to [36] are unions of chatter and spikes. The former refers to the discussions whose subtopic ow is determined by authors decisions. The latter refers to short-term, high intensity discussions of real world events. Individuals The individual behavior in publishing online diaries diers dramatically. [36] characterize the four categories of individuals based on their typical posting behavior within life cycle of a topic. The corpus was collected by crawling the web of 11,804 RSS blog feeds. Gruhl et, al [36] collected 2K-10K blog postings per day for a total of 401,021 postings in their data set. Based on their observations on manually analyzing 12K individual words highly ranked using tf-idf method, they decompose topics along two axes: chatters (internally driven and sustained discussions) and spikes (externally induced sharp rises in postings). Depending on the average chatter level topics can be places into one of the following three categories: 1. Just Spike 2. Spiky Chatter (Topics that are signicantly chatter, but alongside, have high sensitivity to external events) 3. Mostly Chatter The model of Topic = Chatter + Spikes is further rened to see if Spikes themselves are decomposable. For each proper noun x they compute the support s (the number of times x cooccurred with the target) and the reverse condence cr = P (target|x). To generate a rational term set, thresholds for s and cr are used. For these terms they look at the co-occurrences and dene spikes as areas where the posts in a given day exceed a certain number. The distribution of non-spike daily average is approximated by P r(avg # of posts per day > x) cex This observation is also made that most spikes in the manually labeled chatter topics last about 510 days. Table 3 shows the distribution of number of posts by user. The authors in [36] try to locate users whose posts fall in the categories show in table 3. Each post on day i has a probability pi that falls into a given category. They gather all blog posts that contain a particular topic into a list [(u1 , t1 ), (u2 , t2 ), , (uk , tk )] sorted by publication date of the blog, in which ui is the ID of the blog i, and ti is the rst time at which blog ui contained the reference to the topic. The aim is then to induce the relevant edges among a candidate set of (n2 ) edges. 19 Predicate RampUp Ramp-Down Mid-High Spike Algorithm All days in the rst 20% of post mass below mean and average day during this period below /2 All days in the last 20% of post mass below mean and average day during this period below /2 All days during middle 25% of post mass above mean and average day during this period below /2 For some day, number of posts exceed /2 Region rst 20% of posts mass last 20% of posts mass Middle 25% of posts mass From Spike to infection point below both directions. % of Topics 3.7% 5.1% 9.4% 18.2% Table 3: Distribution of the number of posts by user If a appears in the traversal sequence and b does not appear later in the same sequence, this shows valuable information about (a, b). If b were a regular reader of a then, memes discussed by a should sometimes appear in b. The authors present an EM like algorithm to induce the parameter of the transmission graph, in which they rst compute soft assignments of each new edge infection, and then update the edge parameters to increase the likelihood of assigned infections. They estimate the copy probability , and inverse mean propagation delay (r) as: r= and = jS1 jS1 S2 jS1 jS1 pj pj j pj j ) P r(r where P r(a < b) = (1a)(1(a1)b ) the probability that a geometric distribution with parameter a has value b. 5.4 Bursty Evolution The study of evolution of blogs is tightly coupled with the notion of a timed graph. There is a clear dierence between the community structures in the web and that of blogs. Blog communities are usually formed as a result of a debate over time which leaves a number of hyperlinks in the blogspace. Therefore, the community structure in blogs should be studied during short intervals, as the heavy linkage in a short period of time is not signicant when analyzed over a long time span. 5.4.1 Time Graphs A time graph G = (V, E) consists of 1. A set V of nodes, with each node associated with a time interval D(v) (duration). 2. A set E of edges where each edge is dened as a triple: e = (u, v, t) where u and v are nodes and t is a point in the time interval D(u) D(v) 20 5.4.2 Method The algorithm in [42] consists of two steps: pruning, and expansion. The pruning step is simply done by rst removing nodes with degree zero and one, and then checking all nodes with degree 2, to see if their neighbors are connected and form triangles. In that case, they are passed to the expansion procedure (i.e. growing the seed to a dense subgraph to nd a dense community.) and the resulted expansion is reported as a community and is removed from the graph. The results of studying the SCC evolution in blogs show that for each of the three largest strongly connected components, at the beginning of the study, the number of blog pages is signicant but there is no strongly connected component of more than a few nodes. Around the beginning of the second year, a few components representing 1-2% of the nodes in the graph appear, and maintain the same relative size for the next year. In the forth year, however, the size of the component increases to over 20% by the present day. The giant component still appears to be expanding rapidly, doubling in size approximately every three months. Results in [42] also indicate that the number of nodes in the communities for randomized blogspace is an order of magnitude smaller than for blogspace. This shows that the community formation in blogspace is not a property of the growth of the graph. In addition, the SSC in randomized blogspace grows much faster than in the original blogspace. This means that the striking community in the blogspace is a result of the links which are in deed referenced to topicality. 21 6 6.1 Lexical Networks Introduction One of the major dierences between humans and other species is that human beings are capable of adapting languages that are not shared by any other species. Human language allows the construction of a virtually innite range of combinations from a limited set of basic units. We are able to rapidly produce words to form sentences in a highly reliable fashion as the process of sentence generation is very rapid. Generation and evolution of languages causes the creation of new language entities. In this chapter we look at the some previous work on lexical networks. A lexical network refers to a complete weighted graph in which vertices represent linguistic entities, such as words or documents, and edge weights show a relationship between two nodes. Three main classes of lexical networks are word-based, sentence-based, and document-based networks. Word-based lexical networks can be further divided into networks based on co-occurrence, syllabus, semantic networks, and synthetic networks. 6.1.1 Small World of Human Languages The small world property [72] of language networks has been analyzed before [67]. Ferrer Cancho et, al [32] showed that the syntactic network of heads and their modiers exhibits small-world properties. The small-world property is also shown for the co-occurrence networks by [31]. A cooccurrence network is a network of words as nodes and an edge appear between two nodes if they appear in the same sentence. Many co-occurrences are due to syntactical relationships between words or dependency relationships [53]. In this section we take a deeper look at the work by [31]. They set the maximum distance according to assume a co-occurrence to be the minimum distance at which most of the co-occurrences are likely to happen: Many co-occurrences take place at a distance of one. Many co-occurrences take place at a distance of two. It is also shown before that co-occurrences happen at distances greater than two [22]. However, for four reasons they decide to just consider maximum distance of two for a co-occurrence: 1. Unavailability of a system to perform the task of considering any distances greater than two, and that all achievements in computational linguistics are based on consideration of a maximum distance of two. 2. Method failure in capturing exact relationships at distances greater than two. 3. To make this as automatic as possible they stick with distance two and try to collect as many links as possible. 4. long distance syntactic dependencies imply smaller syntactic links. The authors consider the graph human language, L , as dened by L (WL , EL ), where WL = {wi }, (i = 1, , NL ) is the set of NL words and EL = {wi , wj } is the set of edges or connections between 22 words. They call the biggest connected component of the networks that results from the basic and improved methods,, respectively, the unrestricted word network (UWN) and the restricted word network (RWN). They analyze the degree distribution for the unrestricted word network and the restricted word network of about three quarters of the 107 words of the British National Corpus (http://info.ox.ac.uk/bnc/). The authors further show that the degree increases as a function of frequency, with exponent 0.80 for the rst and 0.66 for the second segment. In summary, they show that the graph connecting words in language exhibits the same statistical features as other complex networks. The observe short distances between words arising from the small-world structure. They conclude that language evolution might have involved the selection of a particular arrangement of connections between words. 6.2 Language Growth Model In this section we summarized the work by Dorogovtsev and Mendes [27] which describes the evolution of language networks. Human language can be described as a network of linked words. In this network neighboring words in language sentences are connected by edges. Dorogovtsev and Mendes [27] treat the language as a self-organizing network and in which nodes interact with each other and the network grows. They describe a network growth with the following rules: At each time step, a new vertex (word) is added to the network. A new node at its birth, connects to several old nodes whose number is of the order of 1. For convention it is assumed that a new node only attaches to one node with the probability proportional to its degree. At each time increment, t, ct edges emerge in the network between old words where c is constant. New edges connect nodes with probability proportional to their degree. In continues approximation, the degree of a vertex born at i and observed at t is described by k(i, t) = (1 + 2ct) t The total degree of the network at time t is t 0 k(i, t) t 0 k(u, t)du k(u, t)du = 2t + ct2 Solving for k(i, t) with the initial condition, k(t, t) = 1 results: k(i, t) = ct ci 1/2 2 + ct 2 + ci 3/2 The nonstationary degree distribution is then P (k, t) = 1 ci(2 + ci) 1 ct 1 + 2ci k 23 This results in a power-law degree distribution with two dierent regions, intersecting at kcross ct(2 + ct)3/2 Below this point, the degree distribution is stationary: P (k) 1/2k 3/2 = And above the cross point we they obtain: P (k, t) 1/4(ct)3 k 3 = which is a nonstationary degree distribution in this region 6.3 6.3.1 Complex Networks and Text Quality Text Quality The rst work at which we take a closer look is the text assessment method described by Antiqueria et, al [6] who looked at the text assessment problem using the concept of complex networks. In their work they model the text with a complex network, in which each of the N words is represented as a node and each connection between words as a weighted edge between the respective nodes representing those words. They also get use of a list of stop words to eliminate terms not associated with concepts. The dene two measures, instrength and outstrength corresponding to weighted indegree and outdegree respectively as N Kin (i) = j=1 W (j, i) and Kout (i) = N W (i, j) j=1 Antiqueria et, al [6] reach several conclusions: 1. The quality of the text tends to decrease with outdegree. 2. Quality is almost independent (increases only slightly) of the number of outdegrees for the good-quality texts, while for the low-quality texts the quality increases with the number of outdegrees. 3. The quality of the text decreases with the clustering coecient. 4. In low-quality texts there is much higher variability in the clustering coecients, which also tend to be high. 5. Quality tends to increase with the size of the minimum path for all the three denitions of path used in their work. 24 6.3.2 Summary Evaluation A summary evaluation method based on complex networks in discussed in [62]. The rst step to dene a measure to evaluate summaries using complex network concepts is to represent summaries as complex networks. In this representation terms of the summary are nodes in the network, and each association is determined by a simple adjacency relation. There is an edge between every adjacent words in the summary. The weight on edges show the number of times the two terms are adjacent in the summary. By modeling summaries as complex networks and by introducing a network metric, the authors showed it possible to evaluate dierent automatic summaries. Their measure is based on the number of strongly connected components in the network. More formally, they dene a deviation measure: g(M )|/N A where f (M ) is the function that determines the number of components for M words associations and g(M ) is the function that determines the linear variation of components for M words associations. In this measure N is the number of dierent words in the text and A is the total number of words associations found. Their measure is evaluated by using it to evaluate three automatic summarizers: GistSumm [61], GEI [60], and SuPor [56] against a manual summary. deviation = A M =1 |f (M ) 6.4 Compositionality in Quantitative Semantics The last article that we review in this chapter is on compositionality in quantitative semantics by [52]. They introduce a principle of latent compositionality into quantitative semantics. They implement a Hierarchical Constraint Satisfaction Problem (HCSP) as the fundamental text representation model. This implementation utilizes an extended model of semantic spaces which is sensitive to text structure and thus leaves behind the bag-of-feature approach. A simple example shows the dierent models behavior in interpreting a sentence All sculptures are restored. Only the lions stay in the garden. Vector Space (VS) The representation of this text in the vector space model is a bag of words in which stop words are ltered out. Important words (garden, lion, restore, sculpture, stay) are used to build a vector of weighted terms to represent the sentence in the model. Latent Semantic Analysis (LSA) LSA extends the previous approach. Unlike the VS model, it does not refer to a weighted bag of words, but utilizes factor analytic representations. In this model, the sample is represented by the strongest factors in locating the representations of the input words garden, lion, restore, sculpture, stay within the semantic space. Therefore, this can be assumed of a bag of featurevector representation but it still ignores the structure of the text. Fuzzy Linguistics (FL) FL derives a representation of text based on its integration hierarchy. This model expects the text as a whole suggest words like sculptural art, park, collection, but not veldt, elephant, or zoo. This analysis assumes the present order of sentences. It does not allow reinterpreting 25 lion if the order of the sentences is reversed. Moreover, the example presupposes that all coherence relations as well as the integration hierarchy have been identied before. The major contribution of [52] is, however, a formalization of the principle of compositionality in terms of semantic spaces. In particular, they try to ll the gap of the missing linguistic interpretability of statistical meaning representations. 26 7 Text Summarization This chapter covers some of the previous salient works on text summarization. 7.1 LexRank Lexrank is an stochastic graph based method to determine the lexical centrality of documents within a cluster, and is introduced in [30]. The network on which lexrank is applied is a complete weighted network of linguistic entities. These entities can be sentences, or documents. In its special application for summarization, lexrank is used to nd the centrality of sentences within a text document. To form the network, Lexrank utilizes the tf-idf term vector and the so called bag of words representation. This enables Lexrank to use the cosine similarity of documents as weight edges to build the entire network. The cosine similarity of two document vectors is calculated as di dj Sim( di , dj ) = |di ||dj | In this representation of documents, di = (w1i , w2i , , wV i ), where V is the size of the vocabulary and wji = tfji idfj tfji is the normalized term frequency of term j in document i and idfj is the inverse document frequency of the term j. The centrality of a node u in this network is related to its neighbors centralities and is calculated as: Sim(u, v) d + (1 d) p(v) p(u) = N zadj[v] Sim(z, v) vadj[u] d in this equation is the damping factor to ensure convergence of the values. Lexrank is evaluated using 30 documents of DUC 2003 and 50 documents of 2004 using ROUGE1 evaluation system. 7.2 Summarization of Medical Documents In this section we review the work in [4] whose main aim is to survey the recent work in medical documents summarization. One of the major problems with the the medical domain is that it suers particularly from information overload. This is important to deal with as it is necessary for physicians to be able to quickly and eciently access to up-to-date information according to their particular needs. The need for an automatic summarization system is more crucial in this eld as a result of number and diversity of medical information sources. These summarizers help medical researchers determine the main points of a document as quickly as possible. A number of factors have to be taken into consideration while developing a summarization system: 1 http://www.isi.edu/ cyl/ROUGE 27 Input One major factor is the type of the input to such system. Input can be Single Document or Multi Document. Additionally input language should be taken into account, which results in three dierent type of summarizing systems: monolingual, multilingual, or cross-lingual. The third input factor is type of the input, which can be anything from plain text to images and sound. Purpose These factors concern the possible uses of the summary, and the potential readers of the summary. A summary can be indicative or informative. The former does not claim any role of substituting the source documents, while the latter substitutes the original documents. In another categorization of summaries, a summary can be generic versus user-oriented. Generic systems create a summary of a document or a set of documents based on all the information found in the documents. User-oriented systems create outputs that are relevant to a given input query. From another perspective a summary can either be general purpose or domainspecic. Output Evaluation type of an automatic summarizing system determines its type of output. It can be either qualitative or it can be quantitative. The last, yet major factor in creating a summary is considering the relation that the summary has to the source document. A summary can be an abstract, or an extract. Extracts include material (i.e., sentences, paragraphs, or even phrases) from the source documents. An abstract, on the other hand, tries to identify the most salient concepts in the source documents,then utilizes natural language processing techniques to neatly present them. Generation. 7.3 Summarization Evaluation The existing evaluation metrics can be split into two categories: intrinsic and extrinsic. An intrinsic method evaluates the system summary independently of the purpose that the summary is supposed to satisfy. An extrinsic evaluation, on the other hand, evaluates the produced summary in terms of a specic task. The quality measures which are used by the intrinsic methos can be the integrity of its sentences, the existence of anaphors, the summary readability, the delity of the summary compared to the source documents. Gold summary is another way of doing an intrinsic evaluation. Gold summaries are human-made summaries that are ideal and can be used to compare to system summaries. However, it is usually hard to make annotators agree on what constitutes a gold summary. Tables 4, 5 summarize some of the extractive and abstractive methods of text summarization respectively. 7.4 MMR In this section we reviewed the MMR method which is a re-ranking algorithm based on diversity which is used to produce summaries. Conventional IR methods maximize the relevance of the retrieved documents to the query. This is, however, tricky when there are many potentially relevant documents with partial information overlap. Maximal Marginal Relevance (MMR) is a user-tunable 28 Input Single-document, English, text Sentences Intrinsic [29] Output Sentences Method Statistics (Edmundsonian, paradigm) no revision Evaluation ref. [46] Single-document, English, text Single-document, multilingual, text Sentences Purpose Generic, domain-specic (technical papers) Generic, domain-specic (scientic articles on specic topics) Generic, domain-specic (news) Sentences Intrinsic [24] [1] Single-document, multilingual, text User-oriented, domain-specic (scientic and technical texts ) Generic, domain-specic (news) Sentences Generic, general purpose Paragraphs Use of thematic keywords, no revision, Statistics (Edmundsonian paradigm) Statistics (Edmundsonian paradigm), no revision Statistics (Edmundsonian paradigm), use of thesauri, revision Language processing (to identify keywords) Extrinsic Intrinsic [20] 29 User-oriented, General purpose Generic, domain-specic (scientic articles) Text regions (sentences, paragraphs, sections) Sentences Multi-document, multilingual , text (English, Chinese) Single-document, English, text [66] Multi-document, English, text Intrinsic, extrinsic Intrinsic, extrinsic [48] Single-document, English, text Graph-based, statistics (cosine similarity, vector space model) Graph-based, cohesion relations, language processing Tree-based, language processing (to identify the ) (RST relations markers) [50, 51] Table 4: Summarizing systems with extractive techniques Input Single-document, English, text Templates Information extraction, NLG Output Scripts Method Script activation, canned generation Evaluation ref. [25] Multi-document, English, text Clusters Evaluation of System components Intrinsic [64] Single-document, English, text Ontology-based representation [10] 30 Conceptual representation in UNL Single-document, English, text Purpose Informative, user-oriented, domain-specic Informative, user-oriented, domain-specic Generic, domain-specic (news articles) Informative, user-oriented, domain-specic Extrinsic Syntactic processing of Representative Sentences, NLG Syntactic processing of Representative Sentences, ontology-based annotation, NLG Statistics (for scoring each UNL sentence), removing redundant words, combining sentences [65] Single-document, multilingual, text [27] Table 5: Summarizing systems with abstractive techniques method and a re-ranking method with a functionality to drill down on a narrow topic or retrieving a large range of relevance bearing documents. The criterion considered in MMR is the relevant novelty. The rst way to do this is to measure novelty and relevance independently, and then using a linear combination of two metrics provide a novelty-relevance measure. This linear combination is called the Marginal Relevance. So a document will have a high marginal relevance if it is both relevant and has the minimum information overlap with the previously selected documents. M M R = argmaxDi R\S [(Sim1 (Di , Q) (1 )maxDj S Sim2 (Di , Dj ))] with Q begin the query. This can be specically used in summarization. In that case, the top highly ranked passages of a document can be chosen to be included in the summary. This summarization is shown to work better for longer documents (which contain more passage redundancy across document sections [18]. MMR is also extremely useful in extracting passages from multiple documents that are about the same topics. 31 8 Graph Mincuts This chapter reviews some of the previous salient works on graph mincut utilization for solving machine learning related problems. 8.1 Semi-supervised Learning The major issue in all learning algorithms is the lack of sucient labeled data. Learning methods are usually used in classifying text, web pages, and images, and they need sucient annotated corpora. Unlike labeled data, whose creation is quite tedious, unlabeled data is quite easy to gather. For example, in a classication problem, one can easily access a large number of unlabeled text documents, but the number of manually labeled data can hardly exceed a few. This causes a great interest in semi-supervised methods. The rst paper that we review in this work uses graph mincuts to classify data points using both labeled and unlabeled data. This work utilizes the idea of graph mincuts [15]. Given a combination of labeled and unlabeled datasets, Blum and Chawla [15] construct a graph of the examples such that the minimum cut on the graph yields an optimal binary labeling data according to some predened optimization function. In particular, the goal in [15] is to 1. Find the global optimum which is better than a local optimum regarding the objective minimization function. 2. Utilize this method to reach a signicant dierence in terms of prediction accuracy. Blum and Chawla [15] also make an assumption that the unlabeled data comes under the same distribution that the labeled data does. The classication algorithm described in [15] has ve steps: 1. Construct a weighted graph G = (V, E) where V is the set of all data points as well as a sink v and a source v+ . The sink and the source are classication nodes. 2. Connect classication nodes to those labeled examples that have the same label with edges having innite weights. That is, for all v + add e(v+ , v) = and for all v , add e(v+ , v) = . 3. Weights between example vertices are assigned using a similarity/distance function. 4. Determine the min-cut of the graph. This means to nd a minimum total weight set of edges whose removal discounts v+ from v . The removal of edges on the cut will make a graph be split into two parts: V+ and V . 5. Assign positive labels to all nodes in V+ and negative labels to all nodes in V . 8.1.1 Why This Works The correctness of this algorithm is dependent on the choice of the weight function. For a certain learning algorithm, we can dene edge weights so that the mincut algorithm produces a labeling which minimizes the leave one out cross validation error when applied to entire U L. Additionally, for certain learning algorithms, we can also dene the edge weights so that the mincut algorithms 32 labeling results in a zero leave one out cross validation error in L. This is correct according to the following discussions: (See [15] for proof) If we dene edge weights between examples in a way that for each pair of nodes x, y, we have nnxy = 1 if y is the nearest neighbor of x, and nnxy = 0 otherwise, and also dene w(x, y) = nnxy + nnyx , then for any binary labeling of x, the cost of associated cut is equal to the number of leave-one-out cross validation mistakes made by 1-nearest neighbor on L U . Given a locally weighted averaging algorithm, we can dene edge weights so that the minimum cut yields a labeling that minimizes the L1 -norm leave one out cross validation error. Let w be the weight function used for the symmetric weighted nearest neighbor algorithm, then if the same function is used for nding the graph mincut, the classication resulted by this method has a zero leave one out cross validation error in U. Suppose the data is generated at random in the set of k ( , /4)-round regions, such that the distance between any two regions is at least an the classication only depends on the region to which a point belongs to. If the weighting function is w(x, y) = 1 if d(x, y) < and w(x, y) = 0 otherwise, then O((k/ ) log k labeled examples and O((1/V/4 ) log(1/V/8 ) unlabeled examples are sucient to classify a 1 O( ) fraction of an unlabeled examples correctly. Blum and Chawla [15] show the eciency of their method, using standard data as well as synthetic corpora they also show that this method is robust to noise. 8.2 Randomized Mincuts The method of randomized mincuts extends the mincut approach by adding some sort of randomness to the graph structure, with some preliminary assumptions. Blum et, al [16] assume that similar examples have similar labels. So a natural approach to use unlabeled data is to combine nearest-neighbor prediction. In other words, as an example, similar unlabeled data should be put into same classes. The mincut approach for classication has several properties. First, it can be easily found in polynomial time using network ow. Second, it can be viewed as giving the most probable conguration of labels in the associated Markov random eld. However, this method has some shortcomings. Consider, as an example, a line of n vertices between two points s, t. This graph has n 1 cuts of size 1 and the cut at the very left will be quite unbalanced. The method described in [16] proposes a simple method or addressing a number of these drawbacks using a randomization approach. Specically, the method repeatedly adds random noise to the edge weights. Then it solves the mincut for the graph and outputs a fractional label for each example. In this section we take a closer look at this method described in [16]. A natural energy function to consider is E(f ) = 1 2 wij |f (i) f (j)| = i,j 1 4 wij (f (i) (f (j))2 i,j 33 where the partition function Z normalizes over all labeling. Solving for the lowest energy conguration in this Markov random eld produces a partition of the whole dataset and maximizes self-consistency. The randomized mincut procedure is as follows: Given a graph G constructed from the dataset, produce a collection of cuts by repeatedly adding random noise to the edge weights and then solve the mincut in the resulted graph. Finally remove cuts which are highly unbalanced. Blum et, al [16] show that we only need O(k log n) labeled examples to be condent in a consistent of k edges. 8.2.1 Graph Construction For a given distance metric, we can construct a graph in various ways. The graph should be either connected or a small number of connected components cover all examples. If t components are needed to cover a 1 fraction of the points, the graph based method will then need at least t examples to perform well. Blum et, al [16] construct the graph by simply making a minimum spanning tree on the entire dataset. Their experimental setup is to analyze three datasets: handwritten digits, newsgroups, and the UCI dataset. The results from these experiments suggest the applicability of the randomized mincut algorithm various to settings. 8.3 Sentiment Analysis The analysis of dierent opinions and subjectivity has received a great amount of attention lately because of its various applications. Pang and Lee [59] discuss a new approach in sentiment analysis using graph mincuts. Previous approaches focused on selecting lexical features, and classied sentences based on some such features. Conversely, the method in [59] (1) labels the sentences in the document as either subjective or objective and then (2) applies a standard machine learning classier to make an extract. Document polarity can be considered as a special case of text classication with sentiment rather than topic based categories. Therefore, one can apply standard machine learning techniques to this problem. The pipeline in extracting the sentiments is like the following process: n-sentence Review subjectivity tagging m-sentence extraction positive/negative The cut based classication utilizes two weight functions: indj (x), and assoc(x, y). The denition of these functions will be discussed later in this chapter. Here, indj (x) shows the closeness of data point x to class j and the association function shows the similarity of two data points to each other. The mincut minimizes ind2 (x) + xC1 xC2 ind2 (x) + xi C1 ,xk C2 assoc(xi , xk ) It should be noted that every cut corresponds to a partition of items that has a cost equal to the NB partition cost, and the mincut minimizes this cost. The set individual scores ind1 (x) to P rsub (x), N B (x), where P r N B (x) is Naive Bayes estimate of the probability that and ind2 (x) to 1 P rsub sub sentence x is subjective (Weights of SVM can also be used). The degree of proximity is used as the association score of two nodes. assoc(xi , xj ) = f (j i) c 0 34 if j i T otherwise f (d) species how the inuence of proximal sentences decays with respect to distance d. Pang and Lee [59] use f (d) = 1, e1d , and 1/d2 . They show that the for Naive Bayes polarity classier, the subjectivity extracts are shown to be more eective input than the original documents. They also conclude that employing the minimum cut framework may result in an ecient algorithm for sentiment analysis. 8.4 Energy Minimization The last work that we took a look at is a method of fast approximation for energy minimization using mincuts, described in [17]. The motivation in this work comes from Computer Vision. As mentioned before, these problems can be naturally formulated in terms of energy minimization. This formulation means that one aims to nd a labeling function f that minimizes the energy E(f ) = Esmooth (f ) + Edata (f ) in which Esmooth is a functions that measures the extent to which f is not smooth, and Edata shows the disagreement between f and observed data. Here Edata can be Edata (f ) = pinP (fp Ip )2 with Ip being the observed intensity of p. The main problems in dealing with minimization problems is the large computational costs. Normally, these minimization functions have several local minima and nding the global minimum is a dicult problem, as the space of possible labeling has a big dimension, |P | and can be up to many thousands. The energy function can also be considered as E(f ) = {p,q}N Vp,q (fp , fq ) + pP Dp (fp ) where N is set of interacting pairs of pixels (typically, adjacent pixels). The method described in [17] generates a local minimum with respect to two types of very large moves expansion and swap. 8.4.1 Moves A labeling f can be represented by a partition of image pixels P = {Pl |l L}, and Pl = {p P |fp = l} is the subset of pixels labeled l. Given a pair of labels , , a move from partition P to a new partition P is called a swap if Pl = Pl for any label l = , . The swap means that the only dierence between P and P is that some pixels labeled in P are now labeled in P and some pixels labeled in P are now labeled in P . A move from partition P to a new partition P is called expansion for a label if P P and Pl Pl for any label l = . This moves allows a set of image pixels to change their labels to . Two moves are described in the following: 35 swap: 1. Start with an arbitrary labeling f . 2. Set Success = 0. 3. For each pair of labels {, } L Find f = arg min E(f ) among f within one swap. ) < E(f ), set f = f and Success = 1 If E(f 4. If Success = 1 goto 2. 5. Return f expansion 1. Start with an arbitrary labeling f . 2. Set Success = 0. 3. For each label L Find f = arg min E(f ) among f within one expansion of f . If E(f ) < E(f ), set f = f and Success = 1 4. If Success = 1 goto 2. 5. Return f Now given an input labeling f and a pair of labels, , they nd a labeling f that minimizes that minimizes E E over all labelings within one swap of f . Also, they nd a labeling f over all labelings within one expansion of f given an input labeling f , and a label . They do this part using graph mincuts. They further prove that, a local minimum f , when expansion moves are allowed, and f as the global minimum, satises E(f ) 2cE(f ) To evaluate their method, they present the results of experiments on visual correspondence for stereo, motion, and image restoration.. In image restoration they try to restore the original image from a noisy and aected image. In this example, labels are all possible colors. 36 9 Graph Learning I In this chapter and the next, we review some prior learning techniques which use the graph setting as the framework. We look at an application on text summarization, the problem of best outbreak in blog networks and those of sensor placement, the problem of web projection, and the co-clustering technique. 9.1 Summarization The rst section of this chapter describes a summarization method based on the approach in [75]. Zha [75] describes text summarization whose goal is to take a textual document, extract appropriate content, and present the most important facts of the text to the user, which matches the users need. Zha [75] adopts an unsupervised approach, and explicitly model both keyphrase and sentences that contain them, using the concept of bipartite graphs. For each document, they generate two sets of objects: the set of terms T = {t1 , t2 , , tn } and the set of sentences S = {s1 , s2 , , sm }. A bipartite graph is then constructed using these two sets from T to S, in a way that there is an edge between ti and sj if ti appears in sj . The weight of the edge is a nonnegative value, and can be set proportional to the number of times ti appears in sj . The major principle in this paper is the following: A term should have a high salience score if it appears in many sentences with high salience scores, while a sentence should have a high salience score if it contains many terms with high salience scores. More formally, u(ti ) v(sj )u(ti ) wij v(sj ) wij u(ti ) u(ti )v(sj ) v(sj ) where a b shows the existence of an edge between a, b. The matrix form of these equations will be u= v= 1 Wv 1T Wu It is then clear that u, v are the left and right singular vectors of W corresponding the singular value and that, if is the largest singular value of W , then both u and v have nonnegative components. For numerical computation of the largest singular value triple {u, , v}, Zha [75] uses a variation of the power method. Specically, the author chooses an initial value for v to be the vector of all ones. The following equations are then performed until convergence is achieved. 1. u = W v, u = u/ u 37 2. v = W T u, v = v/ v Zha [75] uses this salience score to identify salient sentences within topics. For this purpose they use clustering. For sentence clustering, they build an undirected weighted graph with sentences as nodes, and edges representing the fact that two nodes (sentences) are sharing a term. The weight wij is considered to be as sparse as W T W . Two sentences si and sj are said to be near-by if one follows the other in the linear order of the document. Then, set wij = wij + wij if si and sj are nearby otherwise For a xed they then apply the spectral clustering technique to obtain a set of sentence clusters (). Thus, the problem is reduced to minimizing a clustering cost function dened as GCV () = (n k J(W, ()))/( ()) where k is the number of desired sentence clusters, W is the weight matrix for term-sentence bipartite graph and () is the set of clusters obtained by applying the spectral clustering to the modied weight matrix W (). Also k ni (i) ||ws mi ||2 i=1 s=1 J(W, ) = and, mi = s=1 ni (i) ws /ni They then use the traditional K-means algorithm iteratively and in each iteration do the following steps: 1. For each sentence vector w, nd the center mi that is closest to w, and associate w with this cluster 2. Compute the new set of centers by taking the center of mass of sentence vectors associated with that center. This way they nd the local minimum of J(W, ) with respect to . They then use sum-of-squares formulation as a matrix trace maximization with special constraints, relaxing which, leads to a trace maximization problem that possesses optimal global solution. Formally, let mi = Wi e/ni and X = diag(e/ n1 , , e/ nk ) The sum of squares function can be written as J(W, ) = trace(W T W ) trace(X T W T W X) 38 so minimizing the above, means to maximize X T X=Ik max trace(X T W T W X) Their summarization algorithm is summarized as 1. Compute k eigenvectors Vk = [v1 , vk ] of Ws (), corresponding to k largest eigenvalues. 2. Compute the pivoted QR decomposition of VkT as (Vk )T P = QR = Q[R11 , R12 ] Q is a k-by-k orthogonal matrix, R11 is a k-by-k upper triangular matrix, and P is a permutation matrix. 3. Compute 1 1 R = R11 [R11 , R12 ]P T = [Ik , R11 R12 ]P T The cluster assignment of each sentence will then be determined by the row index of the largest element in absolute value of the corresponding column of R. 9.2 Cost-eective Outbreak The problem of outbreak in the networks is to nd the most eective way to select a set of nodes to detect a spreading process in the network. Solving this problem is receiving interest as under this setting many real-world problems can be modeled. A solution to this problem is suggested in [44], where they also apply their solution to city water pipe networks and blog networks. In the former we have a limited budget to put some sensors at some nodes in the network so that water contaminants can be detected as quickly as possible. The latter focuses on the problem of information spread in blogspace, and the user tries to read a particular number of posts to get the most up-to-date information about a story which is propagating in the network. More formally, in both problems, we seek to select a subset A of nodes in a graph G = (V, E) which detect outbreak quickly. The outbreak starts at some node and spreads through edges. Associated with each edge there is a time that it takes for the contaminant to reach the target node. Depending on every node we select to put sensors, we achieve a certain placement score, which is a set function R, mapping every placement A to a real number R(A) There is also a cost function c(A) associated with each placement, and we expect the entire cost of the sensor placement does not exceed our budget, B. Sensors placed in the network can be of dierent types, and quality, resulting in variety of costs. Let c(s) show the cost of buying a sensor s. then, the cost of the placement is A : c(A) = sA c(s). The problem is then to maximize the reward subject to the fact that cost is minimized. This can be formulated as follows: max R(A) subject to c(A) AV B Depending on the time t = T (i, s) at which we detect the outbreak in a scenario i, we incur a penalty i (t). which is a function depending on the scenario. The goal is then to minimize the expected penalty over all possible scenarios: (A) P (i)i (T (i, A)) 39 where, for a placement A V , T (i, A) is minimum among all T (i, s) for s A. T (i, A) is the time until event i is detected by a sensors in A rst. P is the probability distribution over the events and we assume it is given beforehand. An alternative formulation for the problem is to dene a penalty reduction function which is specic for every scenario, Ri (A) = () (A) and R(A) = i P (i)Ri (A) = () (A) The penalty function has several important properties: Firstly, R() = 0. Secondly, R is nondecreasing. That means, for A B, R(A) R(B). Thirdly, if we add a sensor to a small placement A, we improve the score at least as much as the time we add that sensor to a larger placement B A. This property for a set function is called sub-modularity. Leskovec et, al [44] show that, for all placements A B V and sensors s V \B, the following is true, R(A {s}) R(A) R(B {s}) R(B) Maximizing sub-modular functions is NP-hard in its general form. They develop the (cost-eective lazy forward selection) CELF algorithm which exploits sub-modularity to nd near-optimal solution and works well in the case where costs are not constants, in which greedy algorithm badly fails. 1 Their algorithm is guaranteed to achieve at least a fraction of 2 (1 1/e) of the optimal solution even in the case where every node can have a dierent cost. As far as running time is concerned, the CELF algorithm runs 700 times as fast as the standard greedy algorithm. 9.2.1 Web Projections Information retrieval methods usually make an assumption that documents are independent. However, an eective web search is not achievable unless the hyperlink relations between web pages are taken into consideration. Web projection concentrates on the creation and use of graphical properties of the web subgraph [43]. Web projection graph is a subgraph of the larger web graph projected using a seed set of documents retrieved as the result of a query. Leskovec et. al, [43] try to investigate how query search results project onto the web graph, how search result quality can be determined using properties of the projection graph, and if we can estimate the behavior of users with query reformulation given the projection graph. They start with a query and collect the initial seed data using a search engine. Leskovec et, al [43] then project the retrieved documents on the web graph. Projecting these documents on the web graph results in an induced subgraph named query projection graph. A Query connection graph is then created by adding intermediate nodes to make the query projection graph connected. These connecting nodes are actually not part of the search results. Using these two graphs (i.e., query projection graph and query connection graph) they construct features to describe the topological properties of the graphs. These features are used in machine learning techniques to build predictive models. Overall, they extract 55 features that best describe the topological properties of two graphs. These 55 features are grouped into four categories: Query projection graph features, which are to measure aspects of the connectivity of query projection graph. 40 Query connection graph features, which are to measure aspects of the connectivity of query connection graph. These are useful in capturing relations between projection nodes and connection nodes. Combination features, which are composition of features from other groups. Query features that represent the non-graphical properties of the result set. This feature is calculated using the query text and the returned list of documents relevant to the query. To do the projection their data contains 22 million web pages from a crawl of the web. The largest connected component is said to have 21 million nodes, while the second largest connected component contains merely several hundred nodes. The problem which is addressed in [43] is to project the results of a query on the web graph and extract attributes described above. This can then be used to learn a model that predicts a querys class. They also learn models to predict user behavior when reformulating queries. 9.3 Co-clustering The basic idea of clustering is to extract unique content bearing words given a set of text documents. If these words are treated as features, then documents can be represented as a vector of these features. A possible representation for such e setting is a word-by-document matrix, in which a non-zero entry indicates the existence of a particular word (row) in a document (represented by column). Words can be clustered on the basis of the documents they appear in. This clustering assumes that the more two terms have co-occurrences in documents, the closer they are. This type of clustering can be useful in building statistical thesaurus and query reformulation. The problem of simultaneously clustering words and documents is discussed in [26]. The primary assumption in [26] is that word clustering has inuence on document clustering, and so does document clustering on word clustering. A given word wi belongs to the word cluster Wm if its association with the document cluster Dm is greater than its association with other document clusters. A natural measure of association of a word with a document cluster is considered to be the sum of edge weights to all documents in the cluster [26]. Wm = {wi : jDm Aij jDl Aij }, l = 1, , k This means that each word cluster is determined by the document cluster. Similarly, each document cluster is determined by a word cluster: Dm = {wi : jWm Aij jWl Aij }, l = 1, , k According to Dhillon, the best word and document clustering would correspond to a minimum k-partitioning of the document-word bipartite graph. This clustering is performed using a spectral algorithm, and they show it works well by their experimental results on the Medline (1300 medical abstracts) and Craneld (1400 aeronautical systems abstracts) corpora. 41 10 Graph Learning II In this chapter we continue our review on some of the learning techniques that utilize graphs as their underlying framework. 10.1 Dimensionality Reduction In the areas of articial intelligence, and information retrieval, researchers usually encounter problems where they have to deal with high dimensional data which is naturally coming from a smaller number of dimensions. This data is actually low dimensional but is lying on a higher dimension space. The problem of dimensionality reduction focuses on representing high dimensional data into a lower dimension space. To do so Belkin and Niyogi [11] propose an algorithm that has several properties. First, their algorithm is quite simple, with few local computations. Second, the authors use the Laplace Beltrami operator in their algorithm. Third, their algorithm performs the dimensionality reduction in a geometric fashion. Forth, the use of Laplacian eigenmaps causes locality preserving which makes the algorithm insensitive against noise and outliers. The problem of dimensionality reduction can be formulated as follows: Given k points x1 , , xk in Rl , nd a set of k points y1 , , yk in Rm such that, m << l, and yi represents xi . The algorithm described in [11] is summarized as follows: 1. In this step we should construct a graph by putting an edge between nodes i and j if xi and xj are close. The proximity can whether be decided by using the Euclidian distance of nodes, and applying a threshold, or can be set to 1 if both of the nodes are in the others k-nearest neighbors. 2. In the second step, we try to weight the graph using a weight function. This weight function can be a heath kernel: 2 Wij = e xi xj t or using a simpler approach. That is, set Wij = 1 if and only if i and j are connected. 3. Compute eigenvalues and eigenvectors for the generalized eigenvector problem: Lf = Df Here D is diagonal weight matrix where Dii = j Wji , and L = D W is the Laplacian matrix. If f0 , , fk1 is the solution of the above equation, sorted by eigenvalues: Lfi = i Dfi and 0 = f0 1 k1 Then, leave out f0 , the corresponding eigenvector for the eigenvalue 0, and use the next m eigenvectors to embed the data in a m-dimensional Euclidean space: xi (f1 (i), , fm (i)) The authors conduct experiments on a simple synthetic example of swiss roll as well as an example from vision with vertical and horizontal bars in a visual eld. In both cases they use the simpler weight function ( Wij {0, 1} ) and show it works well for the evaluated datasets. 42 10.2 Semi-Supervised Learning In this section we focus on a semi-supervised learning method using harmonic functions described in [74]. Semi-supervised learning has received great attention since labeled examples are too expensive and time consuming to create. Building a large labeled dataset requires a large eort of skilled human annotators. The work in [74] adopts Gaussian elds over a continuous state space rather than random elds over discrete label set. The authors assume the solution is solely based on the structure of the data manifold. The framework of the solution is as follows: Suppose (x1 , y1 ), , (xl , yl ) are labeled, and (x1+1 , y1+1 ), , (xl+u , yl+u ) are unlabeled data points, where l << u, and n = l+u. Construct a connected graph of the data points with a weight function W: m (xid xjd )2 wij = exp( ) 2 d d=1 where xid i the component of instance xi represented as a vector xi inRm , and i are length scale hyperparameters for each dimension. They rst compute a function f on the graph and then assign labels based on that function. To assign probability distribution of functions f , they form the Gaussian eld eE(f ) p (f ) = Z where is an inverse temperature parameter, Z is the partition function, and E(f ) is the quadratic energy function: 1 E(f ) = wij (f (i) f (j))2 2 i,j dT h Zhu et, al [74] show that the minimum energy function is harmonic f = argminf |L=fi E(f ) The harmonic threshold is used to determine the class labels using the weight function. That is, assign 1 if f > 1/2 and assign 0 otherwise. Further, they describe how to use external classiers, classiers that are trained separately on labeled data point and are at hand. To combine the external classier with harmonic function, for each unlabeled node i they create a dongle node with the label given by the external classier, and assign the transition probability from i to its dongle to , and discount all other transitions from i by 1 . Then they perform the harmonic minimization on this augmented graph. 10.3 Diusion Kernels This section covers a feature extraction method described in [71]. This article uses a graph of genes, where two genes are linked whenever they catalyze two successive reactions to see if having the knowledge represented by this graph, can help improve the performance of gene function prediction. The formulation of the problem is as follows: The discrete set X represents the set of genes, and |X| = n. Then the set of expression prole is a mapping e : X Rp with p being the number of measurements. In this setting, e(x) is the expression prole of gene x. The goal is to use this graph to extract features from the expression proles. 43 Let F be the set of features. Each feature is simply a mapping on the set of genes to a real number, f : X R. The normalized variance of a linear feature is dened by: fe,v G, V (fe,v ) = xX fe,v (x)2 v2 Linear features with a large normalized variance are called relevant. These can be extracted using principle component analysis techniques. Additionally, if a feature varies slowly between adjacent nodes in the graph, it is called smooth as opposed to rugged. A good feature is both smooth and relevant. If we can dene two functions. h1 : F R+ , and h2 : G R+ for smoothness and relevance respectively, then the problem is regularized as the following optimization problem: max f1 f2 f1 f1 + h1 (f1 ) f2 f2 + h2 (f2 ) (f1 ,f2 )F0 G Here is a parameter to control the trade-o between smoothness and relevance. The method in [71] uses the energy at high frequency by computing the Fourier transform to specify a smoothness function of a feature on a graph. Lets assume L is the Laplacian of the graph, and 0 = 1 n are its eigenvalues, and {i } is the orthonormal set of corresponding eigenvectors. The Fourier decomposition of any feature f F is n f= i=1 fi i where fi = i . The smoothness function for a feature f F is calculated as n f 2 K = i=1 fi2 (i ) where is a monotonic decreasing mapping, and K : X 2 R is dened by n K (x, y) = i=1 (i )i (x)i (y) the matrix K is positive denite as the mapping only takes positive values. as i increases, i increases, so (i ) decreases. Subsequently, the above norm has higher a value on a feature with a lot of energy at high frequency, so can be considered as a smoothing function. The exponential function is a good example of . The relevance of a feature is also dened in [71] as h2 (fe,v ) = fe,v H where H is the reproducible kernel Hilbert space (RKHS) associated with the linear kernel K(x, y) = e(x) e(y). 44 10.3.1 Reformulation If K1 = exp( L) is the diusion kernel, and K2 (x, y) = e(x) e(y) is the linear kernel, taking h1 (f ) = f H1 and h2 (f ) = f H2 the problem can be expressed in its dual form as (,)F 2 max (, ) = K1 K2 2 2 ( (K1 + K1 ))1/2 ( (K2 + K2 ))1/2 The above formulation is a generalization of the canonical correlation analysis known as kernel-CCA which is discussed in [7]. Results reported by the article are encouraging. The performance is shown to be above 80% for some classes, and this method seems successful on some classes which can not be learned using SVM. 45 11 11.1 Sentiment Analysis I Introduction The proceeding two sections give reviews of some papers on sentiment and opinion analysis. Researchers have been investigating the problem of automatic text categorization and sentiment analysis for the past two decades. Sentiment analysis seeks to characterize opinionated natural language text. 11.2 Unstructured Data The emergence of Internet users, and content generating Internet applicants, have resulted in a great increase in the amount of unstructured text on the Internet. Blogs, discussion groups, forums, and similar pages are examples of such growing content. This makes the need for a sentiment analysis technique that can handle the lack of text structure. Most of the previous work on sentiment analysis assume the pairwise independence between features used in classication. Unlike them, the work in [9] tries to propose a machine learning technique for learning predominant sentiments of on-line texts that captures dependencies among words, and to nd a minimal, and sucient set of vocabulary to do the categorization task. Xue Bai, Rema Padman and Edoardo Airoldi [9] present a two-stage Markov Blanket Classier (MBC) that learns conditional dependencies among the words and encodes them into a Markov Blanket Directed Acyclic Graph (MB DAG) for the sentiment variable. The classier then uses a metaheuristic based search, named Tabu Search (TB). Detecting dependencies is important, as it allows nding semantic relations between subjective words. Before discussing the methodology, lets give some background. A Markov Blanket (MB) for a random variable Y is a subset Q of a set of random variables X, such that y is independent of X\Q, and conditional on all variables in Q. Dierent MB DAGs that entail the same factorization for the conditional probability of Y , conditional on a set of variables X, are said to belong to the same Markov equivalence class. Additionally, the search used in this paper is the TS, which is a meta-heuristic strategy that helps local search heuristics explore the state space by guiding them out of local optima [35]. The basic Tabu search is simple. It starts with a solution and iteratively chooses the best move, according to a tness function. This search method assures that solutions previously met are not revisited in the short-term. The algorithm can be summarized as follows: It rst generates an initial MB for Y from the data. To do so, it collects variables which are within two hops of Y in terms of geographical representation: Potential parents and children plus their potential parents and children. The graph so far is undirected. To make the graph directed their method uses the rules described in [8], and then prunes the remaining undirected edges and bi-directed edges to avoid cycles, passes them to a Tabu Search, and returns the MB DAG. They evaluate their algorithm using the data that contains approximately 29, 000 posts to the rec.arts.movies.reviews newsgroup archived at the Internet Movie Database (IMDb). To put the data into the right format, they convert the explicit ratings into one of three categories: positive, negative, or neutral. They extracted all the words that appeared in more than 8 documents, and thus were left with a total number of 7, 716 words, as input features. So each document in this experiment is represented as X = [X1 , , X7,716 ] 46 11.3 Appraisal Expressions Appraisal expression extraction can be viewed as a fundamental task in sentiment analysis. An appraisal expression is a textual unit expressing an evaluative stance toward some target [13]. The paper that Im going to discuss in this section tries to characterize the evaluative attributes of these textual units. The method in [13] extracts adjectival expressions. An appraisal expression, by denition, contains a source, an attitude, and a target, each represented by dierent attributes. As a simple example consider the following sentence: I found the movie quite monotonous. Here the speaker (the Source) has a negative opinion (quite monotonous) towards the movie (the Target). The appraisal theory is based on the following denitions: Attitude type: aect, appreciation, or judgment. Orientation: positive, or negative. Force: intensity of the appraisal. Polarity: marked, or unmarked. Target type: semantic type for the target. They use a chunker to nd attitude groups and targets using a pre-built lexicon, which contains head adjectives and appraisal modiers. After that, the system links each attitude to a target. Each sentence is parsed into dependency representation, and linkages are ranked so that the paths in the dependency tree connecting words in the source to words in the target can be found. After these linkages are made, this information is used to disambiguate multiple senses that an appraisal expression may present. Lets denote the linkage type used in a given appraisal expression by , the set of all possible linkages as L, and a specic linkage type by l. Also lets denote target type of a given appraisal expression by , the set of all target types by T , and a specic target type by t. The goal is to estimate, for each appraisal expression e in the corpus, the probability of its attitude type being a, given the expressions target type t and its linkage type l. P (A = a|e) = P (A = a| = t, = l) Let the model of this probability be M , PM (A = a| = t, = l) = Assuming conditional independence yields: PM ( = t|A = a)PM ( = l|A = a)PM (A = a) PM ( = t)PM ( = l) Given a set of appraisal expressions E extracted by chunking and linkage detection, the goal is to nd the maximum likelihood model M = arg max M eE aA PM ( = t, = l|A = a)PM (A = a) PM ( = t, = l) M (A = a|e) They perform two separate evaluations on the system to evaluate the overall accuracy of the entire system, and to specically evaluate the accuracy of the probabilistic disambiguator. 47 11.4 Blog Sentiment In this section, we discuss the work in [21] that describes textual and linguistic features extracted and used in a classier to analyze the sentiment in blog posts. The aim to see if a post is subjective, and whether it represents a good opinion or a bad one. The training dataset used in this paper is created in two steps. First, the authors obtained the data via RSS feeds. Objective feeds are from news sites, like CNN, NPR, etc. as well as various sites about health, world, and science. Subjective feeds include content from newspaper columns, letters to an editor, reviews and political blogs. In the second step, Chesley et, al [21] manually verify each document to conrm whether it is subjective, objective, positive, or negative. The authors use three class of features to classify the sentiment. Textual features, Part-of-Speech features, and lexical semantic features. Each post is then represented as a vector of features in SVM. Chesley et, al [21] use the online Wikipedia dictionary to determine the polarity of adjectives in the text, as adjectives in wiktionary are often dened by a list of synonyms. This is based on their assumption that wiktionary method assumes that an adjective will most likely have the same polarity as its synonym. Each blog post, in this method, is represented as a vector with count values for each feature. A binary classication is then performed for each post using a Support Vector Machine (SVM) classier. Their hold-out experiments show that for objective and positive posts, positive adjectives acquired from Wikipedias Wiktionary play a key role in increasing overall accuracy. 11.5 Online Product Review A large amount of web content is subjective and shows peoples reviews for dierent products and services. The problem of analyzing reviews gives a user an aggregate view of the entire collection of opinions, and segmenting the articles into classes that can further be explored. In this section we review the method of sentiment analysis in [23], which uses a rather large dataset, and n-grams instead of unigrams. Their dataset consists of over 200k online reviews with an average of more than 800 bytes. Three classiers are described in [23]. The Passive Aggressive (PA) classiers are a family of margin based online learning algorithms, and are similar to SVM. In fact, they can be viewed as online SVM. PA tries to nd a hyperplane to divide the instances into two groups. The margin is proportional to an instances distance to the hyperplane. When the classier encounter errors, the PA algorithm utilizes the margin and changes the current classier. Choosing PA instead of SVM to do the classication has two advantages. First, the PA algorithm follows an online learning pattern. The PA algorithm has a theoretical loss bound, which makes the performance of the classier predictable [23]. The second classier is the Language Modeling (LM) based classier, which is a generative method and classies a word, sentence, or string by calculating its probability of generation. The probability of a string s = w1 wl is calculated as l P (s) = i=1 i1 P (wi |w1 ) j where wi = wi wj . Cui et, al [23] use the Good-Turning estimation in their method, which states that if an n-gram 48 occurs r times, it should be treated as if it had occurred r times. r = (r + 1) nr+1 nr where nr denotes the number of n-grams that occur r times in the training data. Window classier is the third classier described in [23]. Window is an online learning algorithm which has been used for sentiment analysis before. This algorithm learns a linear classier from bag-of-words of documents to predict the polarity of a review x. More formally, h(x) = wV fw cw (x) where cw (x) = 1 if word w appears in review x and is 0 otherwise. Wondow uses a threshold value V to classify the reviews based on that as positive and negative. The main contribution of this work, however, is to explore the role of n-grams as features in analyzing sentiments when n 3. In their study they set n = 6 and extract nearly 3 million n-grams after removing those that had appeared in less than 20 reviews. They calculate 2 for each n-grams. Assume the following assignments: t: a term. c: a class. A: the number of times t and c co-occur. B: the number of times t occurs without c C: the number of times c occurs without t D: the number of times neither t nor c occurs N : the number of documents. Then X 2 (t, c) = N (AD CB)2 (A + C) (B + D) (A + B) (C + D) After calculating this score for each n-gram, The n-grams are sorted in descending order by their 2 scores, and the top M ranked n-grams are chosen as features for the classication procedure. They show that high order n-grams improve the performance of the classiers, especially on negative instances. They also claim that their observation based on large-scale dataset has never been testied before. 49 12 Sentiment Analysis II In this chapter we continue our review on some novel techniques in sentiment analysis. We look at works on classication of review, and subjectivity strength. 12.1 Movie Sale Prediction Analyzing weblogs to extract opinions is important since they contain discussions about dierent products, and opinionated text segments. Mishne and Glance [54] have looked at blogs to examine the correlation between sentiment and references. The rst set of data that the authors use is IMDB the Internet Movie Database. For each movie they obtained the date of its opening, as well as the gross income during the weekend. Additionally, they have a set of weblog data, from which they extract relevant posts to each movie. the relevance of a post and a movie is determined by the following criteria: The post should fall within the period starting a month before to a month after the date of the movie release. The post should contain a link to the movies IMDB page, or the exact movie name appears in the content of the post together with one of the words, movie, watch, see, lm). For each post in the list of relevant ones, Mishne and Glance extracted part of text by selecting k words around the hyperlink where the movie is referred to. The value of k is various between 6 and 250. For each context, they calculate the sentiment score as described in [58]. Their analysis was carried on 49 movies, released between February, and August 2005. The polarity score of this sentiment analysis is tted to a log-linear distribution, with the majority of scores falling within a range of 4 to 7. To perform some experiments, they use the Pearsons correlation between some sentimentderived metrics (number of positive contexts, number of negative contexts, total number of nonneural contexts, ratio between positive and negative contexts, the mean and variance of sentiment values), and both income per screen, and raw sales of each movie. The experiments in [54] show that in the domain of movies, there is a high corelation between references to the movies in blog entries, and movies gross income. They also demonstrate that the number of positive reviews correlates better than raw counts in the period prior to the release of a movie. 12.2 Subjectivity Summarization The analysis of dierent opinions and subjectivity has received a great amount of attention lately because of its various applications. In this section we reviewed a new approach in sentiment analysis described by Pang and Lee [59]. Two factors distinguish this work from all previous works that focused on selecting lexical features and classied sentences based on some such features. First, in this work, labels that are assigned to sentences are either subjective or objective. Second, a standard machine learning classier is applied to make an extract. Document polarity can be considered as a special case of text classication in which we classify sentences to sentiments rather than topic based categories. This enables us to apply machine learning techniques to solve this problem. To extract sentiments within a corpus, a system should 50 review the sentences, and tag their subjectivity. Subjective sentences should further be classied into positive and negative sentences. Suppose we have n terms, x1 , , xn , and we want to classify them into C1 and C2 . This paper utilizes a mincut-based classication which uses two weight functions: indj (x), and assoc(x, y). Here, indj (x) shows the closeness of data point x to class j and the association function shows the similarity of two data points to each other. We would like to maximize each items net happiness (i.e., its individual score or the class it is assigned to, minus its individual score for the other class). They also aim to penalize putting highly associated items into dierent classes. Thus, the problem reduces to an optimization one, and the goal is to minimize ind2 (x) + xC1 xC2 ind2 (x) + xi C1 ,xk C2 assoc(xi , xk ) To solve this minimization problem, the authors build an undirected graph G with vertices {v1 , , vn , s, t}. Edges (s, vi ) with weights ind1 (xi ) and (vi , t) with weights ind2 (xi ) are added. After that, n edges (vj , vk ) with weights assoc(xj , xk ) are added. Now, every cut in the graph 2 G corresponds to a partition of items that have a cost equal to the partition cost and the mincut NB minimizes this cost. Individual scores, ind1 (x) and ind2 (x), are assigned with P rsub (x) and 1 NB NB P rsub (x) respectively, where P rsub (x) is Naive Bayes estimate of the probability that the sentence x is subjective (Weights of SVM can also be used). The degree of proximity is used as the association score of the two nodes. assoc(xi , xj ) = f (j i) c 0 if j i T otherwise The function f (d) species how the inuence of proximal sentences decays with respect to the distance d. The authors used f (d) = 1, e1d , and 1/d2 . Pang and Lee [59] show that the for the Naive Bayes polarity classier, the subjectivity extracts are shown to be more eective inputs than the original documents. The authors also conclude that employing the mincut framework can result in an ecient algorithm for sentiment analysis. 12.3 Unsupervised Classication of Reviews This section focuses on the unsupervised approach of classifying the reviews described in [70]. The PMI-IR method is used to estimate the orientation of a phrase. This method uses the Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of words or phrases. The sentiment extraction method in [70] is simple. First, it extracts phrases that contain adjectives and adverbs. This is possible by applying a part of speech tagger, and then extracting the phrases with specic order of tags. In the second step the semantic orientation of the extracted phrases are estimated using PMI-IR. The PMI score between two words, w1 and w2 is calculated as follows p(w1 &w2 ) P M I(w1 , w2 ) = log2 p(w1 )p(w2 ) where p(w1 , w2 ) shows the probability that two words co-occur. this ratio is a measure of the degree of statistical dependence between the words, whose logarithm is the amount of information that we acquire about the presence of one of the words when the other is observed. 51 Semantic Orientation (SO) is then calculated as SO(phrase) = P M I(phrase, excellence ) P M I(phrase, poor ) With some minor algebraic manipulations: SO(phrase) = log2 hits(phrase NEAR excellence )hits(poor ) hits(phrase NEAR poor )hits(excellence ) where hits(query) is the number of hits returned for the query. Then the system classies the review based on the average semantic orientation of the phrases. In experiments with 410 reviews from Epinions, the algorithm reaches an average accuracy of 74%. Movie reviews seem dicult to classify, and the accuracy on movie reviews is about 66%. On the other hand, for banks and automobiles, the accuracy is 80% to 84%. Travel reviews are an intermediate case. This might suggest that in movies, the whole is not the sum of the parts, while in banks and automobiles it is. 12.4 Opinion Strength Subjective expressions are terms or phrases that express sentiments. According to [73] the subjective strength of a word or phrase is the strength of the opinion, emotion that is expressed. Their examination of the annotated data shows that strong subjectivity is expressed in many dierent ways. Subjectivity clues (PREV) include words and phrases obtained from manually annotated sources. Due to the variety of clues and their sources, the set of PREV clues is not limited to a xed word list or to words of a particular part of speech. The syntactic clues are mostly developed by using a supervised learning procedure. The training data is based on both a human annotated (the MPQA) corpus and a large unannotated corpus in which sentences are automatically identied as subjective or objective through a bootstrapping algorithm. The learning procedure consists of three steps. First, the sentences in the training data are parsed with the Collins parser. Second, ve syntactic clues are formed for each word w, in every parse tree. The class of a word in a parse tree can be one of the following: root(w, t): word w with POS t is the root of the parse tree. leaf (w, t): word w with POS tag t is a leaf in the dependency parse tree. node(w, t): word w with POS tag t. bilex(w, t, r, wc , tc ): word w with POS tag t is modied by word wc with POS tag tc , with grammatical relationship of r. allkids(w, t, w1 , t1 , , rn , wn , tn ): Word w with tag t has n children, wi with tags ti that modify w with grammatical relationship ri . Some rules are set to specify the usefulness of a clue. A clue is considered to be potentially useful if more than x% of its occurrences in the training data are in phrases marked as subjective where x is a parameter tuned on the training data. 52 The authors have set x = 70 in their experiments. Useful clues are then put into three classes: highly reliable, not very reliable, and somewhat reliable. These reliability levels are determined by occurrence. For each clue c they calculate P (strength(c)) = s as the probability of c being in an annotation of strength s. If P (strength(c)) = s T for some threshold T , c is put in the set with strength s. Their experiments show promising results in extracting opinions in nested clauses, and classifying their strength. Syntactic clues are used as features, and the classication is done using the bootstrapping method with the 10-fold cross-validation experiments. Improvements in accuracy of the new system compared to a baseline range from 23% to 79%. 53 13 Spectral Methods This last chapter covers the spectral methods in graphs. Particularly, we look at transductive learning, and spectral graph partitioning. 13.1 Spectral Partitioning Spectral partitioning is a powerful, yet expensive technique for graph clustering [33]. This method works based on the denition of incidence matrix. The incidence matrix, In(G), of the graph G is an |N | |E| matrix, with one row for each node and one column for each edge. A column is zero except for two values i and j which are +1 and 1 respectively if there is an edge e(i, j) in the graph. Additionally, the Laplacian matrix L(G) of G is an |N | |N | symmetric matrix, with one row and column for each node, where deg(i) if i = j 1 if i = j and there is an edge (i, j) (L(G))(i, j) = 0 Otherwise. L(G) has several properties: 1. Symmetry. This causes eigenvalues of L(G) to be real, and its eigenvectors be real and orthogonal. 2. If e = [1, ..., 1]T , then L(G).e = 0. 3. In(G).(In(G))T = L(G) 4. If for a non-zero v, L(G).v = lambda.v then, = 5. 0 1 2 (In(G)T .v) v2 2 n where, i are eigenvalues of L(G). 6. The number of connected components of G is equal to the number of i s that are equal to 0. This means that 2 = 0 if and only if G is connected. (2 (L(G)) is called the algebraic connectivity of G) Algorithm 1 Spectral partitioning algorithm 1: Compute the eigenvector v2 corresponding to 2 of L(G) 2: for Each node n of G do 3: if v2 (n) < 0 then 4: Put node n in partition N 5: else 6: Put node n in partition N + 7: end if 8: end for It is shown in [33] that if G is connected, and N and N + are dened by algorithm 1, then N is connected. It can also be shown that, if G1 = (N, E1 ) is a subgraph of G = (N, E), such that G1 is less connected than G, then 2 (L(G1 )) 2 (L(G)). 54 13.2 Community Finding Using Eigenvectors In this section we overviewed the method of nding communities in networks using eigenvectors suggested by Newman [57]. The problem, for which a solution is suggested, tries to cluster network data. Networks have received great attention in physics and other elds as a foundation of the mathematical representation of a variety of complex systems, including biological and social systems, the Internet, the World Wide Web, and many others. There has been a lot of eort to devise algorithms to cluster graphs. However, the major problem is to develop algorithms that can be run in parallel or distributed computing systems. If A is the adjacency matrix of the graph, in which Aij is 1 if theres an edge between two vetrices i, j, then the number of edges going from one cluster to another is R= 1 2 Aij i,j in dierent clusters The index vector, s, denes for a vertex, whether it is in the rst group or the second. It assigns a vertex a value of si = 1 in the former case, and si = 1 in the latter. The following is the immediate result of the index vectors denition. 1 (1 si sj ) = 2 Then R= 1 0 1 4 if i, j are in dierent groups if i, j are in the same groups (1 si sj )Aij ij Also the degree of node i, ki , is calculated as ki = j Aij The above equations can be reformulated in the matrix form as 1 R = sT Ls 4 where Lij = ki ij Aij and ij is 1 if i = j and 0 otherwise. L is called the Laplacian matrix of the graph. Assume i is the eigenvalue of L corresponding to the eigenvector vi , and that 1 2 n . Furthermore, if n1 , n2 are the required sizes of groups, then T a2 = (v1 s)2 = 1 (n1 n2 )2 n T where ai = vi s. Newman [57] denes the notion of modularity of a graph Q as Q = (number of edges within communities) (expected number of such edges). The goal of network clustering is then reduced to the problem of optimizing the modularity of the network. More formally, Q can be dened as Q= 1 2m [Aij Pij ](gi , gj ) ij 55 where Pij is the expected number of the edges falling between a particular pair of vertices, i and j. In this setting, Aij is the actual number of such edges, and (r, s) = 1 if r = s and 0 otherwise. Note that (gi , gj ) = 1 (si sj + 1) thus, 2 Q= and the matrix form 1 4 [Aij Pij ](si sj + 1) ij 1T s Bs 4m where Bij = Aij Pij . B is called the modularity matrix. He solves this equation as Q= Q= 1 4m 2 i i i where i is the eigenvalue of B corresponding to eigenvector ui . Thus, Newman shows that the modularity of a network can be represented using eigenvalues and eigenvectors of a matrix and is called the modularity matrix. Using this expression Newman [57] derives algorithms for identifying communities, and detecting bipartite or k-partite structure in networks, and a new community centrality measure. 13.3 Spectral Learning Spectral learning techniques are algorithms that use information contained in the eigenvectors of a data anity matrix to detect structure. The method described in [40] is summarized as follows: 1. Given data B, the anity matrix is A Rnn = f (B). 2. Form D, a diagonal matrix, with Dii = 3. Normalize: N = (A + dmax I D)/dmax . 4. Let x1 , , xk be the k largest eigenvectors of N and form a normalized matrix using X = [x1 , , xk ] Rnk1 5. Assign a data point xi to j if and only if the row Xi is assigned to j. Kamvar et, al [40] specify a data-to-data Markov transition matrix. If A is the similarity matrix of the documents, then the equality A = B T B holds for a term-document matrix B. To map document similarities to transition probabilities, lets dene N = D1 A, where D is a diagonal matrix with Dii = j Dij . Here, N corresponds to a transitioning with probability proportional to relative similarity values. Four dierent datasets are used in [40] to compare the spectral learning algorithm to K-means: 20 newsgroups, 3 newsgroups, LYMPHOMA, SOYBEAN. They further suggest an algorithm for classication. 1. Dene A as previously mentioned. 2. For each pair of points (i, j), if they are in the same class, assign Aij = Aji = 1. 56 j Aij . 3. For each pair (i, j) if theyre in dierent classes, set Aij = Aji = 0. 4. Normalize N = 1 dmax (A + dmax I D) If natural classes occur in the data, the Markov chain described above should have cliques. The key dierences between the spectral classier and the clustering algorithm are that A incorporates labeling information, and a classier is used in the spectral space rather than a clustering method. This algorithm is able to classify documents by the similarity of their transition probabilities to known subsets of B, and this makes this method novel. 1. Given the data B, the anity matrix is A Rnn = f (B). 2. Form D, a diagonal matrix, with Dii = 3. Normalize: N = (A + dmax I D)/dmax . 4. Let x1 , , xk be the k largest eigenvectors of N and form a normalized matrix using X = [x1 , , xk ] Rnk1 , 5. Represent each data i by the row Xi of X. 6. Classify the rows as points using a classier. 7. Assign a data point i to the class that Xi is assigned. j Aij . 13.4 Transductive learning Transductive learning is referred to some applications, in which the examples are already known when a classier is trained. Relevance feedback is an example of such tasks, in which users give positive and negative labels to examples that are in the training set. The learning task is dened on a xed array X of n points (x1 , x2 , , xn ). The classication label for each data point is denoted by yi and can either be +1 or 1. A transductive learner can analyze the location of all points and so can structure its hypothesis space based on the input. The method of spectral graph transducer (SGT) is described in [39]. This algorithm takes as input the training labels Yl , and a weighted undirected graph on X with adjacency matrix A. The similarity-weighted k-nearest neighbor graph can be made as Aij = P xk knn(xk ) sim(xi ,xj ) sim(xi ,xk ) if xj knn(xi ) Otherwise 0 First, the diagonal degree matrix B should be computed, in which Bii = j Aij . Then, the Laplacian matrix can be computed, L = B A. The normalized Laplacian is computed as L = B 1 (B A) The smallest d + 1 eigenvalues (excluding the rst) and their corresponding eigenvectors of L are chosen and assigned to D and V respectively. For each new training set: + = l l+ and = l+ l 57 1 Set Cii = 2ll+ for positive examples and Cii = 2l for negative examples, where C is the cost of training samples. This will guarantee equal costs of positive and negative samples. Compute G = (D + cV T CV ) and b = cV T C. Find , the smallest eigenvalue of G 1 n bbT I g predictions can be computed as z = V (G I)1 b Finally, to do the hard class assignment we can threshold z wrt. sign(zi 1 (+ + )) 2 1 2 (+ + ), and set yi = 58 References [1] L. Abderrah. Multilingual alert agent with automatic text summarization. [2] Lada Adamic and Natalie Glance. The political blogosphere and the 2004 u.s. election: Divided they blog. In Proceedings of the WWW2005 Conferences 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis, and Dynamics, 2005. [3] Eytan Adar, Li Zhang, Lada A. Adamic, and Rajan M. Lukose. Implicit structure and the dynamics of Blogspace. 2004. [4] S. Afantenos, V. Karkaletsis, and P. Stamatopoulos. Summarization from medical documents: a survey. Articial Intelligence In Medicine, 33(2):157177, 2005. [5] David J. Aldous and James A. Fill. Reversible Markov Chains and Random Walks on Graphs. Book in preparation, http://www.stat.berkeley.edu/~aldous/book.html, 200X. [6] L. Antiqueira, MGV Nunes, ON Oliveira Jr, and L.F. Costa. Complex networks in the assessment of text quality. Arxiv preprint physics/0504033, 2005. [7] Francis R. Bach and Michael I. Jordan. Kernel independent component analysis. J. Mach. Learn. Res., 3:148, 2003. [8] Glymour C Padman R. Spirtis P. Bai, X. and J. Ramsey. Mb fan search classier for large data sets with few cases. In Technical Report CMU-CALD-04-102. School of Computer Science, Carnegie Mellon University, 2004. [9] Xue Bai, Rema Padman, and Edoardo Airoldi. Sentiment extraction from unstructured text using tabu search-enhanced markov blanket. In Workshop on Mining the Semantic Web, 10th ACM SIGKDD Conference, 2004. [10] R. Barzilay and M. Elhadad. Using Lexical Chains for Text Summarization. Advances in Automatic Text Summarization, 1999. [11] Mikhail Belkin and Partha Niyogi. Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):13731396, 2003. [12] R. Blood. How blogging software reshapes the online community. Communications of the ACM., 47:5355, 2004. [13] Kenneth Bloom, Navendu Garg, and Shlomo Argamon. Extracting appraisal expressions. In HLT-NAACL 2007, pages 308315, 2007. [14] A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In Proc. 19th International Conference on Machine Learning (ICML-2001), 2001. [15] Avrim Blum and Shuchi Chawla. Learning from labeled and unlabeled data using graph mincuts. pages 1926, 2001. [16] Avrim Blum, John D. Laerty, Mugizi Robert Rwebangira, and Rajashekar Reddy. Semisupervised learning using randomized mincuts. 2004. 59 [17] Yuri Boykov, Olga Veksler, and Ramin Zabih. Fast approximate energy minimization via graph cuts. In Proceedings of the International Conference on Computer Vision (ICCV 1), pages 377384, 1999. [18] Jaime G. Carbonell and Jade Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. pages 335336, 1998. [19] Jean Carletta. Assessing agreement on classication tasks: the kappa statistic. Comput. Linguist., 22(2):249254, 1996. [20] H.H. Chen and C.J. Lin. A multilingual news summarizer. Proceedings of the 18th conference on Computational linguistics-Volume 1, pages 159165, 2000. [21] Paula Chesley, Bruce Vincent, Li Xu, and Rohini Srihari. Using verbs and adjectives to automatically classify blog sentiment. In Proceedings of AAAI-CAAW-06, the Spring Symposia on Computational Approaches to Analyzing Weblogs, 2006. [22] N. Chomsky. Syntactic Structures. Walter de Gruyter, 2002. [23] Hang Cui, Vibhu O. Mittal, and Mayur Datar. Comparative experiments on sentiment classication for online product reviews. In AAAI, 2006. [24] H. Dalianis, M. Hassel, K. de Smedt, A. Liseth, TC Lech, and J. Wedekind. Porting and evaluation of automatic summarization. Nordisk Sprogteknologi, 1988. [25] G. DeJong. An Overview of the FRUMP Sy ste m. Strategies for Natural Language Processing, 1982. [26] Inderjit S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD 01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 269274, New York, NY, USA, 2001. ACM. [27] Sergey N. Dorogovtsev and Jos Fernando F. Mendes. Language as an evolving word Web. e Proceedings of the Royal Society of London B, 268(1485):26032606, December 22, 2001. [28] P. Doyle and J. Snell. Random walks and electric networks. Math. Assoc. America., Washington, 1984. [29] HP Edmundso. New Methods in Automatic Extracting. Advances in Automatic Text Summarization, 1999. [30] Gne Erkan and Dragomir R. Radev. Lexrank: Graph-based centrality as salience in text us summarization. Journal of Articial Intelligence Research (JAIR), 2004. [31] Ramon Ferrer i Cancho and Ricard V. Sol. The small-world of human language. Proceedings e of the Royal Society of London B, 268(1482):22612265, November 7 2001. [32] Ramon Ferrer i Cancho, Ricard V. Sol, and Reinhard Khler. Patterns in syntactic depene o dency networks. 69(5), May 26, 2004. [33] Notes for Lecture 23 April 9 Berkeley. Graph partitioning, http://www.cs.berkeley.edu/ demmel/cs267/lecture20/lecture20.html. 60 1999. [34] ML Glasser and IJ Zucker. Extended Watson Integrals for the Cubic Lattices. Proceedings of the National Academy of Sciences of the United States of America, 74(5):18001801, 1977. [35] Fred Glover and Fred Laguna. Tabu Search. Kluwer Academic Publishers, Norwell, MA, USA, 1997. [36] Daniel Gruhl, R. Guha, David Liben-Nowell, and Andrew Tomkins. Information diusion through blogspace. In WWW 04: Proceedings of the 13th international conference on World Wide Web, pages 491501, New York, NY, USA, 2004. ACM. [37] David Hull. Using statistical testing in the evaluation of retrieval experiments. In SIGIR 93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages 329338, New York, NY, USA, 1993. ACM. [38] Kalervo Jrvelin and Jaana Keklinen. Cumulated gain-based evaluation of ir techniques. a aa ACM Trans. Inf. Syst., 20(4):422446, 2002. [39] Thorsten Joachims. Transductive learning via spectral graph partitioning. pages 290297, 2003. [40] Sepandar D. Kamvar, Dan Klein, and Christopher D. Manning. Spectral learning. pages 561566, 2003. [41] Klaus Krippendor. Content Analysis: An Introduction to Its Methodology. Sage Publications, London, 1980. [42] Ravi Kumar, Jasmine Novak, Prabhakar Raghavan, and Andrew Tomkins. On the bursty evolution of blogspace. In WWW 03: Proceedings of the 12th international conference on World Wide Web, pages 568576, New York, NY, USA, 2003. ACM. [43] Jure Leskovec, Susan Dumais, and Eric Horvitz. Web projections: learning from contextual subgraphs of the web. In WWW 07: Proceedings of the 16th international conference on World Wide Web, pages 471480, New York, NY, USA, 2007. ACM. [44] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. Cost-eective outbreak detection in networks. In KDD 07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 420429, New York, NY, USA, 2007. ACM. [45] Y. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. Tseng. Discovery of blog communities based on mutual awareness. In Proceedings of the WWW06 Workshop on Web Intelligence, 2006. [46] HP Luhn. The Automatic Creation of Literature Abstracts. Advances in Automatic Text Summarization, 1999. [47] M Maier M Hein. Manifol denoising. Advances in Neural Information Processing Systems (NIPS), 2006],. [48] I. Mani and E. Bloedorn. Summarizing Similarities and Dierences Among Related Documents. Advances in Automatic Text Summarization, 1999. 61 [49] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. Introduction to Inforu mation Retrieval. Cambridge University Press, 2008. [50] D. Marcu. The rhetorical parsing of natural language texts. Proceedings of the 35th annual meeting on Association for Computational Linguistics, pages 96103, 1997. [51] D. Marcu. The Theory and Practice of Discourse Parsing and Summarization. MIT Press, 2000. [52] A. Mehler. Compositionality in quantitative semantics. a theoretical perspective on text mining. Aspects of Automatic Text Analysis, Studies in Fuzziness and Soft Computing, Berlin. Springer, 2006. [53] I.A. Meluk. Dependency Syntax: Theory and Practice. State University of New York Press, c 1988. [54] Gilad Mishne and Natalie Glance. Predicting movie sales from blogger sentiment. In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW 2006), 2006. [55] T. Mitchell. The role of unlabeled data in supervised learning. In Proc. of 6th International Symposium on Cognitive Science (invited paper), San Sebastian, Spain, 1999. [56] M. Mdolo. SuPor: um Ambiente para a Explorao de Mtodos Extrativos para a Sumarizao o ca e ca Automtica de Textos em Portugus. PhD thesis, Dissertao de Mestrado. Departamento de a e ca Computao, UFSCar. So Carlos-SP, 2003. ca a [57] Mark J. Newman. Finding community structure in networks using the eigenvectors of matrices, 2006. http://arxiv.org/abs/physics/0605087. [58] Kamal Nigam and Matthew Hurst. Towards a robust metric of opinion. In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Aect in Text: Theories and Applications, 2004. [59] Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In ACL2004, pages 271278. Association for Computational Linguistics, 2004. [60] T.A.S. Pardo and L.H.M. Rino. Descrio do GEI-Gerador de Extratos Ideais para o Porca tugus do Brasil. Srie de Relatrios do NILC NILC-TR-04-07, Ncleo Interinstitucional de e e o u Ling ustica Computacional (NILC), So Carlos-SP, 8. a [61] T.A.S. Pardo and L.H.M. Rino. GistSumm: A Summarization Tool Based on a New Extractive Method. Computational Processing of the Portuguese Language: Proceedings, 2003. [62] Thiago Alexandre Salgueiro Pardo, Lucas Antiqueira, Maria das Graas Volpe Nunes, Osvaldo c N. Oliveira Jr., and Luciano da Fontoura Costa. Modeling and evaluating summaries using complex networks. In Proceedings of Computational Processing of the Portuguese Language, the Seventh International Workshop (PROPOR 06), pages 110. Springer, 2006. 62 [63] G. Plya. Uber eine Aufgabe der Wahrscheinlichkeitsrechnung betreend der Irrfahrt im o Straennetz. Math. Annalen, 84:149160, 1921. [64] D.R. Radev and K. McKeown. Generating Natural Language Summaries from Multiple OnLine Sources. Computational Linguistics, 24(3):469500, 1998. [65] H. Saggion and G. Lapalme. Generating indicative-informative summaries with sumUM. Computational Linguistics, 28(4):497526, 2002. [66] G. Salton, A. Singhal, M. Mitra, and C. Buckley. AUTOMATIC TEXT STRUCTURING AND SUMMARIZATION. Advances in Automatic Text Summarization, 1999. [67] Ricard V. Sol, Bernat Corominas Murtra, Sergi Valverde, and Luc Steels. Language networks: e Their structure, function and evolution. Technical Report 05-12-042, Santa Fe Institute Working Paper, 2005. [68] Martin Szummer and Tommi Jaakkola. Partially labeled classication with markov random walks. In Advances in Neural Information Processing Systems 15, Cambridge, MA, 2001. MIT Press. [69] E. M. Trevino. Blogger motivations: Power, pull, and positive feedback. In Internet Research 6.0, 2005. [70] Peter Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classication of reviews. In acl2002, pages 417424, 2002. [71] Jean-Philippe Vert and Minoru Kanehisa. Graph-driven features extraction from microarray data using diusion kernels and kernel CCA. 2002. [72] Duncan J. Watts and Steven Strogatz. Collective dynamics of small-world networks. Nature, 393:440442, June 1998. [73] Theresa Wilson, Janyce Wiebe, and Rebecca Hwa. Just how mad are you? nding strong and weak opinion clauses. In aaai2004, pages 761766, 2004. [74] X. Zhu and Z. Ghahramani and J. Laerty. Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In Proceedings of International Conference on Machine Learning, 2003. [75] Hongyuan Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In SIGIR 02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 113120, New York, NY, USA, 2002. ACM. [76] Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schlkopf. o Learning with local and global consistency. Technical Report MPI no. 112, Max Planck Institute for Biological Cybernetics, Tbingen, Germany, June 2003. u 63
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Michigan - HIST - 549
Sapna Swaroop and Jeffrey D. MorenoffBuilding Community: The Neighborhood Context of Local Social OrganizationPSC Research ReportReport No. 04-549January 2004PSCP OPULATION S TUDIES C ENTERAT T H EI NSTITUTEFORS OCIAL R ESEARCHOFU
Michigan - HIST - 591
EECS 591Handout #2 (part b) - Fall2000Many distributed systems problems require evaluating a global systems property. Evaluating the truth of a global property may require construction of a CONSISTENT global system state.Objective of this lectu
Michigan - HIST - 591
Distributed Object-Based Systems The WWW Architecture Web ServicesHandout 11 Part(b)EECS 591 Farnam Jahanian University of MichiganOptional Reading List The Web Services Idea http:/msdn.microsoft.com/webservices/understanding/readme/default.as
Michigan - HIST - 591
Stages of the Demographic Transition from a Childs Perspective: Family Size, Cohort Size, and Childrens ResourcesDavid Lam University of Michigan davidl@umich.edu Letcia Marteleto University of Michigan leticiam@umich.eduJanuary 2006Population
Michigan - HIST - 592
The Second Demographic Transition in the U.S.: Spatial Patterns and CorrelatesDepartment of Sociology and Population Studies Center, University of MichiganRon J. LesthaegheLisa NeidertPopulation Studies Center, University of MichiganPopulati
Michigan - HIST - 593
Adult External Cause Mortality in South Africa and Russia: 1997-2002Barbara A. Anderson(barba@umich.edu) Professor, Department of Sociology Research Professor, Population Studies Center University of Michigan Consultant, Statistics South Africa Vis
Michigan - HIST - 595
Name of the speaker of the British House of Commons. Where does the phase &quot;O what tangled webs we weave when first we practice to deceive&quot; come from? what country's are confirmed parts of the european union? Not those that might be members in the fut
Michigan - HIST - 595
DRAFT DO NOT CITE WITHOUT PERMISSION9/01/06Metaphors, Myths, and Manipulation: How Telecommunications Consolidation Is Rationalized The Telecommunications Act of 1996 formalized the radical shift from reliance on the regulatory supervision of th
Michigan - HIST - 600
RESEARCH SEMINAR IN INTERNATIONAL ECONOMICSGerald R. Ford School of Public Policy The University of Michigan Ann Arbor, Michigan 48109-3091Discussion Paper No. 577Issues of Fairness in Dispute SettlementAndrew G. BrownWellfleet, MARobert M.
Michigan - HIST - 611
7.1Optimality Conditions for NLPKatta G. Murty, IOE 611 Lecture slidesHISTORY: 1-dimensional unconstrained min., developed in17th century as Newton was developing calculus. Soon, extended to multidimensional unconstrained min. Leibniz (a co-dev
Michigan - HIST - 615
Programming in C: How to Get Things Done!Jan Wigginton Biostatistics 615/815Last LectureAnatomy of a C program A collection of short functionsBuilt-in data types available in C The C standard libraryExecuting C CodeC is a high level langua
Michigan - HIST - 621
X-Spam-Status: No, score=1.0 required=5.0 tests=BAYES_50 autolearn=no version=3.2.0-r431796 Sender: 1.0 (spamval) - NONE Return-Path: &lt;improvetheworld-errors &amp;AElig umich.edu&gt; Received: from newman.eecs.umich.edu (newman.eecs.umich.edu [141.213.4.11]
Michigan - HIST - 629
Household Energy Consumption: Community Context and the Fuelwood TransitionCynthia Macht William G. Axinn Dirgha GhimirePopulation Studies Center University of MichiganPopulation Studies Center Research Report 07-629October 2007Household Ener
Michigan - HIST - 633
Syllabus OA633 Managing Organizational ChangeWednesday 0930-1220 Bus B-09 Professor Karen Golden-Biddle Office: Telephone: 780-492-8901 Email: Office Hours: Wednesdays 1-2:30 or by appointment Purpose and Overview The process of managing organizatio
Michigan - HIST - 633
Estimating and Benchmarking the Trend in the Poverty Rate from the Panel Study of Income DynamicsLloyd D. Griegera Sheldon Danzigerb Robert F. SchoenicPopulation Studies Center Research Report 08-633March 2008aPopulation Studies Center, Gera
Michigan - HIST - 648
X-Spam-Status: No, score=-1.6 required=4.5 tests=BAYES_00,URIBL_GREY autolearn=no version=3.2.0-r372567 Sender: -1.6 (spamval) - NONE Return-Path: &lt;improvetheworld-errors &amp;AElig umich.edu&gt; Received: from newman.eecs.umich.edu (newman.eecs.umich.edu [
Michigan - HIST - 659
SAFETY CONS1DERATIONS FOR A DEMONSTRATION PROGRAM OF ELECTRIC VEHICLES August 1977James O'Day L i l y HuangHighway S a f e t y Research I n s t i t u t e The U n i v e r s i t y o f Michigan Ann Arbor, M i c h i g a n 481091. @ . . w eNo.2.
Michigan - HIST - 700
VISIONING A SUSTAINABLE FUTURE FOR NORTHEAST MICHIGANConnecting Great Lakes Coastal Access, Tourism, and Economic DevelopmentThe process will provide a suite of potential local actions for reaching a desired future as envisioned by the community de
Michigan - HIST - 737
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,HTML_MESSAGE autolearn=unavailable version=3.2.2 Sender: -2.6 (spamval) - NONE Return-Path: &lt;marschall &amp;AElig solartekt.de&gt; Received: from newman.eecs.umich.edu (newman.eecs.umich.edu [141.213
Michigan - HIST - 744
U.S. AND JAPANESE APPROACHES TO SDR AND COGNITIVE RADIO: LEGAL AND CULTURAL FACTORS EXPRESSED IN CERTIFICATION AND TECHNICAL RULES James Miller, Esq. (2004-2006 Mansfield Fellow , FCC)ABSTRACT Software Defined Radio (SDR), Cognitive Radio, and relat
Michigan - HIST - 771
CATEGORICAL AND LIMITED DEPENDENT VARIABLE MODELING Course Number EDUC 771-003 Fall Semester 2008 Location: New Media Center Classroom, 3rd Floor Computer Lab, 3010 School of Education Building Time: Thursdays from 1:00 p.m. to 4:00 p.m. (see course
Michigan - HIST - 772
STEERING SYSTEM ABDOMINAL IMPACT TRAUMAGuy S. Nushola Patricia S. H i u a k R. Jeff LehmanFinal Report t : oMotor Vehicle M n f c u e s Association auatrr 320 New Center Building Dtot Michigan 48202 eri,April 30, 1988UMTRIThe University of
Michigan - HIST - 796
Proof of Work {cannot, can, does currently} WorkL Jean Camp Debin Liu {ljcamp, deliu}@indiana.edu School of Informatics Indiana University* http:/www.ljean.com http:/www.infosecon.net/ Abstract The core enabling factor of spam is that spam is cheap
Michigan - HIST - 798
Enforcement and Control of Piracy, Copying, and Sharing in the Movie Industry 1David Waterman Dept. of Telecommunications Indiana University 1229 E. 7th St. Bloomington, IN 47405 waterman@indiana.edu Sung Wook Ji Doctoral Student Dept. of Telecommun
Michigan - SW - 519
UM-HSRI -79-60MICHIGAN PUBLIC OPINION TOWARD MOTOR VEHICLE INSPECTIONArthur C. WolfeSeptember 1979Highway Safety Research I n s t i t u t e The University of Michigan Ann Arbor, Michigan 48109 Prepared under Contract MVI-79-001 A from the Mic
Michigan - SW - 523
AN ANALYSIS OF THE POTENTIAL LEGAL CONSTRAINTS ON THE USE OF MECHANICAL DEVICES TO MONITOR DRIVING RESTRICTIOKS Paul A. Ruschmann Hal 0. Carroll blurray Greyson Kent B. JoscelynThe University of Michigan Highway Safety Research Institute Ann Arbor,
Michigan - SW - 523
Objectives, obstacles and drivers of ICT adoption. What do IT managers perceive?Nicoletta Corrocher* and Roberto Fontana*We study the determinants of innovation adoption decision. We consider a sample of IT managers of SMEs operating in Italy and
Michigan - SW - 523
Quantitative Spatial Analysis(Course notes for NR/ST 523)by Robin M. Reich and Richard Davis10 0 50 Z 0-6 5 4 3 Y 2 1 0 0 2 1 3 X 4 6 51998 Colorado State University Fort Collins , Colorado 80523TABLE OF CONTENTSTABLE OF CONTENTS .. I C
Michigan - SW - 524
Report No. UM-HSRI-79-66LETTER REPORTPRELIMINARY ASSESSMENT OF THE LEGAL FEASIBILITY OF IMPAIRNENT RESISTANCE/REDUCTION PROGRAMS Hal 0. Carroll Kent B. Joscelyn Paul A. RuschmannThe University of Michigan Nighway Safety Research Institute Ann A
Michigan - SW - 525
@ CRSOCenter for Research on Social Organization The Working Paper Series The University of Michigan Ann ArborSOMETIMES IT MATTERS: A RESOURCE THEORY OF THE CRIMINAL LAW Richard Lempert#525June 1994CENTER FOR RESEARCH ON SOCIAL.ORGA~IZATION W
Michigan - SW - 525
RESEARCH SEMINAR IN INTERNATIONAL ECONOMICSGerald R. Ford School of Public Policy The University of Michigan Ann Arbor, Michigan 48109-1220Discussion Paper No. 523Globalization and the Returns to Speaking English in South AfricaJames Levinsohn
Michigan - SW - 553
GS 553: Thermodynamics and Phase Equilibria Readings on Thermodynamics and Phase Equilibria Anderson, G.M. (1996) Thermodynamics of Natural Systems. QE515.A61, aqueous systems) Wiley, 382 p. (Science Library,Anderson, G.M. and Crerar, D.A. (1993) T
Michigan - SW - 553
April 8, 2003 GS 553. THERMODYNAMICS AND PHASE EQUILIBRIA Lecture 27: Au Cu Fe S Next lecture: Zn Fe S, Ni Fe S, Fe As S readings: CS 41-57, 77-90; Sharp et al. (1985) A. Fe S list of minerals in system (Craig &amp; Scott, 1974) Table CS1 sever
Michigan - SW - 555
u ! 7 @ XXR P gv4Y`W ` vhQV7q`QGw4Y D a S W g F a W d a P H D e a S WgY`WXPQHGDfYXPt`WwUfaQFDvgQHIFGD%dFDtR4r`e4YtPIgfa4QHv4DwWdawFwa4vfaQHYDQHwa`QFGDtvXi`dwQwf4QxwXP Y S ae F e R gP SW aaFea v a H aU Y j Q54I4u HD H W g Hg r a
Michigan - SW - 555
( 1 @ @ B y r d ~ p d { pv r r t { ~ t# &amp;t#t i ' s t 4 n# &amp; t% t st |%um4uuxEUr%xvm W(1yDusyvDv`mwvth1|%umsyw n#% s w# 4% t % s n#tt w t t s n#% s w# 4 % t t# &amp;t#t i ' s n#tt t@UuvcBccwyUuEBsDsmi % %#
Michigan - SW - 555
~ !Qrh%Ehx s ( yx(Tyx0%0hx@(0hQ p s qp x8X rn Exx(0p0)#yVV 0t xQIQ(ErfdQ!fo(VXXQ u s hrp s v zxn { s qp p 9 Q(ErfdQu(z#!pd s XhXVQE
Michigan - SW - 587
'1. R o m Me. ~. Corrrrmt Accwsatm 2No.3. R u t p ~ . n t ' s Cocalog No.UM-HSRI -78-451 Titlw a d Subiorlw .5. RVtOat*Michigan T r a n s p o r t a t i o n Research Program Annual Report f o r F i s c a l Year 1977-19787. A U W s )
Michigan - HLTH - 533
Assortative Mating of the Divorced and the Never Married, 1970-1988Hiromi Ono University of Michigan Institute for Social Research* I thank the Sloan Center for the Ethnography of Everyday Life for supporting this research. I benefited greatly fr
Michigan - HMP - 618
HMP 618 page 1The University of Michigan School of Public Health Department of Health Management and PolicyHMP 618: TOBACCO: FROM SEEDLING TO SOCIAL POLICY Fall Term 2008 Wednesday, 3:00 p.m. 6:00 p.m. Room M1170, SPH II Instructor: Cliff Dougl
Michigan - HONORS - 250
Honors Cup Synthetic Proposal (250 II-W PM-W08)Section: 250 Group Members: Adrienne Cheng, Eric Chow, Hannah VanVels Title: Synthesis of Raspberry Ketone Introduction: We will be attempting to synthesize 4-(4-hydroxyphenyl)butan-2-one, also known as
Michigan - HONORS - 250
Honors Cup Synthetic Proposal (250 I-Wed PM-W08)Group Members: Franzblau, Rachel; Johns, Jeremy; Yadav, Hans Title: Synthesis of Cinnamaldehyde Introduction: Cinnamaldehyde, a major part of cinnamon oil, has been used for flavoring in many different
Michigan - HONORS - 250
Honors Cup Synthetic ProposalSection:250-III Group Members: Jennifer Cui, Laura Weiser, Aaron Vinnek Title: Cinnamaldehyde Introduction: (what makes your target interesting?) Target Compound: CinnamaldehydeOHWhy is this Molecule Interesting? in
Michigan - HONORS - 250
Honors Cup Synthetic ProposalSection: 250; Group IV Group Members: David Chapel, Sameer Oak, Shel Kunji, Susan Yang Title: Three Step Synthesis of Propofol (2,6-diisopropylphenol) Introduction: Propofol is a short-acting, intravenous anesthetic. In
Michigan - HONORS - 251
Honors Cup Synthetic ProposalSection: 251-1 Group Members: Jennifer Waalkes, Sagar Deshpande, Jimmy Sindelar Title: Synthesis of Benzyl Acetate Introduction: Benzyl Acetate is one of the compounds found in the oil of jasmine. Its a common ingredient
Michigan - HONORS - 251
Honors Cup Synthetic ProposalSection: 251 Group Members: Michael Adams, Praneeth Katrapati, Akhila Satish Title: Banana Oil Synthesis Introduction: The fruity taste in many common food products comes from this ester. Commonly known as banana oil, th
Michigan - THEORY - 135
S-925 -IEPC-95-135ELECTRIC PROPULSION ACTIVITIES STATUS AND PLANS AT BPD, CENTROSPAZIO AND SEPA W. D. Deiinger BPD Difesa e Spazio. CollefeTro, ITALY M. Andrenucci' Centrospazio. Pisa. ITALY and E. Detoma Magneti Marelli SEPA - Divisione Electro
Michigan - THEORY - 137
1275 IEPC-93-137NUMERICALMODELLINGOFRAREFIEDPLASMAPLUMEIN NEUTRAL ENVIRONMENT GASBishaev A.M., ResearchKalashnikov V.K., Kim V. Applied Mechanics andInstitute ofElecrodynamics, Moscow, RussiaAbstract Plasma jet outflowing of S
Michigan - THEORY - 137
Subscale Lifecycle Test of Thermal Arcjet Thruster TALOS for the Lunar Mission BW1IEPC-2007-137Presented at the 30th International Electric Propulsion Conference, Florence, Italy September 17-20, 2007 D. Bock , G. Herdrich and H.-P. Rser. Institut
Michigan - THEORY - 137
Invent. math. 137, 427448 (1999) Digital Object Identier (DOI) 10.1007/s002229900930 Springer-Verlag 1999A geometric effective NullstellensatzLawrence Ein1, , Robert Lazarsfeld2,1 2Department of Mathematics, University of Illinois at Chicago,
Michigan - THEORY - 137
Target Cascading for Design of Product FamiliesRyan Fellini, Hyung Min Kim, Michael Kokkolaras, Nestor Michelena, and Panos Papalambros Department of Mechanical Engineering, The University of Michigan 2250 G.G. Brown Bldg., Ann Arbor, Michigan 48109
Michigan - THEORY - 149
91-149APPLICATION AND REVIEW OF THE DEVELOPMENT OF THE CLOSED DRIFT HALL THRUSTERJoseph R. Wetch* John L. Lawless* International Scientific Products San Jose, Ca 95134 408-434-9500 A. S. Koroteev The Scientific Research Institute of Thermal Proces
Michigan - THEORY - 149
0013-7227/08/$15.00/0 Printed in U.S.A.Endocrinology 149(11):5470 5481 Copyright 2008 by The Endocrine Society doi: 10.1210/en.2008-0767Programming Neuroendocrine Stress Axis Activity by Exposure to Glucocorticoids during Postembryonic Developme
Michigan - THEORY - 149
A Pendulum Target Balance for Ion Engine Thrust MeasurementIEPC-2007-149Presented at the 30th International Electric Propulsion Conference, Florence, Italy September 17-20, 2007 Paolo Gessini*, Gilberto Marrega Sandonato, Ricardo Toshiyuki Irita, J
Michigan - THEORY - 150
A. Pushing and Pulling the Cart: The Shape of v, a, and FPlace the fan cart on the track as shown. The force probe should be mounted on the bracket attached to the cart. Open the file ForceVelAcc. Zero the force probe by clicking on the Zero button.
Michigan - THEORY - 150
Team:__Uniform Circular MotionPart I. Polygons, Circles, and Center-Seeking ForcesAn object moving in a circular path at constant speed is undergoing uniform circular motion. This type of motion is everywhere, from a car rounding a curve to the
Michigan - THEORY - 211
Numerical Simulation of Microwave Plasma Thruster FlowIEPC-2007-211Presented at the 30th International Electric Propulsion Conference, Florence, Italy September 17-20, 2007 Mao-lin Chen*, Mao Gen-wang, Yang Juan and Xia Guang-qing Northwestern Poly
Michigan - THEORY - 212
Cohesion and Coherence in the UDRPT Lee and D Hunter University of Pennsylvania, The Wharton School 3730 Walnut Street, Philadelphia, PA 19104 D Orr Vanderbilt University Law School 131 21st Avenue South, Nashville, TN 37203Abstract Where the Inte
Michigan - THEORY - 212
Proceedings of DETC05 Proceedings of IDETC/CIE 2005 2005 ASME Design Engineering Technical Conferences Design Engineering Technical Conferences ASME 2005 International Long Beach, Information in Engineering 24-28, 2005 &amp; Computers and California, USA
Michigan - THEORY - 236
Comparison of the Theoretical and Experimental Performance of an Annular Helicon Plasma SourceIEPC-2007-236Presented at the 30th International Electric Propulsion Conference, Florence, Italy September 17-20, 2007 Cengiz B. Akinli*, Douglas D. Palme
Michigan - THEORY - 236
2175 IEPC-93-236ELECTRIC PROBE MEASUREMENTS IN THE PLUME OF THE UK-10 ION THRUSTER P.C.T. de Boer* The Aerospace Corporation P.O. Box 92957 Los Angeles, CA 90009-2957 ABSTRACT In the following pages, the performance of electricThe performance of
Michigan - THEORY - 236
REAGAN'S SOCIAL SERVICES BLOCK GRANT: WHAT IT IS AND WHAT YOU CAN DO ABOUT ITDeborah K. Zinn University of Michigan May, 1981.CRSO WORKING PAPER NO. 236Copies available through: Center for Research on Social Organization University of Michiga