Equations

# Equations - The Vector(Space Model Sim(q dj = cos =[dj q...

This preview shows pages 1–7. Sign up to view the full content.

The Vector (Space) Model Sim ( q, d j ) = cos ( ) = [ t i =1 w i j w iq ] / (  t i =1 w i j 2    t i =1 w iq 2 ) where is the inner product operator and |q| is the length of q = [ d j q ] / |d j | |q| A normalized tf factor is given by f ( i, j ) = tf ( i , j ) = freq ( i, j ) / max ( freq ( l, j )) The inverse document frequency ( idf ) factor is idf ( i ) = log ( N / n i ) tf-idf weighting scheme w ij = f ( i, j ) log ( N / n i ) For the query term weights, a suggestion is W iq = (0.5 + [0.5 freq ( i, q ) / max ( freq ( l, q ))]) log ( N / n i )

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Probabilistic Queries If a DB contains N documents, n of which are relevant to a query Q , then the probability that a document D selected at random is relevant to Q is: Thus, the probability D is non-relevant is N n rel P ) ( N n N rel P ) ( Bayes’ Theorem P ( A | B ) = P ( A | B ) P(B ) = P ( A B ) P ( B | A ) = P ( A | B ) = P ( A B ) (1) P ( B ) P ( B A ) (2) P ( A ) P ( A ) P ( B | A ) (3) P ( B )
sim ( d j , q ) = P ( R | d j ) / P ( R | d j ) = P ( d j | R ) P ( R ) P ( d j | R ) P ( R ) ~ P ( d j | R ) P ( d j | R ) Probabilistic Model Odds Ratio sim ( d j | q ) ~ [ s g i ( d j ) = 1 P ( k i | R )   t-s g i ( d j ) = 0 P ( k i | R ) ] / [ s g i ( d j ) = 1 P ( k i | R )   t-s g i ( d j ) = 0 P ( k i | R ) ] ~ s 1 log 2 ( P ( k i | R ) / P ( k i | R )) + t t-s log 2 ( P ( k i | R ) / P ( k i | R ))

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Fuzzy Set Theory Definition . Let A and B be two fuzzy subsets of U . Also, let ¬ A be the complement of A . A , u ) = 1 - ( A , u ) ( A B , u ) = max( ( A , u ), ( B , u )) ( A B , u ) = min( ( A , u ), ( B , u )) ( i , j ) = 1 - (1 - c ( i , l )) k l d j Let c ( i , l ) be a normalized correlation factor for ( k i , k l ): c ( i , l ) = n ( i , l ) n i + n l - n ( i , l ) n i : number of documents which contain k i n l : number of documents which contain k l n ( i , l ): number of documents which contain both k i & k l
Extended Boolean Operations OR: A n OR B m = ( A n OR B ) AND ( A OR B m ) AND : A n AND B m = ( A n AND B ) OR ( A AND B m ) NOT A n AND NOT B m = ( A AND NOT B m ) OR ( A n AND NOT B ) The vector associated with the index term k i is computed as: c i,r m r k i = ( c i,r ) , where C i,r = w i j (C i,r is a correlation factor ) Forming the Term Vectors r , g i ( m r )=1 r , g i ( m r )=1 2 d j | l , g l ( d j )= g l ( m r ) ____ _ ______ _ ______________ A degree of correlation between the terms k i and k j can now be computed as: k i k j = c i,r × c j,r r , g i ( m r )=1 g j ( m r )=1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Precision and Recall Let R be the set of relevant documents, and | R | be the number of
This is the end of the preview. Sign up to access the rest of the document.

## This document was uploaded on 10/18/2011.

### Page1 / 38

Equations - The Vector(Space Model Sim(q dj = cos =[dj q...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online