*This preview shows
page 1. Sign up
to
view the full content.*

**Unformatted text preview: **Acknowledgement: Material derived from slides for the book
Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997
http://www-2.cs.cmu.edu/~tom/mlbook.html 11s1: COMP9417 Machine Learning and Data Mining and the book Inductive Logic Programming: Techniques and Applications
by N. Lavrac and S. Dzeroski, Ellis Horwood, New York, 1994
(available at http://www-ai.ijs.si/SasoDzeroski/ILPBook/) Learning and Logic and the paper by A. Cootes, S.H. Muggleton, and M.J.E. Sternberg
“The automatic discovery of structural principles describing
protein fold space”. Journal of Molecular Biology, 2003.
(available at http://www.doc.ic.ac.uk/~shm/jnl.html) April 19, 2010 and the book Data Mining (2e), Ian H. Witten and Eibe Frank,
Morgan Kaufmann, 2005. http://www.cs.waikato.ac.nz/ml/weka Aims Relevant programs This lecture will introduce you to theoretical and applied aspects of
representing hypotheses for machine learning in ﬁrst-order logic. Following
it you should be able to:
• outline the key diﬀerences between propositional and ﬁrst-order learning
• describe the problem of learning relations and some applications
• reproduce the basic FOIL algorithm and its use of information gain
• outline the problem of induction in terms of inverse deduction Progol
http://www.doc.ic.ac.uk/~shm/progol.html
Aleph
http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph
FOIL
http://www.rulequest.com/Personal/
iProlog
http://www.cse.unsw.edu.au/~claude/research/software/ • describe inverse resolution and least general generalisation Golem
http://www.doc.ic.ac.uk/~shm/golem.html • deﬁne the θ-subsumption generality ordering for clauses See also: [Recommended reading: Mitchell, Chapter 10]
[Recommended exercises: 10.5 – 10.7 (10.8)]
COMP9417: April 19, 2010 Learning and Logic: http://www-ai.ijs.si/~ilpnet2/systems/ Slide 1 COMP9417: April 19, 2010 Learning and Logic: Slide 2 Representation in Propositional Logic Meaning in Propositional Logic Propositional variables: P, Q, R, . . . Propositional variables stand for declarative sentences (properties): Negation: ¬S, ¬T, . . . P the paper is red Logical connectives: ∧, ∨, ←, ↔ Q the solution is acid Well-formed formulae: P ∨ Q, (¬R ∧ S ) → T , etc. Potentially useful inferences:
P →Q Inference rules: If the paper is red then the solution is acid Meaning of such formulae can be understood with a truth table:
modus ponens Given B and A ← B infer A modus tollens Given ¬A and A ← B infer ¬B P
T
T
F
F Enable sound or valid inference. COMP9417: April 19, 2010 Learning and Logic: Slide 3 Q
T
F
T
F P →Q
T
F
T
T COMP9417: April 19, 2010 Representation in First-Order Predicate Logic Learning and Logic: Slide 4 Meaning in First-Order Logic We have a richer language for developing formulae: Same basic idea as propositional logic, but more complicated. constant symbols: Fred, Jane, Copper, Manganese, . . . Give meaning to ﬁrst-order logic formulae by interpretation with respect
to a given domain D by associating function symbols: Cons, Succ, . . .
variable symbols: x, y, z, . . .
predicate symbols: Parent, Likes, Binds, . . . • each constant symbol with some element of D We still have: • each n-ary function symbol with some function from Dn to D Negation: ¬Likes(Bob, Footy), . . . • each n-ary predicate symbol with some relation in Dn but we also have quantiﬁcation: For variables, essentially consider associating all or some domain elements
in the formula, depending on quantiﬁcation. ∀xLikes(x, Fred), ∃y Binds(Copper, y ) Interpretation is association of a formula with a truth-valued statement
about the domain. Logical connectives: ∧, ∨, ←, ↔ And we still have well-formed formulae and inference rules . . .
COMP9417: April 19, 2010 Learning and Logic: Slide 5 COMP9417: April 19, 2010 Learning and Logic: Slide 6 Learning First Order Rules Learning First Order Rules How to learn concepts about nodes in a graph ? Why do that?
• trees, rules so far have allowed only comparisons of a variable with a
constant value (e.g., sky = sunny, temperature < 45)
• these are propositional representations – have same expressive power
as propositional logic
• to express more powerful concepts, say involving relationships between
example objects, propositional representations are insuﬃcient, and we
need a more expressive representation
E.g., to classify X depending on it’s relation R to another object Y COMP9417: April 19, 2010 Learning and Logic: Slide 7 • Cannot use ﬁxed set of attributes where each attribute describes a
linked node (how many attributes ?)
• Cannot use ﬁxed set of attributes to learn connectivity concepts . . . COMP9417: April 19, 2010 Learning and Logic: Slide 8 Prolog deﬁnitions for relational concepts Learning First Order Rules BUT in ﬁrst order logic sets of rules can represent graph concepts such as Some Prolog syntax: Ancestor(x, y ) ← P arent(x, y )
Ancestor(x, y ) ← P arent(x, z ) ∧ Ancestor(z, y ) • all predicate and constant names begin with a lower-case letter The declarative programming language Prolog is based on the Horn
clause subset of ﬁrst-order logic – a form of Logic Programming :
• Prolog is a general purpose programming language: logic programs
are sets of ﬁrst order rules
• “pure” Prolog is Turing complete, i.e., can simulate a Universal
Turing machine (every computable function)
• learning in this representation is called Inductive Logic Programming
(ILP)
COMP9417: April 19, 2010 Learning and Logic: Slide 9 – predicate (relation) names, e.g. uncle, adjacent
– constant names, e.g. fred, banana
• all variable names begin with an upper-case letter
– X, Y, Head, Tail
• a predicate is speciﬁed by its name and arity (number of arguments),
e.g.
– male/1 means the predicate “male” with one argument
– sister/2 means the predicate “sister of” with two arguments COMP9417: April 19, 2010 Learning and Logic: Slide 10 Prolog deﬁnitions for relational concepts Prolog deﬁnitions for relational concepts • predicates are deﬁned by sets of clauses, each with that predicate in its
head
– e.g. the recursive deﬁnition of ancestor/2 • each instance of a relation name in a clause is called a literal
• a deﬁnite clause has exactly one literal in the clause head
• a Horn clause has at most one literal in the clause head
• Prolog programs are sets of Horn clauses ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y). • Prolog is a form of logic programming (many approaches) • clause head, e.g. ancestor/2, is to the left of the ’:-’ • related to SQL, functional programming, . . . • clause body, e.g. parent(X,Z), ancestor(Z,Y), is to the right of
the ’:-’ COMP9417: April 19, 2010 Learning and Logic: Slide 11 COMP9417: April 19, 2010 Induction as Inverted Deduction Slide 12 Induction as Inverted Deduction Induction is ﬁnding h such that where Learning and Logic: “pairs of people, u, v such that child of u is v ,” (∀xi, f (xi) ∈ D) B ∧ h ∧ xi f (xi)
f ( xi ) : Child(Bob, Sharon) xi : M ale(Bob), F emale(Sharon), F ather(Sharon, Bob) • xi is ith training instance B: • f (xi) is the target function value for xi
• B is other background knowledge P arent(u, v ) ← F ather(u, v ) What satisﬁes (∀xi, f (xi) ∈ D) B ∧ h ∧ xi f (xi)?
h1 : Child(u, v ) ← F ather(v, u) h2 : Child(u, v ) ← P arent(v, u) So let’s design inductive algorithm by inverting operators for automated
deduction!
COMP9417: April 19, 2010 Learning and Logic: Slide 13 COMP9417: April 19, 2010 Learning and Logic: Slide 14 Induction as Inverted Deduction Induction as Inverted Deduction Induction is, in fact, the inverse
operation of deduction, and cannot
be conceived to exist without the
corresponding operation, so that
the question of relative importance
cannot arise. Who thinks of asking
whether addition or subtraction is
the more important process in
arithmetic? But at the same time
much diﬀerence in diﬃculty may
exist between a direct and inverse
operation; . . . it must be allowed
that inductive investigations are of
a far higher degree of diﬃculty and
complexity than any questions of
deduction. . . . (W.S. Jevons, 1874)
COMP9417: April 19, 2010 Learning and Logic: Slide 15 A photograph of the Logic
Piano invented by William
Stanley Jevons. Photograph
taken
at
the
Sydney
Powerhouse Museum on March
5, 2006. This item is part of
the collection of the Museum
of the History of Science,
Oxford and was on loan to
the Powerhouse Museum.
[From:
http://commons.
wikimedia.org/wiki/File:
William_Stanley_Jevons_
Logic_Piano.jpg, April 19,
2010]
COMP9417: April 19, 2010 Induction as Inverted Deduction Learning and Logic: Slide 16 Induction as Inverted Deduction We have mechanical deductive operators F (A, B ) = C , where A ∧ B C Positives: need inductive operators • Subsumes earlier idea of ﬁnding h that “ﬁts” training data
• Domain theory B helps deﬁne meaning of “ﬁt” the data O(B, D) = h where (∀xi, f (xi) ∈ D) (B ∧ h ∧ xi) f (xi) B ∧ h ∧ xi f ( xi )
• Suggests algorithms that search H guided by B COMP9417: April 19, 2010 Learning and Logic: Slide 17 COMP9417: April 19, 2010 Learning and Logic: Slide 18 Deduction: Resolution Rule Induction as Inverted Deduction Negatives: P
¬L
P • Doesn’t allow for noisy data. Consider
(∀xi, f (xi) ∈ D) (B ∧ h ∧ xi) f (xi) ∨
∨
∨ L
R
R 1. Given initial clauses C1 and C2, ﬁnd a literal L from clause C1 such
that ¬L occurs in clause C2 • First order logic gives a huge hypothesis space H 2. Form the resolvent C by including all literals from C1 and C2, except
for L and ¬L. More precisely, the set of literals occurring in the
conclusion C is → overﬁtting...
→ intractability of calculating all acceptable h’s C = (C1 − {L}) ∪ (C2 − {¬L})
where ∪ denotes set union, and “−” denotes set diﬀerence.
COMP9417: April 19, 2010 Learning and Logic: Slide 19 COMP9417: April 19, 2010 Inverting Resolution
C : KnowMaterial
2
C : PassExam V
1 Slide 20 Inverting Resolution (Propositional) V Study C : KnowMaterial
2
C : PassExam V
1 KnowMaterial Learning and Logic: V Study 1. Given initial clauses C1 and C , ﬁnd a literal L that occurs in clause
C1, but not in clause C .
2. Form the second clause C2 by including the following literals KnowMaterial C2 = (C − (C1 − {L})) ∪ {¬L}
3. Given initial clauses C2 and C , ﬁnd a literal ¬L that occurs in clause
C2, but not in clause C .
4. Form the second clause C1 by including the following literals
C: PassExam V COMP9417: April 19, 2010 Study C: PassExam V C1 = (C − (C2 − {¬L})) ∪ {L} Study Learning and Logic: Slide 21 COMP9417: April 19, 2010 Learning and Logic: Slide 22 Duce operators First order resolution Op Same Head V Identiﬁcation
p ← A, B
q←B
p ← A, q
p ← A, q Absorption
p ← A, B
q←A Intra-construction
p ← A, B1 w ← B1
p ← A, B2 w ← B2
p ← A, w Inter-construction
p1 ← A, B1 p1 ← w, B1
p2 ← A, B2 p2 ← w, B2
w←A W Diﬀerent Head First order resolution:
1. Find a literal L1 from clause C1, literal L2 from clause C2, and
substitution θ such that L1θ = ¬L2θ p ← q, B
q←A 2. Form the resolvent C by including all literals from C1θ and C2θ, except
for L1θ and ¬L2θ. More precisely, the set of literals occurring in the
conclusion C is
C = (C1 − {L1})θ ∪ (C2 − {L2})θ Each operator is read as: pre-conditions on left, post-conditions on right. COMP9417: April 19, 2010 Learning and Logic: Slide 23 COMP9417: April 19, 2010 Inverting First order resolution Learning and Logic: Slide 24 Cigol Factor θ
C = (C1 − {L1})θ1) ∪ (C2 − {L2})θ2 Father (Tom, Bob) GrandChild( y,x ) V C2 should have no common literals with C1 Father ( x,z ) V Father ( z,y ) {Bob/y, Tom/z} C − (C1 − {L1})θ1) = (C2 − {L2})θ2
Father (Shannon, Tom ) −
By deﬁnition of resolution L2 = ¬L1θ1θ2 1
−
−
C2 = (C − (C1 − {L1})θ1)θ2 1 ∪ {¬L1θ1θ2 1} GrandChild ( Bob,x) V Father ( x,Tom ) {Shannon/x} GrandChild ( Bob, Shannon) COMP9417: April 19, 2010 Learning and Logic: Slide 25 COMP9417: April 19, 2010 Learning and Logic: Slide 26 Subsumption and Generality LGG θ-subsumption C θ-subsumes D if there is a substitution θ such that
C θ ⊆ D.
C is at least as general as D (C ≤ D) if C θ-subsumes D. Plotkin (1969, 1970, 1971), Reynolds (1969)
• LGG of clauses is based on LGGs of literals
• Lgg of literals is based on LGGs of terms, i.e. constants and variables If C θ-subsumes D then C logically entails D (but not the reverse).
θ-subsumption is a partial order, thus generates a lattice in which any two
clauses have a least-upper-bound and a greatest-lower-bound. • LGG of two constants is a variable, i.e. a minimal generalisation The least general generalisation (LGG) of two clauses is their least-upperbound in the θ-subsumption lattice. COMP9417: April 19, 2010 Learning and Logic: Slide 27 COMP9417: April 19, 2010 Learning and Logic: Slide 28 LGG of atoms LGG of clauses Two atoms are compatible if they have the same predicate symbol and
arity (number of arguments) The LGG of two clauses C1 and C2 is formed by taking the LGGs of each
literal in C1 with every literal in C2. • lgg(a, b) for diﬀerent constants or functions with diﬀerent function
symbols is the variable X Clauses form a subsumption lattice, with LGG as least upper bound and
MGI (most general instance) as lower bound. • lgg(f (a1, ..., an), f (b1, ..., bn)) is f (lgg(a1, b1), ..., lgg(an, bn)) Lifts the concept learning lattice to a ﬁrst-order logic representation. • lgg(Y1, Y2) for variables Y1, Y2 is the variable X Leads to relative LGGs with respect to background knowledge. Note:
1. must ensure that the same variable appears everywhere its bound
arguments do in the atom
2. must ensure introduced variables appear nowhere in the original atoms
COMP9417: April 19, 2010 Learning and Logic: Slide 29 COMP9417: April 19, 2010 Learning and Logic: Slide 30 Subsumption lattice RLGG – LGG relative to background knowledge g
Example from Quinlan (1991)
g ’ ’ Given two ground instances of target predicate Q/k , Q(c1, c2, . . . , ck )
and Q(d1, d2, . . . , dk ), plus other logical relations representing background
knowledge that may be relevant to the target concept, the relative least
general generalisation (rlgg) of these two instances is: g g lgg(a,b) g g a b
i i ’i Q(lgg(c1, d1), lgg(c2, d2), . . .) ← mgi(a,b) {lgg(r1, r2)} for every pair r1, r2 of ground instances from each relation in the
background knowledge. ’
i i i
COMP9417: April 19, 2010 Learning and Logic: Slide 31 COMP9417: April 19, 2010 Learning and Logic: RLGG Example RLGG Example Predicate
Scene
On
Left-of
Circle
Square
Triangle This ﬁgure depicts two scenes s1 and s2 and may be described by the
predicates Scene/1, On/3, Left-of/2, Circle/1, Square/1 and Triangle/1. COMP9417: April 19, 2010 Slide 32 Learning and Logic: Slide 33 COMP9417: April 19, 2010 Ground Instances (tuples)
{< s1 >, < s2 >}
{< s1, a, b >, < s2, f, e >}
{< s1, b, c >, < s2, d, e >}
{< a >, < f >}
{< b >, < d >}
{< c >, < e >}
Learning and Logic: Slide 34 RLGG Example RLGG Example To compute RLGG of the two scenes generate the clause: Compute LGGs to introduce variables into the ﬁnal clause: Scene(lgg(s1, s2)) ←
On(lgg(s1, s2), lgg(a, f ), lgg(b, e)),
Left-of(lgg(s1, s2), lgg(b, d), lgg(c, e)),
Circle(lgg(a, f )),
Square(lgg(b, d)),
Triangle(lgg(c, e)) Scene(A) ←
On(A, B, C ),
Left-of(A, D, E ),
Circle(B ),
Square(D),
Triangle(E ) COMP9417: April 19, 2010 Learning and Logic: Slide 35 COMP9417: April 19, 2010 Learning and Logic: Slide 36 Example: First Order Rule for Classifying Web Pages Learning First Order Rules [Slattery, 1997] • to learn logic programs we can adopt propositional rule learning methods course(A) ←
has-word(A, instructor),
not has-word(A, good),
link-from(A, B),
has-word(B, assign),
not link-from(B, C) • the target relation is clause head, e.g. ancestor/2
– think of this as the consequent • the clause body is constructed using predicates from background
knowledge
– think of this as the antecedent Train: 31/31, Test: 31/34 • unlike propositional rules ﬁrst order rules can have
– variables
– tests on more than one variable at a time
– recursion Can learn graph-type representations. • learning is set up as a search through the hypothesis space of ﬁrst order
rules
COMP9417: April 19, 2010 Learning and Logic: Slide 37 COMP9417: April 19, 2010 Learning and Logic: Slide 38 FOIL(T arget predicate, P redicates, Examples)
P os := positive Examples
N eg := negative Examples
while P os, do
// Learn a N ewRule
N ewRule := most general rule possible
N ewRuleN eg := N eg
while N ewRuleN eg , do
// Add a new literal to specialize N ewRule
Candidate literals := generate candidates
Best literal := argmaxL∈Candidate literals F oil Gain(L, N ewRule)
add Best literal to N ewRule preconditions
N ewRuleN eg := subset of N ewRuleN eg that
satisﬁes N ewRule preconditions
Learned rules := Learned rules + N ewRule
P os := P os − {members of P os covered by N ewRule}
Return Learned rules
COMP9417: April 19, 2010 Learning and Logic: Slide 39 Specializing Rules in FOIL
Learning rule: P (x1, x2, . . . , xk ) ← L1 . . . Ln Candidate specializations
add new literal of form:
• Q(v1, . . . , vr ), where at least one of the vi in the created literal must
already exist as a variable in the rule.
• Equal(xj , xk ), where xj and xk are variables already present in the
rule
• The negation of either of the above forms of literals COMP9417: April 19, 2010 Completeness and Consistency (Correctness) COMP9417: April 19, 2010 Learning and Logic: Learning and Logic: Slide 40 Completeness and Consistency (Correctness) Slide 41 COMP9417: April 19, 2010 Learning and Logic: Slide 42 Variable Bindings Information Gain in FOIL • A substitution replaces variables by terms
• Substitution θ applied to literal L is written Lθ Where • If θ = {x/3, y/z } and L = P (x, y ) then Lθ = P (3, z )
FOIL bindings are substitutions mapping each variable to a constant:
F oil Gain(L, R) ≡ t log2 p1
p0
− log2
p1 + n 1
p0 + n 0 • L is the candidate literal to add to rule R
• p0 = number of positive bindings of R GrandDaughter(x, y ) ← • n0 = number of negative bindings of R With 4 constants in our examples we have 16 possible bindings: • p1 = number of positive bindings of R + L {x/V ictor, y/Sharon}, {x/V ictor, y/Bob}, . . . • n1 = number of negative bindings of R + L With 1 positive example of GrandDaughter, other 15 bindings are negative: • t is the number of positive bindings of R also covered by R + L GrandDaughter(V ictor, Sharon)
COMP9417: April 19, 2010 Learning and Logic: Slide 43 COMP9417: April 19, 2010 Learning and Logic: Slide 44 FOIL Example
Learning with FOIL Information Gain in FOIL Note
• − log2 p0p0n0 is minimum number of bits to identify an arbitrary positive
+
binding among the bindings of R Background
Family Tree Fred - Mary Alice - Tom • F oil Gain(L, R) measures the reduction due to L in the total number
of bits needed to encode the classiﬁcation of all positive bindings of R Bob - Cindy John - Barb • − log2 p1p1n1 is minimum number of bits to identify an arbitrary positive
+
binding among the bindings of R + L Ann - Frank Carol Ted Target Predicate: ancestor
COMP9417: April 19, 2010 Learning and Logic: Slide 45 COMP9417: April 19, 2010 Learning and Logic: New clause: ancestor(X,Y) :-.
Best antecedent: parent(X,Y) Gain: 31.02 Slide 46 Carol Ted Target Predicate: ancestor
FOIL Example Completeness and Correctness New clause: ancestor(X,Y) :-.
Best antecedent: parent(X,Y) Gain: 31.02
Learned clause: ancestor(X,Y) :- parent(X,Y). 7 New clause: ancestor(X,Y) :-.
Best antecedent: parent(Z,Y) Gain: 13.65
Best antecedent: ancestor(X,Z) Gain: 27.86
Learned clause: ancestor(X,Y) :- parent(Z,Y),
ancestor(X,Z). 2 Learning and Logic: 4 3 6
5 1 Deﬁnition: ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- parent(Z,Y),
ancestor(X,Z).
COMP9417: April 19, 2010 8 0 x
Slide 47 y represents COMP9417: April 19, 2010 LinkedTo(x,y)
Learning and Logic: Slide 48 17 FOIL as a propositional learner FOIL Example Instances: • target predicate is usual form of class value and attribute values • pairs of nodes, e.g 1, 5, with graph described by literals LinkedTo(0,1),
¬ LinkedTo(0,8) etc. • literals restricted to those in typical propositional learners
– Vi = const, Vi > num, Vi ≤ num Target function: • plus extended set • CanReach(x,y) true iﬀ directed path from x to y – Vi = Vj , Vi ≥ Vj Hypothesis space: • FOIL results vs C4.5 • Each h ∈ H is a set of Horn clauses using predicates LinkedT o (and
CanReach) COMP9417: April 19, 2010 – Class1(V1, V2, . . . , Vm), Class2(V1, V2, . . . , Vm), . . . Learning and Logic: Slide 49 – accuracy competitive, especially with extended literal set
– FOIL required longer computation
– C4.5 more compact, i.e. better pruning
COMP9417: April 19, 2010 Learning and Logic: Slide 50 FOIL learns Prolog programs from examples FOIL learns Prolog programs from examples • from I. Bratko’s book ”Prolog Programming for Artiﬁcial
Intelligence”
• introductory list programming problems
• training sets by randomly sampling from universe of 3 and 4 element
lists
• FOIL learned most predicates completely and correctly
– some predicates learned in restricted
– some learned in more complex form than in book
– most learned in few seconds, some much longer COMP9417: April 19, 2010 Learning and Logic: Slide 51 COMP9417: April 19, 2010 Determinate Literals Learning and Logic: Slide 52 Determinate Literals Reﬁning clause A ← L1, L2, . . . , Lm−1 • adding a new literal Q(X, Y ) where Y is the unique value for X • a new literal Lm is determinate if • this will result in zero gain !
• FOIL gives a small positive gain to literals introducing a new variable
• BUT there may be many such literals – Lm introduces new variable(s)
– there is exactly one extension of each positive tuple that satisﬁes Lm
– there is no more than one extension of each negative tuple that
satisﬁes Lm
So Lm preserves all positive tuples and does not increase the set of
bindings
Determinate literals allow growing the clause to overcome greedy search
myopia without blowing up the search space. COMP9417: April 19, 2010 Learning and Logic: Slide 53 COMP9417: April 19, 2010 Learning and Logic: Slide 54 Identifying document components Identifying document components • background knowledge • Problem: learn rules to locate logical components of documents
• documents have varying numbers of components
• relationships (e.g. alignment) between pairs of components
• inherently relational task
• target relations to identify sender, receiver, date, reference, logo. – 20 single page documents
– 244 components
– 57 relations specifying
∗ component type (text or picture)
∗ position on page
∗ alignment with other components
• test set error from 0% to 4% COMP9417: April 19, 2010 Learning and Logic: Slide 55 COMP9417: April 19, 2010 Learning and Logic: Slide 56 Text applications of ﬁrst-order logic in learning Identifying document components Q: when to use ﬁrst-order logic in machine learning ?
A: when relations are important. COMP9417: April 19, 2010 Learning and Logic: Slide 57 COMP9417: April 19, 2010 Learning and Logic: Slide 58 Representation for text Learning information extraction rules Example: text categorization, i.e. assign a document to one of a ﬁnite set
of categories.
Propositional learners: What is information extraction ? ﬁll a pre-deﬁned template from a given
text.
Partial approach to ﬁnding meaning of documents.
Given: examples of texts and ﬁlled templates • use a “bag-of-words”, often with frequency-based measures Learn: rules for ﬁlling template slots based on text • disregards word order, e.g. equivalence of
That’s true, I did not do it
That’s not true, I did do it
First-order learners: word-order predicates in background knowledge
has word(Doc, Word, Pos)
Pos1 < Pos2
COMP9417: April 19, 2010 Learning and Logic: Slide 59 COMP9417: April 19, 2010 Sample Job Posting Learning and Logic: Slide 60 Example job posting Subject: US-TN-SOFTWARE PROGRAMMER Subject: US-TN-SOFTWARE PROGRAMMER Date: 17 Nov 1996 17:37:29 GMT Date: 17 Nov 1996 17:37:29 GMT Organization: Reference.Com Posting Service Organization: Reference.Com Posting Service Message-ID: [email protected] Message-ID: [email protected] SOFTWARE PROGRAMMER SOFTWARE PROGRAMMER Position available for Software Programmer experienced in generating software for PC-Based Voice Mail systems. Experienced Position available for Software Programmer experienced in generating software for PC-Based Voice Mail systems. Experienced in C Programming. Must be familiar with communicating with and controlling voice cards; preferable Dialogic, however, in C Programming. Must be familiar with communicating with and controlling voice cards; preferable Dialogic, however, experience with others such as Rhetorix and Natural Microsystems is okay. Prefer 5 years or more experience with PC Based experience with others such as Rhetorix and Natural Microsystems is okay. Prefer 5 years or more experience with PC Based Voice Mail, but will consider as little as 2 years. Need to ﬁnd a Senior level person who can come on board and pick up code Voice Mail, but will consider as little as 2 years. Need to ﬁnd a Senior level person who can come on board and pick up code with very little training. Present Operating System is DOS. May go to OS-2 or UNIX in future. with very little training. Present Operating System is DOS. May go to OS-2 or UNIX in future. Please reply to: Please reply to: Kim Anderson Kim Anderson AdNET AdNET (901) 458-2888 fax (901) 458-2888 fax [email protected] [email protected] COMP9417: April 19, 2010 Learning and Logic: Slide 61 COMP9417: April 19, 2010 Learning and Logic: Slide 62 Example ﬁlled template A learning method for Information Extraction
Rapier (Caliﬀ and Mooney, 2002) is an ILP-based approach which learns
information extraction rules based on regular expression-type patterns id: [email protected]
title: SOFTWARE PROGRAMMER
salary:
company: Pre-Filler Patterns: what must match before ﬁller recruiter:
state: TN Filler Patterns: what the ﬁller pattern is city:
country: US Post-Filler Patterns: what must match after ﬁller language: C
platform: PC | DOS | OS-2 | UNIX
application: Algorithm uses a combined bottom-up (speciﬁc-to-general) and top-down
(general-to-speciﬁc) approach to generalise rules. area: Voice Mail
req years experience: 2
desired years experience: 5
req degree: syntactic analysis: Brill part-of-speech tagger desired degree:
post date: 17 Nov 1996 semantic analysis: WordNet (Miller, 1993)
COMP9417: April 19, 2010 Learning and Logic: Slide 63 COMP9417: April 19, 2010 Example rules from text to ﬁll the city slot in a job template: Progol: Reduce combinatorial explosion by generating most speciﬁc
acceptable h as lower bound on search space “. . . located in Atlanta, Georgia.” 1. User speciﬁes H by stating predicates, functions, and forms of
arguments allowed for each “. . . oﬃces in Kansas City, Missouri.”
Filler Pattern Post-Filler Pattern 1) word: in
tag: in 1) list: max length: 2
tag: nnp 1) word: ,
tag: ,
2) tag: nnp
semantic: state 2. Progol uses sequential covering algorithm.
For each xi, f (xi)
• Find most speciﬁc hypothesis hi s.t. B ∧ hi ∧ xi f (xi)
– actually, considers only k -step entailment where nnp denotes a proper noun (syntax) and state is a general label
from the WordNet ontology (semantics). COMP9417: April 19, 2010 Learning and Logic: Slide 64 Progol A learning method for Information Extraction Pre-Filler Pattern Learning and Logic: Slide 65 3. Conduct general-to-speciﬁc search bounded by speciﬁc hypothesis hi,
choosing hypothesis with minimum description length COMP9417: April 19, 2010 Learning and Logic: Slide 66 Protein structure fold(’Four-helical up-and-down bundle’,P) :helix(P,H1),
length(H1,hi),
position(P,H1,Pos),
interval(1 <= Pos <= 3),
adjacent(P,H1,H2),
helix(P,H2). H:6[79-88] H:5[111-113]
H:3[71-84] Protein structure H:1[19-37] H:4[61-64]
H:5[66-70]
H:1[8-17]
H:2[26-33] “The protein P has fold class ’Four-helical up-and-down bundle’ if it
contains a long helix H1 at a a secondary structure position between
1 and 3 and H1 is followed by a second helix H2”. E:2[96-98]
E:1[57-59] H:7[99-106]
H:4[93-108] H:2[41-64] H:3[40-50] 1omd - EF-Hand
2mhr - Four-helical up-and-down bundle COMP9417: April 19, 2010 Learning and Logic: Slide 67 COMP9417: April 19, 2010 Learning and Logic: Slide 68 COMP9417: April 19, 2010 Learning and Logic: Slide 70 Protein structure classiﬁcation
• Protein structure largely driven by careful inspection of experimental
data by human experts
• Rapid production of protein structures from structural-genomics
projects
• Machine-learning strategy that automatically determines structural
principles describing 45 classes of fold
• Rules learnt were both statistically signiﬁcant and meaningful to protein
experts
A. Cootes, S.H. Muggleton, and M.J.E. Sternberg
“The automatic discovery of structural principles describing
protein fold space”. Journal of Molecular Biology, 2003.
available at http://www.doc.ic.ac.uk/~shm/jnl.html
COMP9417: April 19, 2010 Learning and Logic: Slide 69 Summary Immunoglobulin:Has antiparallel sheets B and C; B has 3 strands, topology123;
C has 4 strands, topology 2134. • BUT: much more ... TIM barrel:- • learning in a general-purpose programming language Has between 5 and 9 helices; Has a parallel sheet of 8
strands. • use of rich background knowlege
• incorporate arbitrary program elements into clauses (rules) SH3:Has an antiparallel sheet B. C and D are the 1st and 4th
strands in the sheet B respectively. C and D are the end
strands of B and are 4.360 (+/- 2.18) angstroms apart. D
contains a proline in the c-terminal end. COMP9417: April 19, 2010 • can be viewed as an extended approach to rule learning Learning and Logic: Slide 71 • background knowledge can grow as a result of learning
• control search with declarative bias
• learning probabilistic logic programs COMP9417: April 19, 2010 Learning and Logic: Slide 72 ...

View
Full
Document