*This preview shows
page 1. Sign up
to
view the full content.*

**Unformatted text preview: **Aims
This lecture will enable you to describe and reproduce machine learning
approaches using genetic algorithms. Following it you should be able to: 11s1: COMP9417 Machine Learning and Data Mining Genetic Algorithms • outline the framework of evolutionary computation
• reproduce the prototypical genetic algorithm for machine learning
• design representations for rule learning by a genetic algorithm April 5, 2011 • describe genetic algorithm operators such as mutation and crossover
• outline the schema theorem
• describe genetic programming Acknowledgement: Material derived from slides for the book
Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997
http://www-2.cs.cmu.edu/~tom/mlbook.html • deﬁne the Baldwin eﬀect [Recommended reading: Mitchell, Chapter 9]
[Recommended exercises: 9.1 (9.2-9.4) ] COMP9417: April 5, 2011 Evolutionary Computation Genetic Algorithms: Slide 1 Biological Evolution
Lamarck and others: • Computational procedures patterned after biological evolution
• Search method that probabilistically applies operators to set of points
in the search space
• Can be viewed as form of stochastic optimization
– aim to ﬁnd approximate solutions to diﬃcult optimization problems • Species “transmute” over time
Darwin and Wallace:
• Consistent, heritable variation among individuals in population
• Natural selection of the “ﬁttest”
Mendel and genetics:
• A mechanism for inheriting traits
• mapping: genotype → phenotype COMP9417: April 5, 2011 Genetic Algorithms: Slide 2 COMP9417: April 5, 2011 Genetic Algorithms: Slide 3 A Genetic Algorithm for Machine Learning Representing Hypotheses for Genetic Algorithms GA(F itness, F itness threshold, p, r, m)
Initialize: P ← p random hypotheses
Evaluate: for each h in P , compute F itness(h)
While [maxh F itness(h)] < F itness threshold Do
1. Select: Probabilistically select (1 − r)p members of P to add to PS .
Pr(hi) = pF itness(hi)h )
F itness(
j =1 Represent
(Outlook = Overcast ∨ Rain) ∧ (W ind = Strong )
by
Outlook
011 j ·
2. Crossover: Probabilistically select r2p pairs of hypotheses from P .
For each pair, h1, h2, produce two oﬀspring by
applying the Crossover operator.
Add all oﬀspring to Ps.
3. Mutate: Invert a randomly selected bit in m · p random
members of Ps
4. Update: P ← Ps
5. Evaluate: for each h in P , compute F itness(h)
Return hypothesis from P with highest ﬁtness.
COMP9417: April 5, 2011 Genetic Algorithms: Slide 4 Represent
IF W ind = Strong Crossover Mask THEN P layT ennis = yes by
Outlook
111 W ind
10 P layT ennis
10 COMP9417: April 5, 2011 Operators for Genetic Algorithms
Initial strings W ind
10 Genetic Algorithms: Slide 5 Operators for Genetic Algorithms
Offspring Mutation – new version of single parent Single-point crossover:
11101001000 11111000000 00001010101 Crossover – two new oﬀspring from two parents 11101010101
00001001000 Parameters for operators chosen randomly at each application Two-point crossover:
11101001000 00111110000 Point mutation: single bit chosen at random and ﬂipped 11001011000 00001010101 00101000101 Single-point crossover: n – number of bits contributed by ﬁrst parent 11101001000 10001000100 Two-point crossover: n0, n1 – number of bits contributed by second, then
ﬁrst parent Uniform crossover:
10011010011 00001010101 Point mutation: 01101011001 11101001000 11101011000 Uniform crossover: non-contiguous bits chosen at random deﬁne
contribution by each parent
Many variations of these are possible; can be domain-speciﬁc COMP9417: April 5, 2011 Genetic Algorithms: Slide 6 COMP9417: April 5, 2011 Genetic Algorithms: Slide 7 Selecting Most Fit Hypotheses Selecting Most Fit Hypotheses How to measure ﬁtness ? If hypotheses are rules: Other strategies may be better than ﬁtness proportionate selection (can
lead to “crowding” – many copies of similar individuals). • classiﬁcation accuracy on data set, or Tournament selection: • accuracy combined with other factors, such as rule complexity • Pick h1, h2 at random with uniform probability Fitness proportionate selection: • With probability p, select the more ﬁt. F itness(hi)
Pr(hi) = p
j =1 F itness(hj ) Rank selection:
• Sort all hypotheses by ﬁtness Also called roulette wheel selection.
Interpretation: select a hypothesis h on the basis of its ﬁtness relative to
the combined ﬁtness of the population.
COMP9417: April 5, 2011 Genetic Algorithms: Slide 8 • Probability of selection is proportional to rank COMP9417: April 5, 2011 Genetic Algorithms: Slide 9 GABIL GABIL [DeJong et al. 1993]
Learns disjunctive set of propositional rules (competitive with C4.5) Genetic operators: ??? Representation: • want variable length rule sets
• want only well-formed bitstring hypotheses IF a1 = T ∧ a2 = F THEN c = T ; IF a2 = T THEN c = F Fitness:
F itness(h) = (percent correct(h))2 represented by
a1
10 COMP9417: April 5, 2011 a2
01 c
1 a1
11 a2
10 c
0 Genetic Algorithms: Slide 10 COMP9417: April 5, 2011 Genetic Algorithms: Slide 11 Crossover with Variable-Length Bitstrings Crossover with Variable-Length Bitstrings If we choose 1, 3, go from: Start with
h1 : a1
10 a2
01 c
1 a1
11 a2
10 c
0 h1 : a1
1[0 a2
01 c
1 a1
11 a2
1]0 c
0 h2 : 01 11 0 10 01 0 h2 : 0[1 1]1 0 10 01 0 to get crossover result: 1. choose crossover points for h1, e.g., after bits 1, 8 COMP9417: April 5, 2011 Genetic Algorithms: Slide 12 h3 : a1
11 a2
10 c
0 h4 : 2. now restrict points in h2 to those that produce bitstrings with welldeﬁned semantics, e.g., 1, 3, 1, 8, 6, 8. a1
00 a2
01 c
1 a1
11 a2
11 c
0 a1
10 a2
01 COMP9417: April 5, 2011 GABIL Extensions c
0 Genetic Algorithms: Slide 13 GABIL Results Add new genetic operators, also applied probabilistically: Performance of GABIL comparable to symbolic rule/tree learning
methods C4.5, ID5R, AQ14 1. AddAlternative: generalize constraint on ai by changing a 0 to 1
2. DropCondition: generalize constraint on ai by changing every 0 to 1 • GABIL without AA and DC operators: 92.1% accuracy And, add new ﬁeld to bitstring to determine whether to allow these
a1
01 a2
11 c
0 a1
10 a2
01 c
0 AA
1 Average performance on a set of 12 synthetic problems: • GABIL with AA and DC operators: 95.2% accuracy
• symbolic learning methods ranged from 91.2 to 96.6 DC
0 So now the learning strategy also evolves, i.e., learning to learn ! COMP9417: April 5, 2011 Genetic Algorithms: Slide 14 COMP9417: April 5, 2011 Genetic Algorithms: Slide 15 Schemas Consider Just Selection How to characterize evolution of population in GA? ¯
• f (t) = average ﬁtness of pop. at time t Schema = string containing 0, 1, * (“don’t care”) • m(s, t) = instances of schema s in pop at time t
• u(s, t) = ave. ﬁtness of instances of s at time t
ˆ • Typical schema: 10**0*
• Instances of above schema: 101101, 100000, ... Probability of selecting h in one selection step
Pr(h) = Characterize population by number of instances representing each possible
schema =
• m(s, t) = number of instances of schema s in pop at time t COMP9417: April 5, 2011 Genetic Algorithms: Slide 16 Slide 17
u(s, t)
ˆ
d( s )
E [m(s, t + 1)] ≥ ¯
m(s, t) 1 − pc
(1 − pm)o(s)
l−1
f ( t) f ( h)
¯
n f ( t)
h ∈ s∩ p • m(s, t) = instances of schema s in pop at time t t ¯
• f (t) = average ﬁtness of pop. at time t u(s, t)
ˆ
¯ m(s, t)
n f ( t) • u(s, t) = ave. ﬁtness of instances of s at time t
ˆ
• pc = probability of single point crossover operator Expected number of instances of s after n selections
E [m(s, t + 1)] = Genetic Algorithms: Schema Theorem Probability of selecting an instance of s in one step = f ( h)
¯
n f ( t) COMP9417: April 5, 2011 Consider Just Selection Pr(h ∈ s) = f ( h)
n
i=1 f (hi ) • pm = probability of mutation operator
• l = length of single bit strings u(s, t)
ˆ
m(s, t)
¯
f ( t) • o(s) number of deﬁned (non “*”) bits in s
• d(s) = distance between leftmost, rightmost deﬁned bits in s COMP9417: April 5, 2011 Genetic Algorithms: Slide 18 COMP9417: April 5, 2011 Genetic Algorithms: Slide 19 Genetic Programming Crossover Population of programs represented by trees
sin(x) + x2 + y + +
^ sin sin 2 x + + x
^ y x +
+ sin +
^ sin
x + x sin 2 ^
x + x
2 +
x x
COMP9417: April 5, 2011 2
Genetic Algorithms: Slide 20 COMP9417: April 5, 2011 Block Problem Genetic Algorithms: Slide 21 Block Problem Primitive functions: v u l a • (MS x): (“move to stack”), if block x is on the table, moves x to the
top of the stack and returns the value T . Otherwise, does nothing and
returns the value F . i • (MT x): (“move to table”), if block x is somewhere in the stack,
moves the block at the top of the stack to the table and returns the
value T . Otherwise, returns F . Goal: spell UNIVERSAL
Terminals: • (EQ x y ): (“equal”), returns T if x equals y , and returns F otherwise. • CS (“current stack”) = name of the top block on stack, or F .
• TB (“top correct block”) = name of topmost correct block on stack
• NN (“next necessary”) = name of the next block needed above TB in
the stack
COMP9417: April 5, 2011 y
y y ^ n
e
s
r y
2 x Genetic Algorithms: Slide 22 • (NOT x): returns T if x = F , else returns F
• (DU x y ): (“do until”) executes the expression x repeatedly until
expression y returns the value T
COMP9417: April 5, 2011 Genetic Algorithms: Slide 23 Genetic Programming Block Problem Learned Program: More interesting example: design electronic ﬁlter circuits Trained to ﬁt 166 test problems • Individuals are programs that transform begining circuit to ﬁnal circuit,
by adding/subtracting components and connections Using population of 300 programs, found this after 10 generations: • Use population of 640,000, run on 64 node parallel processor (EQ (DU (MT CS)(NOT CS)) (DU (MS NN)(NOT NN)) ) • Discovers circuits competitive with best human designs
John Koza and colleagues – many applications of GP. COMP9417: April 5, 2011 Genetic Algorithms: Slide 24 COMP9417: April 5, 2011 GP for Classifying Images Genetic Algorithms: Slide 25 Biological Evolution Fitness: based on coverage and accuracy
Representation: Lamarck (19th century) • Primitives include Add, Sub, Mult, Div, Not, Max, Min, Read, Write,
If-Then-Else, Either, Pixel, Least, Most, Ave, Variance, Diﬀerence,
Mini, Library
• Mini refers to a local subroutine that is separately co-evolved
• Library refers to a global library subroutine (evolved by selecting the
most useful minis) • Believed individual genetic makeup was altered by lifetime experience
• But current evidence contradicts this view
What is the impact of individual learning on population evolution? Genetic operators:
• Crossover, mutation
• Create “mating pools” and use rank proportionate reproduction
COMP9417: April 5, 2011 Genetic Algorithms: Slide 26 COMP9417: April 5, 2011 Genetic Algorithms: Slide 27 Baldwin Eﬀect Baldwin Eﬀect Assume Plausible example: • Individual learning has no direct inﬂuence on individual DNA 1. New predator appears in environment • But ability to learn reduces need to “hard wire” traits in DNA 2. Individuals who can learn (to avoid it) will be selected Then 3. Increase in learning individuals will support more diverse gene pool • Ability of individuals to learn will support more diverse gene pool 4. Resulting in faster evolution – Because learning allows individuals with various “hard wired” traits
to be successful 5. Possibly resulting in selection for new non-learned traits such as
instinctive fear of predator • More diverse gene pool will support faster evolution of gene pool
→ individual learning (indirectly) increases rate of evolution
COMP9417: April 5, 2011 Genetic Algorithms: Slide 28 COMP9417: April 5, 2011 Computer Experiments on Baldwin Eﬀect Genetic Algorithms: Slide 29 Summary: Genetic Algorithms Evolve simple neural networks:
• Conduct randomized, parallel, hill-climbing search through H • Some network weights ﬁxed during lifetime, others trainable
• Genetic makeup determines which are ﬁxed, and their weight values • Nice feature: evaluation of Fitness can be very indirect
– consider learning rule set for multistep decision making
– no issue of assigning credit/blame to indiv. steps Results:
• With no individual learning, population failed to improve over time
• When individual learning allowed
– Early generations: population contained many individuals with many
trainable weights
– Later generations: higher ﬁtness, while number of trainable weights
decreased
COMP9417: April 5, 2011 • Approach learning as optimization problem (optimize ﬁtness) Genetic Algorithms: Slide 30 • Generalized to Genetic Programming
• Also consider alternative methods of probabilistic optimization COMP9417: April 5, 2011 Genetic Algorithms: Slide 31 ...

View
Full
Document