6.4.Genetic Algorithms - Aims This lecture will enable you...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Aims This lecture will enable you to describe and reproduce machine learning approaches using genetic algorithms. Following it you should be able to: 11s1: COMP9417 Machine Learning and Data Mining Genetic Algorithms • outline the framework of evolutionary computation • reproduce the prototypical genetic algorithm for machine learning • design representations for rule learning by a genetic algorithm April 5, 2011 • describe genetic algorithm operators such as mutation and crossover • outline the schema theorem • describe genetic programming Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997 http://www-2.cs.cmu.edu/~tom/mlbook.html • define the Baldwin effect [Recommended reading: Mitchell, Chapter 9] [Recommended exercises: 9.1 (9.2-9.4) ] COMP9417: April 5, 2011 Evolutionary Computation Genetic Algorithms: Slide 1 Biological Evolution Lamarck and others: • Computational procedures patterned after biological evolution • Search method that probabilistically applies operators to set of points in the search space • Can be viewed as form of stochastic optimization – aim to find approximate solutions to difficult optimization problems • Species “transmute” over time Darwin and Wallace: • Consistent, heritable variation among individuals in population • Natural selection of the “fittest” Mendel and genetics: • A mechanism for inheriting traits • mapping: genotype → phenotype COMP9417: April 5, 2011 Genetic Algorithms: Slide 2 COMP9417: April 5, 2011 Genetic Algorithms: Slide 3 A Genetic Algorithm for Machine Learning Representing Hypotheses for Genetic Algorithms GA(F itness, F itness threshold, p, r, m) Initialize: P ← p random hypotheses Evaluate: for each h in P , compute F itness(h) While [maxh F itness(h)] < F itness threshold Do 1. Select: Probabilistically select (1 − r)p members of P to add to PS . Pr(hi) = ￿pF itness(hi)h ) F itness( j =1 Represent (Outlook = Overcast ∨ Rain) ∧ (W ind = Strong ) by Outlook 011 j · 2. Crossover: Probabilistically select r2p pairs of hypotheses from P . For each pair, ￿h1, h2￿, produce two offspring by applying the Crossover operator. Add all offspring to Ps. 3. Mutate: Invert a randomly selected bit in m · p random members of Ps 4. Update: P ← Ps 5. Evaluate: for each h in P , compute F itness(h) Return hypothesis from P with highest fitness. COMP9417: April 5, 2011 Genetic Algorithms: Slide 4 Represent IF W ind = Strong Crossover Mask THEN P layT ennis = yes by Outlook 111 W ind 10 P layT ennis 10 COMP9417: April 5, 2011 Operators for Genetic Algorithms Initial strings W ind 10 Genetic Algorithms: Slide 5 Operators for Genetic Algorithms Offspring Mutation – new version of single parent Single-point crossover: 11101001000 11111000000 00001010101 Crossover – two new offspring from two parents 11101010101 00001001000 Parameters for operators chosen randomly at each application Two-point crossover: 11101001000 00111110000 Point mutation: single bit chosen at random and flipped 11001011000 00001010101 00101000101 Single-point crossover: n – number of bits contributed by first parent 11101001000 10001000100 Two-point crossover: n0, n1 – number of bits contributed by second, then first parent Uniform crossover: 10011010011 00001010101 Point mutation: 01101011001 11101001000 11101011000 Uniform crossover: non-contiguous bits chosen at random define contribution by each parent Many variations of these are possible; can be domain-specific COMP9417: April 5, 2011 Genetic Algorithms: Slide 6 COMP9417: April 5, 2011 Genetic Algorithms: Slide 7 Selecting Most Fit Hypotheses Selecting Most Fit Hypotheses How to measure fitness ? If hypotheses are rules: Other strategies may be better than fitness proportionate selection (can lead to “crowding” – many copies of similar individuals). • classification accuracy on data set, or Tournament selection: • accuracy combined with other factors, such as rule complexity • Pick h1, h2 at random with uniform probability Fitness proportionate selection: • With probability p, select the more fit. F itness(hi) Pr(hi) = ￿p j =1 F itness(hj ) Rank selection: • Sort all hypotheses by fitness Also called roulette wheel selection. Interpretation: select a hypothesis h on the basis of its fitness relative to the combined fitness of the population. COMP9417: April 5, 2011 Genetic Algorithms: Slide 8 • Probability of selection is proportional to rank COMP9417: April 5, 2011 Genetic Algorithms: Slide 9 GABIL GABIL [DeJong et al. 1993] Learns disjunctive set of propositional rules (competitive with C4.5) Genetic operators: ??? Representation: • want variable length rule sets • want only well-formed bitstring hypotheses IF a1 = T ∧ a2 = F THEN c = T ; IF a2 = T THEN c = F Fitness: F itness(h) = (percent correct(h))2 represented by a1 10 COMP9417: April 5, 2011 a2 01 c 1 a1 11 a2 10 c 0 Genetic Algorithms: Slide 10 COMP9417: April 5, 2011 Genetic Algorithms: Slide 11 Crossover with Variable-Length Bitstrings Crossover with Variable-Length Bitstrings If we choose ￿1, 3￿, go from: Start with h1 : a1 10 a2 01 c 1 a1 11 a2 10 c 0 h1 : a1 1[0 a2 01 c 1 a1 11 a2 1]0 c 0 h2 : 01 11 0 10 01 0 h2 : 0[1 1]1 0 10 01 0 to get crossover result: 1. choose crossover points for h1, e.g., after bits 1, 8 COMP9417: April 5, 2011 Genetic Algorithms: Slide 12 h3 : a1 11 a2 10 c 0 h4 : 2. now restrict points in h2 to those that produce bitstrings with welldefined semantics, e.g., ￿1, 3￿, ￿1, 8￿, ￿6, 8￿. a1 00 a2 01 c 1 a1 11 a2 11 c 0 a1 10 a2 01 COMP9417: April 5, 2011 GABIL Extensions c 0 Genetic Algorithms: Slide 13 GABIL Results Add new genetic operators, also applied probabilistically: Performance of GABIL comparable to symbolic rule/tree learning methods C4.5, ID5R, AQ14 1. AddAlternative: generalize constraint on ai by changing a 0 to 1 2. DropCondition: generalize constraint on ai by changing every 0 to 1 • GABIL without AA and DC operators: 92.1% accuracy And, add new field to bitstring to determine whether to allow these a1 01 a2 11 c 0 a1 10 a2 01 c 0 AA 1 Average performance on a set of 12 synthetic problems: • GABIL with AA and DC operators: 95.2% accuracy • symbolic learning methods ranged from 91.2 to 96.6 DC 0 So now the learning strategy also evolves, i.e., learning to learn ! COMP9417: April 5, 2011 Genetic Algorithms: Slide 14 COMP9417: April 5, 2011 Genetic Algorithms: Slide 15 Schemas Consider Just Selection How to characterize evolution of population in GA? ¯ • f (t) = average fitness of pop. at time t Schema = string containing 0, 1, * (“don’t care”) • m(s, t) = instances of schema s in pop at time t • u(s, t) = ave. fitness of instances of s at time t ˆ • Typical schema: 10**0* • Instances of above schema: 101101, 100000, ... Probability of selecting h in one selection step Pr(h) = Characterize population by number of instances representing each possible schema = • m(s, t) = number of instances of schema s in pop at time t COMP9417: April 5, 2011 Genetic Algorithms: Slide 16 Slide 17 ￿ ￿ u(s, t) ˆ d( s ) E [m(s, t + 1)] ≥ ¯ m(s, t) 1 − pc (1 − pm)o(s) l−1 f ( t) ￿ f ( h) ¯ n f ( t) h ∈ s∩ p • m(s, t) = instances of schema s in pop at time t t ¯ • f (t) = average fitness of pop. at time t u(s, t) ˆ ¯ m(s, t) n f ( t) • u(s, t) = ave. fitness of instances of s at time t ˆ • pc = probability of single point crossover operator Expected number of instances of s after n selections E [m(s, t + 1)] = Genetic Algorithms: Schema Theorem Probability of selecting an instance of s in one step = f ( h) ¯ n f ( t) COMP9417: April 5, 2011 Consider Just Selection Pr(h ∈ s) = f ( h) ￿n i=1 f (hi ) • pm = probability of mutation operator • l = length of single bit strings u(s, t) ˆ m(s, t) ¯ f ( t) • o(s) number of defined (non “*”) bits in s • d(s) = distance between leftmost, rightmost defined bits in s COMP9417: April 5, 2011 Genetic Algorithms: Slide 18 COMP9417: April 5, 2011 Genetic Algorithms: Slide 19 Genetic Programming Crossover Population of programs represented by trees ￿ sin(x) + x2 + y + + ^ sin sin 2 x + + x ^ y x + + sin + ^ sin x + x sin 2 ^ x + x 2 + x x COMP9417: April 5, 2011 2 Genetic Algorithms: Slide 20 COMP9417: April 5, 2011 Block Problem Genetic Algorithms: Slide 21 Block Problem Primitive functions: v u l a • (MS x): (“move to stack”), if block x is on the table, moves x to the top of the stack and returns the value T . Otherwise, does nothing and returns the value F . i • (MT x): (“move to table”), if block x is somewhere in the stack, moves the block at the top of the stack to the table and returns the value T . Otherwise, returns F . Goal: spell UNIVERSAL Terminals: • (EQ x y ): (“equal”), returns T if x equals y , and returns F otherwise. • CS (“current stack”) = name of the top block on stack, or F . • TB (“top correct block”) = name of topmost correct block on stack • NN (“next necessary”) = name of the next block needed above TB in the stack COMP9417: April 5, 2011 y y y ^ n e s r y 2 x Genetic Algorithms: Slide 22 • (NOT x): returns T if x = F , else returns F • (DU x y ): (“do until”) executes the expression x repeatedly until expression y returns the value T COMP9417: April 5, 2011 Genetic Algorithms: Slide 23 Genetic Programming Block Problem Learned Program: More interesting example: design electronic filter circuits Trained to fit 166 test problems • Individuals are programs that transform begining circuit to final circuit, by adding/subtracting components and connections Using population of 300 programs, found this after 10 generations: • Use population of 640,000, run on 64 node parallel processor (EQ (DU (MT CS)(NOT CS)) (DU (MS NN)(NOT NN)) ) • Discovers circuits competitive with best human designs John Koza and colleagues – many applications of GP. COMP9417: April 5, 2011 Genetic Algorithms: Slide 24 COMP9417: April 5, 2011 GP for Classifying Images Genetic Algorithms: Slide 25 Biological Evolution Fitness: based on coverage and accuracy Representation: Lamarck (19th century) • Primitives include Add, Sub, Mult, Div, Not, Max, Min, Read, Write, If-Then-Else, Either, Pixel, Least, Most, Ave, Variance, Difference, Mini, Library • Mini refers to a local subroutine that is separately co-evolved • Library refers to a global library subroutine (evolved by selecting the most useful minis) • Believed individual genetic makeup was altered by lifetime experience • But current evidence contradicts this view What is the impact of individual learning on population evolution? Genetic operators: • Crossover, mutation • Create “mating pools” and use rank proportionate reproduction COMP9417: April 5, 2011 Genetic Algorithms: Slide 26 COMP9417: April 5, 2011 Genetic Algorithms: Slide 27 Baldwin Effect Baldwin Effect Assume Plausible example: • Individual learning has no direct influence on individual DNA 1. New predator appears in environment • But ability to learn reduces need to “hard wire” traits in DNA 2. Individuals who can learn (to avoid it) will be selected Then 3. Increase in learning individuals will support more diverse gene pool • Ability of individuals to learn will support more diverse gene pool 4. Resulting in faster evolution – Because learning allows individuals with various “hard wired” traits to be successful 5. Possibly resulting in selection for new non-learned traits such as instinctive fear of predator • More diverse gene pool will support faster evolution of gene pool → individual learning (indirectly) increases rate of evolution COMP9417: April 5, 2011 Genetic Algorithms: Slide 28 COMP9417: April 5, 2011 Computer Experiments on Baldwin Effect Genetic Algorithms: Slide 29 Summary: Genetic Algorithms Evolve simple neural networks: • Conduct randomized, parallel, hill-climbing search through H • Some network weights fixed during lifetime, others trainable • Genetic makeup determines which are fixed, and their weight values • Nice feature: evaluation of Fitness can be very indirect – consider learning rule set for multistep decision making – no issue of assigning credit/blame to indiv. steps Results: • With no individual learning, population failed to improve over time • When individual learning allowed – Early generations: population contained many individuals with many trainable weights – Later generations: higher fitness, while number of trainable weights decreased COMP9417: April 5, 2011 • Approach learning as optimization problem (optimize fitness) Genetic Algorithms: Slide 30 • Generalized to Genetic Programming • Also consider alternative methods of probabilistic optimization COMP9417: April 5, 2011 Genetic Algorithms: Slide 31 ...
View Full Document

Ask a homework question - tutors are online