1.2.Introduction to Machine Learning

1.2.Introduction to Machine Learning - Aims 11s1 COMP9417...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Aims 11s1: COMP9417 Machine Learning and Data Mining Introduction to Machine Learning This lecture will provide the basis for you to be able to describe the motivation, scope and some application areas of machine learning. Following it you should be able to: • describe the general learning problem March 1, 2011 • state some of the steps in setting up a learning problem • list some applications of machine learning • list some issues in machine learning Acknowledgement: Material derived from slides for the book Machine Learning, Tom Mitchell, McGraw-Hill, 1997 http://www-2.cs.cmu.edu/~tom/mlbook.html COMP9417: March 1, 2011 Overview Introduction to Machine Learning: Slide 1 Why Machine Learning [Recommended reading: Mitchell, Chapter 1] [Recommended exercises: 1.1,1.2, optionally 1.5] • Considerable progress in algorithms and theory • Growing flood of online data • Why Machine Learning? • Increasing computational power • What is a well-defined learning problem? • Many successful commercial/scientific applications • An example: learning to play checkers (draughts) • What questions should we ask about Machine Learning? COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 2 COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 3 Some definitions Three niches for machine learning: • Data mining: using historical data to improve decisions machine learning the science of algorithmic methods of learning from experience with the goal of improving performance on selected tasks – medical records → medical knowledge data mining the use of machine learning or statistical algorithms to search large amounts of data for hidden patterns or relationships that are interesting and potentially useful • Software applications we can’t program by hand – autonomous robots – speech recognition • Self customizing programs – Web sites that learn user interests COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 4 COMP9417: March 1, 2011 Typical data mining Task Patient103 time=1 Patient103 time=2 Age: 23 FirstPregnancy: no Anemia: no Diabetes: no PreviousPrematureBirth: no Ultrasound: ? FirstPregnancy: no Anemia: no Diabetes: YES PreviousPrematureBirth: no Ultrasound: abnormal Elective C−Section: ? Emergency C−Section: ? ... ... Age: 23 Elective C−Section: no Emergency C−Section: ? ... Slide 5 Datamining Result Patient103 time=n Patient103 time=1 Age: 23 FirstPregnancy: no Anemia: no Diabetes: no PreviousPrematureBirth: no Ultrasound: ? Patient103 time=2 Age: 23 ... FirstPregnancy: no Anemia: no Diabetes: YES PreviousPrematureBirth: no Ultrasound: abnormal Elective C−Section: ? Emergency C−Section: ? ... Elective C−Section: no Emergency C−Section: Yes ... Age: 23 FirstPregnancy: no Anemia: no Diabetes: no PreviousPrematureBirth: no Ultrasound: ? Given: Elective C−Section: no Emergency C−Section: ? ... Patient103 time=n Age: 23 FirstPregnancy: no Anemia: no Diabetes: no PreviousPrematureBirth: no Ultrasound: ? Elective C−Section: no Emergency C−Section: Yes ... One of 18 learned rules: • 9714 patient records, each describing a pregnancy and birth If No previous vaginal delivery, and Abnormal 2nd Trimester Ultrasound, and Malpresentation at admission Then Probability of Emergency C-Section is 0.6 • Each patient record contains 215 features Learn to predict: • Classes of future patients at high risk for Emergency Cesarean Section COMP9417: March 1, 2011 Introduction to Machine Learning: Introduction to Machine Learning: Slide 6 Over training data: 26/41 = .63, Over test data: 12/20 = .60 COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 7 Credit Risk Analysis Customer103: (time=t0) Customer103: (time=t1) ... Other Prediction Problems Customer purchase behavior: Customer103: (time=tn) Years of credit: 9 Years of credit: 9 Years of credit: 9 Loan balance: $2,400 Loan balance: $3,250 Loan balance: $4,500 Income: $52k Income: ? Income: ? Customer103: (time=t0) Customer103: (time=t1) ... Customer103: (time=tn) Max billing cycles late: 3 Max billing cycles late: 4 Profitable customer?: ? Profitable customer?: ? Profitable customer?: No ... ... ... Rules learned from synthesized data: If Other-Delinquent-Accounts > 2, and Number-Delinquent-Billing-Cycles > 1 Then Profitable-Customer? = No [Deny Credit Card application] If Other-Delinquent-Accounts = 0, and (Income > $30k) OR (Years-of-Credit > 3) Then Profitable-Customer? = Yes [Accept Credit Card application] COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 8 Age: 53 Income: $50k Own House: Yes Own House: Yes MS Products: Word Max billing cycles late: 6 Sex: M Income: $50k Own House: Yes Other delinquent accts: 3 Age: 53 MS Products: Word Computer: Pentium MS Products: Word Computer: Pentium Purchase Excel?: ? Purchase Excel?: Yes ... Other delinquent accts: 2 Income: $50k Own House: Yes Other delinquent accts: 2 Age: 53 Purchase Excel?: ? Own House: Yes Sex: M Computer: 386 PC Own House: Yes Sex: M ... ... Customer retention: Customer103: (time=t0) (time=t0) Stage: mix Customer103: (time=tn) Sex: M Age: 53 Income: $50k Own House: Yes Own House: Yes Own House: Yes Checking: $5k Checking: $20k Checking: $0 Savings: $15k Savings: $0 Savings: $0 Current−customer?: yes ... Current−customer?: yes ... Current−customer?: No COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 9 Tasmanian Apple Thinning Product72: (time=t1) Stage: cook ... Product72: (time=tn) Apple orchards are important in primary production in Tasmania, and there has been a long history in the process of apple thinning. Apples are naturally biennial bearing,. Trees flower heavily one year producing a large crop of small fruit (called the ”On” year) followed by light flowering the next year with a small crop of large poor quality fruit. Stage: cool Mixing−speed: 60rpm Temperature: 325 Fan−speed: medium Viscosity: 1.3 Fat content: 15% Viscosity: 3.2 Fat content: 12% Viscosity: 1.3 Fat content: 12% Density: 2.8 ... Sex: M Age: 53 Income: $50k Process optimization: Product72: Customer103: (time=t1) Sex: M Age: 53 Income: $50k Density: 1.1 Density: 1.2 Spectral peak: 2800 Spectral peak: 3200 Spectral peak: 3100 Product underweight?: ?? Product underweight?: ?? Product underweight?: Yes ... ... ... Thinning is most economically done by applying sprays of chemicals that act similarly to plant hormones and cause the abortion of flowers and fruitlets at an early stage of development. Early thinning favours the development of the desirable high density of cells in the fruit. Orchardists – decision about concentration of thinning agent at blossom time. If concentration too low, then thinning is not effective and cost of hand thinning is prohibitive,. If the concentration too high, then risk of losing all the fruit. Decision is difficult because of large number of variables to be taken into account. COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 10 COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 11 • trees - cultivar, rootstock and age. • physiology - previous crop, vigour, number of blossom buds. • pruning - severity of detailed pruning, limb thinning, and penetration of light into the canopy. • market - size of fruit required for the market. • spraying - type of spray machinery and volume of water to be used in the machinery. BG Gas Drilling - “Stuck Pipe” 60 tasks, (some with 50 decision tree leaves (i.e. rule paths), plus 30 other variables and 40 procedures supported by a customized help file of 5,000 words. Drilling is a hugely expensive process, with daily costs for a North Sea operation typically incurring rig costs of around $50,000 per day. Clearly, anything that helps to reduce the time when a drilling rig is not productive has the potential to achieve huge savings. COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 12 COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 13 Nissan - Car selection Daily report data from two databases: One of which was old and included incomplete or absent data - particularly IADC (International Association of Drilling Contractors) codes. The other database was compiled more recently and included a large amount of additional data about well site geology, drilling costs, etc. Sixty recorded occurrences of Stuck Pipe in 170 BG wells. Possible to mine the data and to determine trends. Much of the time invested by the project team has concentrated on getting data in good order. Results indicate that length of time the hole has been open; the properties of the drilling mud; and the frequency with which the mud is conditioned all play a significant role in the incidence of Stuck Pipe. Starting from the basic choices of 3 alternative engines, 3 types of suspension, 2 types of transmission, 9 colours and 3 styles of seat fabric, customers can go far further and create a car to suit their own personality. COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 14 COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 15 With 670,000 possible combinations, “it is a totally new concept” says Takao Ohmura, Sales Manager of Tokyo Nissan Computer Systems. Channel 4 TV scheduling A guidebook explains the options in table form and we were able to input these tables into XpertRule. Normally it is difficult to utilise such a large matrix, but XpertRule was able to automatically generate a decision tree structure to arrive at the correct model, from attributes and values in the tables. During the day, Channel 4’s strength is the housewife market whilst in the evenings Channel 4’s strength lies in its varied targeting ability. In comparison with ITV, Channel 4 audiences contain a greater proportion of younger, lighter, up-market, male viewers (audience research has also identified Channel 4’s ability to target cluster groups defined by names such as “Progressive Priscillas” and “Free-thinking Franks”). It met our three major requirements: (1) the model selection and check must be completed in three minutes: (2) the ability to run on Nissan dealers hardware, and (3) ease of maintaining the system after the launch of the Cefiro model. COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 16 Advertisers may specify to have commercials placed first in the break, last in the break or “Top & Tail” in a break making break sequencing a challenge if optimal use of airtime is to be achieved. Definition of a knowledge-based system to solve the problem requires observation of a number of prioritised “rules”: Top of the list is the need for no overlaps or gaps, with Top and Tail or First and Last network spots also receiving high priority. Lower down the list are First and Last Super-macro spots and non-reporting Super-macros sequenced to play at the same time. COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 17 122 R.R.Saith et al. Decision tree for IVF Optimization problems – as the number of possible combinations grows it becomes impractical to try all combinations to arrive at a solution in a reasonable time. Rule of thumb can be used to narrow down options but, in most cases, good rules are not available or are difficult to capture. Numerical optimization techniques are currently available in most advanced spreadsheets, but these tend to be incapable of optimizing problems involving sequencing or scheduling and they are “exploitation” rather than “exploration” techniques. The solution involved the use of genetic algorithm techniques which allows the exploration of large search spaces for optimal or near optimal solutions. COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 18 COMP9417: March 1, 2011 Introduction to Machine Learning: nant analysis, wh features and transfe equations. A similar approa features recorded a be traced to each e from those of Nay technique that is no analysis method relationships in th rather than as math which pre-select recorded by embry Slide 19 122 R.R.Saith et al. nant analysis, which express relationships between the features and transfer outcome in the form of mathematical equations. A similar approach was followed in this study, in which features recorded at the oocyte and follicle stage that could be traced to each embryo were included. The study differs from those of Nayadu et al. in that it uses an analysis technique that is novel to IVF. The ‘class probability tree’ analysis method used here (described below) defines relationships in the form of easily understandable rules rather than as mathematical equations. Unlike other studies which pre-select features, all features (a total of 53) recorded by embryologists in the IVF treatment records of patients were included. Normal practice in the Oxford IVF Unit is to select and transfer a batch of three embryos from an average of seven available embryos, giving a take home baby rate of ~22%. Since it is not possible to identify which one or two of the three embryos resulted in pregnancy in the case of singleton or twin pregnancy, data relating to all three embryos of the transferred batch were included in this study. Data for embryo batches that resulted in a successful IVF treatment cycle (‘take home baby’) were compared with data for batches of embryos that resulted in a negative pregnancy test (‘no take home baby’) on transfer. Rules for IVF Problems Too Difficult to Program by Hand ALVINN [Pomerleau] drives 70 mph on highways ! Class probability trees and rules Figure 1. A hypothetical class probability tree and the extracted rules. Each class probability box in the tree contains cases with a similar pattern of features. The proportion of cases in the box belonging to either the ‘take home baby’ (TH) or ‘no take home baby’ (NTH) class is shown. Estimated class probabilities for prediction of a new case from its feature values are based on the values in these boxes. The estimates are expressed as percentages against each rule. The class label attached to the rule is that of the class that has a higher probability. COMP9417: March 1, 2011 Introduction to Machine Learning: Like traditional methods of multivariate statistics, the class probability tree method can analyse a large number of features simultaneously. Unlike traditional methods, which only capture linear relationships between features, complex inter-feature interactions are automatically taken into account. Results are expressed as rules and are easier to understand and apply than mathematical equations. Slide 20 The class probability tree analysis 2011 COMP9417: March 1, technique works by analysing data (here follicle, oocyte and embryo features) 1986; Steer et al., 1992) and then assess the relationship related to a sufficient number of cases (here batches of between these pre-defined embryo grades and embryos). The cases belong to different classes or groups development. Some have explored the independent contri[here ‘take home baby’ (TH) or the ‘no take home baby’ bution of either oocyte or follicle features to the outcome of (NTH) class] and the pattern of features characterizing transfer of associated embryos (Laufer et al., 1985; each class is discovered. The pattern–class relationships Cornwallis et al., 1990; Andersen, 1991; Smith et al., are initially expressed as trees which are then re-expressed 1991). A different approach has been adapted by Nayudu et as a set of easily understandable statements or rules. al. (1989, 1987) who simultaneously investigated features As can be seen in Figure 1 the tree starts as a root node, of the oocyte, follicle and embryo stages as well as with a set of cases (in this example, 100 batches of Straight Sharp Sharp maternal Ahead features (all in the same analysis). This is embryos, called the training set) that are to be used to Left Right important given that the development of the embryo cannot construct outcome-predicting rules. The cases are known be viewed in isolation from the influence of the stages to belong to mutually exclusive classes (here TH and 30 Output preceding its formation. The data were analysed using NTH). Each training case of a known class is described by Units statistical techniques of logistic regression and discrimiits feature values (in the study here, 53 features averaged Introduction to Machine Learning: Slide 21 Stanley - DARPA Grand Challenge Champion 2005 4 Hidden Units • won 2 million dollars (US), first team to complete 132 mile course 30x32 Sensor Input Retina • modified VW Touareg R5 with drive-by-wire, took 6 hours 54 minutes averaging over 19 mph • seven Pentium M computers, GPS and various sensors • localization, mapping and collision avoidance - Bayesian / statistical COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 22 COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 23 quences of these hypotheses are tested by experiment. The Robot Scientist follows this paradigm: we employ the logical inference mechanism of abduction10 to form new hypotheses, and that of deduction to test oftwareypotheses are consistent (see Methods and S which h that “adapts” to User Supplementar y Information). The system has been physically implemented and conducts biological assays with minimal human intervention after the robot is set up. The hardware platform consists of a liquid-handling robot with its control PC, a plate reader with its control PC, and a master PC to control the system and do the scientific reasoning. The software platform consists of background knowledge about the biological problem, a logical inference engine, hypothesis generation code (abduction), experiment selection code (deduction), and the Laboratory Information • Brin & Page - PhD students in databases / data mining at Stanford Management System (LIMS) code that integrates the whole system. • robot conducts (1998) The PageRank algorithm assays by pipetting and mixing liquids on microtitre plates. Given - technology targets advertisements or userse • Google business model a computed definition of one to mor expebased nts,their activity developed code that designs a layout of rime on we have reagents on the liquid-handling platform that will allow these COMP9417: March 1, 2011 experiments, with controls, to be performedIntroduction to MachineIn addition, efficiently. Learning: Slide 24 the liquid-handling robot is automatically programmed to plate out the yeast and media into the correct wells. The system measures the concentration of yeast in the wells of the microtitre trays using the adjacent plate reader and returns the results to the LIMS (although microtitre trays are still moved in and out of incubators manually). Scientific Discovery Figure 1 The Robot Scientist hypothesis-generation and experimentation loop. COMP9417: March 1, 2011 Slide 25 5 The Robot Scientist project (2004-current) 248 coding sequences (open reading frames; ORFs), enzymes and metabolites in a pathway12. All objects (ORFs, proteins and metabolites) and relationships (coding, reactions, transport and feedback) are described as logical formulae. The structure of the metabolic Measuring Neural Activity pathway is that of a directed graph, with metabolites as nodes and enzymes as arcs. An arc corresponds to a reaction. The compounds at each node are the set of all metabolites that can be synthesized by the reactions leading to it. Reactions are modelled as unidirectional transformations. Using this formalism, we have implemented a model of the AAA pathway with the logic programming language Prolog (the complete model is provided in Supplementary Information). Prolog makes it possible both to inspect the biological knowledge in the model directly and to compute the predictions of • Botros, the model automatically. The van Dijkinfers (deduces)-that a knock- adjustment model & Killian (2007) Cochlear implant • AutoNRT expert path can neural response the out mutant will grow if, and only if, asystems usebe found fromtelemetry (ECAP) input metabolites to•the three aromatic tree learning - Quinlan’s C5 and Cubist Decision/Regression amino acids. This allows the model to compute the phenotype of a particular knockout or to be used to infer missing reactions that could explain an observed COMP9417: March 1, 2011 Introduction phenotype (abduction). We consider that most hypothesis genera- to Machine Learning: tion in modern biology is abductive. What is inferred are not general hypotheses, which would be inductive, but specific facts about biological entities. The original bioinformatic information for the AAA model was Scientific Discovery taken mainly from the KEGG13 catalogue of metabolism. The model was then tested with all possible auxotrophic experiments involving a single replacementRobot scientist in the lab ed manually to fit the metabolite, and was alter empirical results. To ensure that the model was not ‘over-fitted’, we carried out all possible auxotrophic experiments with pairs of metabolites. The model correctly predicted at least 98.5% of the experiments (Supplementary Information). To the best of our knowledge, no bioinformatic model has been as thoroughly tested with knockout mutants. Machine learning is the branch of artificial intelligence that seeks to develop computer systems that improve their performance automatically with experience14,15. It has much in common with statistics, but differs in having a greater emphasis on algorithms, data representation and making acquired knowledge explicit. The Introduction to Machine Learning: Slide 26 NATURE | VOL 427 | 15 JANUARY 2004 | www.nature.com/nature COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 27 regression model then used the four factor profiles corresponding voxel locations (obtained from the data participants other than the one that was being analyzed) The voxels were selected similarly to the other machine protocols. For each of the four factor locations (a tot locations), five voxels with the highest product of the co with the corresponding factor profile times their stabi selected, for a total of 80 voxels. This selection proced performed separately for each of the 1,770 runs, leaving tw out at each iteration. To illustrate examples of the predictions, Figure 5 s presence of observed and predicted activation in the pa that makes it possible to measure how well a model can generate a prediction for an item (the neural representation of a particular noun) on which it has not been trained [1]. The success of any such generative approach demonstrates more than just a mathematical characterization of a phenomenon. The ability to extend prediction to new items provides an additional test of the theoretical account of the phenomenon. In brief, two words are left out of the training set at each fold (say, apartment and carrot in one of the folds), and a regression model is trained using the data from the remaining 58 words to determine the regression weights to be associated with each of the four factors. To make the prediction, the values of the independent Neuro semantics Neuro semantics Neurosemantic Theory Figure 1. Locations of the voxel clusters (spheres) associated with the four factors. The spheres (shown as surface projections) are centered at the cluster centroid, with a radius equal to the mean radial dispersion of the cluster voxels. doi:10.1371/journal.pone.0008622.g001 Just, M. et al. Just, M. et al. (2010) Figure 5. Observed and predicted images of apartment and carrot for one of the participants. A single coronal slice at MNI c y = 46 mm is shown. Dark and light and blue ellipses indicate L PPA and R Precuneus shelter factor locations respectively. Note that both the and predicted images of apartment have high activation levels in both locations. By contrast, both the observed and predicted images of ca low activation levels in these locations. (2010) doi:10.1371/journal.pone.0008622.g005 Table 3. Locations (MNI centroid coordinates) and sizes of the voxel clusters associated with the four factors. PLoS ONE | www.plosone.org Factor Cluster location x y z No. of voxels L Fusiform Gyrus/Parahippocampal Gyrus (PPA) 232 242 218 26 238 220 6 212 260 16 40 January 2010 | Volume 5 | Issue 1 6 26 12 Radius (mm) shelter R Fusiform Gyrus/Parahippocampal Gyrus (PPA) L Precuneus COMP9417: March 1, 2011 R Precuneus L Inf Temporal Gyrus manipulation 16 2 14 36 Introduction 54 Machine Learning: to 256 256 28 12 4 8 Slide 28 8 4 230 34 51 240 48 21 254 4 10 18 270 24 34 8 254 10 18 26 L Mid/Inf Frontal Gyri 248 28 18 10 6 L Inf Temporal Gyrus 252 262 214 7 4 L Occipital Pole 218 298 26 24 6 R Occipital Pole 16 294 0 47 10 L Lingual/Fusiform Gyri 228 268 212 20 8 R Lingual/Fusiform Gyri 30 276 214 14 Slide 29 6 246 L Inf Frontal Gyrus Introduction to Machine Learning: 12 L Inf Temporal Gyrus COMP9417: March 1, 2011 10 238 L Precentral Gyrus word length 260 L Postcentral/Supramarginal Gyri eating L Supramarginal Gyrus 6 8 doi:10.1371/journal.pone.0008622.t003 Learning to win at Jeopardy PLoS ONE | www.plosone.org 7 Where Is this Headed ? January 2010 | Volume 5 | Issue 1 | e8622 IBM’s “Watson” question-answering machine Mature algorithms • a goal of Artificial Intelligence for more than 4 decades • decision trees, regression, neural nets, Bayesian methods ... • many previous attempts failed • can be applied to standard database relations or flat files • hard problems: natural-language understanding, strategy, . . . • established software and services industry • Watson uses machine learning on question-answer examples • in Jan 2011, beat two human former champions at Jeopardy • potential applications in medicine and other knowledge-based areas COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 30 COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 31 Relevant Disciplines Where Is this Headed ? Opportunity for tomorrow: enormous impact • Artificial intelligence • Computational complexity theory • Learn across full mixed-media data • Learn across multiple internal databases, plus the web and newsfeeds • Learn by active experimentation • Statistics • Information theory • Bayesian methods • Learn more complex functions • Control theory • Learn by analogy • Philosophy • Cumulative, lifelong learning and adaptation • Psychology and neurobiology • Programming languages and systems with learning embedded ? • Physics • ... COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 32 COMP9417: March 1, 2011 A definition of the learning problem Introduction to Machine Learning: Slide 33 Learning to Play Checkers Learning = improving with experience at some task • T : Play checkers • P : Percent of games won in world tournament • Improve over task T , • with respect to performance measure P , • What experience? • based on experience E . • What exactly should be learned? E.g., Learn to play checkers (draughts) • How shall it be represented? • What specific algorithm to learn it? • T : Play checkers • P : % of games won in world tournament • E : opportunity to play against self COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 34 COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 35 Type of Training Experience Choose the Target Function • Direct or indirect? • ChooseM ove : Board → M ove ?? • Teacher or not? • V : Board → ￿ ?? A problem: is training experience representative of performance goal? • ... COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 36 COMP9417: March 1, 2011 Possible Definition for Target Function V Introduction to Machine Learning: Slide 37 Choose Representation for Target Function • if b is a final board state that is won, then V (b) = 100 • collection of rules? • if b is a final board state that is lost, then V (b) = −100 • neural network ? • if b is a final board state that is drawn, then V (b) = 0 • polynomial function of board features? • if b is a not a final state in the game, then V (b) = V (b￿), where b￿ is the best final board state that can be achieved starting from b and playing optimally until the end of the game. • ... This gives correct values, but is not operational COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 38 COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 39 A Representation for Learned Function Obtaining Training Examples • V (b): the true target function w0 + w1 · bp(b) + w2 · rp(b) + w3 · bk (b) + w4 · rk (b) + w5 · bt(b) + w6 · rt(b) ˆ • V (b) : the learned function • Vtrain(b): the training value • bp(b): number of black pieces on board b • rp(b): number of red pieces on b One rule for estimating training values: • bk (b): number of black kings on b ˆ • Vtrain(b) ← V (Successor(b)) • rk (b): number of red kings on b • bt(b): number of red pieces threatened by black (i.e., which can be taken on black’s next turn) • rt(b): number of black pieces threatened by red COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 40 COMP9417: March 1, 2011 Introduction to Machine Learning: Choose Weight Tuning Rule Slide 41 Design Choices Determine Type of Training Experience LMS Weight update rule: Games against experts Games against self Do repeatedly: ... Table of correct moves Determine Target Function • Select a training example b at random Board ¨ move 1. Compute error(b): ... Board ¨ value Determine Representation of Learned Function ˆ error(b) = Vtrain(b) − V (b) Polynomial 2. For each board feature fi, update weight wi: ... Linear function of six features Artificial neural network Determine Learning Algorithm wi ← wi + c · fi · error(b) Gradient descent c is some small constant, say 0.1, to moderate the rate of learning COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 42 Linear programming ... Completed Design COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 43 Some Issues in Machine Learning • What algorithms can approximate functions well (and when)? • How does number of training examples influence accuracy? • How does complexity of hypothesis representation impact it? • How does noisy data influence accuracy? • What are the theoretical limits of learnability? • How can prior knowledge of learner help? • What clues can we get from biological learning systems? • How can systems alter their own representations? COMP9417: March 1, 2011 Introduction to Machine Learning: Slide 44 ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online