Tutorial11-solutions

Course: COMP 3702, Fall 2009
School: Allan Hancock College
11: Tutorial Neural networks Question 1 3 inputs, 1 bias, 1 output, logistic (sigmoid) or threshold activation function. Pattern -&gt; Calculation -&gt; Function -&gt; Output 1 -&gt; u = 1*0 + -2*-1 + 2*.5 - .5 = 2.5 -&gt; threshold -&gt; output = 1 2 -&gt; u = 2.5 -&gt; logistic -&gt; output = 1/(1+exp(-2.5)) ~= 0.9 3 -&gt; u = 0*4 + 1*2 + 2*-1.5 - 1.2 = -2.2 -&gt;...

11: Tutorial Neural networks Question 1 3 inputs, 1 bias, 1 output, logistic (sigmoid) or threshold activation function. Pattern -> Calculation -> Function -> Output 1 -> u = 1*0 + -2*-1 + 2*.5 - .5 = 2.5 -> threshold -> output = 1 2 -> u = 2.5 -> logistic -> output = 1/(1+exp(-2.5)) ~= 0.9 3 -> u = 0*4 + 1*2 + 2*-1.5 - 1.2 = -2.2 -> threshold -> output = 0 4 -> u = -2.2 -> logistic -> output = 1/(1+exp(2.2)) ~= 0.1 5 -> u = 1*1 + 1*-2 + -2*-1 = 1 -> logistic -> output = 1/(1+exp(-1)) ~= 0.7 6 -> u = 1*1 + 1*-2 + -1*-1 = 0 -> logistic -> output = 1/(1+exp(-0)) = 0.5 7 -> u = 1*1 + 1*-2 + -1*-1 + 1 = 1 -> logistic -> output = 1/(1+exp(-1)) ~= 0.7 Question 2 (example answers) a) W1 = 2, w2 = 2, w0 = b = -3 b) a^b^!c (! = NOT, ^ = AND) Use a single threshold unit with 3 inputs: a, b and c. The given function can be modelled (for 0/1 binary data anyway) by using the following set of weights: 1, 1 and -1 on a, b, and c inputs respectively, with a -1.5 bias weight. a b c w1=1 w2=1 w3=-1 f=threshold y b=-1.5 Question 3 a) X1=[0 0]; hidden unit activation values ui (i=1,2) u1 = 0*-4 + 0*4 +2 = 2 ; h1 = 1/(1+exp(-u1)) ~= 0.88 u2 = 0*-5 + 0*5 -3 = -3 ; h2 = 1/(1+exp(-u2)) ~= 0.047 output unit activation value z z = 0.88*-4 + 0.047*3 +2 = -1.38 ; o (output) = 1/(1+exp(-z)) ~= 0.20 This is <0.5 , so counts as class 0. X2=[0 1] u1 = 0*-4 + 1*4 +2 = 6 ; h1 = 1/(1+exp(-u1)) ~= 0.998 u2 = 0*-5 + 1*5 -3 = 2 ; h2 = 1/(1+exp(-u2)) ~= 0.88 z = 0.998*-4 + 0.88*3 +2 = 0.648; o (output) = 1/(1+exp(-z)) ~= 0.657 This is >0.5 , so counts as class 1. X3=[1 0] u1 = 1*-4 + 0*4 +2 = -2 ; h1 = 1/(1+exp(-u1)) ~= 0.12 u2 = 1*-5 + 0*5 -3 = -8 ; h2 = 1/(1+exp(-u2)) ~= 0.00034 z ~= 0.12*-4 +2 = 1.52 ; o (output) = 1/(1+exp(-z)) ~= 0.821 This is >0.5 , so counts as class 1. X4=[1 1] u1 = 1*-4 + 1*4 +2 = 2 u2 = 1*-5 + 1*5 -3 = -3 -> same as X1 z = 0.88*-4 + 0.047*3 +2 = -1.38 ; o (output) = 1/(1+exp(-z)) ~= 0.20 This is <0.5 , so counts as class 0. b) Decision boundary is at: o (output) = 1/(1+exp(-z)) = 0.5 -> multiply both sides by 2(1+exp(-z)) 2 = 1 + exp(-z) -> exp(-z)=1 -> take logs of both sides (base e) -z = 0 -> z=0 z = -4h1+3h2+2 = 0 -> h2 = 4/3 h1 - 2/3. I'll leave it to you to plot this straight line. Given that h1 and h2 are constrained to fall in the range [0,1], there is no point in going beyond this on this plot. Re: translating this back into the input space. Too hard. One could assume that h1=0.5 was meaningful and translate that back a into decision boundary in the x1,x2 space (similarly for h2). But it really isn't meaningful - the only decision is at the output. The following is too much for this course (so feel free to ignore if studying for the exam), but some idea of how to get an output-based decision boundary right back to the inputs (through the hidden layer) follows: -4h1+3h2+2=0 (a straight line) (eqn 1). h1=g(-4x1 + 4x2 + 2) ; h2=g(-5x1+5x2-3) . Let theta = x2-x1 in the above eqn, then put the h1 and h2 terms into eqn 1. Call the result eqn 2. Since g(x) is a sigmoid = 1/(1+e{-x}) , there are a couple of terms in the denominators of fractions. Multiply eqn 2 by each of these denominators and I get a few terms - the most complex involve e^{theta}. I let phi = e^{theta}, and the decision boundary equation reduced to the following: 1+5e2 phi^4 + 2 e^{-3}phi^5 + 2e^{-1}phi^9 = 0. If one solved this numerically, I imagine it would produce a range of values for phi. If one solution was phi1, then we could phrase that as an equation in x1 and x2 . As far as I can tell, it would just result in a straight line of the form x2 = x1+log(phi1) . I'm not sure which sides of the boundary would be which class. Given the multiple solutions, there would probably be a few of these lines though, making the class structure quite complex. Note: this network was unusually easy (!) to analyse because the input weights were similar for h1 and h2. In general, you would get harder equations to solve and probably no straight lines. Question 4 Classification: ...

