## Series Introduction

This is the 4th article of creating .NET library, here are links for Part 1 & 2:

Build Simple AI .NET Library - Part 1 - Basics First

Build Simple AI .NET Library - Part 2 - Machine Learning Introduction

Build Simple AI .NET Library - Part 3 - Perceptron

My objective is to create a simple AI library that covers couple of advanced AI topics such as Genetic algorithms, ANN, Fuzzy logics and other evolutionary algorithms. The only challenge to complete this series would be having enough time working on code and articles.

Having the code itself might not be the main target however, understanding these algorithms is. wish it will be useful to someone someday.

## Article Introduction - Part 2 "Beyond Perceptron"

At last article, we have created Perceptron that acts as Binary Linear Classifier, will continue discussion about Perceptron to create more complicated layout for more complicated problems.

Strongly advise to review Build Simple AI .NET Library - Part 2 - Machine Learning Introduction before moving any further in this article.

## More Perceptron Examples

As mentioned, Perceptron is simplest processing element of ANN, yet it is still powerful algorithm however it is very limited. Remember, it's mainly used only as binary linear classifier.

What about other complicated classifications problems, what about binary but non-linear classifications. will see through couple of examples how to develop further complex layouts of Perceptron.

To be all in same page; let's verify some definitions:

`Binary Classifier`

- is a problem where output is having only 2 possible answers, classifications or groups. However, classification can be linear or non-linear `Linear Classifier`

- If inputs are linearly separated. You can draw straight line to separate both groups

`Non-linear Classifier `

- Incase classification is not possible via straight line

`Problem Dimensions `

- Inputs matrix (Vector) can be considered as features of problem being optimized, for last example of article 3, we had 2 inputs, one is X and other one is Y which are coordinates of each point. other way to view inputs is by considering number of inputs as dimensions of the problem. So 4D problem means it has 4 different features (AI can resolve problems of higher dimensions that are impossible to visualize by human brain)

## Using Perceptron to Optimize Binary Functions

To better understand Perceptron and its limitation, will check its use in optimization of binary functions as NOT, OR AND & XOR

**NOT Function**

So this is 1-Diemsional problem. Let's design Perceptron as following

h(x) = W_{0} + W_{1} * X_{1}

As output is 0 or 1, Step Activation function is a good choice

Then Y=StepFunction (h(x))

From NOT truth table above, output Y is 1 when X is 0 so h(x) shall be >= 0 if X=0

h(x) = W_{0} + W_{1} * X_{1 }>= 0 When X=0

W_{0} >= 0 for X =0 let's select **W**_{0} =1

h(x) = 1 + W_{1} * X_{1}

Now, second possible value of Y is 0 for X = 1

h(x) < 0 for X =1

1 + W_{1} * X_{1 }< 0 for X =1

1 + W_{1 }< 0 for X=1

W_{1 }< -1 so let's select **W**_{1 }= -1.5

Finally, **h(x) = 1 - 1.5 * X**

**OR Function**

This is 2-dimenstional problem, let's plot X1 & X2

These are linearly separated groups as straight line can be drawn to separate both groups as the following one

Again, will use Step activation function for this Perceptron

h(x) = W_{0} + W_{1} * X_{1 }+ W_{2} * X_{2}

Y= Step(h(x)

From truth table, we know that Y=o for X1=X2=0 which means

h(x) < 0 for X1=X2=0

W0 < 0 for X1=X2=0 - Let's select **W0 as -0.5**

h(x) = -0.5 + W_{1} * X_{1 }+ W_{2} * X_{2}

Selecting one line from the graph that intercepts with X1 at 0.5 and X2 at 0.5 (other lines can work as well as separators)

From Truth table, Y=1 for X1=1 and X2 =0 then h(x) >= 0 for X1=1 and X2 =0

-0.5+ W_{1} * X_{1 }+ W_{2} * X_{2 }>= 0 for X1=1 and X2 =0

-0.5+ W_{1} * 1_{ }+ W_{2} * 0_{ }>= 0 for X1=1 and X2 =0

-0.5+ W_{1} _{ }>= 0 for X1=1 and X2 =0

W_{1} _{ }>= 0.5 for X1=1 and X2 =0 let's select **W**_{1} _{ }= 1

h(x) = -0.5 + 1 * X_{1 }+ W_{2} * X_{2}

Also, Y=1 for X1=0 and X2 =1 then h(x) >= 0 for X1=0 and X2 =1

-0.5 + 1 * X_{1 }+ W_{2} * X_{2 }>= 0 for X1=0 and X2 =1

-0.5 + W_{2} _{ }>= 0 for X1=0 and X2 =1

W_{2} _{ }>= 0.5 for X1=0 and X2 =1 let's select **W2 **_{ }= 1

Finally `h(x) = -0.5 + X1 + X2`

Let's confirm truth table:

X1 | X2 | Desired | h(x) = -0.5 + X_{1 }+ X_{2} | Y |

1 | 1 | 1 | 1.5 | 1 |

1 | 0 | 1 | 0.5 | 1 |

0 | 1 | 1 | 0.5 | 1 |

0 | 0 | 0 | -0.5 | 0 |

**AND Function**

Similarly, it is 2D problem and Perceptron shall be

by following same above OR Procedure we may conclude values for W0, W1 & W2

one possible combination is `h(x) = -1.5 + X1 + X2`

To verify truth table

X1 | X2 | Desired | h(x) = -1.5 + X_{1 }+ X_{2} | Y |

1 | 1 | 1 | 0.5 | 1 |

1 | 0 | 0 | -0.5 | 0 |

0 | 1 | 0 | -0.5 | 0 |

0 | 0 | 0 | -1.5 | 0 |

So, final Perceptron shall be

**XOR Function**

This is a problem, this function can not be linearly separated; there is no single line can separate the 2 groups.

Then Perceptron can not resolve this problem and this is the main and major limitation of Perceptron (only binary linear classifications)

Yet Perceptron is powerful algorithm and can be used maybe in other formations to optimize complicated problems.

Let's be back to XOR function and try to understand this function more. will use `Venn diagrams`

to help on that. Venn diagrams are graphical representation of different logical operations (here is more about Venn diagrams)

Venn diagram for OR gate shall be

and here is for AND

Here is XOR

From Venn diagrams we may extract the meaning of XOR gate as the result of UNION (OR) excluding INTERSECTION area in other words

A XOR B = (A + B) - (A.B)

We already used Perceptron to implement AND & OR functions above, so why we do not use more than one Perceptron to implement above function. One possible implementation could be

**AND function**

Already 2D AND function is implemented and we may use the same

**OR Function**

We did not implement 3D OR function. To do so, let's start by simplifying XOR function truth table

X1 | X2 | X1 AND X2 | Desired |

1 | 1 | 1 | 0 |

1 | 0 | 0 | 1 |

0 | 1 | 0 | 1 |

0 | 0 | 0 | 0 |

So we need to find weights of h(x) of OR function Perceptron that fulfill above table where

h(x) = W_{0} +W_{1} * X_{1} + W_{2} * X_{2} + W_{3} * X_{3 } (X3 = X1 AND X2)

Activation function will be Step as well.

**Let's start with last combination of X1=0, X2=0 & X1 AND X2 = 0 then Y = 0**

h(x) = W_{0} +W_{1} * X_{1} + W_{2} * X_{2} + W_{3} * X_{3 }<0 for X1=0, X2=0 & X1 AND X2 = 0

W_{0} <0 for X1=0, X2=0 & X1 AND X2 = 0 Let's select **W**_{0} = -1

**For combination of X1=1, X2=0 & X1 AND X2 = 0 then Y = 1**

h(x) = -1 +W_{1} * X_{1} + W_{2} * X_{2} + W_{3} * X_{3 }>= 0 for X1=1, X2=0 & X1 AND X2 = 0

-1 +W_{1} _{ } >= 0 for X1=1, X2=0 & X1 AND X2 = 0

W_{1} _{ } >= 1 Let's select **W**_{1} = 2

**For combination of X1=0, X2=1 & X1 AND X2 = 0 then Y = 1**

h(x) = -1 + 2 * X_{1} + W_{2} * X_{2} + W_{3} * X_{3 }>= 0 for X1=0, X2=1 & X1 AND X2 = 0

-1 +W_{2} _{ } >= 0 for X1=0, X2=1 & X1 AND X2 = 0

W_{1} _{ } >= 1 Let's select **W**_{2} = 2

**For combination of X1=1, X2=1 & X1 AND X2 = 1 then Y = 0**

h(x) = -1 + 2 * X_{1} + 2 * X_{2} + W_{3} * X_{3 }< 0 for X1=1, X2=1 & X1 AND X2 = 1

-1 + 2 + 2 +W_{3} _{ } < 0 for X1=1, X2=1 & X1 AND X2 = 1

3 +W_{3} _{ } < 0 for X1=1, X2=1 & X1 AND X2 = 1

W_{3} _{ } < -3 for X1=1, X2=1 & X1 AND X2 = 1 Let's select **W**_{3} = -4

**Final **`h(x) =-1 +2 * X1 + 2 * X2 - 4 * X3 (X3 = X1 AND X2)`

Final **Perceptron Network **shall be

Well, let's try to reformulate the graphical representation of above layout. each Perceptron shall be denoted by its function.

Instead of having 1 AND Perceptron, let's add 1 Perceptron that generates X1 and other one to generate X2:

Now, let's add dummy Perceptrons to receive inputs and just pass it to next level of Perceptrons

Clearly above is better representation, and it's called **MLP (Multi-Layer Perceptron Network)**. This is exactly the common layout for ANN (Artificial Neural Network)

Inputs are being received by set of Perceptrons equal to number of inputs, this is called `Input Layer`

Output is being generated by Perceptron, where 1 Perceptron per each output. This is called `Output Layer`

Processing Perceptrons in the middle between input layer to output layer is called `Hidden Layer`

Each ANN can have only 1 Input Layer and 1 Output Layer but could have one or multiple hidden layers. Number of hidden layers is based on complexity of problem being optimized.

We have demonstrated that by adding 1 Perceptron in addition to output Perceptron could add additional power to the network

A lot of algorithms are there for ANN training and ANN itself has many types as well. will discuss the most common types and algorithms in next articles

However, it is all start from Perceptron concept and build up on it hence, it was important to get as much details as possible about Perceptron although the examples might not seem to be complicated.