This preview shows page 231 - 233 out of 614 pages.
to train convolutional neural networks by backpropagation. Their model achieved outstanding results at the time (onlymatched by Support Vector Machines at the time) and was adopted to recognize digits for processing deposits in ATMmachines. Some ATMs still runn the code that Yann and his colleague Leon Bottou wrote in the 1990s!8.6.1 LeNetIn a rough sense, we can think LeNet as consisting of two parts: (i) a block of convolutional layers; and (ii) a block offully-connected layers. Before getting into the weeds, let’s brieﬂy review the model inFig. 8.6.1.Fig. 8.6.1: Data ﬂow in LeNet 5. The input is a handwritten digit, the output a probabilitiy over 10 possible outcomes.The basic units in the convolutional block are a convolutional layer and a subsequent average pooling layer (note thatmax-pooling works better, but it had not been invented in the 90s yet). The convolutional layer is used to recognize thespatial patterns in the image, such as lines and the parts of objects, and the subsequent average pooling layer is used to1191208.6. Convolutional Neural Networks (LeNet)225
Dive into Deep Learning, Release 0.7reduce the dimensionality. The convolutional layer block is composed of repeated stacks of these two basic units. Eachconvolutional layer uses a5×5kernel and processes each output with a sigmoid activation function (again, note thatReLUs are now known to work more reliably, but had not been invented yet). The first convolutional layer has 6 outputchannels, and second convolutional layer increases channel depth further to 16.However, coinciding with this increase in the number of channels, the height and width are shrunk considerably. There-fore, increasing the number of output channels makes the parameter sizes of the two convolutional layers similar. Thetwo average pooling layers are of size2×2and take stride 2 (note that this means they are non-overlapping). In otherwords, the pooling layer downsamples the representation to be preciselyone quarterthe pre-pooling size.The convolutional block emits an output with size given by (batch size, channel, height, width). Before we can passthe convolutional block’s output to the fully-connected block, we must ﬂatten each example in the mini-batch. In otherwords, we take this 4D input and tansform it into the 2D input expected by fully-connected layers: as a reminder, the firstdimension indexes the examples in the mini-batch and the second gives the ﬂat vector representation of each example.LeNet’s fully-connected layer block has three fully-connected layers, with 120, 84, and 10 outputs, respectively. Becausewe are still performing classification, the 10 dimensional output layer corresponds to the number of possible output classes.While getting to the point where you truly understand what’s going on inside LeNet may have taken a bit of work, you cansee below that implementing it in a modern deep learning library is remarkably simple. Again, we’ll rely on the Sequentialclass.