Computational graph (no code involved) This question aims at checking
your understanding on defining arbitrary network architectures and compute any derivative involved for optimization. Consider a neural network with N input units, N output units, and K hidden units. The activations are computed as follows: where σ denotes the logistic function, applied elementwise. The cost involves a squared difference with the target s (with a 0.5 factor) and a regularization term that accounts for the dot product with respect to an external vector r. More concretely: a) Draw the computation graph relating x, z, h, y, , , and . b) Derive the backpropagation equations for computing ∂ /∂ W . To make things simpler, you (1) may use σ' to denote the derivative of the logistic function.