3。5。4 MNIST Handwritten Digits 

The MNIST dataset has pixel values in the range [0,255]。 We thus start with simple rescaling to shift the data into the range [0,1]。 In practice, removing the mean-value per example can also help feature learning。 Note: While one could also elect to use PCA/ZCA whitening on MNIST if desired, this is not often done in practice。 

Chapter Four Deep Networks

4。1 Overview 

In the previous sections, you constructed a 3-layer neural network comprising an input, hidden and output layer。 While fairly effective for MNIST, this 3-layer model is a fairly shallow network; by this, we mean that the features (hidden layer activations a(2)) are computed using only "one layer" of computation (the hidden layer)。 

In this section, we begin to discuss deep neural networks, meaning ones in which we have multiple hidden layers; this will allow us to compute much more complex features of the input。 Because each hidden layer computes a non-linear transformation of the previous layer, a deep network can have significantly greater representational power (i。e。, can learn significantly more complex functions) than a shallow one。 

Note that when training a deep network, it is important to use a non-linear activation function f(·) in each hidden layer。 This is because multiple layers of linear functions would itself compute only a linear function of the input (i。e。, composing multiple linear functions together results in just another linear function), and thus be no more expressive than using just a single layer of hidden units。 

4。2 Advantages of deep networks 

Why do we want to use a deep network? The primary advantage is that it can compactly represent a significantly larger set of functions than shallow networks。 Formally, one can show that there are functions which a k-layer network can represent compactly (with a number of hidden units that is polynomial in the number of inputs), that a (k−1)-layer network cannot represent unless it has an exponentially large number of hidden units。 

To take a simple example, consider building a Boolean circuit/network to compute the parity (or XOR) of n input bits。 Suppose each node in the network can compute either the logical OR of its inputs (or the OR of the negation of the inputs), or compute the logical AND。 If we have a network with only one input, one hidden, and one output layer, the parity function would require a number of nodes that is exponential in the input size n。 If however we are allowed a deeper network, then the network/circuit size can be only polynomial in n。 

By using a deep network, in the case of images, one can also start to learn part-whole decompositions。 For example, the first layer might learn to group together pixels in an image in order to detect edges (as seen in the earlier exercises)。 The second layer might then group together edges to detect longer contours, or perhaps detect simple "parts of objects。" An even deeper layer might then group together these contours or detect even more complex features。 

Finally, cortical computations (in the brain) also have multiple layers of processing。 For example, visual images are processed in multiple stages by the brain, by cortical area "V1", followed by cortical area "V2" (a different part of the brain), and so on。 

4。3 Difficulty of training deep architectures 

While the theoretical benefits of deep networks in terms of their compactness and expressive power have been appreciated for many decades, until recently researchers had little success training deep architectures。 

The main learning algorithm that researchers were using was to randomly initialize the weights of a deep network, and then train it using a labeled training set  using a supervised learning objective, for example by applying gradient descent to try to drive down the training error。 However, this usually did not work well。 There were several reasons for this。 

上一篇:轨道转化砂带的砂光机英文文献和中文翻译
下一篇:船舶建造规格书英文文献和中文翻译

数控机床制造过程的碳排...

新的数控车床加工机制英文文献和中文翻译

抗震性能的无粘结后张法...

锈蚀钢筋的力学性能英文文献和中文翻译

未加筋的低屈服点钢板剪...

台湾绿色B建筑节水措施英文文献和中文翻译

汽车内燃机连杆载荷和应...

ASP.net+sqlserver企业设备管理系统设计与开发

麦秸秆还田和沼液灌溉对...

安康汉江网讯

LiMn1-xFexPO4正极材料合成及充放电性能研究

张洁小说《无字》中的女性意识

我国风险投资的发展现状问题及对策分析

老年2型糖尿病患者运动疗...

新課改下小學语文洧效阅...

互联网教育”变革路径研究进展【7972字】

网络语言“XX体”研究