Multilayer Perceptrons

2019/07/01

Perceptron

感知机,只能表示线性可分的,所以不可表示 XOR

MultiLayer Perceptron

可以表示任何的 Decision Boundary,把各种 Decision Boundary 进行叠加就可以。

Activation Function

sigmoid

0~1, probability, Relative smaller gradient

Hypoblic Tangent

-1~1, zero centered, Relatively larger saturation region

Rectified Linear Unit (ReLU)

Time-efficient and faster convergence, but neurons are prone to death.

指数函数 exp

容易溢出,需要做处理:

Entropy

Entropy

Amount of bits to encode information (uncertainty) in q

Relative Entropy

Amount of extra bits to encode information in p given q:

Cross Entropy

Cross Entropy = Relative Entropy + Entropy

Cross-entropy loss: cost function:

Backpropagation

Automatically Differentiation

Practical Training Strategies

Stochastic Gradient Descent

SGD with momentum

Learning rate decay

Though some algorithms can adjust learning rate adaptively, a good choice of learning rate ! could result in better performance. To make network converge stably and quickly, we could set learning rate that decays over time

Exponential decay strategy

1/t decay strategy

Weight decay

L1 Regularization

L2 Regularization

Dropout

Weight Initialization

Proper initialization avoids reducing or magnifying the magnitudes of signals exponentially

Babysitting Learning

Overfit on a small data

  1. Training error fluctuating? Decrease the learning rate
  2. Some of the units saturated? Scale down the initialization. Properly normalize the inputs

Expressiveness of DL

Shallow MLP

Universal: Two-layer MLP can represent any boundary (Hornik, 1991)

Two-layer MLP requires exponentially large number of units KN->无穷

Space Folding

是讲如何形象地把低维空间非线性映射到高维空间中线性可分

Post Directory