vanishing gradient

in deep neural networks, after propagating through several layers, the gradient becomes very small, fact that implies a slow, or a stopped learning. this phenomenon is called the vanishing gradient problem. (OVidiu Calin, 2020) it is recommended to use relu over sigmoid (or other sigmoidal functions) as an activation functions for deep feedforward neural networks to reduce the risk of a vanishing gradient.

gradients could also become too large as a result of subsequent multiplications, this problem is called the exploding gradient problem.