Swish Activation¶
swish_actv_type
swish_actv_type(
scale=1.0,
beta=1.0,
attributes=...
)
The Swish activation function is a self-gated activation function. TThis activation function is also known as SiLU (Sigmoid Linear Unit).
\[f(x) = s x \cdot \sigma(\beta x) = s \frac{x}{1 + e^{-\beta x}}\]
where \(s\) is a scaling factor (default 1.0), \(\sigma\) is the sigmoid function and \(\beta\) is a parameter that controls the “steepness” of the activation.
By default, \(\beta = 1.0\).
Swish has been shown to work better than ReLU on deeper models across a variety of challenging datasets. It is smooth and non-monotonic.
Arguments¶
scale (real): Scaling factor for the output. Default:
1.0.beta (real): Parameter that controls the “steepness” of the activation. Default:
1.0.attributes (array): Optional ONNX attributes.
Shape:¶
Input: Any shape.
Output: Same shape as input.