# Multiplicative [Attention](Attention.md)
- 
- $f_{att}(h_{i}, s_{j}) = h_{i}^{T}W_{a}s_{j}$
- Since [Additive Attention](Additive%20Attention.md) performs better for scale, use a factor [Scaled Dot Product Attention](Scaled%20Dot%20Product%20Attention.md)