|
Mila 0.13.48
Deep Neural Network Library
|
Configuration interface for the Grouped-Query Attention component. More...
#include <stdexcept>#include <string>#include <utility>#include <sstream>import Serialization.Metadata;import Dnn.ComponentConfig;import Dnn.TensorTypes;import Dnn.Component;Classes | |
| class | Mila::Dnn::GqaConfig |
| Configuration class for the Grouped-Query Attention module. More... | |
Namespaces | |
| namespace | Mila |
| Mila main API namespace. | |
| namespace | Mila::Dnn |
Configuration interface for the Grouped-Query Attention component.
Grouped-Query Attention (GQA) extends Multi-Head Attention by decoupling the number of Q heads from the number of K/V heads. Each K/V head is shared by a contiguous group of Q heads, reducing KV cache size and memory bandwidth during inference proportionally to (num_heads / num_kv_heads).
Special cases: num_kv_heads == num_heads → standard Multi-Head Attention num_kv_heads == 1 → Multi-Query Attention (MQA)