Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
GroupedQueryAttention.Config.ixx File Reference

Configuration interface for the Grouped-Query Attention component. More...

#include <stdexcept>
#include <string>
#include <utility>
#include <sstream>
import Serialization.Metadata;
import Dnn.ComponentConfig;
import Dnn.TensorTypes;
import Dnn.Component;

Classes

class  Mila::Dnn::GqaConfig
 Configuration class for the Grouped-Query Attention module. More...

Namespaces

namespace  Mila
 Mila main API namespace.
namespace  Mila::Dnn

Detailed Description

Configuration interface for the Grouped-Query Attention component.

Grouped-Query Attention (GQA) extends Multi-Head Attention by decoupling the number of Q heads from the number of K/V heads. Each K/V head is shared by a contiguous group of Q heads, reducing KV cache size and memory bandwidth during inference proportionally to (num_heads / num_kv_heads).

Special cases: num_kv_heads == num_heads → standard Multi-Head Attention num_kv_heads == 1 → Multi-Query Attention (MQA)