|
Mila 0.13.48
Deep Neural Network Library
|
Non-owning transient scratch state for CudaGqaOp inference paths. More...
import Dnn.ITensor;Classes | |
| struct | Mila::Dnn::Compute::GqaState |
| Non-owning pointers to shared transient GQA scratch buffers. More... | |
Namespaces | |
| namespace | Mila |
| Mila main API namespace. | |
| namespace | Mila::Dnn |
| namespace | Mila::Dnn::Compute |
Non-owning transient scratch state for CudaGqaOp inference paths.
Tensors are owned by the caller (LlamaTransformer) and shared across all GQA layers. Each layer is called sequentially, so a single allocation suffices.