Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
GqaState.ixx File Reference

Non-owning transient scratch state for CudaGqaOp inference paths. More...

import Dnn.ITensor;

Classes

struct  Mila::Dnn::Compute::GqaState
 Non-owning pointers to shared transient GQA scratch buffers. More...

Namespaces

namespace  Mila
 Mila main API namespace.
namespace  Mila::Dnn
namespace  Mila::Dnn::Compute

Detailed Description

Non-owning transient scratch state for CudaGqaOp inference paths.

Tensors are owned by the caller (LlamaTransformer) and shared across all GQA layers. Each layer is called sequentially, so a single allocation suffices.