Capability interface for KV-cache state management. More...

Inheritance diagram for Mila::Dnn::Compute::IKvCacheLifecycle:

[legend]

Public Member Functions
virtual	~IKvCacheLifecycle ()=default
virtual void	initializeKvCache (int batch_size, int max_sequence_length)=0
	Allocate the KV cache for a given batch size and maximum sequence length.
virtual void	resetKvCache ()=0
	Reset the KV cache to an empty state, preserving the allocation.

Detailed Description

Capability interface for KV-cache state management.

Implemented by attention operations (GQA, MHA) that allocate and maintain key/value caches across autoregressive decode steps. This concern is orthogonal to positional dispatch — an operation may implement both IPositionalUnaryOp and IKVCacheLifecycle.

Constructor & Destructor Documentation

◆ ~IKvCacheLifecycle()

virtual Mila::Dnn::Compute::IKvCacheLifecycle::~IKvCacheLifecycle ( )

virtualdefault

Member Function Documentation

◆ initializeKvCache()

virtual void Mila::Dnn::Compute::IKvCacheLifecycle::initializeKvCache	(	int	batch_size,
		int	max_sequence_length )

pure virtual

Allocate the KV cache for a given batch size and maximum sequence length.

Parameters

batch_size	Number of sequences in the batch.
max_sequence_length	Maximum number of tokens the cache must hold.

Implemented in Mila::Dnn::Compute::Cuda::Gqa::CudaGqaOp< TPrecision >, Mila::Dnn::Compute::Cuda::Gqa::CudaGqaOp< TensorDataType::BF16 >, Mila::Dnn::Compute::Cuda::Gqa::CudaGqaOp< TensorDataType::FP32 >, Mila::Dnn::Compute::Cuda::MultiHeadAttention::CudaMultiHeadAttentionOp< TPrecision >, Mila::Dnn::Compute::Cuda::MultiHeadAttention::CudaMultiHeadAttentionOp< TensorDataType::BF16 >, and Mila::Dnn::Compute::Cuda::MultiHeadAttention::CudaMultiHeadAttentionOp< TensorDataType::FP32 >.

◆ resetKvCache()

virtual void Mila::Dnn::Compute::IKvCacheLifecycle::resetKvCache ( )