Quantization-specific KV cache compression policies. More...

#include <concepts>
import Dnn.TensorDataType;
import Dnn.Quantization.KvCache.Policy;

Classes
struct	Mila::Dnn::Quant::KvCache::PerChannelKvFp8< TStorage >
	Symmetric per-head per-token FP8 KV cache compression policy. More...

Namespaces
namespace	Mila
	Mila main API namespace.
namespace	Mila::Dnn
namespace	Mila::Dnn::Quant
namespace	Mila::Dnn::Quant::KvCache

Concepts
concept	Mila::Dnn::Quant::KvCache::QuantKvPolicy
	Concept for quantization-based KV cache compression policies.

Detailed Description

Quantization-specific KV cache compression policies.

Refines KvCachePolicy with the dtype and scale granularity fields required by the FP8 cache write and read kernels in CudaGqaOp.

A future SlidingWindowPolicy would import Dnn.Quantization.KvCache.Policy directly and would not import this module, as sliding window eviction carries no dtype fields.

Note: Alpha.6 target: PerChannelKvFp8<> � symmetric per-head per-token FP8 compression applied identically to K and V.

Classes

Namespaces

Concepts

Detailed Description