Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::Quant::KvCache::QuantKvPolicy Concept Referenceexport

Concept for quantization-based KV cache compression policies. More...

Concept definition

template<typename T>
{
{ T::kStorageDtype } -> std::convertible_to<TensorDataType>;
{
T::kScaleDtype
} -> std::convertible_to<TensorDataType>;
{
T::kPerHeadPerToken
} -> std::convertible_to<bool>;
{
T::kSymmetric
} -> std::convertible_to<bool>;
}
Concept for quantization-based KV cache compression policies.
Definition QuantPolicy.ixx:42

Detailed Description

Concept for quantization-based KV cache compression policies.

Refinement of KvCachePolicy requiring the additional dtype and granularity fields consumed by the FP8 cache kernels in CudaGqaOp. Used where the operation needs to inspect storage and scale dtypes, for example during kernel selection and scale tensor allocation.

Note
NoKvCompression satisfies KvCachePolicy but does not satisfy QuantKvPolicy. CudaGqaOp guards all QuantKvPolicy-specific paths with if constexpr (kKvCompressed) before accessing these fields.
Template Parameters
TCandidate policy type.