Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
QuantPolicy.ixx File Reference

Quantization-specific KV cache compression policies. More...

#include <concepts>
import Dnn.TensorDataType;
import Dnn.Quantization.KvCache.Policy;

Classes

struct  Mila::Dnn::Quant::KvCache::PerChannelKvFp8< TStorage >
 Symmetric per-head per-token FP8 KV cache compression policy. More...

Namespaces

namespace  Mila
 Mila main API namespace.
namespace  Mila::Dnn
namespace  Mila::Dnn::Quant
namespace  Mila::Dnn::Quant::KvCache

Concepts

concept  Mila::Dnn::Quant::KvCache::QuantKvPolicy
 Concept for quantization-based KV cache compression policies.

Detailed Description

Quantization-specific KV cache compression policies.

Refines KvCachePolicy with the dtype and scale granularity fields required by the FP8 cache write and read kernels in CudaGqaOp.

A future SlidingWindowPolicy would import Dnn.Quantization.KvCache.Policy directly and would not import this module, as sliding window eviction carries no dtype fields.

Note
Alpha.6 target: PerChannelKvFp8<> � symmetric per-head per-token FP8 compression applied identically to K and V.