|
Mila 0.13.48
Deep Neural Network Library
|
Quantization-specific KV cache compression policies. More...
Classes | |
| struct | Mila::Dnn::Quant::KvCache::PerChannelKvFp8< TStorage > |
| Symmetric per-head per-token FP8 KV cache compression policy. More... | |
Namespaces | |
| namespace | Mila |
| Mila main API namespace. | |
| namespace | Mila::Dnn |
| namespace | Mila::Dnn::Quant |
| namespace | Mila::Dnn::Quant::KvCache |
Concepts | |
| concept | Mila::Dnn::Quant::KvCache::QuantKvPolicy |
| Concept for quantization-based KV cache compression policies. | |
Quantization-specific KV cache compression policies.
Refines KvCachePolicy with the dtype and scale granularity fields required by the FP8 cache write and read kernels in CudaGqaOp.
A future SlidingWindowPolicy would import Dnn.Quantization.KvCache.Policy directly and would not import this module, as sliding window eviction carries no dtype fields.
PerChannelKvFp8<> � symmetric per-head per-token FP8 compression applied identically to K and V.