Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Dnn.Quantization.KvCache.QuantPolicy Module Reference

Concepts

concept  Mila::Dnn::Quant::KvCache::QuantKvPolicy
 Concept for quantization-based KV cache compression policies.

Classes

struct  Mila::Dnn::Quant::KvCache::PerChannelKvFp8< TStorage >
 Symmetric per-head per-token FP8 KV cache compression policy. More...

Files

file  /__w/Mila/Mila/Mila/Src/Dnn/Quantization/KvCache/QuantPolicy.ixx
 Quantization-specific KV cache compression policies.