Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
LanguageModelConfig.ixx File Reference

CRTP base configuration for all deployable Mila language models. More...

#include <stdexcept>
#include <string>
import Dnn.TensorTypes;

Classes

struct  Mila::Dnn::LanguageModelConfig< TDerived >
 CRTP base configuration for all deployable Mila language models. More...

Namespaces

namespace  Mila
 Mila main API namespace.
namespace  Mila::Dnn

Enumerations

enum class  Mila::Dnn::KvCacheCompression { Mila::Dnn::None , Mila::Dnn::FP8 }
 KV cache storage and compression strategy for GroupedQueryAttention. More...
enum class  Mila::Dnn::WeightQuantization { Mila::Dnn::None , Mila::Dnn::FP8 , Mila::Dnn::FP4 }
 Weight storage and matmul strategy for Linear components. More...

Detailed Description

CRTP base configuration for all deployable Mila language models.

LanguageModelConfig<TDerived> owns the deployment concerns that are universal across all language model architectures:

  1. context_length — maximum sequence length the model is built for. RoPE embeddings and KV cache buffers are sized to this.
  2. WeightQuantization — weight storage and matmul strategy for Linear components. Defaults to WeightQuantization::None (BF16 weights).
  3. KvCacheCompression — KV cache storage and compression strategy for GroupedQueryAttention components. Defaults to KvCacheCompression::None (no compression).

CRTP Pattern

All fluent setters return TDerived& so that chains work correctly across both base and derived methods without casting at the call site:

QwenModelConfig config = QwenModelConfig( context_length )
.withFP8Quantization() // returns QwenModelConfig&
.withThinkingMode(); // returns QwenModelConfig&

Relationship to ModelConfig

ModelConfig<TDevice, TPrecision> is the structural base for all Mila models. LanguageModelConfig is the deployment configuration counterpart for the language model branch of that hierarchy. Vision model configurations would derive from a sibling VisionModelConfig<TDerived>, not from this class.

Relationship to BuildContext

LanguageModelConfig is the public API surface for deployment configuration. BuildContext is the internal carrier through the component tree. fromPretrained() projects LanguageModelConfig into BuildContext once — they are never the same object.

Quantization Presets vs Fine-Grained Control

Convenience preset methods express common deployment decisions in user vocabulary. Fine-grained setters are available for atypical configurations:

// Preset — FP8 weights + FP8 KV cache
LlamaModelConfig config = LlamaModelConfig( context_length )
.withFP8Quantization();
// Fine-grained — FP4 weights, no KV compression
LlamaModelConfig config = LlamaModelConfig( context_length )
.withWeightQuantization( WeightQuantization::FP4 )
.withKvCacheCompression( KvCacheCompression::None );