CRTP base configuration for all deployable Mila language models. More...

#include <stdexcept>
#include <string>
import Dnn.TensorTypes;

Classes
struct	Mila::Dnn::LanguageModelConfig< TDerived >
	CRTP base configuration for all deployable Mila language models. More...

Namespaces
namespace	Mila
	Mila main API namespace.
namespace	Mila::Dnn

Enumerations
enum class	Mila::Dnn::KvCacheCompression { Mila::Dnn::None , Mila::Dnn::FP8 }
	KV cache storage and compression strategy for GroupedQueryAttention. More...
enum class	Mila::Dnn::WeightQuantization { Mila::Dnn::None , Mila::Dnn::FP8 , Mila::Dnn::FP4 }
	Weight storage and matmul strategy for Linear components. More...

Detailed Description

CRTP base configuration for all deployable Mila language models.

LanguageModelConfig<TDerived> owns the deployment concerns that are universal across all language model architectures:

context_length — maximum sequence length the model is built for. RoPE embeddings and KV cache buffers are sized to this.
WeightQuantization — weight storage and matmul strategy for Linear components. Defaults to WeightQuantization::None (BF16 weights).
KvCacheCompression — KV cache storage and compression strategy for GroupedQueryAttention components. Defaults to KvCacheCompression::None (no compression).

CRTP Pattern

All fluent setters return TDerived& so that chains work correctly across both base and derived methods without casting at the call site:

QwenModelConfig config = QwenModelConfig( context_length )
    .withFP8Quantization()   // returns QwenModelConfig&
    .withThinkingMode();     // returns QwenModelConfig&

Relationship to ModelConfig

ModelConfig<TDevice, TPrecision> is the structural base for all Mila models. LanguageModelConfig is the deployment configuration counterpart for the language model branch of that hierarchy. Vision model configurations would derive from a sibling VisionModelConfig<TDerived>, not from this class.

Relationship to BuildContext

LanguageModelConfig is the public API surface for deployment configuration. BuildContext is the internal carrier through the component tree. fromPretrained() projects LanguageModelConfig into BuildContext once — they are never the same object.

Quantization Presets vs Fine-Grained Control

Convenience preset methods express common deployment decisions in user vocabulary. Fine-grained setters are available for atypical configurations:

// Preset — FP8 weights + FP8 KV cache
LlamaModelConfig config = LlamaModelConfig( context_length )
    .withFP8Quantization();
 
// Fine-grained — FP4 weights, no KV compression
LlamaModelConfig config = LlamaModelConfig( context_length )
    .withWeightQuantization( WeightQuantization::FP4 )
    .withKvCacheCompression( KvCacheCompression::None );

Classes

Namespaces

Enumerations

Detailed Description

CRTP Pattern

Relationship to ModelConfig

Relationship to BuildContext

Quantization Presets vs Fine-Grained Control