CRTP base configuration for all deployable Mila language models. More...

Public Member Functions
	LanguageModelConfig ()=default
	LanguageModelConfig (dim_t context_length)
	Construct with a required context length.
std::string	baseToString () const
	Produce the base fields portion of a toString() summary.
dim_t	getContextLength () const noexcept
KvCacheCompression	getKvCacheCompression () const noexcept
WeightQuantization	getWeightQuantization () const noexcept
TDerived &	withContextLength (dim_t context_length)
	Set the maximum sequence length.
TDerived &	withFP4Quantization ()
	FP4 quantization — FP4 weights, FP8 KV cache.
TDerived &	withFP8Quantization ()
	FP8 quantization — FP8 weights, FP8 KV cache.
TDerived &	withFullPrecision ()
	Full precision — BF16 weights, BF16 KV cache.
TDerived &	withKvCacheCompression (KvCacheCompression kv)
	Set the KV cache compression mode independently.
TDerived &	withWeightQuantization (WeightQuantization wq)
	Set the weight quantization mode independently.

Protected Attributes
dim_t	context_length_ { 0 }
KvCacheCompression	kv_cache_compression_ { KvCacheCompression::None }
WeightQuantization	weight_quantization_ { WeightQuantization::None }

Detailed Description

template<typename TDerived>
struct Mila::Dnn::LanguageModelConfig< TDerived >

CRTP base configuration for all deployable Mila language models.

Template Parameters

TDerived Concrete config type (e.g. LlamaModelConfig). All fluent setters return TDerived& to support unbroken chain syntax across base and derived methods.

Constructor & Destructor Documentation

◆ LanguageModelConfig() [1/2]

template<typename TDerived>

Mila::Dnn::LanguageModelConfig< TDerived >::LanguageModelConfig ( )

default

◆ LanguageModelConfig() [2/2]

template<typename TDerived>

Mila::Dnn::LanguageModelConfig< TDerived >::LanguageModelConfig ( dim_t context_length )

inlineexplicit

Construct with a required context length.

Parameters

context_length Maximum sequence length in tokens. Must be > 0.

Exceptions

std::invalid_argument if context_length is zero.

Member Function Documentation

◆ baseToString()

template<typename TDerived>

std::string Mila::Dnn::LanguageModelConfig< TDerived >::baseToString ( ) const

inline

Produce the base fields portion of a toString() summary.

Concrete model configs call this from their own toString() implementation and append architecture-specific fields.

◆ getContextLength()

template<typename TDerived>

dim_t Mila::Dnn::LanguageModelConfig< TDerived >::getContextLength ( ) const

inlinenoexcept

Here is the caller graph for this function:

◆ getKvCacheCompression()

template<typename TDerived>

KvCacheCompression Mila::Dnn::LanguageModelConfig< TDerived >::getKvCacheCompression ( ) const

inlinenoexcept

◆ getWeightQuantization()

template<typename TDerived>

WeightQuantization Mila::Dnn::LanguageModelConfig< TDerived >::getWeightQuantization ( ) const

inlinenoexcept

◆ withContextLength()

template<typename TDerived>

TDerived & Mila::Dnn::LanguageModelConfig< TDerived >::withContextLength ( dim_t context_length )

inline

Set the maximum sequence length.

Required before passing the config to fromPretrained(). RoPE embeddings and KV cache buffers are sized to this value at build time.

Parameters

context_length Maximum sequence length in tokens. Must be > 0.

Exceptions

std::invalid_argument if context_length is zero.

◆ withFP4Quantization()

template<typename TDerived>

TDerived & Mila::Dnn::LanguageModelConfig< TDerived >::withFP4Quantization ( )

inline

FP4 quantization — FP4 weights, FP8 KV cache.

Maps to PerGroupFp4<> on Linear (future) and PerChannelKvFp8<> on GroupedQueryAttention. Aggressive compression; some quality loss relative to FP8. FP4 KV cache is not a Mila target.

◆ withFP8Quantization()

template<typename TDerived>

TDerived & Mila::Dnn::LanguageModelConfig< TDerived >::withFP8Quantization ( )

inline

FP8 quantization — FP8 weights, FP8 KV cache.

Maps to PerChannelFp8<> on Linear and PerChannelKvFp8<> on GroupedQueryAttention. Good quality/compression tradeoff for standard inference on Ada Lovelace and later.

◆ withFullPrecision()

template<typename TDerived>

TDerived & Mila::Dnn::LanguageModelConfig< TDerived >::withFullPrecision ( )

inline

Full precision — BF16 weights, BF16 KV cache.

Resets both quantization axes to their defaults. Useful for explicitly documenting intent or overriding a previously set preset.

◆ withKvCacheCompression()

template<typename TDerived>

TDerived & Mila::Dnn::LanguageModelConfig< TDerived >::withKvCacheCompression ( KvCacheCompression kv )

inline

Set the KV cache compression mode independently.

Use when the desired KV cache compression does not pair with the default weight quantization of a preset, or when a preset does not exist for the desired combination.

Parameters

kv	KV cache compression mode to apply.

◆ withWeightQuantization()

template<typename TDerived>

TDerived & Mila::Dnn::LanguageModelConfig< TDerived >::withWeightQuantization ( WeightQuantization wq )

inline

Set the weight quantization mode independently.

Use when the desired weight quantization does not pair with the default KV cache compression of a preset, or when a preset does not exist for the desired combination.

Parameters

wq	Weight quantization mode to apply.

Member Data Documentation

◆ context_length_

template<typename TDerived>

dim_t Mila::Dnn::LanguageModelConfig< TDerived >::context_length_ { 0 }

protected

◆ kv_cache_compression_

template<typename TDerived>

KvCacheCompression Mila::Dnn::LanguageModelConfig< TDerived >::kv_cache_compression_ { KvCacheCompression::None }

protected

◆ weight_quantization_

template<typename TDerived>

WeightQuantization Mila::Dnn::LanguageModelConfig< TDerived >::weight_quantization_ { WeightQuantization::None }

protected

The documentation for this struct was generated from the following file:

/__w/Mila/Mila/Mila/Src/Dnn/Core/LanguageModelConfig.ixx

Public Member Functions

Protected Attributes

Detailed Description

Constructor & Destructor Documentation

◆ LanguageModelConfig() [1/2]

◆ LanguageModelConfig() [2/2]

Member Function Documentation

◆ baseToString()

◆ getContextLength()

◆ getKvCacheCompression()

◆ getWeightQuantization()

◆ withContextLength()

◆ withFP4Quantization()

◆ withFP8Quantization()

◆ withFullPrecision()

◆ withKvCacheCompression()

◆ withWeightQuantization()

Member Data Documentation

◆ context_length_

◆ kv_cache_compression_

◆ weight_quantization_