Deployment configuration for Llama language models. More...

Inheritance diagram for Mila::Dnn::LlamaModelConfig:

Collaboration diagram for Mila::Dnn::LlamaModelConfig:

Public Member Functions
	LlamaModelConfig ()=default
	Default constructor.
	LlamaModelConfig (dim_t context_length)
	Construct with a required context length.
std::string	toString () const
	Produce a human-readable summary of the Llama model configuration.
Public Member Functions inherited from Mila::Dnn::LanguageModelConfig< LlamaModelConfig >
	LanguageModelConfig ()=default
std::string	baseToString () const
	Produce the base fields portion of a toString() summary.
dim_t	getContextLength () const noexcept
KvCacheCompression	getKvCacheCompression () const noexcept
WeightQuantization	getWeightQuantization () const noexcept
LlamaModelConfig &	withContextLength (dim_t context_length)
	Set the maximum sequence length.
LlamaModelConfig &	withFP4Quantization ()
	FP4 quantization — FP4 weights, FP8 KV cache.
LlamaModelConfig &	withFP8Quantization ()
	FP8 quantization — FP8 weights, FP8 KV cache.
LlamaModelConfig &	withFullPrecision ()
	Full precision — BF16 weights, BF16 KV cache.
LlamaModelConfig &	withKvCacheCompression (KvCacheCompression kv)
	Set the KV cache compression mode independently.
LlamaModelConfig &	withWeightQuantization (WeightQuantization wq)
	Set the weight quantization mode independently.

Additional Inherited Members
Protected Attributes inherited from Mila::Dnn::LanguageModelConfig< LlamaModelConfig >
dim_t	context_length_
KvCacheCompression	kv_cache_compression_
WeightQuantization	weight_quantization_

Detailed Description

Deployment configuration for Llama language models.

Inherits all fluent setters and accessors from LanguageModelConfig<LlamaModelConfig>. Chains work across base and derived methods without casting.

Constructor & Destructor Documentation

Mila::Dnn::LlamaModelConfig::LlamaModelConfig ( )

default

Default constructor.

context_length defaults to zero. Call withContextLength() before passing to fromPretrained(), or use the explicit constructor.

Here is the caller graph for this function:

Mila::Dnn::LlamaModelConfig::LlamaModelConfig ( dim_t context_length )

inlineexplicit

Construct with a required context length.

Parameters

context_length Maximum sequence length in tokens. Must be > 0.

Exceptions

std::invalid_argument if context_length is zero.

Here is the call graph for this function:

std::string Mila::Dnn::LlamaModelConfig::toString ( ) const

inline

Produce a human-readable summary of the Llama model configuration.

Here is the call graph for this function:

The documentation for this struct was generated from the following file: