|
Mila 0.13.48
Deep Neural Network Library
|
Deployment configuration for Llama language models. More...


Public Member Functions | |
| LlamaModelConfig ()=default | |
| Default constructor. | |
| LlamaModelConfig (dim_t context_length) | |
| Construct with a required context length. | |
| std::string | toString () const |
| Produce a human-readable summary of the Llama model configuration. | |
| Public Member Functions inherited from Mila::Dnn::LanguageModelConfig< LlamaModelConfig > | |
| LanguageModelConfig ()=default | |
| std::string | baseToString () const |
| Produce the base fields portion of a toString() summary. | |
| dim_t | getContextLength () const noexcept |
| KvCacheCompression | getKvCacheCompression () const noexcept |
| WeightQuantization | getWeightQuantization () const noexcept |
| LlamaModelConfig & | withContextLength (dim_t context_length) |
| Set the maximum sequence length. | |
| LlamaModelConfig & | withFP4Quantization () |
| FP4 quantization — FP4 weights, FP8 KV cache. | |
| LlamaModelConfig & | withFP8Quantization () |
| FP8 quantization — FP8 weights, FP8 KV cache. | |
| LlamaModelConfig & | withFullPrecision () |
| Full precision — BF16 weights, BF16 KV cache. | |
| LlamaModelConfig & | withKvCacheCompression (KvCacheCompression kv) |
| Set the KV cache compression mode independently. | |
| LlamaModelConfig & | withWeightQuantization (WeightQuantization wq) |
| Set the weight quantization mode independently. | |
Additional Inherited Members | |
| Protected Attributes inherited from Mila::Dnn::LanguageModelConfig< LlamaModelConfig > | |
| dim_t | context_length_ |
| KvCacheCompression | kv_cache_compression_ |
| WeightQuantization | weight_quantization_ |
Deployment configuration for Llama language models.
Inherits all fluent setters and accessors from LanguageModelConfig<LlamaModelConfig>. Chains work across base and derived methods without casting.
|
default |
Default constructor.
context_length defaults to zero. Call withContextLength() before passing to fromPretrained(), or use the explicit constructor.

|
inlineexplicit |
Construct with a required context length.
| context_length | Maximum sequence length in tokens. Must be > 0. |
| std::invalid_argument | if context_length is zero. |

|
inline |
Produce a human-readable summary of the Llama model configuration.
