Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
LlamaModelConfig.ixx File Reference

Deployment configuration for Llama language models. More...

#include <stdexcept>
#include <string>
#include <type_traits>
#include <concepts>
import Dnn.TensorTypes;
import Dnn.LanguageModelConfig;

Classes

struct  Mila::Dnn::LlamaModelConfig
 Deployment configuration for Llama language models. More...

Namespaces

namespace  Mila
 Mila main API namespace.
namespace  Mila::Dnn

Detailed Description

Deployment configuration for Llama language models.

LlamaModelConfig is the concrete configuration type passed to LlamaModel::fromPretrained(). It inherits all universal language model deployment concerns from LanguageModelConfig<LlamaModelConfig>:

  • context_length — maximum sequence length
  • weight_quantization — Linear weight storage strategy
  • kv_cache_compression — GroupedQueryAttention cache strategy

All Llama architectural parameters (num_layers, num_heads, hidden_dim, rope_theta, vocab_size, etc.) are read from checkpoint metadata at load time and are not deployment concerns. LlamaModelConfig carries no architecture-specific fields beyond what the base provides.

Usage

// Standard BF16 inference
auto config = LlamaModelConfig( context_length );
// FP8 weights + FP8 KV cache
auto config = LlamaModelConfig( context_length )
.withFP8Quantization();
// FP8 weights only — no KV compression
auto config = LlamaModelConfig( context_length )
.withWeightQuantization( WeightQuantization::FP8 );