|
Mila 0.13.48
Deep Neural Network Library
|
Configuration class for GPT transformer blocks. More...


Public Member Functions | |
| GptBlockConfig (dim_t model_dim, dim_t num_heads) | |
| Construct a GPT block configuration. | |
| void | fromMetadata (const SerializationMetadata &meta) |
| Populate configuration from provided metadata. | |
| ActivationType | getActivationType () const noexcept |
| dim_t | getEffectiveHiddenDimension () const noexcept |
| Get effective hidden dimension with default fallback. | |
| dim_t | getHiddenSize () const noexcept |
| dim_t | getModelDim () const noexcept |
| dim_t | getNumHeads () const noexcept |
| float | getResidualScale () const noexcept |
| SerializationMetadata | toMetadata () const |
| Convert configuration into a SerializationMetadata object. | |
| std::string | toString () const override |
| Produce a short, human-readable summary of the configuration. | |
| bool | useBias () const noexcept |
| void | validate () const override |
| Validate configuration parameters. | |
| template<typename Self> | |
| decltype(auto) | withActivation (this Self &&self, ActivationType activation_type) |
| template<typename Self> | |
| decltype(auto) | withBias (this Self &&self, bool use_bias) |
| template<typename Self> | |
| decltype(auto) | withHiddenSize (this Self &&self, dim_t hidden_dim) |
| template<typename Self> | |
| decltype(auto) | withMaxSequenceLength (this Self &&self, dim_t max_seq_len) |
| Set maximum sequence length for block-level positional handling. | |
| template<typename Self> | |
| decltype(auto) | withResidualScale (this Self &&self, float scale) |
| Public Member Functions inherited from Mila::Dnn::ComponentConfig | |
| virtual | ~ComponentConfig ()=default |
| Virtual destructor for polymorphic base. | |
Private Attributes | |
| ActivationType | activation_type_ = ActivationType::Gelu |
| dim_t | hidden_dim_ = 0 |
| dim_t | max_seq_len_ = 2048 |
| dim_t | model_dim_ |
| dim_t | num_heads_ |
| float | residual_scale_ = 1.0f |
| bool | use_bias_ = false |
Configuration class for GPT transformer blocks.
Holds the model dimension, attention head count, MLP/attention options, and basic activation/residual settings.
Construct a GPT block configuration.
| model_dim | Model dimension. Must be > 0. |
| num_heads | Number of query attention heads. Must be > 0 and must divide model_dim evenly. |
|
inlinevirtual |
Populate configuration from provided metadata.
Implementations should read available keys and leave missing keys at their current/default values to preserve forward/backward compatibility.
| meta | Metadata to read configuration values from. |
Implements Mila::Dnn::ComponentConfig.

|
inlinenoexcept |
|
inlinenoexcept |
Get effective hidden dimension with default fallback.

|
inlinenoexcept |
|
inlinenoexcept |
|
inlinenoexcept |
|
inlinenoexcept |
|
inlinevirtual |
Convert configuration into a SerializationMetadata object.
Implementations should include any fields required to fully reconstruct the configuration via fromMetadata.
Implements Mila::Dnn::ComponentConfig.

|
inlineoverridevirtual |
Produce a short, human-readable summary of the configuration.
Implementations should return a compact, single-line description suitable for logging and debugging.
Implements Mila::Dnn::ComponentConfig.

|
inlinenoexcept |
|
inlineoverridevirtual |
Validate configuration parameters.
Called by callers to ensure the configuration represents a valid, constructible component. Implementations must throw std::invalid_argument (or a derived exception) when validation fails.
| std::invalid_argument | If the configuration is invalid. |
Implements Mila::Dnn::ComponentConfig.
|
inline |
|
inline |
|
inline |

|
inline |
Set maximum sequence length for block-level positional handling.
| max_seq_len | Maximum sequence length (must be > 0). |
|
inline |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |