Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::GptBlockConfig Class Referenceexport

Configuration class for GPT transformer blocks. More...

Inheritance diagram for Mila::Dnn::GptBlockConfig:
Collaboration diagram for Mila::Dnn::GptBlockConfig:

Public Member Functions

 GptBlockConfig (dim_t model_dim, dim_t num_heads)
 Construct a GPT block configuration.
void fromMetadata (const SerializationMetadata &meta)
 Populate configuration from provided metadata.
ActivationType getActivationType () const noexcept
dim_t getEffectiveHiddenDimension () const noexcept
 Get effective hidden dimension with default fallback.
dim_t getHiddenSize () const noexcept
dim_t getModelDim () const noexcept
dim_t getNumHeads () const noexcept
float getResidualScale () const noexcept
SerializationMetadata toMetadata () const
 Convert configuration into a SerializationMetadata object.
std::string toString () const override
 Produce a short, human-readable summary of the configuration.
bool useBias () const noexcept
void validate () const override
 Validate configuration parameters.
template<typename Self>
decltype(auto) withActivation (this Self &&self, ActivationType activation_type)
template<typename Self>
decltype(auto) withBias (this Self &&self, bool use_bias)
template<typename Self>
decltype(auto) withHiddenSize (this Self &&self, dim_t hidden_dim)
template<typename Self>
decltype(auto) withMaxSequenceLength (this Self &&self, dim_t max_seq_len)
 Set maximum sequence length for block-level positional handling.
template<typename Self>
decltype(auto) withResidualScale (this Self &&self, float scale)
Public Member Functions inherited from Mila::Dnn::ComponentConfig
virtual ~ComponentConfig ()=default
 Virtual destructor for polymorphic base.

Private Attributes

ActivationType activation_type_ = ActivationType::Gelu
dim_t hidden_dim_ = 0
dim_t max_seq_len_ = 2048
dim_t model_dim_
dim_t num_heads_
float residual_scale_ = 1.0f
bool use_bias_ = false

Detailed Description

Configuration class for GPT transformer blocks.

Holds the model dimension, attention head count, MLP/attention options, and basic activation/residual settings.

Constructor & Destructor Documentation

◆ GptBlockConfig()

Mila::Dnn::GptBlockConfig::GptBlockConfig ( dim_t model_dim,
dim_t num_heads )
inline

Construct a GPT block configuration.

Parameters
model_dimModel dimension. Must be > 0.
num_headsNumber of query attention heads. Must be > 0 and must divide model_dim evenly.

Member Function Documentation

◆ fromMetadata()

void Mila::Dnn::GptBlockConfig::fromMetadata ( const SerializationMetadata & meta)
inlinevirtual

Populate configuration from provided metadata.

Implementations should read available keys and leave missing keys at their current/default values to preserve forward/backward compatibility.

Parameters
metaMetadata to read configuration values from.

Implements Mila::Dnn::ComponentConfig.

Here is the call graph for this function:

◆ getActivationType()

ActivationType Mila::Dnn::GptBlockConfig::getActivationType ( ) const
inlinenoexcept

◆ getEffectiveHiddenDimension()

dim_t Mila::Dnn::GptBlockConfig::getEffectiveHiddenDimension ( ) const
inlinenoexcept

Get effective hidden dimension with default fallback.

Returns
Hidden dimension, defaulting to 4x model_dim if not set.
Here is the caller graph for this function:

◆ getHiddenSize()

dim_t Mila::Dnn::GptBlockConfig::getHiddenSize ( ) const
inlinenoexcept

◆ getModelDim()

dim_t Mila::Dnn::GptBlockConfig::getModelDim ( ) const
inlinenoexcept

◆ getNumHeads()

dim_t Mila::Dnn::GptBlockConfig::getNumHeads ( ) const
inlinenoexcept

◆ getResidualScale()

float Mila::Dnn::GptBlockConfig::getResidualScale ( ) const
inlinenoexcept

◆ toMetadata()

SerializationMetadata Mila::Dnn::GptBlockConfig::toMetadata ( ) const
inlinevirtual

Convert configuration into a SerializationMetadata object.

Implementations should include any fields required to fully reconstruct the configuration via fromMetadata.

Returns
SerializationMetadata Metadata representation of the config.

Implements Mila::Dnn::ComponentConfig.

Here is the call graph for this function:

◆ toString()

std::string Mila::Dnn::GptBlockConfig::toString ( ) const
inlineoverridevirtual

Produce a short, human-readable summary of the configuration.

Implementations should return a compact, single-line description suitable for logging and debugging.

Returns
std::string Human-readable summary of the configuration.

Implements Mila::Dnn::ComponentConfig.

Here is the call graph for this function:

◆ useBias()

bool Mila::Dnn::GptBlockConfig::useBias ( ) const
inlinenoexcept

◆ validate()

void Mila::Dnn::GptBlockConfig::validate ( ) const
inlineoverridevirtual

Validate configuration parameters.

Called by callers to ensure the configuration represents a valid, constructible component. Implementations must throw std::invalid_argument (or a derived exception) when validation fails.

Exceptions
std::invalid_argumentIf the configuration is invalid.

Implements Mila::Dnn::ComponentConfig.

◆ withActivation()

template<typename Self>
decltype(auto) Mila::Dnn::GptBlockConfig::withActivation ( this Self && self,
ActivationType activation_type )
inline

◆ withBias()

template<typename Self>
decltype(auto) Mila::Dnn::GptBlockConfig::withBias ( this Self && self,
bool use_bias )
inline

◆ withHiddenSize()

template<typename Self>
decltype(auto) Mila::Dnn::GptBlockConfig::withHiddenSize ( this Self && self,
dim_t hidden_dim )
inline
Here is the caller graph for this function:

◆ withMaxSequenceLength()

template<typename Self>
decltype(auto) Mila::Dnn::GptBlockConfig::withMaxSequenceLength ( this Self && self,
dim_t max_seq_len )
inline

Set maximum sequence length for block-level positional handling.

Parameters
max_seq_lenMaximum sequence length (must be > 0).

◆ withResidualScale()

template<typename Self>
decltype(auto) Mila::Dnn::GptBlockConfig::withResidualScale ( this Self && self,
float scale )
inline

Member Data Documentation

◆ activation_type_

ActivationType Mila::Dnn::GptBlockConfig::activation_type_ = ActivationType::Gelu
private

◆ hidden_dim_

dim_t Mila::Dnn::GptBlockConfig::hidden_dim_ = 0
private

◆ max_seq_len_

dim_t Mila::Dnn::GptBlockConfig::max_seq_len_ = 2048
private

◆ model_dim_

dim_t Mila::Dnn::GptBlockConfig::model_dim_
private

◆ num_heads_

dim_t Mila::Dnn::GptBlockConfig::num_heads_
private

◆ residual_scale_

float Mila::Dnn::GptBlockConfig::residual_scale_ = 1.0f
private

◆ use_bias_

bool Mila::Dnn::GptBlockConfig::use_bias_ = false
private

The documentation for this class was generated from the following file: