Configuration class for TransformerBlock.
More...
Configuration class for TransformerBlock.
Provides a type-safe fluent interface for configuring TransformerBlock modules.
◆ TransformerBlockConfig()
Mila::Dnn::TransformerBlockConfig::TransformerBlockConfig |
( |
const std::vector< size_t > & |
input_shape, |
|
|
size_t |
num_heads |
|
) |
| |
|
inline |
Constructor with required parameters.
- Parameters
-
input_shape | The shape of the input tensor [batch_size, sequence_length, embedding_dim] |
num_heads | The number of attention heads |
◆ getActivationType()
ActivationType Mila::Dnn::TransformerBlockConfig::getActivationType |
( |
| ) |
const |
|
inline |
Get the activation type for the MLP.
◆ getDropout()
float Mila::Dnn::TransformerBlockConfig::getDropout |
( |
| ) |
const |
|
inline |
◆ getHiddenDimension()
size_t Mila::Dnn::TransformerBlockConfig::getHiddenDimension |
( |
| ) |
const |
|
inline |
Get the hidden dimension for the feed-forward network.
◆ getInputShape()
const std::vector< size_t > & Mila::Dnn::TransformerBlockConfig::getInputShape |
( |
| ) |
const |
|
inline |
◆ getNumHeads()
size_t Mila::Dnn::TransformerBlockConfig::getNumHeads |
( |
| ) |
const |
|
inline |
Get the number of attention heads.
◆ useBias()
bool Mila::Dnn::TransformerBlockConfig::useBias |
( |
| ) |
const |
|
inline |
Check if bias is enabled.
◆ usePreLayerNorm()
bool Mila::Dnn::TransformerBlockConfig::usePreLayerNorm |
( |
| ) |
const |
|
inline |
Check if using pre-layer normalization.
◆ validate()
void Mila::Dnn::TransformerBlockConfig::validate |
( |
| ) |
const |
|
inlinevirtual |
Validate configuration parameters.
- Exceptions
-
std::invalid_argument | If validation fails |
Reimplemented from Mila::Dnn::ComponentConfig.
◆ withActivation()
Configure the activation function for the MLP.
- Parameters
-
activation_type | The activation function type |
- Returns
- TransformerBlockConfig& Reference to this for method chaining
◆ withBias()
Configure whether to use bias in attention and feedforward layers.
- Parameters
-
use_bias | Whether to use bias |
- Returns
- TransformerBlockConfig& Reference to this for method chaining
◆ withDropout()
Configure the dropout rate.
- Parameters
-
dropout | Dropout probability (0.0 to 1.0) |
- Returns
- TransformerBlockConfig& Reference to this for method chaining
◆ withHiddenDimension()
Configure the hidden dimension for the feed-forward network.
- Parameters
-
hidden_dim | Size of the hidden layer in the feed-forward network |
- Returns
- TransformerBlockConfig& Reference to this for method chaining
◆ withPreLayerNorm()
Configure whether to use pre-layer normalization architecture.
- Parameters
-
use_pre_ln | Whether to use pre-layer normalization |
- Returns
- TransformerBlockConfig& Reference to this for method chaining
◆ activation_type_
◆ dropout_
float Mila::Dnn::TransformerBlockConfig::dropout_ = 0.0f |
|
private |
◆ hidden_dim_
size_t Mila::Dnn::TransformerBlockConfig::hidden_dim_ = 0 |
|
private |
◆ input_shape_
std::vector<size_t> Mila::Dnn::TransformerBlockConfig::input_shape_ |
|
private |
◆ num_heads_
size_t Mila::Dnn::TransformerBlockConfig::num_heads_ |
|
private |
◆ use_bias_
bool Mila::Dnn::TransformerBlockConfig::use_bias_ = true |
|
private |
◆ use_pre_ln_
bool Mila::Dnn::TransformerBlockConfig::use_pre_ln_ = true |
|
private |
The documentation for this class was generated from the following file: