|
Mila
Deep Neural Network Library
|
Layer Normalization module. More...


Public Types | |
| using | ModuleBase = Module< TDeviceType, TInput, TOutput > |
| Alias for base module type. | |
| using | MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource > |
| Memory resource type used for tensors, selected based on device type. | |
Public Types inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput > | |
| using | MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource > |
Public Member Functions | |
| LayerNorm (const std::string &device_name, const LayerNormConfig &config) | |
| Constructs a new LayerNorm module with a device name. | |
| LayerNorm (std::shared_ptr< DeviceContext > device_context, const LayerNormConfig &config) | |
| Constructs a new LayerNorm module with a provided device context. | |
| void | backward (const Tensor< TInput, MR > &input, const Tensor< TOutput, MR > &output_grad, Tensor< TInput, MR > &input_grad) |
| Performs the backward pass of the Layer Normalization operation. | |
| void | forward (const Tensor< TInput, MR > &input, Tensor< TOutput, MR > &output) |
| Performs the forward pass of the Layer Normalization operation. | |
| std::shared_ptr< Tensor< TInput, MR > > | getBias () |
| Gets the bias tensor used after normalization and scaling. | |
| std::shared_ptr< Tensor< TInput, MR > > | getWeight () |
| Gets the weight tensor used for scaling after normalization. | |
| bool | hasBias () const |
| Gets whether the module has a bias tensor. | |
| void | load (ModelArchive &archive) override |
| Deserializes the module state from a ZIP archive. | |
| size_t | parameterCount () const override |
| Gets the number of trainable parameters in this module. | |
| void | save (ModelArchive &archive) const override |
| Serializes the module state to a ZIP archive. | |
| std::string | toString () const override |
| Generates a string representation of this module's configuration. | |
Public Member Functions inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput > | |
| Module (const std::string &device_name, const ComponentConfig &config) | |
| Constructor with device name. | |
| Module (std::shared_ptr< DeviceContext > context, const ComponentConfig &config) | |
| Constructor with a specific device context. | |
| virtual | ~Module ()=default |
| Virtual destructor for proper cleanup in derived classes. | |
| std::shared_ptr< Compute::DeviceContext > | getDeviceContext () const |
| Get the device context for this module. | |
| Compute::DeviceType | getDeviceType () const |
| Get the device type of the current device context. | |
| std::string | getName () const |
| Get the name of the module. | |
| const auto & | getParameterTensors () const |
| Get the parameter tensors of this module. | |
| const ComputePrecision::Policy & | getPrecision () const |
| const auto & | getStateTensors () const |
| Get the state tensors of this module. | |
| bool | isTraining () const |
| Check if the module is in training mode. | |
| virtual void | setTraining (bool is_training) |
| Set the training mode of this module. | |
Private Member Functions | |
| void | createOperation () |
| Creates the appropriate Layer Normalization operation based on the current device context. | |
| void | initializeTensors () |
| Initializes the tensors needed for the Layer Normalization operation. | |
Private Attributes | |
| std::shared_ptr< Tensor< TOutput, MR > > | bias_ { nullptr } |
| The bias tensor added after normalization and scaling. | |
| LayerNormConfig | config_ |
| Configuration for the LayerNorm module. | |
| std::shared_ptr< Tensor< TOutput, MR > > | mean_ { nullptr } |
| The mean tensor used for normalization. | |
| std::shared_ptr< UnaryOperation< TDeviceType, TInput, TOutput > > | operation_ { nullptr } |
| The underlying operation that implements Layer Normalization. | |
| std::vector< std::shared_ptr< Tensor< TOutput, MR > > > | output_state_ |
| Collection of output state tensors for caching. | |
| std::vector< std::shared_ptr< Tensor< TOutput, MR > > > | parameters_ |
| Collection of trainable parameters for this module. | |
| OperationAttributes | properties_ |
| Operation attributes and configuration. | |
| std::shared_ptr< Tensor< TOutput, MR > > | rstd_ { nullptr } |
| The reciprocal standard deviation tensor. | |
| std::shared_ptr< Tensor< TOutput, MR > > | weight_ { nullptr } |
| The weight tensor for scaling after normalization. | |
Additional Inherited Members | |
Protected Member Functions inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput > | |
| const std::string | parametersToString () const |
| Helper method to convert parameters to string representation. | |
| const std::string | stateToString () const |
| Helper method to convert state tensors to string representation. | |
Protected Attributes inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput > | |
| std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > > | parameter_map_ = {} |
| Map of parameter names to parameter tensors. | |
| std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > > | state_map_ = {} |
| Map of state names to state tensors. | |
Layer Normalization module.
Layer Normalization is a technique used to normalize the inputs across features for each data sample in a batch. It helps stabilize and accelerate deep neural network training by reducing internal covariate shift.
The operation can be expressed as: y = ((x - mean) / sqrt(variance + epsilon)) * weight + bias
Unlike Batch Normalization, Layer Normalization computes statistics independently for each sample in a batch, making it well-suited for variable-length sequences and recurrent neural networks.
|
export |
Alias for base module type.
|
export |
Memory resource type used for tensors, selected based on device type.
|
inlineexplicitexport |
Constructs a new LayerNorm module with a device name.
Creates a new DeviceContext internally using the provided device name. This constructor is useful for creating standalone modules without pre-existing device contexts.
| device_name | The name of the device to use (e.g., "CPU", "CUDA:0"). |
| config | Configuration parameters for the LayerNorm module. |
| std::invalid_argument | If the device name is invalid or the configuration is invalid |
| std::runtime_error | If device type doesn't match template parameter TDeviceType |

|
inlineexplicitexport |
Constructs a new LayerNorm module with a provided device context.
Uses a pre-existing DeviceContext instance. This constructor is useful when integrating the module into a larger network that shares device contexts across modules.
| device_context | The device context to use for this module. |
| config | Configuration parameters for the LayerNorm module. |
| std::invalid_argument | If device_context is null or configuration is invalid |
| std::runtime_error | If device context type doesn't match template parameter TDeviceType |

|
inlineexport |
Performs the backward pass of the Layer Normalization operation.
Computes gradients with respect to input and parameters. Layer Normalization's backward pass requires computing:
| input | The input tensor from the forward pass. |
| output_grad | The gradient of the loss with respect to the output. |
| input_grad | The tensor to store the gradient with respect to input. |
|
inlineexportprivate |
Creates the appropriate Layer Normalization operation based on the current device context.
This method initializes the operation_ member with the appropriate implementation of Layer Normalization for either CPU or CUDA, based on the current device context. It also passes the config object to the operation.


|
inlineexport |
Performs the forward pass of the Layer Normalization operation.
Normalizes the input tensor across the specified axis, then scales and shifts the result using the weight and bias tensors.
| input | The input tensor to be normalized. |
| output | The output tensor where the results will be stored. |
|
inlineexport |
Gets the bias tensor used after normalization and scaling.
The bias tensor is added after normalization and scaling.
|
inlineexport |
Gets the weight tensor used for scaling after normalization.
The weight tensor is applied as a scale factor to the normalized values.
|
inlineexport |
Gets whether the module has a bias tensor.

|
inlineexportprivate |
Initializes the tensors needed for the Layer Normalization operation.
Creates and initializes:


|
inlineoverrideexportvirtual |
Deserializes the module state from a ZIP archive.
Loads the trainable parameters (weight, bias) from the provided archive.
| zip | The ZIP archive to load the module state from. |
Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

|
inlineoverrideexportvirtual |
Gets the number of trainable parameters in this module.
Counts the total number of trainable parameters, which includes the weight tensor and, if present, the bias tensor.
Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.


|
inlineoverrideexportvirtual |
Serializes the module state to a ZIP archive.
Saves the trainable parameters (weight, bias) to the provided archive.
| zip | The ZIP archive to save the module state to. |
Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

|
inlineoverrideexportvirtual |
Generates a string representation of this module's configuration.
Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

|
exportprivate |
The bias tensor added after normalization and scaling.
|
exportprivate |
Configuration for the LayerNorm module.
|
exportprivate |
The mean tensor used for normalization.
Stores the mean values computed during the forward pass.
|
exportprivate |
The underlying operation that implements Layer Normalization.
|
exportprivate |
Collection of output state tensors for caching.
|
exportprivate |
Collection of trainable parameters for this module.
|
exportprivate |
Operation attributes and configuration.
|
exportprivate |
The reciprocal standard deviation tensor.
Stores the reciprocal of the standard deviation values (1/sqrt(variance + epsilon)) computed during the forward pass.
|
exportprivate |
The weight tensor for scaling after normalization.