Mila
Deep Neural Network Library
|
Layer Normalization module. More...
Public Types | |
using | ModuleBase = Module< TDeviceType, TInput, TOutput > |
Alias for base module type. | |
using | MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource > |
Memory resource type used for tensors, selected based on device type. | |
![]() | |
using | MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource > |
Public Member Functions | |
LayerNorm (const std::string &device_name, const LayerNormConfig &config) | |
Constructs a new LayerNorm module with a device name. | |
LayerNorm (std::shared_ptr< DeviceContext > device_context, const LayerNormConfig &config) | |
Constructs a new LayerNorm module with a provided device context. | |
void | backward (const Tensor< TInput, MR > &input, const Tensor< TOutput, MR > &output_grad, Tensor< TInput, MR > &input_grad) |
Performs the backward pass of the Layer Normalization operation. | |
void | forward (const Tensor< TInput, MR > &input, Tensor< TOutput, MR > &output) |
Performs the forward pass of the Layer Normalization operation. | |
std::shared_ptr< Tensor< TInput, MR > > | getBias () |
Gets the bias tensor used after normalization and scaling. | |
std::shared_ptr< Tensor< TInput, MR > > | getWeight () |
Gets the weight tensor used for scaling after normalization. | |
bool | hasBias () const |
Gets whether the module has a bias tensor. | |
void | load (ModelArchive &archive) override |
Deserializes the module state from a ZIP archive. | |
size_t | parameterCount () const override |
Gets the number of trainable parameters in this module. | |
void | save (ModelArchive &archive) const override |
Serializes the module state to a ZIP archive. | |
std::string | toString () const override |
Generates a string representation of this module's configuration. | |
![]() | |
Module (const std::string &device_name, const ComponentConfig &config) | |
Constructor with device name. | |
Module (std::shared_ptr< DeviceContext > context, const ComponentConfig &config) | |
Constructor with a specific device context. | |
virtual | ~Module ()=default |
Virtual destructor for proper cleanup in derived classes. | |
std::shared_ptr< Compute::DeviceContext > | getDeviceContext () const |
Get the device context for this module. | |
Compute::DeviceType | getDeviceType () const |
Get the device type of the current device context. | |
std::string | getName () const |
Get the name of the module. | |
const auto & | getParameterTensors () const |
Get the parameter tensors of this module. | |
const ComputePrecision::Policy & | getPrecision () const |
const auto & | getStateTensors () const |
Get the state tensors of this module. | |
bool | isTraining () const |
Check if the module is in training mode. | |
virtual void | setTraining (bool is_training) |
Set the training mode of this module. | |
Private Member Functions | |
void | createOperation () |
Creates the appropriate Layer Normalization operation based on the current device context. | |
void | initializeTensors () |
Initializes the tensors needed for the Layer Normalization operation. | |
Private Attributes | |
std::shared_ptr< Tensor< TOutput, MR > > | bias_ { nullptr } |
The bias tensor added after normalization and scaling. | |
LayerNormConfig | config_ |
Configuration for the LayerNorm module. | |
std::shared_ptr< Tensor< TOutput, MR > > | mean_ { nullptr } |
The mean tensor used for normalization. | |
std::shared_ptr< UnaryOperation< TDeviceType, TInput, TOutput > > | operation_ { nullptr } |
The underlying operation that implements Layer Normalization. | |
std::vector< std::shared_ptr< Tensor< TOutput, MR > > > | output_state_ |
Collection of output state tensors for caching. | |
std::vector< std::shared_ptr< Tensor< TOutput, MR > > > | parameters_ |
Collection of trainable parameters for this module. | |
OperationAttributes | properties_ |
Operation attributes and configuration. | |
std::shared_ptr< Tensor< TOutput, MR > > | rstd_ { nullptr } |
The reciprocal standard deviation tensor. | |
std::shared_ptr< Tensor< TOutput, MR > > | weight_ { nullptr } |
The weight tensor for scaling after normalization. | |
Additional Inherited Members | |
![]() | |
const std::string | parametersToString () const |
Helper method to convert parameters to string representation. | |
const std::string | stateToString () const |
Helper method to convert state tensors to string representation. | |
![]() | |
std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > > | parameter_map_ = {} |
Map of parameter names to parameter tensors. | |
std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > > | state_map_ = {} |
Map of state names to state tensors. | |
Layer Normalization module.
Layer Normalization is a technique used to normalize the inputs across features for each data sample in a batch. It helps stabilize and accelerate deep neural network training by reducing internal covariate shift.
The operation can be expressed as: y = ((x - mean) / sqrt(variance + epsilon)) * weight + bias
Unlike Batch Normalization, Layer Normalization computes statistics independently for each sample in a batch, making it well-suited for variable-length sequences and recurrent neural networks.
|
export |
Alias for base module type.
|
export |
Memory resource type used for tensors, selected based on device type.
|
inlineexplicitexport |
Constructs a new LayerNorm module with a device name.
Creates a new DeviceContext internally using the provided device name. This constructor is useful for creating standalone modules without pre-existing device contexts.
device_name | The name of the device to use (e.g., "CPU", "CUDA:0"). |
config | Configuration parameters for the LayerNorm module. |
std::invalid_argument | If the device name is invalid or the configuration is invalid |
std::runtime_error | If device type doesn't match template parameter TDeviceType |
|
inlineexplicitexport |
Constructs a new LayerNorm module with a provided device context.
Uses a pre-existing DeviceContext instance. This constructor is useful when integrating the module into a larger network that shares device contexts across modules.
device_context | The device context to use for this module. |
config | Configuration parameters for the LayerNorm module. |
std::invalid_argument | If device_context is null or configuration is invalid |
std::runtime_error | If device context type doesn't match template parameter TDeviceType |
|
inlineexport |
Performs the backward pass of the Layer Normalization operation.
Computes gradients with respect to input and parameters. Layer Normalization's backward pass requires computing:
input | The input tensor from the forward pass. |
output_grad | The gradient of the loss with respect to the output. |
input_grad | The tensor to store the gradient with respect to input. |
|
inlineexportprivate |
Creates the appropriate Layer Normalization operation based on the current device context.
This method initializes the operation_ member with the appropriate implementation of Layer Normalization for either CPU or CUDA, based on the current device context. It also passes the config object to the operation.
|
inlineexport |
Performs the forward pass of the Layer Normalization operation.
Normalizes the input tensor across the specified axis, then scales and shifts the result using the weight and bias tensors.
input | The input tensor to be normalized. |
output | The output tensor where the results will be stored. |
|
inlineexport |
Gets the bias tensor used after normalization and scaling.
The bias tensor is added after normalization and scaling.
|
inlineexport |
Gets the weight tensor used for scaling after normalization.
The weight tensor is applied as a scale factor to the normalized values.
|
inlineexport |
Gets whether the module has a bias tensor.
|
inlineexportprivate |
Initializes the tensors needed for the Layer Normalization operation.
Creates and initializes:
|
inlineoverrideexportvirtual |
Deserializes the module state from a ZIP archive.
Loads the trainable parameters (weight, bias) from the provided archive.
zip | The ZIP archive to load the module state from. |
Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.
|
inlineoverrideexportvirtual |
Gets the number of trainable parameters in this module.
Counts the total number of trainable parameters, which includes the weight tensor and, if present, the bias tensor.
Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.
|
inlineoverrideexportvirtual |
Serializes the module state to a ZIP archive.
Saves the trainable parameters (weight, bias) to the provided archive.
zip | The ZIP archive to save the module state to. |
Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.
|
inlineoverrideexportvirtual |
Generates a string representation of this module's configuration.
Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.
|
exportprivate |
The bias tensor added after normalization and scaling.
|
exportprivate |
Configuration for the LayerNorm module.
|
exportprivate |
The mean tensor used for normalization.
Stores the mean values computed during the forward pass.
|
exportprivate |
The underlying operation that implements Layer Normalization.
|
exportprivate |
Collection of output state tensors for caching.
|
exportprivate |
Collection of trainable parameters for this module.
|
exportprivate |
Operation attributes and configuration.
|
exportprivate |
The reciprocal standard deviation tensor.
Stores the reciprocal of the standard deviation values (1/sqrt(variance + epsilon)) computed during the forward pass.
|
exportprivate |
The weight tensor for scaling after normalization.