Mila
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput > Class Template Referenceexport

Layer Normalization module. More...

Inheritance diagram for Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >:
Collaboration diagram for Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >:

Public Types

using ModuleBase = Module< TDeviceType, TInput, TOutput >
 Alias for base module type.
 
using MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource >
 Memory resource type used for tensors, selected based on device type.
 
- Public Types inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput >
using MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource >
 

Public Member Functions

 LayerNorm (const std::string &device_name, const LayerNormConfig &config)
 Constructs a new LayerNorm module with a device name.
 
 LayerNorm (std::shared_ptr< DeviceContext > device_context, const LayerNormConfig &config)
 Constructs a new LayerNorm module with a provided device context.
 
void backward (const Tensor< TInput, MR > &input, const Tensor< TOutput, MR > &output_grad, Tensor< TInput, MR > &input_grad)
 Performs the backward pass of the Layer Normalization operation.
 
void forward (const Tensor< TInput, MR > &input, Tensor< TOutput, MR > &output)
 Performs the forward pass of the Layer Normalization operation.
 
std::shared_ptr< Tensor< TInput, MR > > getBias ()
 Gets the bias tensor used after normalization and scaling.
 
std::shared_ptr< Tensor< TInput, MR > > getWeight ()
 Gets the weight tensor used for scaling after normalization.
 
bool hasBias () const
 Gets whether the module has a bias tensor.
 
void load (ModelArchive &archive) override
 Deserializes the module state from a ZIP archive.
 
size_t parameterCount () const override
 Gets the number of trainable parameters in this module.
 
void save (ModelArchive &archive) const override
 Serializes the module state to a ZIP archive.
 
std::string toString () const override
 Generates a string representation of this module's configuration.
 
- Public Member Functions inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput >
 Module (const std::string &device_name, const ComponentConfig &config)
 Constructor with device name.
 
 Module (std::shared_ptr< DeviceContext > context, const ComponentConfig &config)
 Constructor with a specific device context.
 
virtual ~Module ()=default
 Virtual destructor for proper cleanup in derived classes.
 
std::shared_ptr< Compute::DeviceContextgetDeviceContext () const
 Get the device context for this module.
 
Compute::DeviceType getDeviceType () const
 Get the device type of the current device context.
 
std::string getName () const
 Get the name of the module.
 
const auto & getParameterTensors () const
 Get the parameter tensors of this module.
 
const ComputePrecision::PolicygetPrecision () const
 
const auto & getStateTensors () const
 Get the state tensors of this module.
 
bool isTraining () const
 Check if the module is in training mode.
 
virtual void setTraining (bool is_training)
 Set the training mode of this module.
 

Private Member Functions

void createOperation ()
 Creates the appropriate Layer Normalization operation based on the current device context.
 
void initializeTensors ()
 Initializes the tensors needed for the Layer Normalization operation.
 

Private Attributes

std::shared_ptr< Tensor< TOutput, MR > > bias_ { nullptr }
 The bias tensor added after normalization and scaling.
 
LayerNormConfig config_
 Configuration for the LayerNorm module.
 
std::shared_ptr< Tensor< TOutput, MR > > mean_ { nullptr }
 The mean tensor used for normalization.
 
std::shared_ptr< UnaryOperation< TDeviceType, TInput, TOutput > > operation_ { nullptr }
 The underlying operation that implements Layer Normalization.
 
std::vector< std::shared_ptr< Tensor< TOutput, MR > > > output_state_
 Collection of output state tensors for caching.
 
std::vector< std::shared_ptr< Tensor< TOutput, MR > > > parameters_
 Collection of trainable parameters for this module.
 
OperationAttributes properties_
 Operation attributes and configuration.
 
std::shared_ptr< Tensor< TOutput, MR > > rstd_ { nullptr }
 The reciprocal standard deviation tensor.
 
std::shared_ptr< Tensor< TOutput, MR > > weight_ { nullptr }
 The weight tensor for scaling after normalization.
 

Additional Inherited Members

- Protected Member Functions inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput >
const std::string parametersToString () const
 Helper method to convert parameters to string representation.
 
const std::string stateToString () const
 Helper method to convert state tensors to string representation.
 
- Protected Attributes inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput >
std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > > parameter_map_ = {}
 Map of parameter names to parameter tensors.
 
std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > > state_map_ = {}
 Map of state names to state tensors.
 

Detailed Description

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
requires ValidTensorTypes<TInput, TOutput>
class Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >

Layer Normalization module.

Layer Normalization is a technique used to normalize the inputs across features for each data sample in a batch. It helps stabilize and accelerate deep neural network training by reducing internal covariate shift.

The operation can be expressed as: y = ((x - mean) / sqrt(variance + epsilon)) * weight + bias

Unlike Batch Normalization, Layer Normalization computes statistics independently for each sample in a batch, making it well-suited for variable-length sequences and recurrent neural networks.

Template Parameters
TDeviceTypeThe device type (CPU or CUDA) on which to perform computations.
TInputData type of the input tensor elements.
TOutputData type of the output tensor elements, defaults to TInput.

Member Typedef Documentation

◆ ModuleBase

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
using Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::ModuleBase = Module<TDeviceType, TInput, TOutput>
export

Alias for base module type.

◆ MR

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
using Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::MR = std::conditional_t<TDeviceType == DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource>
export

Memory resource type used for tensors, selected based on device type.

Constructor & Destructor Documentation

◆ LayerNorm() [1/2]

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::LayerNorm ( const std::string &  device_name,
const LayerNormConfig config 
)
inlineexplicitexport

Constructs a new LayerNorm module with a device name.

Creates a new DeviceContext internally using the provided device name. This constructor is useful for creating standalone modules without pre-existing device contexts.

Parameters
device_nameThe name of the device to use (e.g., "CPU", "CUDA:0").
configConfiguration parameters for the LayerNorm module.
Exceptions
std::invalid_argumentIf the device name is invalid or the configuration is invalid
std::runtime_errorIf device type doesn't match template parameter TDeviceType
Here is the call graph for this function:

◆ LayerNorm() [2/2]

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::LayerNorm ( std::shared_ptr< DeviceContext device_context,
const LayerNormConfig config 
)
inlineexplicitexport

Constructs a new LayerNorm module with a provided device context.

Uses a pre-existing DeviceContext instance. This constructor is useful when integrating the module into a larger network that shares device contexts across modules.

Parameters
device_contextThe device context to use for this module.
configConfiguration parameters for the LayerNorm module.
Exceptions
std::invalid_argumentIf device_context is null or configuration is invalid
std::runtime_errorIf device context type doesn't match template parameter TDeviceType
Here is the call graph for this function:

Member Function Documentation

◆ backward()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
void Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::backward ( const Tensor< TInput, MR > &  input,
const Tensor< TOutput, MR > &  output_grad,
Tensor< TInput, MR > &  input_grad 
)
inlineexport

Performs the backward pass of the Layer Normalization operation.

Computes gradients with respect to input and parameters. Layer Normalization's backward pass requires computing:

  1. Gradient with respect to weight and bias (if present)
  2. Gradient with respect to input, which is more complex due to the normalization operation's chain rule derivatives
Parameters
inputThe input tensor from the forward pass.
output_gradThe gradient of the loss with respect to the output.
input_gradThe tensor to store the gradient with respect to input.

◆ createOperation()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
void Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::createOperation ( )
inlineexportprivate

Creates the appropriate Layer Normalization operation based on the current device context.

This method initializes the operation_ member with the appropriate implementation of Layer Normalization for either CPU or CUDA, based on the current device context. It also passes the config object to the operation.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ forward()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
void Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::forward ( const Tensor< TInput, MR > &  input,
Tensor< TOutput, MR > &  output 
)
inlineexport

Performs the forward pass of the Layer Normalization operation.

Normalizes the input tensor across the specified axis, then scales and shifts the result using the weight and bias tensors.

Parameters
inputThe input tensor to be normalized.
outputThe output tensor where the results will be stored.

◆ getBias()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
std::shared_ptr< Tensor< TInput, MR > > Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::getBias ( )
inlineexport

Gets the bias tensor used after normalization and scaling.

The bias tensor is added after normalization and scaling.

Returns
std::shared_ptr<Tensor<TInput, MR>> Shared pointer to the bias tensor.

◆ getWeight()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
std::shared_ptr< Tensor< TInput, MR > > Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::getWeight ( )
inlineexport

Gets the weight tensor used for scaling after normalization.

The weight tensor is applied as a scale factor to the normalized values.

Returns
std::shared_ptr<Tensor<TInput, MR>> Shared pointer to the weight tensor.

◆ hasBias()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
bool Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::hasBias ( ) const
inlineexport

Gets whether the module has a bias tensor.

Returns
bool True if the module has a bias tensor, false otherwise.
Here is the call graph for this function:

◆ initializeTensors()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
void Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::initializeTensors ( )
inlineexportprivate

Initializes the tensors needed for the Layer Normalization operation.

Creates and initializes:

  • weight tensor (initialized to ones)
  • bias tensor (initialized to zeros)
  • mean tensor (for storing means during forward pass)
  • reciprocal standard deviation tensor (for storing 1/std during forward pass)
Here is the call graph for this function:
Here is the caller graph for this function:

◆ load()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
void Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::load ( ModelArchive archive)
inlineoverrideexportvirtual

Deserializes the module state from a ZIP archive.

Loads the trainable parameters (weight, bias) from the provided archive.

Parameters
zipThe ZIP archive to load the module state from.

Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

Here is the call graph for this function:

◆ parameterCount()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
size_t Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::parameterCount ( ) const
inlineoverrideexportvirtual

Gets the number of trainable parameters in this module.

Counts the total number of trainable parameters, which includes the weight tensor and, if present, the bias tensor.

Returns
size_t The total number of parameters.

Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ save()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
void Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::save ( ModelArchive archive) const
inlineoverrideexportvirtual

Serializes the module state to a ZIP archive.

Saves the trainable parameters (weight, bias) to the provided archive.

Parameters
zipThe ZIP archive to save the module state to.

Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

Here is the call graph for this function:

◆ toString()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
std::string Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::toString ( ) const
inlineoverrideexportvirtual

Generates a string representation of this module's configuration.

Returns
std::string A formatted string with module information

Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

Here is the call graph for this function:

Member Data Documentation

◆ bias_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
std::shared_ptr<Tensor<TOutput, MR> > Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::bias_ { nullptr }
exportprivate

The bias tensor added after normalization and scaling.

◆ config_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
LayerNormConfig Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::config_
exportprivate

Configuration for the LayerNorm module.

◆ mean_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
std::shared_ptr<Tensor<TOutput, MR> > Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::mean_ { nullptr }
exportprivate

The mean tensor used for normalization.

Stores the mean values computed during the forward pass.

◆ operation_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
std::shared_ptr<UnaryOperation<TDeviceType, TInput, TOutput> > Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::operation_ { nullptr }
exportprivate

The underlying operation that implements Layer Normalization.

◆ output_state_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
std::vector<std::shared_ptr<Tensor<TOutput, MR> > > Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::output_state_
exportprivate

Collection of output state tensors for caching.

◆ parameters_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
std::vector<std::shared_ptr<Tensor<TOutput, MR> > > Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::parameters_
exportprivate

Collection of trainable parameters for this module.

◆ properties_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
OperationAttributes Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::properties_
exportprivate

Operation attributes and configuration.

◆ rstd_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
std::shared_ptr<Tensor<TOutput, MR> > Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::rstd_ { nullptr }
exportprivate

The reciprocal standard deviation tensor.

Stores the reciprocal of the standard deviation values (1/sqrt(variance + epsilon)) computed during the forward pass.

◆ weight_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TInput = float, typename TOutput = TInput>
std::shared_ptr<Tensor<TOutput, MR> > Mila::Dnn::LayerNorm< TDeviceType, TInput, TOutput >::weight_ { nullptr }
exportprivate

The weight tensor for scaling after normalization.


The documentation for this class was generated from the following file: