Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::LayerNorm< TDeviceType, TPrecision > Class Template Referenceexport

Device-templated Layer Normalization component. More...

Inheritance diagram for Mila::Dnn::LayerNorm< TDeviceType, TPrecision >:
Collaboration diagram for Mila::Dnn::LayerNorm< TDeviceType, TPrecision >:

Public Types

using ComponentBase = Component<TDeviceType, TPrecision>
using MR = typename DeviceTypeTraits<TDeviceType>::memory_resource
using TensorType = Tensor<TPrecision, MR>

Public Member Functions

 LayerNorm (const std::string &name, const LayerNormConfig &config, std::optional< DeviceId > device_id=std::nullopt)
 Construct LayerNorm with optional ExecutionContext ownership.
 ~LayerNorm () override=default
TensorTypebackward (const TensorType &input, const TensorType &output_grad)
 Run backward pass and return a reference to the component-owned input-gradient tensor.
TensorTypeforward (const TensorType &input)
 Run forward pass and return a reference to the component-owned output tensor.
DeviceId getDeviceId () const override
 Get the compute device id associated with this component.
std::vector< ITensor * > getGradients () const override
 Return non-owning pointers to parameter gradient tensors.
MemoryStats getMemoryStats () const override
 Return the current memory allocation breakdown for this component.
std::vector< ITensor * > getParameters () const override
 Return non-owning pointers to parameter tensors.
const ComponentType getType () const override
 Get the component type identifier.
void loadParameter (const std::string &name, const ITensorBlob &blob) override
 Load a parameter from serialized tensor data.
size_t parameterCount () const override
 Return number of trainable parameters.
void save_ (ModelArchive &archive, SerializationMode mode) const override
void synchronize () override
 Wait for outstanding device work submitted by this component.
std::string toString () const override
 Produce a short, human-readable description of the component.
void zeroGradients () override
 Clear all model-owned gradients for this component.
Public Member Functions inherited from Mila::Dnn::Component< TDeviceType, TPrecision >
 Component (const std::string &name)
 Construct component with required name identifier.
virtual ~Component ()=default
virtual void build (const BuildContext &context) final
 Build the component with the provided BuildContext (canonical overload).
const std::string getName () const
 Get the component's name identifier.
virtual std::vector< std::string > getParameterNames () const
 List all available parameter names for this component.
RuntimeMode getRuntimeMode () const noexcept
 Convenience accessor — true if currently in Eval mode.
TrainingMode getTrainingMode () const noexcept
 The current runtime behavioral mode of this Component.
virtual bool isBuilt () const final
 Returns true if build() has completed successfully.
bool isInferenceMode () const noexcept
bool isTrainingMode () const noexcept
void setTrainingMode (TrainingMode mode)
 Set the runtime behavioral mode for this Component.

Protected Member Functions

void onBuilding (const BuildContext &context) override
 Hook invoked during build() to initialize component with input shape.
void onExecutionContextSet () override
 Hook invoked after ExecutionContext is set.
void onTrainingModeChanging (TrainingMode training_mode) override
 Hook invoked when training mode is about to change.
Protected Member Functions inherited from Mila::Dnn::Component< TDeviceType, TPrecision >
IExecutionContextgetExecutionContext () const
 Get the shared execution context.
bool hasExecutionContext () const noexcept
 Check if execution context has been set.
template<TensorDataType TParameterPrecision, typename TMemoryResource>
void loadParameterFromBlob (const std::string &param_name, const Serialization::ITensorBlob &blob, Tensor< TParameterPrecision, TMemoryResource > &target, const shape_t &expected_shape)
 Load a tensor blob into a parameter tensor with validation.
void setExecutionContext (IExecutionContext *context)
 Set the execution context for this component.

Private Member Functions

dim_t computeNormalizedFeatureCount (const shape_t &input_shape) const
void createOperation ()
void initializeGradients ()
void initializeParameters (const shape_t &input_shape)
 Single parameter allocation routine.
void validateBuildContext (const BuildContext &context) const
void validateInputShape (const shape_t &input_shape) const

Private Attributes

std::shared_ptr< TensorTypebias_ { nullptr }
std::shared_ptr< TensorTypebias_grad_ { nullptr }
LayerNormConfig config_
std::unique_ptr< TensorTypeinput_grad_ { nullptr }
std::shared_ptr< UnaryOperation< TDeviceType, TPrecision > > operation_ { nullptr }
std::unique_ptr< TensorTypeoutput_ { nullptr }
std::optional< TensorTypeoutput_view_
std::unique_ptr< IExecutionContextowned_exec_context_ { nullptr }
std::shared_ptr< TensorTypeweight_ { nullptr }
std::shared_ptr< TensorTypeweight_grad_ { nullptr }

Additional Inherited Members

Static Public Member Functions inherited from Mila::Dnn::Component< TDeviceType, TPrecision >
static constexpr DeviceType getDeviceType ()
 Compile-time device type for this component instance.
static constexpr TensorDataType getPrecision () noexcept
 Compile-time tensor precision for this component instance.
Protected Attributes inherited from Mila::Dnn::Component< TDeviceType, TPrecision >
BuildContext build_context_ { shape_t{ 1 }, RuntimeMode::Training }
 The BuildContext stored at build time.

Detailed Description

template<DeviceType TDeviceType, TensorDataType TPrecision>
requires PrecisionSupportedOnDevice<TPrecision, TDeviceType>
class Mila::Dnn::LayerNorm< TDeviceType, TPrecision >

Device-templated Layer Normalization component.

Provides forward and backward APIs that operate on concrete Tensor types. Delegates heavy compute to a UnaryOperation backend. Parameters (weight/bias) and parameter gradients are owned by the component.

Constructor & Destructor Documentation

◆ LayerNorm()

template<DeviceType TDeviceType, TensorDataType TPrecision>
Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::LayerNorm ( const std::string & name,
const LayerNormConfig & config,
std::optional< DeviceId > device_id = std::nullopt )
inlineexplicitexport

Construct LayerNorm with optional ExecutionContext ownership.

Parameters
nameComponent name (used for tensor names).
configLayerNorm configuration (normalized_shape, axis, epsilon, bias).
device_idIf provided, component creates and owns an ExecutionContext bound to this device; otherwise a parent must supply one before building.
Exceptions
std::invalid_argumentif provided device_id type does not match template.
Here is the call graph for this function:

◆ ~LayerNorm()

template<DeviceType TDeviceType, TensorDataType TPrecision>
Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::~LayerNorm ( )
overrideexportdefault

Member Function Documentation

◆ backward()

template<DeviceType TDeviceType, TensorDataType TPrecision>
TensorType & Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::backward ( const TensorType & input,
const TensorType & output_grad )
inlineexport

Run backward pass and return a reference to the component-owned input-gradient tensor.

The returned reference refers to a Tensor owned by this component. The backend operation_->backward will write/accumulate into the provided input-gradient tensor.

Preconditions:

  • Component must be built and in training mode.
  • Backend operation must be initialized.
  • Component-owned input-gradient buffer must be allocated (done during build or on training start).
Parameters
inputOriginal forward input tensor (device-bound).
output_gradGradient with respect to the component output (device-bound).
Returns
Reference to the component-owned input-gradient Tensor.
Exceptions
std::runtime_erroron precondition violations.
Here is the call graph for this function:

◆ computeNormalizedFeatureCount()

template<DeviceType TDeviceType, TensorDataType TPrecision>
dim_t Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::computeNormalizedFeatureCount ( const shape_t & input_shape) const
inlineexportprivate
Here is the call graph for this function:
Here is the caller graph for this function:

◆ createOperation()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::createOperation ( )
inlineexportprivate
Here is the call graph for this function:
Here is the caller graph for this function:

◆ forward()

template<DeviceType TDeviceType, TensorDataType TPrecision>
TensorType & Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::forward ( const TensorType & input)
inlineexport

Run forward pass and return a reference to the component-owned output tensor.

The returned reference refers to a Tensor owned by this component. The backend operation_->forward will write into the provided output tensor.

Preconditions:

  • Component must be built.
  • Backend operation must be initialized.
  • Component-owned output buffer must be allocated (done during build).
Parameters
inputInput Tensor bound to the component device.
Returns
Reference to the component-owned output Tensor.
Exceptions
std::runtime_erroron precondition violations.
Here is the call graph for this function:

◆ getDeviceId()

template<DeviceType TDeviceType, TensorDataType TPrecision>
DeviceId Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::getDeviceId ( ) const
inlineoverrideexportvirtual

Get the compute device id associated with this component.

Must return the device on which parameters and operations execute.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ getGradients()

template<DeviceType TDeviceType, TensorDataType TPrecision>
std::vector< ITensor * > Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::getGradients ( ) const
inlineoverrideexportvirtual

Return non-owning pointers to parameter gradient tensors.

Only valid when isTraining() is true.

Exceptions
std::runtime_errorif called when not in training mode or before the component has been built.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ getMemoryStats()

template<DeviceType TDeviceType, TensorDataType TPrecision>
MemoryStats Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::getMemoryStats ( ) const
inlineoverrideexportvirtual

Return the current memory allocation breakdown for this component.

Reflects allocations at the moment of the call. The returned stats naturally track the component lifecycle:

After construction — parameters only After build( Inference ) — parameters + T=1 state buffers After build( Training ) — parameters + T=full state buffers After setEvaluation( false ) — parameters + state + gradients

For CompositeComponent and Network, the returned stats are the recursive aggregate of all child components.

May be called at any time — no lifecycle preconditions.

Returns
MemoryStats reflecting current allocations.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ getParameters()

template<DeviceType TDeviceType, TensorDataType TPrecision>
std::vector< ITensor * > Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::getParameters ( ) const
inlineoverrideexportvirtual

Return non-owning pointers to parameter tensors.

The returned tensor pointers remain valid for the lifetime of the component. Order should be canonical (weights before biases).

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ getType()

template<DeviceType TDeviceType, TensorDataType TPrecision>
const ComponentType Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::getType ( ) const
inlineoverrideexportvirtual

Get the component type identifier.

Used for serialization and runtime type identification.

Returns
Component type enum value.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ initializeGradients()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::initializeGradients ( )
inlineexportprivate
Here is the call graph for this function:
Here is the caller graph for this function:

◆ initializeParameters()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::initializeParameters ( const shape_t & input_shape)
inlineexportprivate

Single parameter allocation routine.

If input_shape is provided the allocator will compute channel count and outer_shape for axis-mode or normalized-shape-mode. If only normalized_shape is available, channels are computed from that shape and outer_shape is left empty.

Parameters
input_shapeOptional pointer to the build-time input shape.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ loadParameter()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::loadParameter ( const std::string & name,
const ITensorBlob & blob )
inlineoverrideexportvirtual

Load a parameter from serialized tensor data.

Loads raw tensor bytes directly into an existing parameter tensor, handling precision conversion and device upload as needed.

The component validates that the blob's shape matches the parameter's expected shape, then delegates to the backend to perform:

  • Precision conversion (blob dtype → parameter dtype)
  • Device upload (CPU bytes → target device)
Parameters
nameParameter name used to locate the target tensor.
blobSerialized tensor metadata and raw bytes.
Exceptions
std::runtime_errorif component has no parameters to load.
std::runtime_errorif blob shape doesn't match parameter shape.

Reimplemented from Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ onBuilding()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::onBuilding ( const BuildContext & context)
inlineoverrideexportprotectedvirtual

Hook invoked during build() to initialize component with input shape.

Validates input shape, allocates parameters if needed, binds parameters to the backend operation, triggers backend build, and allocates the component-owned forward output and input-gradient tensors.

Output buffer — allocated at the full input shape.

LayerNorm is a general component with no knowledge of sequence dimensions or inference decode paths. The parent Network or Transformer is responsible for passing the correct input shape via BuildContext:

Training — full sequence shape e.g. [B, T, features] Inference — decode shape e.g. [1, 1, features] for decode path or prefill shape e.g. [1, T_chunk, features] for prefill

In all cases LayerNorm simply allocates at inputShape() — no special casing for inference or sequence dimensions.

Reimplemented from Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ onExecutionContextSet()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::onExecutionContextSet ( )
inlineoverrideexportprotectedvirtual

Hook invoked after ExecutionContext is set.

Creates the backend operation and performs any eager parameter allocation if normalized_shape was supplied at construction time.

Reimplemented from Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ onTrainingModeChanging()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::onTrainingModeChanging ( TrainingMode training_mode)
inlineoverrideexportprotectedvirtual

Hook invoked when training mode is about to change.

Propagates training state to the backend operation and allocates or clears parameter gradient buffers as appropriate.

Reimplemented from Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ parameterCount()

template<DeviceType TDeviceType, TensorDataType TPrecision>
size_t Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::parameterCount ( ) const
inlineoverrideexportvirtual

Return number of trainable parameters.

For leaf components this is the element count of owned parameter tensors. CompositeComponent and Network implementations should return the recursive aggregate across all children.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the caller graph for this function:

◆ save_()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::save_ ( ModelArchive & archive,
SerializationMode mode ) const
inlineoverrideexportvirtual

◆ synchronize()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::synchronize ( )
inlineoverrideexportvirtual

Wait for outstanding device work submitted by this component.

On CPU this may be a no-op. Use to ensure results are visible to the host or to measure synchronous timings.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ toString()

template<DeviceType TDeviceType, TensorDataType TPrecision>
std::string Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::toString ( ) const
inlineoverrideexportvirtual

Produce a short, human-readable description of the component.

Implementations should keep output concise and avoid throwing.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ validateBuildContext()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::validateBuildContext ( const BuildContext & context) const
inlineexportprivate
Here is the call graph for this function:
Here is the caller graph for this function:

◆ validateInputShape()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::validateInputShape ( const shape_t & input_shape) const
inlineexportprivate
Here is the call graph for this function:

◆ zeroGradients()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::LayerNorm< TDeviceType, TPrecision >::zeroGradients ( )
inlineoverrideexportvirtual

Clear all model-owned gradients for this component.

Default implementation is a no-op. Composite components should override to recurse to children. Leaf components should override to zero their parameter and activation gradients using device-aware helpers.

Reimplemented from Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

The documentation for this class was generated from the following file:
  • /__w/Mila/Mila/Mila/Src/Dnn/Components/Normalization/LayerNorm/LayerNorm.ixx