|
Mila 0.13.48
Deep Neural Network Library
|
CUDA implementation of Layer Normalization. More...


Public Types | |
| using | CudaExecutionContext = ExecutionContext<DeviceType::Cuda> |
| using | MR = CudaDeviceMemoryResource |
| using | NativeType = typename Mila::Dnn::Compute::Cuda::TensorDataTypeMap<TPrecision>::device_type |
| using | TensorType = Tensor<TPrecision, MR> |
| using | UnaryOperationBase = UnaryOperation<DeviceType::Cuda, TPrecision> |
| Public Types inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TPrecision > | |
| using | MR |
| using | TensorInputType |
| using | TensorOutputType |
| Public Types inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| using | DataTypeTraits |
Public Member Functions | |
| CudaLayerNormOp (IExecutionContext *context, const LayerNormConfig &config) | |
| void | backward (const ITensor &input, const ITensor &output_grad, ITensor &input_grad) const override |
| Execute backward pass (hot path). | |
| void | build (const BuildContext &config) override |
| Prepare operation for execution with concrete input shape. | |
| void | forward (const ITensor &input, ITensor &output) const override |
| Execute forward pass (hot path). | |
| const LayerNormConfig & | getConfig () const |
| std::string | getName () const override |
| Human-readable operation name. | |
| OperationType | getOperationType () const override |
| Operation type identifier. | |
| void | setGradients (ITensor *weight_grad, ITensor *bias_grad) override |
| Bind component-owned parameter gradient tensors for training. | |
| void | setParameters (ITensor *weight, ITensor *bias) override |
| Bind component-owned parameter tensors. | |
| Public Member Functions inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TPrecision > | |
| virtual | ~UnaryOperation ()=default |
| Public Member Functions inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| virtual | ~Operation ()=default |
| virtual void | clearGradients () noexcept |
| Clear any cached gradient pointers held by the operation. | |
| virtual TensorDataType | getDataType () const |
| Tensor data type for this operation. | |
| virtual DeviceType | getDeviceType () const |
| Device type for this operation. | |
| virtual std::size_t | getStateMemorySize () const |
| Returns the number of bytes of state memory allocated by this operation. | |
| virtual bool | isBuilt () const |
| Whether build() completed successfully for a concrete input shape. | |
| virtual bool | isEvalMode () const |
| Query whether operation is configured for training. | |
| virtual void | setTrainingMode (TrainingMode training_mode) |
| Configure operation training-mode behavior. | |
Private Member Functions | |
| void | computeRuntimePartition_ (const shape_t &input_shape, int64_t &norm_axis, int &outer_size, int &inner_size, int &norm_dim, int64_t &num_slices, int64_t &normalized_features) const |
| void | validateNormalizedShape_ (const shape_t &input_shape) const |
| void | validateRuntimeShape_ (const shape_t &input_shape) const |
Private Attributes | |
| NativeType * | bias_ { nullptr } |
| NativeType * | bias_grad_ { nullptr } |
| LayerNormConfig | config_ |
| CudaExecutionContext * | context_ |
| Detail::cuda_layernorm_impl< NativeType > | impl_ |
| int | max_inner_size_ { 0 } |
| shape_t | max_input_shape_ |
| int | max_norm_dim_ { 0 } |
| dim_t | max_num_slices_ { 0 } |
| int | max_outer_size_ { 0 } |
| NativeType * | mean_ { nullptr } |
| std::shared_ptr< TensorType > | mean_tensor_ |
| int64_t | norm_axis_ { -1 } |
| NativeType * | rstd_ { nullptr } |
| std::shared_ptr< TensorType > | rstd_tensor_ |
| NativeType * | weight_ { nullptr } |
| NativeType * | weight_grad_ { nullptr } |
| int64_t | weight_size_ { 0 } |
Additional Inherited Members | |
| Static Public Attributes inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| static constexpr TensorDataType | data_type |
| static constexpr DeviceType | device_type |
| Static Protected Member Functions inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TPrecision > | |
| static const TensorInputType & | asInputTensor (const ITensor &t) |
| static TensorOutputType & | asOutputTensor (ITensor &t) |
| Protected Attributes inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| bool | is_built_ |
| TrainingMode | training_mode_ |
CUDA implementation of Layer Normalization.
Normalizes activations along a specified axis by computing mean and variance, then applying an affine transformation with learnable weight and bias parameters.
Design philosophy:
| TPrecision | Abstract tensor precision (FP32, FP16, etc.) |
| using Mila::Dnn::Compute::Cuda::LayerNorm::CudaLayerNormOp< TPrecision >::CudaExecutionContext = ExecutionContext<DeviceType::Cuda> |
| using Mila::Dnn::Compute::Cuda::LayerNorm::CudaLayerNormOp< TPrecision >::MR = CudaDeviceMemoryResource |
| using Mila::Dnn::Compute::Cuda::LayerNorm::CudaLayerNormOp< TPrecision >::NativeType = typename Mila::Dnn::Compute::Cuda::TensorDataTypeMap<TPrecision>::device_type |
| using Mila::Dnn::Compute::Cuda::LayerNorm::CudaLayerNormOp< TPrecision >::TensorType = Tensor<TPrecision, MR> |
| using Mila::Dnn::Compute::Cuda::LayerNorm::CudaLayerNormOp< TPrecision >::UnaryOperationBase = UnaryOperation<DeviceType::Cuda, TPrecision> |
|
inline |

|
inlineoverridevirtual |
Execute backward pass (hot path).
Computes input gradient and accumulates parameter gradients using forward-pass statistics cached during forward().
| input | Original forward-pass input (required for gradient computation) |
| output_grad | Gradient of loss with respect to output |
| input_grad | Gradient of loss with respect to input (computed) |
Implements Mila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TPrecision >.

|
inlineoverridevirtual |
Prepare operation for execution with concrete input shape.
Cold-path initialization: computes normalization axis, partitions tensor dimensions, and allocates forward-pass statistics storage.
Dimension partitioning:
Example: For shape [2, 3, 4, 5] with axis=2:
Forward-pass statistics (mean, rstd) are allocated with size outer_size * inner_size to store one mean/rstd value per normalized slice.
| input_shape | Shape of input tensor to be normalized |
| std::runtime_error | If parameters not bound via setParameters() |
| std::invalid_argument | If input shape incompatible with configuration |
| std::invalid_argument | If computed normalization axis is out of range |
Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.

|
inlineprivate |


|
inlineoverridevirtual |
Execute forward pass (hot path).
Computes normalized output and caches forward-pass statistics (mean, rstd) required for backward gradient computation.
| input | Input tensor to normalize |
| output | Normalized output tensor (same shape as input) |
Implements Mila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TPrecision >.

|
inline |
|
inlineoverridevirtual |
Human-readable operation name.
Implements Mila::Dnn::Compute::Operation< TDeviceType, TInput >.
|
inlineoverridevirtual |
Operation type identifier.
Implements Mila::Dnn::Compute::Operation< TDeviceType, TInput >.
|
inlineoverridevirtual |
Bind component-owned parameter gradient tensors for training.
Caches native device gradient pointers for backward pass writes. Weight gradient is required; bias gradient is optional based on configuration.
| weight_grad | Gradient accumulator for weight parameter (required) |
| bias_grad | Gradient accumulator for bias parameter (optional) |
| std::invalid_argument | If weight_grad is null or not a CUDA tensor |
| std::invalid_argument | If bias_grad is required by config but null or not a CUDA tensor |
Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.

|
inlineoverridevirtual |
Bind component-owned parameter tensors.
Caches native device pointers for zero-overhead hot-path access. Weight is required; bias is optional based on configuration.
| weight | Scaling parameter applied after normalization (required) |
| bias | Shift parameter applied after normalization (optional) |
| std::invalid_argument | If weight is null or not a CUDA tensor |
| std::invalid_argument | If bias is required by config but null or not a CUDA tensor |
Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.

|
inlineprivate |


|
inlineprivate |


|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |