Mila
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::Gelu< TDeviceType, TDataType > Class Template Referenceexport

Gaussian Error Linear Unit (GELU) activation function module. More...

Inheritance diagram for Mila::Dnn::Gelu< TDeviceType, TDataType >:
Collaboration diagram for Mila::Dnn::Gelu< TDeviceType, TDataType >:

Public Types

using ModuleBase = Module< TDeviceType, TDataType, TDataType >
 Alias for base module type.
 
using MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource >
 Memory resource type determined based on device type.
 
- Public Types inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput >
using MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource >
 

Public Member Functions

 Gelu (const std::string &device_name, const GeluConfig &config)
 Constructs a Gelu module using device name and configuration.
 
 Gelu (std::shared_ptr< DeviceContext > device_context, const GeluConfig &config)
 Constructs a Gelu module with an existing device context and configuration.
 
void backward (const Tensor< TDataType, MR > &input, const Tensor< TDataType, MR > &output_grad, Tensor< TDataType, MR > &input_grad)
 Performs backward propagation, computing gradients for GELU activation.
 
void forward (const Tensor< TDataType, MR > &input, Tensor< TDataType, MR > &output)
 Performs forward propagation through the GELU activation function.
 
GeluConfig::ApproximationMethod getApproximationMethod () const
 Returns the current approximation method used by this GELU instance.
 
void load (ModelArchive &archive) override
 Deserializes module state from a ZIP archive.
 
size_t parameterCount () const override
 Returns the number of trainable parameters in this module.
 
void save (ModelArchive &zip) const override
 Serializes module state to a ZIP archive.
 
std::string toString () const override
 Generates a string representation of this module's configuration.
 
- Public Member Functions inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput >
 Module (const std::string &device_name, const ComponentConfig &config)
 Constructor with device name.
 
 Module (std::shared_ptr< DeviceContext > context, const ComponentConfig &config)
 Constructor with a specific device context.
 
virtual ~Module ()=default
 Virtual destructor for proper cleanup in derived classes.
 
std::shared_ptr< Compute::DeviceContextgetDeviceContext () const
 Get the device context for this module.
 
Compute::DeviceType getDeviceType () const
 Get the device type of the current device context.
 
std::string getName () const
 Get the name of the module.
 
const auto & getParameterTensors () const
 Get the parameter tensors of this module.
 
const ComputePrecision::PolicygetPrecision () const
 
const auto & getStateTensors () const
 Get the state tensors of this module.
 
bool isTraining () const
 Check if the module is in training mode.
 
virtual void setTraining (bool is_training)
 Set the training mode of this module.
 

Private Member Functions

void createOperation ()
 Initializes the appropriate GELU operation implementation.
 

Static Private Member Functions

static std::string approximationMethodToString (GeluConfig::ApproximationMethod method)
 Converts approximation method enum to human-readable string.
 

Private Attributes

GeluConfig config_
 Configuration for the GELU module.
 
std::shared_ptr< UnaryOperation< TDeviceType, TDataType, TDataType > > operation_ { nullptr }
 The underlying computational operation that implements GELU.
 
std::vector< std::shared_ptr< Tensor< TDataType, MR > > > output_state_
 Output state cache for backward propagation.
 
std::vector< std::shared_ptr< Tensor< TDataType, MR > > > parameters_
 Parameter tensors for the operation.
 
OperationAttributes properties_
 Additional attributes for operation customization.
 

Additional Inherited Members

- Protected Member Functions inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput >
const std::string parametersToString () const
 Helper method to convert parameters to string representation.
 
const std::string stateToString () const
 Helper method to convert state tensors to string representation.
 
- Protected Attributes inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput >
std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > > parameter_map_ = {}
 Map of parameter names to parameter tensors.
 
std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > > state_map_ = {}
 Map of state names to state tensors.
 

Detailed Description

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
requires ValidFloatTensorType<TDataType>
class Mila::Dnn::Gelu< TDeviceType, TDataType >

Gaussian Error Linear Unit (GELU) activation function module.

GELU is defined mathematically as: GELU(x) = x * phi(x)

Where phi(x) is the cumulative distribution function of the standard normal distribution.

Three approximation methods are supported (configured via GeluConfig):

  1. Exact: Uses the error function - most accurate but computationally expensive
  2. Tanh: Fast approximation using tanh - GELU(x) ? 0.5x(1 + tanh(?(2/?)(x + 0.044715x³)))
  3. Sigmoid: Fast approximation using sigmoid - GELU(x) ? x * sigmoid(1.702x)

Note: Currently only the Tanh approximation is fully supported in the implementation.

Template Parameters
TDeviceTypeComputing device type (CPU or CUDA)
TDataTypeFloating-point data type for computations (e.g., float, half )

Member Typedef Documentation

◆ ModuleBase

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
using Mila::Dnn::Gelu< TDeviceType, TDataType >::ModuleBase = Module<TDeviceType, TDataType, TDataType>
export

Alias for base module type.

◆ MR

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
using Mila::Dnn::Gelu< TDeviceType, TDataType >::MR = std::conditional_t<TDeviceType == DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource>
export

Memory resource type determined based on device type.

Automatically selects appropriate memory resource (CPU or CUDA) based on TDeviceType.

Constructor & Destructor Documentation

◆ Gelu() [1/2]

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
Mila::Dnn::Gelu< TDeviceType, TDataType >::Gelu ( const std::string &  device_name,
const GeluConfig config 
)
inlineexplicitexport

Constructs a Gelu module using device name and configuration.

Creates a new DeviceContext internally using the provided device name. This constructor is useful for creating standalone modules without pre-existing device contexts.

Parameters
device_nameDevice identifier string (e.g., "cpu", "cuda:0")
configConfiguration parameters for the GELU module
Exceptions
std::invalid_argumentIf the device name is invalid or the configuration is invalid
std::runtime_errorIf device type doesn't match template parameter TDeviceType
Here is the call graph for this function:

◆ Gelu() [2/2]

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
Mila::Dnn::Gelu< TDeviceType, TDataType >::Gelu ( std::shared_ptr< DeviceContext device_context,
const GeluConfig config 
)
inlineexplicitexport

Constructs a Gelu module with an existing device context and configuration.

Uses a pre-existing DeviceContext instance. This constructor is useful when integrating the module into a larger network that shares device contexts across modules.

Parameters
device_contextShared pointer to an existing device context
configConfiguration parameters for the GELU module
Exceptions
std::invalid_argumentIf device_context is null or configuration is invalid
std::runtime_errorIf device context type doesn't match template parameter TDeviceType
Here is the call graph for this function:

Member Function Documentation

◆ approximationMethodToString()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
static std::string Mila::Dnn::Gelu< TDeviceType, TDataType >::approximationMethodToString ( GeluConfig::ApproximationMethod  method)
inlinestaticexportprivate

Converts approximation method enum to human-readable string.

Parameters
methodThe approximation method to convert
Returns
String representation of the approximation method
Here is the caller graph for this function:

◆ backward()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
void Mila::Dnn::Gelu< TDeviceType, TDataType >::backward ( const Tensor< TDataType, MR > &  input,
const Tensor< TDataType, MR > &  output_grad,
Tensor< TDataType, MR > &  input_grad 
)
inlineexport

Performs backward propagation, computing gradients for GELU activation.

Computes the gradient of the GELU function with respect to its inputs, which is needed for training via backpropagation.

The GELU derivative is: d/dx GELU(x) = ?(x) + x * ?'(x)

Where ?'(x) is the derivative of the CDF (the PDF of the standard normal distribution).

Parameters
inputOriginal input tensor from the forward pass
output_gradGradient tensor from the next layer (?L/?output)
input_gradOutput tensor to store the computed gradients (?L/?input)

◆ createOperation()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
void Mila::Dnn::Gelu< TDeviceType, TDataType >::createOperation ( )
inlineexportprivate

Initializes the appropriate GELU operation implementation.

Creates the device-specific operation implementation based on the template parameter TDeviceType and registers it with the operation registry.

The operation choice is determined at compile-time via constexpr branching.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ forward()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
void Mila::Dnn::Gelu< TDeviceType, TDataType >::forward ( const Tensor< TDataType, MR > &  input,
Tensor< TDataType, MR > &  output 
)
inlineexport

Performs forward propagation through the GELU activation function.

Applies the GELU transformation element-wise to each value in the input tensor. The specific approximation method used is determined by the GeluConfig setting.

Parameters
inputInput tensor to transform
outputTensor where results will be stored (must be pre-allocated with matching dimensions)

◆ getApproximationMethod()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
GeluConfig::ApproximationMethod Mila::Dnn::Gelu< TDeviceType, TDataType >::getApproximationMethod ( ) const
inlineexport

Returns the current approximation method used by this GELU instance.

Returns
Current approximation method from GeluConfig::ApproximationMethod enum
Here is the call graph for this function:

◆ load()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
void Mila::Dnn::Gelu< TDeviceType, TDataType >::load ( ModelArchive archive)
inlineoverrideexportvirtual

Deserializes module state from a ZIP archive.

Implementation of the Module interface for deserialization. Since GELU has no learnable parameters, this is a no-op implementation.

Parameters
zipZIP archive for deserialization

Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

◆ parameterCount()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
size_t Mila::Dnn::Gelu< TDeviceType, TDataType >::parameterCount ( ) const
inlineoverrideexportvirtual

Returns the number of trainable parameters in this module.

GELU is a parameterless activation function with no trainable weights.

Returns
Always returns 0

Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

◆ save()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
void Mila::Dnn::Gelu< TDeviceType, TDataType >::save ( ModelArchive zip) const
inlineoverrideexportvirtual

Serializes module state to a ZIP archive.

Implementation of the Module interface for serialization. Since GELU has no learnable parameters, this is a no-op implementation.

Parameters
zipZIP archive for serialization

Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

◆ toString()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
std::string Mila::Dnn::Gelu< TDeviceType, TDataType >::toString ( ) const
inlineoverrideexportvirtual

Generates a string representation of this module's configuration.

Returns
Formatted string with module name, device information, and approximation method

Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

Here is the call graph for this function:

Member Data Documentation

◆ config_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
GeluConfig Mila::Dnn::Gelu< TDeviceType, TDataType >::config_
exportprivate

Configuration for the GELU module.

Stores the settings that define how the GELU function should be computed, particularly which approximation method to use.

◆ operation_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
std::shared_ptr<UnaryOperation<TDeviceType, TDataType, TDataType> > Mila::Dnn::Gelu< TDeviceType, TDataType >::operation_ { nullptr }
exportprivate

The underlying computational operation that implements GELU.

This pointer is initialized based on the device type and configuration, providing the device-specific implementation of the GELU function.

◆ output_state_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
std::vector<std::shared_ptr<Tensor<TDataType, MR> > > Mila::Dnn::Gelu< TDeviceType, TDataType >::output_state_
exportprivate

Output state cache for backward propagation.

Stores intermediate results from the forward pass that may be needed during backward propagation to efficiently compute gradients.

◆ parameters_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
std::vector<std::shared_ptr<Tensor<TDataType, MR> > > Mila::Dnn::Gelu< TDeviceType, TDataType >::parameters_
exportprivate

Parameter tensors for the operation.

Empty for GELU since it has no trainable parameters, but required by the UnaryOperation interface.

◆ properties_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
OperationAttributes Mila::Dnn::Gelu< TDeviceType, TDataType >::properties_
exportprivate

Additional attributes for operation customization.

Holds configuration values that might be needed by specific implementations of the GELU operation.


The documentation for this class was generated from the following file: