Gaussian Error Linear Unit (GELU) activation function module. More...

Inheritance diagram for Mila::Dnn::Gelu< TDeviceType, TDataType >:

Collaboration diagram for Mila::Dnn::Gelu< TDeviceType, TDataType >:

[legend]

Public Types
using	ModuleBase = Module< TDeviceType, TDataType, TDataType >
	Alias for base module type.

using	MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource >
	Memory resource type determined based on device type.

Public Types inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput >
using	MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource >

Public Member Functions
	Gelu (const std::string &device_name, const GeluConfig &config)
	Constructs a Gelu module using device name and configuration.

	Gelu (std::shared_ptr< DeviceContext > device_context, const GeluConfig &config)
	Constructs a Gelu module with an existing device context and configuration.

void	backward (const Tensor< TDataType, MR > &input, const Tensor< TDataType, MR > &output_grad, Tensor< TDataType, MR > &input_grad)
	Performs backward propagation, computing gradients for GELU activation.

void	forward (const Tensor< TDataType, MR > &input, Tensor< TDataType, MR > &output)
	Performs forward propagation through the GELU activation function.

GeluConfig::ApproximationMethod	getApproximationMethod () const
	Returns the current approximation method used by this GELU instance.

void	load (ModelArchive &archive) override
	Deserializes module state from a ZIP archive.

size_t	parameterCount () const override
	Returns the number of trainable parameters in this module.

void	save (ModelArchive &zip) const override
	Serializes module state to a ZIP archive.

std::string	toString () const override
	Generates a string representation of this module's configuration.

Public Member Functions inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput >
	Module (const std::string &device_name, const ComponentConfig &config)
	Constructor with device name.

	Module (std::shared_ptr< DeviceContext > context, const ComponentConfig &config)
	Constructor with a specific device context.

virtual	~Module ()=default
	Virtual destructor for proper cleanup in derived classes.

std::shared_ptr< Compute::DeviceContext >	getDeviceContext () const
	Get the device context for this module.

Compute::DeviceType	getDeviceType () const
	Get the device type of the current device context.

std::string	getName () const
	Get the name of the module.

const auto &	getParameterTensors () const
	Get the parameter tensors of this module.

const ComputePrecision::Policy &	getPrecision () const

const auto &	getStateTensors () const
	Get the state tensors of this module.

bool	isTraining () const
	Check if the module is in training mode.

virtual void	setTraining (bool is_training)
	Set the training mode of this module.

Private Member Functions
void	createOperation ()
	Initializes the appropriate GELU operation implementation.

Static Private Member Functions
static std::string	approximationMethodToString (GeluConfig::ApproximationMethod method)
	Converts approximation method enum to human-readable string.

Private Attributes
GeluConfig	config_
	Configuration for the GELU module.

std::shared_ptr< UnaryOperation< TDeviceType, TDataType, TDataType > >	operation_ { nullptr }
	The underlying computational operation that implements GELU.

std::vector< std::shared_ptr< Tensor< TDataType, MR > > >	output_state_
	Output state cache for backward propagation.

std::vector< std::shared_ptr< Tensor< TDataType, MR > > >	parameters_
	Parameter tensors for the operation.

OperationAttributes	properties_
	Additional attributes for operation customization.

Additional Inherited Members
Protected Member Functions inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput >
const std::string	parametersToString () const
	Helper method to convert parameters to string representation.

const std::string	stateToString () const
	Helper method to convert state tensors to string representation.

Protected Attributes inherited from Mila::Dnn::Module< TDeviceType, TInput, TOutput >
std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > >	parameter_map_ = {}
	Map of parameter names to parameter tensors.

std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > >	state_map_ = {}
	Map of state names to state tensors.

Detailed Description

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>
requires ValidFloatTensorType<TDataType>
class Mila::Dnn::Gelu< TDeviceType, TDataType >

Gaussian Error Linear Unit (GELU) activation function module.

GELU is defined mathematically as: GELU(x) = x * phi(x)

Where phi(x) is the cumulative distribution function of the standard normal distribution.

Three approximation methods are supported (configured via GeluConfig):

Exact: Uses the error function - most accurate but computationally expensive
Tanh: Fast approximation using tanh - GELU(x) ? 0.5x(1 + tanh(?(2/?)(x + 0.044715x�)))
Sigmoid: Fast approximation using sigmoid - GELU(x) ? x * sigmoid(1.702x)

Note: Currently only the Tanh approximation is fully supported in the implementation.

Template Parameters

TDeviceType	Computing device type (CPU or CUDA)
TDataType	Floating-point data type for computations (e.g., float, half )

Member Typedef Documentation

◆ ModuleBase

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

using Mila::Dnn::Gelu< TDeviceType, TDataType >::ModuleBase = Module<TDeviceType, TDataType, TDataType>

export

Alias for base module type.

◆ MR

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

using Mila::Dnn::Gelu< TDeviceType, TDataType >::MR = std::conditional_t<TDeviceType == DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource>

export

Memory resource type determined based on device type.

Automatically selects appropriate memory resource (CPU or CUDA) based on TDeviceType.

Constructor & Destructor Documentation

◆ Gelu() [1/2]

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

Mila::Dnn::Gelu< TDeviceType, TDataType >::Gelu	(	const std::string &	device_name,
		const GeluConfig &	config
	)

inlineexplicitexport

Constructs a Gelu module using device name and configuration.

Creates a new DeviceContext internally using the provided device name. This constructor is useful for creating standalone modules without pre-existing device contexts.

Parameters

device_name	Device identifier string (e.g., "cpu", "cuda:0")
config	Configuration parameters for the GELU module

Exceptions

std::invalid_argument	If the device name is invalid or the configuration is invalid
std::runtime_error	If device type doesn't match template parameter TDeviceType

Here is the call graph for this function:

◆ Gelu() [2/2]

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

Mila::Dnn::Gelu< TDeviceType, TDataType >::Gelu	(	std::shared_ptr< DeviceContext >	device_context,
		const GeluConfig &	config
	)

inlineexplicitexport

Constructs a Gelu module with an existing device context and configuration.

Uses a pre-existing DeviceContext instance. This constructor is useful when integrating the module into a larger network that shares device contexts across modules.

Parameters

device_context	Shared pointer to an existing device context
config	Configuration parameters for the GELU module

Exceptions

std::invalid_argument	If device_context is null or configuration is invalid
std::runtime_error	If device context type doesn't match template parameter TDeviceType

Here is the call graph for this function:

Member Function Documentation

◆ approximationMethodToString()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

static std::string Mila::Dnn::Gelu< TDeviceType, TDataType >::approximationMethodToString ( GeluConfig::ApproximationMethod method )

inlinestaticexportprivate

Converts approximation method enum to human-readable string.

Parameters

method The approximation method to convert

Returns: String representation of the approximation method

Here is the caller graph for this function:

◆ backward()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

void Mila::Dnn::Gelu< TDeviceType, TDataType >::backward	(	const Tensor< TDataType, MR > &	input,
		const Tensor< TDataType, MR > &	output_grad,
		Tensor< TDataType, MR > &	input_grad
	)

inlineexport

Performs backward propagation, computing gradients for GELU activation.

Computes the gradient of the GELU function with respect to its inputs, which is needed for training via backpropagation.

The GELU derivative is: d/dx GELU(x) = ?(x) + x * ?'(x)

Where ?'(x) is the derivative of the CDF (the PDF of the standard normal distribution).

Parameters

input	Original input tensor from the forward pass
output_grad	Gradient tensor from the next layer (?L/?output)
input_grad	Output tensor to store the computed gradients (?L/?input)

◆ createOperation()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

void Mila::Dnn::Gelu< TDeviceType, TDataType >::createOperation ( )

inlineexportprivate

Initializes the appropriate GELU operation implementation.

Creates the device-specific operation implementation based on the template parameter TDeviceType and registers it with the operation registry.

The operation choice is determined at compile-time via constexpr branching.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ forward()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

void Mila::Dnn::Gelu< TDeviceType, TDataType >::forward	(	const Tensor< TDataType, MR > &	input,
		Tensor< TDataType, MR > &	output
	)

inlineexport

Performs forward propagation through the GELU activation function.

Applies the GELU transformation element-wise to each value in the input tensor. The specific approximation method used is determined by the GeluConfig setting.

Parameters

input	Input tensor to transform
output	Tensor where results will be stored (must be pre-allocated with matching dimensions)

◆ getApproximationMethod()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

GeluConfig::ApproximationMethod Mila::Dnn::Gelu< TDeviceType, TDataType >::getApproximationMethod ( ) const

inlineexport

Returns the current approximation method used by this GELU instance.

Returns: Current approximation method from GeluConfig::ApproximationMethod enum

Here is the call graph for this function:

◆ load()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

void Mila::Dnn::Gelu< TDeviceType, TDataType >::load ( ModelArchive & archive )

inlineoverrideexportvirtual

Deserializes module state from a ZIP archive.

Implementation of the Module interface for deserialization. Since GELU has no learnable parameters, this is a no-op implementation.

Parameters

zip	ZIP archive for deserialization

Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

◆ parameterCount()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

size_t Mila::Dnn::Gelu< TDeviceType, TDataType >::parameterCount ( ) const

inlineoverrideexportvirtual

Returns the number of trainable parameters in this module.

GELU is a parameterless activation function with no trainable weights.

Returns: Always returns 0

Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

◆ save()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

void Mila::Dnn::Gelu< TDeviceType, TDataType >::save ( ModelArchive & zip ) const

inlineoverrideexportvirtual

Serializes module state to a ZIP archive.

Implementation of the Module interface for serialization. Since GELU has no learnable parameters, this is a no-op implementation.

Parameters

zip	ZIP archive for serialization

Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

◆ toString()

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

std::string Mila::Dnn::Gelu< TDeviceType, TDataType >::toString ( ) const

inlineoverrideexportvirtual

Generates a string representation of this module's configuration.

Returns: Formatted string with module name, device information, and approximation method

Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.

Here is the call graph for this function:

Member Data Documentation

◆ config_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

GeluConfig Mila::Dnn::Gelu< TDeviceType, TDataType >::config_

exportprivate

Configuration for the GELU module.

Stores the settings that define how the GELU function should be computed, particularly which approximation method to use.

◆ operation_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

std::shared_ptr<UnaryOperation<TDeviceType, TDataType, TDataType> > Mila::Dnn::Gelu< TDeviceType, TDataType >::operation_ { nullptr }

exportprivate

The underlying computational operation that implements GELU.

This pointer is initialized based on the device type and configuration, providing the device-specific implementation of the GELU function.

◆ output_state_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

std::vector<std::shared_ptr<Tensor<TDataType, MR> > > Mila::Dnn::Gelu< TDeviceType, TDataType >::output_state_

exportprivate

Output state cache for backward propagation.

Stores intermediate results from the forward pass that may be needed during backward propagation to efficiently compute gradients.

◆ parameters_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

std::vector<std::shared_ptr<Tensor<TDataType, MR> > > Mila::Dnn::Gelu< TDeviceType, TDataType >::parameters_

exportprivate

Parameter tensors for the operation.

Empty for GELU since it has no trainable parameters, but required by the UnaryOperation interface.

◆ properties_

template<DeviceType TDeviceType = DeviceType::Cuda, typename TDataType = float>

OperationAttributes Mila::Dnn::Gelu< TDeviceType, TDataType >::properties_

exportprivate

Additional attributes for operation customization.

Holds configuration values that might be needed by specific implementations of the GELU operation.

The documentation for this class was generated from the following file:

/home/runner/work/Mila/Mila/Mila/Src/Dnn/Modules/Activations/Gelu.ixx

Public Types

Public Member Functions

Private Member Functions

Static Private Member Functions

Private Attributes

Additional Inherited Members

Detailed Description

Member Typedef Documentation

◆ ModuleBase

◆ MR

Constructor & Destructor Documentation

◆ Gelu() [1/2]

◆ Gelu() [2/2]

Member Function Documentation

◆ approximationMethodToString()

◆ backward()

◆ createOperation()

◆ forward()

◆ getApproximationMethod()

◆ load()

◆ parameterCount()

◆ save()

◆ toString()

Member Data Documentation

◆ config_

◆ operation_

◆ output_state_

◆ parameters_

◆ properties_