Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::Gelu< TDeviceType, TPrecision > Class Template Referenceexport

Gaussian Error Linear Unit (GELU) activation component. More...

Inheritance diagram for Mila::Dnn::Gelu< TDeviceType, TPrecision >:
Collaboration diagram for Mila::Dnn::Gelu< TDeviceType, TPrecision >:

Public Types

using ComponentBase = Component<TDeviceType, TPrecision>
using MR = typename DeviceTypeTraits<TDeviceType>::memory_resource
using TensorType = Tensor<TPrecision, MR>

Public Member Functions

 Gelu (const std::string &name, const GeluConfig &config, std::optional< DeviceId > device_id=std::nullopt)
 ~Gelu () override=default
TensorTypebackward (const TensorType &input, const TensorType &output_grad)
 Compute gradients with respect to the component input.
TensorTypeforward (const TensorType &input)
 Run the forward computation for this GELU component.
ApproximationMethod getApproximationMethod () const
 Return the configured GELU approximation method.
DeviceId getDeviceId () const override
 Get the device identifier for this component.
std::vector< ITensor * > getGradients () const override
 Get parameter gradient tensors.
MemoryStats getMemoryStats () const override
 Return memory allocation breakdown.
std::vector< ITensor * > getParameters () const override
 Get trainable parameter tensors.
const ComponentType getType () const override
 Get the component type identifier.
size_t parameterCount () const override
 Number of trainable parameters.
void save_ (ModelArchive &archive, SerializationMode mode) const override
 Persist component state to archive.
void synchronize () override
 Wait for all asynchronous work submitted by this component to complete.
std::string toString () const override
 Generate human-readable description of the component.
Public Member Functions inherited from Mila::Dnn::Component< TDeviceType, TPrecision >
 Component (const std::string &name)
 Construct component with required name identifier.
virtual ~Component ()=default
virtual void build (const BuildContext &context) final
 Build the component with the provided BuildContext (canonical overload).
const std::string getName () const
 Get the component's name identifier.
virtual std::vector< std::string > getParameterNames () const
 List all available parameter names for this component.
RuntimeMode getRuntimeMode () const noexcept
 Convenience accessor — true if currently in Eval mode.
TrainingMode getTrainingMode () const noexcept
 The current runtime behavioral mode of this Component.
virtual bool isBuilt () const final
 Returns true if build() has completed successfully.
bool isInferenceMode () const noexcept
bool isTrainingMode () const noexcept
virtual void loadParameter (const std::string &name, const Serialization::ITensorBlob &blob)
 Load a parameter from serialized tensor data.
void setTrainingMode (TrainingMode mode)
 Set the runtime behavioral mode for this Component.
virtual void zeroGradients ()
 Clear all model-owned gradients for this component.

Static Public Member Functions

static std::unique_ptr< GelufromArchive_ (ModelArchive &archive, const std::string &component_name, IExecutionContext *exec_context)
Static Public Member Functions inherited from Mila::Dnn::Component< TDeviceType, TPrecision >
static constexpr DeviceType getDeviceType ()
 Compile-time device type for this component instance.
static constexpr TensorDataType getPrecision () noexcept
 Compile-time tensor precision for this component instance.

Protected Member Functions

void onBuilding (const BuildContext &build_context) override
 Hook invoked during build() to initialize backend operation.
void onExecutionContextSet () override
 Hook invoked after ExecutionContext is set.
void onTrainingModeChanging (TrainingMode training_mode) override
 Hook invoked when training mode changes.
Protected Member Functions inherited from Mila::Dnn::Component< TDeviceType, TPrecision >
IExecutionContextgetExecutionContext () const
 Get the shared execution context.
bool hasExecutionContext () const noexcept
 Check if execution context has been set.
template<TensorDataType TParameterPrecision, typename TMemoryResource>
void loadParameterFromBlob (const std::string &param_name, const Serialization::ITensorBlob &blob, Tensor< TParameterPrecision, TMemoryResource > &target, const shape_t &expected_shape)
 Load a tensor blob into a parameter tensor with validation.
void setExecutionContext (IExecutionContext *context)
 Set the execution context for this component.

Private Types

using OpType = typename OperationTraits<OperationType::GeluOp, TDeviceType, TPrecision>::type

Private Member Functions

void createOperation ()
 Create backend UnaryOperation from OperationRegistry.

Static Private Member Functions

static void validateMetadata_ (const SerializationMetadata &meta, const std::string &component_name)
 Validate metadata from archive during deserialization.

Private Attributes

GeluConfig config_
std::unique_ptr< TensorTypeinput_grad_ { nullptr }
std::shared_ptr< OpTypeoperation_ { nullptr }
std::unique_ptr< TensorTypeoutput_ { nullptr }
std::optional< TensorTypeoutput_view_
std::unique_ptr< IExecutionContextowned_exec_context_ { nullptr }

Additional Inherited Members

Protected Attributes inherited from Mila::Dnn::Component< TDeviceType, TPrecision >
BuildContext build_context_ { shape_t{ 1 }, RuntimeMode::Training }
 The BuildContext stored at build time.

Detailed Description

template<DeviceType TDeviceType, TensorDataType TPrecision>
requires PrecisionSupportedOnDevice<TPrecision, TDeviceType>
class Mila::Dnn::Gelu< TDeviceType, TPrecision >

Gaussian Error Linear Unit (GELU) activation component.

Device-templated GELU component that performs forward (and optionally backward) computation by delegating to a registered device-specific UnaryOperation implementation found via the OperationRegistry.

Template Parameters
TDeviceTypeCompile-time device identifier (DeviceType::Cpu or DeviceType::Cuda).
TPrecisionTensor data precision used by this component.

Construction Modes:

  • Standalone mode: Construct with DeviceId to create and own an ExecutionContext. The component manages the context lifetime and uses it for operation execution.
  • Shared mode: Construct without DeviceId; parent (Network/CompositeComponent) provides ExecutionContext via setExecutionContext() after construction.

Ownership:

Preconditions:

  • The component's build(const shape_t&) must be called before forward() to fully initialize the backend operation.
  • Shared mode requires parent to call setExecutionContext() before build().

Behavior:

  • Stateless: no trainable parameters; parameterCount() returns 0 and save()/load() are minimal but include template metadata so a loader can validate instantiation parameters.
  • forward() delegates to the backend UnaryOperation. The caller is responsible for providing device-compatible ITensor objects.
  • backward() computes input gradients for GELU activation (no parameter gradients).

Threading / Synchronization:

  • Component does not guarantee thread-safety; call synchronize() to wait for outstanding device work to complete when needed.

Constructor & Destructor Documentation

◆ Gelu()

template<DeviceType TDeviceType, TensorDataType TPrecision>
Mila::Dnn::Gelu< TDeviceType, TPrecision >::Gelu ( const std::string & name,
const GeluConfig & config,
std::optional< DeviceId > device_id = std::nullopt )
inlineexplicitexport
Here is the call graph for this function:

◆ ~Gelu()

template<DeviceType TDeviceType, TensorDataType TPrecision>
Mila::Dnn::Gelu< TDeviceType, TPrecision >::~Gelu ( )
overrideexportdefault

Member Function Documentation

◆ backward()

template<DeviceType TDeviceType, TensorDataType TPrecision>
TensorType & Mila::Dnn::Gelu< TDeviceType, TPrecision >::backward ( const TensorType & input,
const TensorType & output_grad )
inlineexport

Compute gradients with respect to the component input.

Delegates to the backend UnaryOperation::backward implementation to compute the gradient of GELU with respect to the input. The component owns the input gradient buffer which is allocated during onBuilding().

The gradient computation follows the chain rule: dL/dinput = dL/doutput * dGELU(input)/dinput

Parameters
inputConst reference to the original forward input.
output_gradConst reference to the gradient w.r.t. component output (?L/?output).
Returns
Pointer to an ITensor containing the computed input gradient.
Exceptions
std::runtime_errorif component has not been built via build().
std::runtime_errorif component is not in training mode.
std::runtime_errorif operation backend is not initialized.
Note
GELU has no parameters, so no parameter gradients are computed.
The implementation may accumulate into the returned tensor (backend-dependent).
Requires setTraining(true) for gradient computation in some backends.
Here is the call graph for this function:

◆ createOperation()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Gelu< TDeviceType, TPrecision >::createOperation ( )
inlineexportprivate

Create backend UnaryOperation from OperationRegistry.

Called by onExecutionContextSet() hook. Looks up "GeluOp" in the OperationRegistry and creates a device-specific implementation.

Exceptions
std::runtime_errorif operation creation fails.
std::runtime_errorif "GeluOp" is not registered for this device/precision.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ forward()

template<DeviceType TDeviceType, TensorDataType TPrecision>
TensorType & Mila::Dnn::Gelu< TDeviceType, TPrecision >::forward ( const TensorType & input)
inlineexport

Run the forward computation for this GELU component.

Dispatches to the device-specific UnaryOperation backend via a cached output view. The view is pre-initialized at build time and rebuilt only when the input shape changes (e.g. prefill vs decode). Zero heap allocation in steady-state.

Parameters
inputConst reference to the input tensor.
Returns
Reference to the cached output view.
Exceptions
std::runtime_errorif component has not been built via build().
Here is the call graph for this function:

◆ fromArchive_()

template<DeviceType TDeviceType, TensorDataType TPrecision>
std::unique_ptr< Gelu > Mila::Dnn::Gelu< TDeviceType, TPrecision >::fromArchive_ ( ModelArchive & archive,
const std::string & component_name,
IExecutionContext * exec_context )
inlinestaticexport
Here is the call graph for this function:

◆ getApproximationMethod()

template<DeviceType TDeviceType, TensorDataType TPrecision>
ApproximationMethod Mila::Dnn::Gelu< TDeviceType, TPrecision >::getApproximationMethod ( ) const
inlineexport

Return the configured GELU approximation method.

Returns
Configured GeluConfig::ApproximationMethod value (Exact or Tanh).

◆ getDeviceId()

template<DeviceType TDeviceType, TensorDataType TPrecision>
DeviceId Mila::Dnn::Gelu< TDeviceType, TPrecision >::getDeviceId ( ) const
inlineoverrideexportvirtual

Get the device identifier for this component.

Returns the DeviceId from the ExecutionContext. In standalone mode, this is the device specified at construction. In shared mode, this is the parent's device.

Returns
DeviceId indicating device type and index.
Exceptions
std::runtime_errorif ExecutionContext has not been set.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ getGradients()

template<DeviceType TDeviceType, TensorDataType TPrecision>
std::vector< ITensor * > Mila::Dnn::Gelu< TDeviceType, TPrecision >::getGradients ( ) const
inlineoverrideexportvirtual

Get parameter gradient tensors.

GELU has no trainable parameters, therefore no gradients.

Returns
Empty vector.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ getMemoryStats()

template<DeviceType TDeviceType, TensorDataType TPrecision>
MemoryStats Mila::Dnn::Gelu< TDeviceType, TPrecision >::getMemoryStats ( ) const
inlineoverrideexportvirtual

Return memory allocation breakdown.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ getParameters()

template<DeviceType TDeviceType, TensorDataType TPrecision>
std::vector< ITensor * > Mila::Dnn::Gelu< TDeviceType, TPrecision >::getParameters ( ) const
inlineoverrideexportvirtual

Get trainable parameter tensors.

GELU has no trainable parameters.

Returns
Empty vector.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ getType()

template<DeviceType TDeviceType, TensorDataType TPrecision>
const ComponentType Mila::Dnn::Gelu< TDeviceType, TPrecision >::getType ( ) const
inlineoverrideexportvirtual

Get the component type identifier.

Used for serialization and runtime type identification.

Returns
Component type enum value.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ onBuilding()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Gelu< TDeviceType, TPrecision >::onBuilding ( const BuildContext & build_context)
inlineoverrideexportprotectedvirtual

Hook invoked during build() to initialize backend operation.

Delegates shape-dependent initialization to the backend UnaryOperation. Must be called before forward() or backward().

State guards:

  • Expects ExecutionContext to be set (required for operation creation)
  • Expects operation to be initialized (created in onExecutionContextSet)
  • Expects component to be unbuilt (guaranteed by Component::build)
Parameters
input_shapeExpected shape for input tensors.
Exceptions
std::invalid_argumentif input_shape is incompatible with the component configuration.
std::runtime_errorif backend allocation or build fails.
std::runtime_errorif operation is not initialized.

Reimplemented from Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ onExecutionContextSet()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Gelu< TDeviceType, TPrecision >::onExecutionContextSet ( )
inlineoverrideexportprotectedvirtual

Hook invoked after ExecutionContext is set.

Called by Component::setExecutionContext() after the context is registered. Creates the backend UnaryOperation using the OperationRegistry.

This hook is triggered in two scenarios:

  • Standalone mode: Immediately in constructor after owned context creation
  • Shared mode: When parent calls setExecutionContext() after construction

State guards:

Exceptions
std::runtime_errorif operation creation fails.
std::runtime_errorif "GeluOp" is not registered for this device/precision.

Reimplemented from Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ onTrainingModeChanging()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Gelu< TDeviceType, TPrecision >::onTrainingModeChanging ( TrainingMode training_mode)
inlineoverrideexportprotectedvirtual

Hook invoked when training mode changes.

Propagates training mode to the backend operation. Called by Component::setTraining() with the training mutex held.

State guards:

  • Expects operation to be initialized (should be created in onExecutionContextSet)
  • Can be called before or after build()
Parameters
is_trainingNew training mode state.
Note
Do not call setTraining() from this hook (reentrancy prohibited).
If operation is not initialized, silently returns (may occur during construction).

Reimplemented from Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ parameterCount()

template<DeviceType TDeviceType, TensorDataType TPrecision>
size_t Mila::Dnn::Gelu< TDeviceType, TPrecision >::parameterCount ( ) const
inlineoverrideexportvirtual

Number of trainable parameters.

GELU is stateless and exposes no trainable parameters.

Returns
0

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ save_()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Gelu< TDeviceType, TPrecision >::save_ ( ModelArchive & archive,
SerializationMode mode ) const
inlineoverrideexportvirtual

Persist component state to archive.

GELU is stateless (no trainable tensors) but persists:

  • Component type ("Gelu") and version (1)
  • Component name from config
  • Template parameters (device type and precision) for loader validation
  • Serialized GeluConfig (approximation method)

Files written:

Parameters
archiveArchive to write to.
modeSerialization mode (currently unused, all state is always saved).

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ synchronize()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Gelu< TDeviceType, TPrecision >::synchronize ( )
inlineoverrideexportvirtual

Wait for all asynchronous work submitted by this component to complete.

Synchronizes the underlying ExecutionContext. On CPU implementations this may be a no-op. Use to ensure results are visible on the host or to measure synchronous timings.

Exceptions
std::runtime_errorif ExecutionContext has not been set.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ toString()

template<DeviceType TDeviceType, TensorDataType TPrecision>
std::string Mila::Dnn::Gelu< TDeviceType, TPrecision >::toString ( ) const
inlineoverrideexportvirtual

Generate human-readable description of the component.

Produces a multi-line string showing:

Returns
Formatted string representation.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ validateMetadata_()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Gelu< TDeviceType, TPrecision >::validateMetadata_ ( const SerializationMetadata & meta,
const std::string & component_name )
inlinestaticexportprivate

Validate metadata from archive during deserialization.

Verifies:

  • Version is 1
  • Type is "Gelu"
  • Device type matches TDeviceType
  • Precision matches TPrecision
Parameters
metaParsed metadata.
component_nameComponent name for error messages.
Exceptions
std::runtime_errorif validation fails.
Here is the call graph for this function:
Here is the caller graph for this function:

The documentation for this class was generated from the following file:
  • /__w/Mila/Mila/Mila/Src/Dnn/Components/Activations/Gelu/Gelu.ixx