Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::Softmax< TDeviceType, TPrecision > Class Template Referenceexport

Softmax activation module (device-templated). More...

Inheritance diagram for Mila::Dnn::Softmax< TDeviceType, TPrecision >:
Collaboration diagram for Mila::Dnn::Softmax< TDeviceType, TPrecision >:

Public Types

using ComponentBase = Component<TDeviceType, TPrecision>
using MR = typename DeviceTypeTraits<TDeviceType>::memory_resource
using TensorType = Tensor<TPrecision, MR>

Public Member Functions

 Softmax (const std::string &name, const SoftmaxConfig &config, std::optional< DeviceId > device_id=std::nullopt)
 ~Softmax () override=default
void backward (const ITensor &input, const ITensor &output_grad, ITensor &input_grad)
 Backward pass - delegates to backend operation.
void forward (const ITensor &input, ITensor &output)
 Forward pass - delegates to backend operation.
int64_t getAxis () const noexcept
 Get the softmax axis.
DeviceId getDeviceId () const override
 Get the device identifier for this module.
std::vector< ITensor * > getGradients () const override
 Get parameter gradient tensors.
MemoryStats getMemoryStats () const override
 Return the current memory allocation breakdown for this component.
std::vector< ITensor * > getParameters () const override
 Get trainable parameter tensors.
const ComponentType getType () const override
 Get the component type identifier.
size_t parameterCount () const override
 Number of trainable parameters.
void save_ (ModelArchive &archive, SerializationMode mode) const override
 Persist module state to archive.
void synchronize () override
 Wait for all asynchronous work submitted by this module to complete.
std::string toString () const override
 Generate human-readable description of the module.
Public Member Functions inherited from Mila::Dnn::Component< TDeviceType, TPrecision >
 Component (const std::string &name)
 Construct component with required name identifier.
virtual ~Component ()=default
virtual void build (const BuildContext &context) final
 Build the component with the provided BuildContext (canonical overload).
const std::string getName () const
 Get the component's name identifier.
virtual std::vector< std::string > getParameterNames () const
 List all available parameter names for this component.
RuntimeMode getRuntimeMode () const noexcept
 Convenience accessor — true if currently in Eval mode.
TrainingMode getTrainingMode () const noexcept
 The current runtime behavioral mode of this Component.
virtual bool isBuilt () const final
 Returns true if build() has completed successfully.
bool isInferenceMode () const noexcept
bool isTrainingMode () const noexcept
virtual void loadParameter (const std::string &name, const Serialization::ITensorBlob &blob)
 Load a parameter from serialized tensor data.
void setTrainingMode (TrainingMode mode)
 Set the runtime behavioral mode for this Component.
virtual void zeroGradients ()
 Clear all model-owned gradients for this component.

Protected Member Functions

void onBuilding (const BuildContext &build_config) override
 Hook invoked during build() to initialize component with input shape.
void onExecutionContextSet () override
 Get the configuration.
void onTrainingModeChanging (TrainingMode training_mode) override
 Hook invoked when training mode changes.
Protected Member Functions inherited from Mila::Dnn::Component< TDeviceType, TPrecision >
IExecutionContextgetExecutionContext () const
 Get the shared execution context.
bool hasExecutionContext () const noexcept
 Check if execution context has been set.
template<TensorDataType TParameterPrecision, typename TMemoryResource>
void loadParameterFromBlob (const std::string &param_name, const Serialization::ITensorBlob &blob, Tensor< TParameterPrecision, TMemoryResource > &target, const shape_t &expected_shape)
 Load a tensor blob into a parameter tensor with validation.
void setExecutionContext (IExecutionContext *context)
 Set the execution context for this component.

Private Types

using OpType = typename OperationTraits<OperationType::SoftmaxOp, TDeviceType, TPrecision>::type

Private Member Functions

void createOperation ()
 Create the backend compute operation.
void validateInputShape (const ITensor &input) const
 Validate input shape for softmax operation.
void validateInputShape (const shape_t &input_shape) const
 Validate input shape for softmax operation.

Private Attributes

SoftmaxConfig config_
std::shared_ptr< OpTypeoperation_ { nullptr }
std::unique_ptr< IExecutionContextowned_exec_context_ { nullptr }

Additional Inherited Members

Static Public Member Functions inherited from Mila::Dnn::Component< TDeviceType, TPrecision >
static constexpr DeviceType getDeviceType ()
 Compile-time device type for this component instance.
static constexpr TensorDataType getPrecision () noexcept
 Compile-time tensor precision for this component instance.
Protected Attributes inherited from Mila::Dnn::Component< TDeviceType, TPrecision >
BuildContext build_context_ { shape_t{ 1 }, RuntimeMode::Training }
 The BuildContext stored at build time.

Detailed Description

template<DeviceType TDeviceType, TensorDataType TPrecision>
requires PrecisionSupportedOnDevice<TPrecision, TDeviceType>
class Mila::Dnn::Softmax< TDeviceType, TPrecision >

Softmax activation module (device-templated).

Delegates computation to a device-specific UnaryOperation implementation registered in the OperationRegistry.

Softmax is a stateless activation function with no trainable parameters. The operation computes: softmax(x) = exp(x - max(x)) / sum(exp(x - max(x))) across a specified axis.

Construction Modes:

  • Standalone mode: Construct with DeviceId to create and own an ExecutionContext. The component manages the context lifetime and uses it for operation execution.
  • Shared mode: Construct without DeviceId; parent (Network/CompositeComponent) provides ExecutionContext via setExecutionContext() after construction.

Ownership:

Template Parameters
TDeviceTypeDevice type (DeviceType::Cpu or DeviceType::Cuda)
TPrecisionAbstract tensor precision (TensorDataType)

Constructor & Destructor Documentation

◆ Softmax()

template<DeviceType TDeviceType, TensorDataType TPrecision>
Mila::Dnn::Softmax< TDeviceType, TPrecision >::Softmax ( const std::string & name,
const SoftmaxConfig & config,
std::optional< DeviceId > device_id = std::nullopt )
inlineexplicitexport
Here is the call graph for this function:

◆ ~Softmax()

template<DeviceType TDeviceType, TensorDataType TPrecision>
Mila::Dnn::Softmax< TDeviceType, TPrecision >::~Softmax ( )
overrideexportdefault

Member Function Documentation

◆ backward()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Softmax< TDeviceType, TPrecision >::backward ( const ITensor & input,
const ITensor & output_grad,
ITensor & input_grad )
inlineexport

Backward pass - delegates to backend operation.

Computes gradient: dX = Y * (dY - dot(Y, dY)) where Y is the softmax output.

Here is the call graph for this function:

◆ createOperation()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Softmax< TDeviceType, TPrecision >::createOperation ( )
inlineexportprivate

Create the backend compute operation.

Uses the shared ExecutionContext from the base class to request a device-specific UnaryOperation from the OperationRegistry.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ forward()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Softmax< TDeviceType, TPrecision >::forward ( const ITensor & input,
ITensor & output )
inlineexport

Forward pass - delegates to backend operation.

Computes softmax activation across the configured axis.

Here is the call graph for this function:

◆ getAxis()

template<DeviceType TDeviceType, TensorDataType TPrecision>
int64_t Mila::Dnn::Softmax< TDeviceType, TPrecision >::getAxis ( ) const
inlineexportnoexcept

Get the softmax axis.

Returns
The axis along which softmax is computed.

◆ getDeviceId()

template<DeviceType TDeviceType, TensorDataType TPrecision>
DeviceId Mila::Dnn::Softmax< TDeviceType, TPrecision >::getDeviceId ( ) const
inlineoverrideexportvirtual

Get the device identifier for this module.

Returns the DeviceId from the ExecutionContext. In standalone mode, this is the device specified at construction. In shared mode, this is the parent's device.

Returns
DeviceId indicating device type and index.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ getGradients()

template<DeviceType TDeviceType, TensorDataType TPrecision>
std::vector< ITensor * > Mila::Dnn::Softmax< TDeviceType, TPrecision >::getGradients ( ) const
inlineoverrideexportvirtual

Get parameter gradient tensors.

Softmax has no trainable parameters, therefore no gradients.

Returns
Empty vector.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ getMemoryStats()

template<DeviceType TDeviceType, TensorDataType TPrecision>
MemoryStats Mila::Dnn::Softmax< TDeviceType, TPrecision >::getMemoryStats ( ) const
inlineoverrideexportvirtual

Return the current memory allocation breakdown for this component.

Reflects allocations at the moment of the call. The returned stats naturally track the component lifecycle:

After construction — parameters only After build( Inference ) — parameters + T=1 state buffers After build( Training ) — parameters + T=full state buffers After setEvaluation( false ) — parameters + state + gradients

For CompositeComponent and Network, the returned stats are the recursive aggregate of all child components.

May be called at any time — no lifecycle preconditions.

Returns
MemoryStats reflecting current allocations.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ getParameters()

template<DeviceType TDeviceType, TensorDataType TPrecision>
std::vector< ITensor * > Mila::Dnn::Softmax< TDeviceType, TPrecision >::getParameters ( ) const
inlineoverrideexportvirtual

Get trainable parameter tensors.

Softmax has no trainable parameters.

Returns
Empty vector.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ getType()

template<DeviceType TDeviceType, TensorDataType TPrecision>
const ComponentType Mila::Dnn::Softmax< TDeviceType, TPrecision >::getType ( ) const
inlineoverrideexportvirtual

Get the component type identifier.

Used for serialization and runtime type identification.

Returns
Component type enum value.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ onBuilding()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Softmax< TDeviceType, TPrecision >::onBuilding ( const BuildContext & build_config)
inlineoverrideexportprotectedvirtual

Hook invoked during build() to initialize component with input shape.

Softmax is stateless and has no parameters to allocate. This method validates the input shape and delegates to the backend operation's build method to cache dimension computations.

Parameters
input_shapeExpected shape for input tensors.
Exceptions
std::invalid_argumentif input_shape is invalid or axis out of bounds.
std::runtime_errorif backend build fails.

Reimplemented from Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ onExecutionContextSet()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Softmax< TDeviceType, TPrecision >::onExecutionContextSet ( )
inlineoverrideexportprotectedvirtual

Get the configuration.

Returns
Reference to the SoftmaxConfig.

Hook invoked after ExecutionContext is set.

Called by Component::setExecutionContext() after the context is registered. Creates the backend UnaryOperation using the OperationRegistry.

This hook is triggered in two scenarios:

  • Standalone mode: Immediately in constructor after owned context creation
  • Shared mode: When parent calls setExecutionContext() after construction
Exceptions
std::runtime_errorif operation creation fails.

Reimplemented from Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ onTrainingModeChanging()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Softmax< TDeviceType, TPrecision >::onTrainingModeChanging ( TrainingMode training_mode)
inlineoverrideexportprotectedvirtual

Hook invoked when training mode changes.

Propagates training mode to the backend operation. Called by Component::setTraining() with the training mutex held.

Parameters
is_trainingNew training mode state.
Note
Do not call setTraining() from this hook (reentrancy prohibited).

Reimplemented from Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ parameterCount()

template<DeviceType TDeviceType, TensorDataType TPrecision>
size_t Mila::Dnn::Softmax< TDeviceType, TPrecision >::parameterCount ( ) const
inlineoverrideexportvirtual

Number of trainable parameters.

Softmax is stateless and exposes no trainable parameters.

Returns
0

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ save_()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Softmax< TDeviceType, TPrecision >::save_ ( ModelArchive & archive,
SerializationMode mode ) const
inlineoverrideexportvirtual

Persist module state to archive.

Softmax is stateless (no trainable tensors) but persists:

  • Module type and version metadata
  • Configuration (axis)
Parameters
archiveArchive to write to.
modeSerialization mode (currently unused for stateless components).

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

◆ synchronize()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Softmax< TDeviceType, TPrecision >::synchronize ( )
inlineoverrideexportvirtual

Wait for all asynchronous work submitted by this module to complete.

Synchronizes the underlying ExecutionContext. On CPU implementations this may be a no-op. Use to ensure results are visible on the host or to measure synchronous timings.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ toString()

template<DeviceType TDeviceType, TensorDataType TPrecision>
std::string Mila::Dnn::Softmax< TDeviceType, TPrecision >::toString ( ) const
inlineoverrideexportvirtual

Generate human-readable description of the module.

Produces a multi-line string showing:

  • Module name
  • Device type
  • Axis configuration
Returns
Formatted string representation.

Implements Mila::Dnn::Component< TDeviceType, TPrecision >.

Here is the call graph for this function:

◆ validateInputShape() [1/2]

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Softmax< TDeviceType, TPrecision >::validateInputShape ( const ITensor & input) const
inlineexportprivate

Validate input shape for softmax operation.

Ensures the input has valid rank and the configured axis is within bounds.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ validateInputShape() [2/2]

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Softmax< TDeviceType, TPrecision >::validateInputShape ( const shape_t & input_shape) const
inlineexportprivate

Validate input shape for softmax operation.

Ensures the input has valid rank and the configured axis is within bounds.

Here is the call graph for this function:

The documentation for this class was generated from the following file:
  • /__w/Mila/Mila/Mila/Src/Dnn/Components/Normalization/Softmax.ixx