Mila
Deep Neural Network Library
|
Multi-Layer Perceptron (MLP) block for neural networks. More...
Public Types | |
using | CompositeModuleBase = CompositeModule< TDeviceType, TDataType > |
Alias for base module type. | |
using | MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource > |
Memory resource type used for tensors, selected based on device type. | |
![]() | |
using | ModuleBase = Module< TDeviceType, TDataType, TDataType > |
Base class type for the module. | |
using | MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, HostMemoryResource > |
Memory resource type based on device type. | |
![]() | |
using | MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource > |
Public Member Functions | |
MLP (const std::string &device_name, const MLPConfig &config) | |
Constructs a new MLP module with a device name. | |
MLP (std::shared_ptr< DeviceContext > device_context, const MLPConfig &config) | |
Constructs a new MLP module with a provided device context. | |
void | backward (const Tensor< TDataType, MR > &input, const Tensor< TDataType, MR > &output_grad, Tensor< TDataType, MR > &input_grad) |
Performs the backward pass of the MLP block. | |
void | forward (const Tensor< TDataType, MR > &input, Tensor< TDataType, MR > &output) |
Performs the forward pass of the MLP block. | |
void | load (ModelArchive &archive) override |
Deserializes the module state from a ZIP archive. | |
size_t | parameterCount () const override |
Gets the number of trainable parameters in this module. | |
void | save (ModelArchive &archive) const override |
Serializes the module state to a ZIP archive. | |
std::string | toString () const override |
Generates a string representation of this module's configuration. | |
![]() | |
CompositeModule () | |
Default constructor. | |
CompositeModule (const std::string &device_name, const ComponentConfig &config) | |
Constructor with device name. | |
CompositeModule (std::shared_ptr< DeviceContext > context, const ComponentConfig &config) | |
Constructor with device context. | |
virtual | ~CompositeModule ()=default |
Virtual destructor. | |
CompositeModule & | addModule (const std::string &name, std::shared_ptr< Module< TDeviceType, TDataType, TDataType > > module) |
Add a named child module to this module. | |
CompositeModule & | addModule (std::shared_ptr< Module< TDeviceType, TDataType, TDataType > > module) |
Add an unnamed child module to this module. | |
std::shared_ptr< Module< TDeviceType, TDataType, TDataType > > | getModule (const std::string &name) const |
Get a specific sub-module by name. | |
const std::vector< std::shared_ptr< Module< TDeviceType, TDataType, TDataType > > > & | getModules () const |
Get all sub-modules contained in this module. | |
const std::unordered_map< std::string, std::shared_ptr< Module< TDeviceType, TDataType, TDataType > > > & | getNamedModules () const |
Get all named sub-modules contained in this module. | |
bool | hasModule (const std::string &name) const |
Check if a sub-module with the given name exists. | |
bool | removeModule (const std::string &name) |
Remove a sub-module by name. | |
bool | replaceModule (const std::string &name, std::shared_ptr< Module< TDeviceType, TDataType, TDataType > > module) |
Replace an existing sub-module with a new one. | |
void | setTraining (bool is_training) override |
Set the training mode for this module and all its sub-modules. | |
![]() | |
Module (const std::string &device_name, const ComponentConfig &config) | |
Constructor with device name. | |
Module (std::shared_ptr< DeviceContext > context, const ComponentConfig &config) | |
Constructor with a specific device context. | |
virtual | ~Module ()=default |
Virtual destructor for proper cleanup in derived classes. | |
std::shared_ptr< Compute::DeviceContext > | getDeviceContext () const |
Get the device context for this module. | |
Compute::DeviceType | getDeviceType () const |
Get the device type of the current device context. | |
std::string | getName () const |
Get the name of the module. | |
const auto & | getParameterTensors () const |
Get the parameter tensors of this module. | |
const ComputePrecision::Policy & | getPrecision () const |
const auto & | getStateTensors () const |
Get the state tensors of this module. | |
bool | isTraining () const |
Check if the module is in training mode. | |
Private Member Functions | |
void | initializeModules () |
Initializes all submodules for the MLP. | |
Private Attributes | |
Tensor< TDataType, MR > | act_output_ |
Output tensor from activation function. | |
std::shared_ptr< Module< TDeviceType, TDataType > > | activation_ { nullptr } |
Activation function module. | |
MLPConfig | config_ |
Configuration for the MLP module. | |
std::shared_ptr< Dropout< TDeviceType, TDataType > > | dropout1_ { nullptr } |
Optional dropout module. | |
Tensor< TDataType, MR > | dropout1_output_ |
Output tensor from dropout. | |
std::shared_ptr< Linear< TDeviceType, TDataType > > | fc1_ { nullptr } |
First linear layer (input_features -> hidden_size). | |
Tensor< TDataType, MR > | fc1_output_ |
Output tensor from first linear layer. | |
std::shared_ptr< Linear< TDeviceType, TDataType > > | fc2_ { nullptr } |
Second linear layer (hidden_size -> input_features). | |
Tensor< TDataType, MR > | fc2_output_ |
Output tensor from second linear layer. | |
std::shared_ptr< LayerNorm< TDeviceType, TDataType > > | norm1_ { nullptr } |
Optional layer normalization module. | |
Tensor< TDataType, MR > | norm1_output_ |
Output tensor from layer normalization. | |
Tensor< TDataType, MR > | residual_input_ |
Cached input tensor for residual connection. | |
Additional Inherited Members | |
![]() | |
const std::string | parametersToString () const |
Helper method to convert parameters to string representation. | |
const std::string | stateToString () const |
Helper method to convert state tensors to string representation. | |
![]() | |
std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > > | parameter_map_ = {} |
Map of parameter names to parameter tensors. | |
std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > > | state_map_ = {} |
Map of state names to state tensors. | |
Multi-Layer Perceptron (MLP) block for neural networks.
This module implements a two-layer MLP with an activation function in between: input -> Linear -> Activation -> Linear -> output
Optionally includes:
MLP blocks are fundamental components in many network architectures, including transformers where they typically follow attention layers and process token representations.
TDeviceType | The device type (CPU or CUDA) on which to perform computations. |
TDataType | The data type used for tensor elements throughout the network. |
|
export |
Alias for base module type.
|
export |
Memory resource type used for tensors, selected based on device type.
|
inlineexplicitexport |
Constructs a new MLP module with a device name.
Creates a new DeviceContext internally using the provided device name. This constructor is useful for creating standalone modules without pre-existing device contexts.
device_name | The name of the device to use (e.g., "CPU", "CUDA:0"). |
config | Configuration parameters for the MLP module. |
std::invalid_argument | If the device name is invalid or the configuration is invalid |
std::runtime_error | If device type doesn't match template parameter TDeviceType |
|
inlineexplicitexport |
Constructs a new MLP module with a provided device context.
Uses a pre-existing DeviceContext instance. This constructor is useful when integrating the module into a larger network that shares device contexts across modules.
device_context | The device context to use for this module. |
config | Configuration parameters for the MLP module. |
std::invalid_argument | If device_context is null or configuration is invalid |
std::runtime_error | If device context type doesn't match template parameter TDeviceType |
|
inlineexport |
Performs the backward pass of the MLP block.
Computes gradients for all components in the network by working backwards from the output gradient. Handles residual connections, dropout, layer normalization, and activation functions.
input | The input tensor from the forward pass. |
output_grad | The gradient of loss with respect to the output. |
input_grad | The tensor to store gradients with respect to input. |
|
inlineexport |
Performs the forward pass of the MLP block.
Processes the input through the full network: Linear -> (LayerNorm) -> Activation -> (Dropout) -> Linear -> (Residual)
When in inference mode with fused operations enabled, uses optimized execution.
input | The input tensor to be processed. |
output | The output tensor where the results will be stored. |
|
inlineexportprivate |
Initializes all submodules for the MLP.
Creates and configures:
|
inlineoverrideexportvirtual |
Deserializes the module state from a ZIP archive.
Loads the state of all submodules from the provided ZIP archive.
zip | The ZIP archive to load the module state from. |
Reimplemented from Mila::Dnn::CompositeModule< TDeviceType, TDataType >.
|
inlineoverrideexportvirtual |
Gets the number of trainable parameters in this module.
Counts the total number of parameters across all submodules.
Reimplemented from Mila::Dnn::CompositeModule< TDeviceType, TDataType >.
|
inlineoverrideexportvirtual |
Serializes the module state to a ZIP archive.
Saves the state of all submodules to the provided ZIP archive.
zip | The ZIP archive to save the module state to. |
Reimplemented from Mila::Dnn::CompositeModule< TDeviceType, TDataType >.
|
inlineoverrideexportvirtual |
Generates a string representation of this module's configuration.
Reimplemented from Mila::Dnn::CompositeModule< TDeviceType, TDataType >.
|
exportprivate |
Output tensor from activation function.
|
exportprivate |
Activation function module.
|
exportprivate |
Configuration for the MLP module.
|
exportprivate |
Optional dropout module.
|
exportprivate |
Output tensor from dropout.
|
exportprivate |
First linear layer (input_features -> hidden_size).
|
exportprivate |
Output tensor from first linear layer.
|
exportprivate |
Second linear layer (hidden_size -> input_features).
|
exportprivate |
Output tensor from second linear layer.
|
exportprivate |
Optional layer normalization module.
|
exportprivate |
Output tensor from layer normalization.
|
exportprivate |
Cached input tensor for residual connection.