Mila
Deep Neural Network Library
|
An encoder module that provides token and positional embeddings. More...
Public Types | |
using | ModuleBase = Module< TDeviceType, TInput, TOutput > |
Alias for base module type. | |
using | MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource > |
Memory resource type determined based on device type. | |
![]() | |
using | MR = std::conditional_t< TDeviceType==DeviceType::Cuda, CudaMemoryResource, CpuMemoryResource > |
Public Member Functions | |
Encoder (const std::string &device_name, const EncoderConfig &config) | |
Constructs a new Encoder module with a device name. | |
Encoder (std::shared_ptr< DeviceContext > device_context, const EncoderConfig &config) | |
Constructs a new Encoder module with a provided device context. | |
void | forward (const Tensor< TInput, MR > &input, Tensor< TOutput, MR > &output) |
Performs the forward pass of the encoder. | |
size_t | getChannels () const |
Gets the number of channels (embedding dimension). | |
size_t | getMaxSequenceLength () const |
Gets the maximum sequence length. | |
size_t | getVocabularyLength () const |
Gets the vocabulary length. | |
void | load (ModelArchive &archive) override |
Loads the encoder parameters from a zip archive. | |
size_t | parameterCount () const override |
Gets the number of parameters in the module. | |
void | save (ModelArchive &archive) const override |
Saves the encoder parameters to a zip archive. | |
std::string | toString () const override |
Gets the module information as a string. | |
![]() | |
Module (const std::string &device_name, const ComponentConfig &config) | |
Constructor with device name. | |
Module (std::shared_ptr< DeviceContext > context, const ComponentConfig &config) | |
Constructor with a specific device context. | |
virtual | ~Module ()=default |
Virtual destructor for proper cleanup in derived classes. | |
std::shared_ptr< Compute::DeviceContext > | getDeviceContext () const |
Get the device context for this module. | |
Compute::DeviceType | getDeviceType () const |
Get the device type of the current device context. | |
std::string | getName () const |
Get the name of the module. | |
const auto & | getParameterTensors () const |
Get the parameter tensors of this module. | |
const ComputePrecision::Policy & | getPrecision () const |
const auto & | getStateTensors () const |
Get the state tensors of this module. | |
bool | isTraining () const |
Check if the module is in training mode. | |
virtual void | setTraining (bool is_training) |
Set the training mode of this module. | |
Private Member Functions | |
void | createOperation () |
Creates the computational operation based on current device context. | |
void | initializeTensors () |
Initializes the token and positional embedding tensors. | |
Private Attributes | |
OperationAttributes | attributes_ |
Operation-specific attributes and configuration. | |
EncoderConfig | config_ |
Configuration for the Encoder module. | |
std::shared_ptr< UnaryOperation< TDeviceType, TInput, TOutput > > | operation_ { nullptr } |
The computational operation that implements the encoder logic. | |
std::vector< std::shared_ptr< Tensor< TOutput, MR > > > | output_state_ |
Output state tensors used for intermediate values. | |
std::vector< std::shared_ptr< Tensor< TOutput, MR > > > | parameters_ |
Vector of parameter tensors that will be used during forward/backward passes. | |
std::shared_ptr< Tensor< TOutput, MR > > | wpe_ { nullptr } |
Position embedding table with shape (maxT,C), encodes token position information. | |
std::shared_ptr< Tensor< TOutput, MR > > | wte_ { nullptr } |
Token embedding table with shape (V,C), maps token IDs to vector representations. | |
Additional Inherited Members | |
![]() | |
const std::string | parametersToString () const |
Helper method to convert parameters to string representation. | |
const std::string | stateToString () const |
Helper method to convert state tensors to string representation. | |
![]() | |
std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > > | parameter_map_ = {} |
Map of parameter names to parameter tensors. | |
std::unordered_map< std::string, std::shared_ptr< Tensor< TOutput, MR > > > | state_map_ = {} |
Map of state names to state tensors. | |
An encoder module that provides token and positional embeddings.
The Encoder transforms input token IDs into continuous vector representations by:
This implementation supports both CPU and CUDA execution depending on the device context. The encoder is a fundamental component in transformer architectures, providing the initial representation of tokens that subsequent layers will process.
TDeviceType | The device type (CPU or CUDA) on which to perform computations. |
TInput | The data type of the input token IDs (typically int). |
TOutput | The data type of the output embeddings (typically float). |
|
export |
Alias for base module type.
|
export |
Memory resource type determined based on device type.
|
inlineexplicitexport |
Constructs a new Encoder module with a device name.
device_name | The name of the device to use (e.g., "CPU", "CUDA:0"). |
config | Configuration parameters for the Encoder module. |
std::invalid_argument | If the device name is invalid or the configuration is invalid |
std::runtime_error | If device type doesn't match template parameter TDeviceType |
|
inlineexplicitexport |
Constructs a new Encoder module with a provided device context.
device_context | The device context to use for this module. |
config | Configuration parameters for the Encoder module. |
std::invalid_argument | If device_context is null or configuration is invalid |
std::runtime_error | If device context type doesn't match template parameter TDeviceType |
|
inlineexportprivate |
Creates the computational operation based on current device context.
Instantiates either a CPU or CUDA encoder operation based on the current device context. The operation implements the actual embedding lookup and addition logic during forward pass.
|
inlineexport |
Performs the forward pass of the encoder.
Transforms input token IDs into continuous embeddings by:
input | The input tensor containing token IDs with shape (B,T). |
output | The output tensor that will contain embeddings with shape (B,T,C). |
|
inlineexport |
Gets the number of channels (embedding dimension).
|
inlineexport |
Gets the maximum sequence length.
|
inlineexport |
Gets the vocabulary length.
|
inlineexportprivate |
Initializes the token and positional embedding tensors.
Creates and initializes:
Both tensors are initialized using Xavier initialization to ensure proper gradient flow during training. The tensors are registered as parameters in the module's parameter map for training and serialization.
|
inlineoverrideexportvirtual |
Loads the encoder parameters from a zip archive.
Deserializes all parameter tensors (wte and wpe) from the specified zip archive. This enables loading pretrained models for inference or continued training.
zip | The zip archive to load the parameters from. |
Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.
|
inlineoverrideexportvirtual |
Gets the number of parameters in the module.
Counts all learnable parameters in the encoder, which includes all elements in the token embedding table (wte) and position embedding table (wpe).
Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.
|
inlineoverrideexportvirtual |
Saves the encoder parameters to a zip archive.
Serializes all parameter tensors (wte and wpe) to the specified zip archive. This enables model persistence for later reuse or distribution.
zip | The zip archive to save the parameters to. |
Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.
|
inlineoverrideexportvirtual |
Gets the module information as a string.
Provides a human-readable description of the encoder configuration, including dimensions, parameter counts, and tensor information.
Implements Mila::Dnn::Module< TDeviceType, TInput, TOutput >.
|
exportprivate |
Operation-specific attributes and configuration.
|
exportprivate |
Configuration for the Encoder module.
|
exportprivate |
The computational operation that implements the encoder logic.
|
exportprivate |
Output state tensors used for intermediate values.
Not used in this module.
|
exportprivate |
Vector of parameter tensors that will be used during forward/backward passes.
Contains both the token embeddings (wte) and position embeddings (wpe).
|
exportprivate |
Position embedding table with shape (maxT,C), encodes token position information.
maxT is the maximum sequence length and C is the embedding dimension.
|
exportprivate |
Token embedding table with shape (V,C), maps token IDs to vector representations.
V is the vocabulary size and C is the embedding dimension.