Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision > Class Template Referenceexport

Device-agnostic AdamW optimizer. More...

Inheritance diagram for Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >:
Collaboration diagram for Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >:

Public Types

using ExecutionContextType = ExecutionContext<TDeviceType>
using OptimizerType = CpuAdamWOptimizer<TPrecision>

Public Member Functions

 AdamWOptimizer (IExecutionContext *exec_context, const AdamWConfig &config)
 Construct AdamW optimizer from fluent AdamWConfig.
 ~AdamWOptimizer () override=default
void addParameter (ITensor *param, ITensor *grad) override
 Register a parameter tensor for optimization.
float getBeta1 () const noexcept
float getBeta2 () const noexcept
float getEpsilon () const noexcept
float getLearningRate () const override
 Get the current learning rate.
size_t getParameterCount () const noexcept
size_t getStepCount () const noexcept
float getWeightDecay () const noexcept
void setLearningRate (float learning_rate) override
 Set the learning rate for future updates.
void setWeightDecay (float weight_decay)
void step () override
 Perform one optimization step.
Public Member Functions inherited from Mila::Dnn::Compute::Optimizer< TDeviceType, TPrecision >
virtual ~Optimizer ()=default

Private Attributes

AdamWConfig config_
IExecutionContextcontext_
std::shared_ptr< OptimizerTypeimpl_

Detailed Description

template<DeviceType TDeviceType, TensorDataType TPrecision>
requires PrecisionSupportedOnDevice<TPrecision, TDeviceType>
class Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >

Device-agnostic AdamW optimizer.

Dispatches to the appropriate device-specific implementation (CPU or CUDA) based on the TDeviceType template parameter. Uses AdamWConfig for fluent configuration of hyperparameters.

Template Parameters
TDeviceTypeDevice type (DeviceType::Cpu or DeviceType::Cuda)
TPrecisionTensor precision (TensorDataType::FP32, FP16, BF16)

Member Typedef Documentation

◆ ExecutionContextType

template<DeviceType TDeviceType, TensorDataType TPrecision>
using Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::ExecutionContextType = ExecutionContext<TDeviceType>

◆ OptimizerType

template<DeviceType TDeviceType, TensorDataType TPrecision>
using Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::OptimizerType = CpuAdamWOptimizer<TPrecision>

Constructor & Destructor Documentation

◆ AdamWOptimizer()

template<DeviceType TDeviceType, TensorDataType TPrecision>
Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::AdamWOptimizer ( IExecutionContext * exec_context,
const AdamWConfig & config )
inlineexplicit

Construct AdamW optimizer from fluent AdamWConfig.

Parameters
exec_contextExecution context for device resources
configFluent AdamWConfig describing hyperparameters
Exceptions
std::invalid_argumentif exec_context is null
std::invalid_argumentif config.validate() fails
Here is the call graph for this function:

◆ ~AdamWOptimizer()

template<DeviceType TDeviceType, TensorDataType TPrecision>
Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::~AdamWOptimizer ( )
overridedefault

Member Function Documentation

◆ addParameter()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::addParameter ( ITensor * param,
ITensor * grad )
inlineoverridevirtual

Register a parameter tensor for optimization.

Adds a parameter-gradient pair to the optimizer's update list. The optimizer will allocate internal state tensors (momentum, variance, etc.) matching the parameter shape and device placement.

Parameters
paramShared pointer to parameter tensor to be optimized
gradShared pointer to gradient tensor (must match param shape)
Exceptions
std::invalid_argumentif param or grad is nullptr
std::invalid_argumentif param and grad shapes don't match
std::invalid_argumentif param and grad are on different devices
std::runtime_errorif state allocation fails
Note
Must be called after model->build() when parameter shapes are known
Parameter and gradient must persist for the optimizer's lifetime
Calling multiple times with same parameter updates the gradient reference
State tensors are initialized to zero on first registration
See also
step()

Example:

auto params = model->getParameters();
auto grads = model->getGradients();
for (size_t i = 0; i < params.size(); ++i) {
optimizer->addParameter(params[i], grads[i]);
}

Implements Mila::Dnn::Compute::Optimizer< TDeviceType, TPrecision >.

◆ getBeta1()

template<DeviceType TDeviceType, TensorDataType TPrecision>
float Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::getBeta1 ( ) const
inlinenoexcept

◆ getBeta2()

template<DeviceType TDeviceType, TensorDataType TPrecision>
float Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::getBeta2 ( ) const
inlinenoexcept

◆ getEpsilon()

template<DeviceType TDeviceType, TensorDataType TPrecision>
float Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::getEpsilon ( ) const
inlinenoexcept

◆ getLearningRate()

template<DeviceType TDeviceType, TensorDataType TPrecision>
float Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::getLearningRate ( ) const
inlineoverridevirtual

Get the current learning rate.

Returns the base learning rate used for parameter updates. Some optimizers may apply adaptive per-parameter learning rates internally (Adam, AdamW), but this method returns the global scaling factor.

Returns
Current learning rate as a float
Note
For adaptive optimizers, actual effective learning rate per parameter may differ due to momentum and variance scaling
See also
setLearningRate()

Implements Mila::Dnn::Compute::Optimizer< TDeviceType, TPrecision >.

◆ getParameterCount()

template<DeviceType TDeviceType, TensorDataType TPrecision>
size_t Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::getParameterCount ( ) const
inlinenoexcept

◆ getStepCount()

template<DeviceType TDeviceType, TensorDataType TPrecision>
size_t Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::getStepCount ( ) const
inlinenoexcept

◆ getWeightDecay()

template<DeviceType TDeviceType, TensorDataType TPrecision>
float Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::getWeightDecay ( ) const
inlinenoexcept

◆ setLearningRate()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::setLearningRate ( float learning_rate)
inlineoverridevirtual

Set the learning rate for future updates.

Updates the base learning rate used by the optimizer. Typically used for learning rate schedules (decay, warmup, cyclic, etc.).

Parameters
learning_rateNew learning rate (must be positive)
Exceptions
std::invalid_argumentif learning_rate <= 0
Note
Takes effect immediately for the next step() call
Does not affect optimizer state (momentum, variance)
For learning rate schedules, call this at epoch or iteration boundaries
See also
getLearningRate()

Example with learning rate decay:

float initial_lr = 0.001f;
optimizer->setLearningRate(initial_lr);
for (size_t epoch = 0; epoch < num_epochs; ++epoch) {
// Training loop...
// Decay learning rate every 10 epochs
if (epoch > 0 && epoch % 10 == 0) {
float new_lr = optimizer->getLearningRate() * 0.5f;
optimizer->setLearningRate(new_lr);
std::cout << "Learning rate: " << new_lr << std::endl;
}
}

Implements Mila::Dnn::Compute::Optimizer< TDeviceType, TPrecision >.

◆ setWeightDecay()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::setWeightDecay ( float weight_decay)
inline

◆ step()

template<DeviceType TDeviceType, TensorDataType TPrecision>
void Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::step ( )
inlineoverridevirtual

Perform one optimization step.

Updates all registered parameters using their accumulated gradients according to the optimizer's update rule (SGD, Adam, AdamW, etc.). This is the HOT PATH method called every training iteration.

For algorithms with state (Adam, AdamW):

  • Updates first and second moment estimates
  • Applies bias correction if needed
  • Computes parameter update
  • Writes updated parameters back to tensors
Exceptions
std::runtime_errorif no parameters have been registered
std::runtime_errorif gradient data is invalid or null
Note
Gradients should be computed via backward() before calling step()
For CUDA implementations, may be asynchronous (uses device stream)
Increments internal step counter for algorithms requiring it (Adam, AdamW)
See also
addParameter()
backward()

Typical sequence:

model->zeroGradients(); // Clear previous gradients (model-managed)
model->forward(input, output); // Forward pass
loss = computeLoss(output, target);
model->backward(input, loss_grad); // Compute gradients
optimizer->step(); // Update parameters

Implements Mila::Dnn::Compute::Optimizer< TDeviceType, TPrecision >.

Member Data Documentation

◆ config_

template<DeviceType TDeviceType, TensorDataType TPrecision>
AdamWConfig Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::config_
private

◆ context_

template<DeviceType TDeviceType, TensorDataType TPrecision>
IExecutionContext* Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::context_
private

◆ impl_

template<DeviceType TDeviceType, TensorDataType TPrecision>
std::shared_ptr<OptimizerType> Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >::impl_
private

The documentation for this class was generated from the following file:
  • /__w/Mila/Mila/Mila/Src/Dnn/Optimizers/AdamW.ixx