|
Mila 0.13.48
Deep Neural Network Library
|
Abstract base class for parameter optimizers. More...

Public Member Functions | |
| virtual | ~Optimizer ()=default |
| virtual void | addParameter (ITensor *param, ITensor *grad)=0 |
| Register a parameter tensor for optimization. | |
| virtual float | getLearningRate () const =0 |
| Get the current learning rate. | |
| virtual void | setLearningRate (float learning_rate)=0 |
| Set the learning rate for future updates. | |
| virtual void | step ()=0 |
| Perform one optimization step. | |
Abstract base class for parameter optimizers.
Optimizers update model parameters using computed gradients according to specific update rules (SGD, Adam, AdamW, etc.). The optimizer:
Template Parameters:
| TDeviceType | Device where optimization occurs (DeviceType::Cpu or DeviceType::Cuda) |
| TPrecision | Abstract tensor precision (TensorDataType::FP32, FP16, BF16) |
Typical usage pattern:
Implementation Requirements:
|
virtualdefault |
|
pure virtual |
Register a parameter tensor for optimization.
Adds a parameter-gradient pair to the optimizer's update list. The optimizer will allocate internal state tensors (momentum, variance, etc.) matching the parameter shape and device placement.
| param | Shared pointer to parameter tensor to be optimized |
| grad | Shared pointer to gradient tensor (must match param shape) |
| std::invalid_argument | if param or grad is nullptr |
| std::invalid_argument | if param and grad shapes don't match |
| std::invalid_argument | if param and grad are on different devices |
| std::runtime_error | if state allocation fails |
Example:
Implemented in Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >, Mila::Dnn::Compute::CudaAdamWOptimizer< TPrecision >, and Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >.
|
pure virtual |
Get the current learning rate.
Returns the base learning rate used for parameter updates. Some optimizers may apply adaptive per-parameter learning rates internally (Adam, AdamW), but this method returns the global scaling factor.
Implemented in Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >, Mila::Dnn::Compute::CudaAdamWOptimizer< TPrecision >, and Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >.
|
pure virtual |
Set the learning rate for future updates.
Updates the base learning rate used by the optimizer. Typically used for learning rate schedules (decay, warmup, cyclic, etc.).
| learning_rate | New learning rate (must be positive) |
| std::invalid_argument | if learning_rate <= 0 |
Example with learning rate decay:
Implemented in Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >, Mila::Dnn::Compute::CudaAdamWOptimizer< TPrecision >, and Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >.
|
pure virtual |
Perform one optimization step.
Updates all registered parameters using their accumulated gradients according to the optimizer's update rule (SGD, Adam, AdamW, etc.). This is the HOT PATH method called every training iteration.
For algorithms with state (Adam, AdamW):
| std::runtime_error | if no parameters have been registered |
| std::runtime_error | if gradient data is invalid or null |
Typical sequence:
Implemented in Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >, Mila::Dnn::Compute::CudaAdamWOptimizer< TPrecision >, and Mila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision >.