Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision > Class Template Referenceexport

CPU-specific AdamW optimizer implementation. More...

Inheritance diagram for Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >:
Collaboration diagram for Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >:

Public Types

using ExecutionContextType = ExecutionContext<DeviceType::Cpu>
using HostType = typename TensorHostTypeMap<TPrecision>::host_type
using MR = CpuMemoryResource
using TensorType = Tensor<TPrecision, MR>

Public Member Functions

 CpuAdamWOptimizer (IExecutionContext *context, const AdamWConfig &config)
 Construct CPU AdamW optimizer.
 ~CpuAdamWOptimizer () override=default
void addParameter (ITensor *param, ITensor *grad) override
 Register a parameter-gradient pair for optimization.
float getBeta1 () const noexcept
 Get beta1 parameter.
float getBeta2 () const noexcept
 Get beta2 parameter.
float getEpsilon () const noexcept
 Get epsilon parameter.
float getLearningRate () const override
 Zero all gradient tensors.
size_t getParameterCount () const noexcept
 Get number of registered parameter groups.
size_t getStepCount () const noexcept
 Get current step count.
float getWeightDecay () const noexcept
 Get weight decay parameter.
void setLearningRate (float learning_rate) override
 Set learning rate for future steps.
void step () override
 Perform one AdamW optimization step.
Public Member Functions inherited from Mila::Dnn::Compute::Optimizer< DeviceType::Cpu, TPrecision >
virtual ~Optimizer ()=default

Private Member Functions

void updateParameter (HostType *param_data, const HostType *grad_data, float *m_data, float *v_data, size_t num_params, float beta1_correction, float beta2_correction)
 Update a single parameter using AdamW algorithm.

Static Private Member Functions

static std::string shapeToString (const shape_t &shape)
 Convert shape to string for error messages.

Private Attributes

AdamWConfig config_
IExecutionContextcontext_
std::vector< const HostType * > grad_data_
std::vector< ITensor * > grads_
float learning_rate_
std::vector< float * > m_data_
std::vector< std::shared_ptr< Tensor< TensorDataType::FP32, MR > > > m_states_
std::vector< HostType * > param_data_
std::vector< ITensor * > params_
size_t step_count_ { 0 }
std::vector< float * > v_data_
std::vector< std::shared_ptr< Tensor< TensorDataType::FP32, MR > > > v_states_

Detailed Description

template<TensorDataType TPrecision>
requires PrecisionSupportedOnDevice<TPrecision, DeviceType::Cpu>
class Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >

CPU-specific AdamW optimizer implementation.

Implements the AdamW algorithm using scalar CPU loops. Maintains per-parameter state tensors (first moment, second moment) and performs synchronous parameter updates.

AdamW algorithm:

  • m_t = beta1 * m_{t-1} + (1 - beta1) * g_t (first moment)
  • v_t = beta2 * v_{t-1} + (1 - beta2) * g_t^2 (second moment)
  • m_hat = m_t / (1 - beta1^t) (bias correction)
  • v_hat = v_t / (1 - beta2^t) (bias correction)
  • theta_t = theta_{t-1} - lr * (m_hat / (sqrt(v_hat) + eps) + wd * theta_{t-1})

Features:

  • Decoupled weight decay (AdamW variant)
  • Bias correction for moments
  • FP32 state for numerical stability
  • Synchronous execution
Template Parameters
TPrecisionAbstract tensor precision (TensorDataType)

Member Typedef Documentation

◆ ExecutionContextType

template<TensorDataType TPrecision>
using Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::ExecutionContextType = ExecutionContext<DeviceType::Cpu>

◆ HostType

template<TensorDataType TPrecision>
using Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::HostType = typename TensorHostTypeMap<TPrecision>::host_type

◆ MR

template<TensorDataType TPrecision>
using Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::MR = CpuMemoryResource

◆ TensorType

template<TensorDataType TPrecision>
using Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::TensorType = Tensor<TPrecision, MR>

Constructor & Destructor Documentation

◆ CpuAdamWOptimizer()

template<TensorDataType TPrecision>
Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::CpuAdamWOptimizer ( IExecutionContext * context,
const AdamWConfig & config )
inlineexplicit

Construct CPU AdamW optimizer.

Parameters
exec_contextCPU execution context
learning_rateInitial learning rate (typical: 1e-3 to 1e-4)
beta1Exponential decay rate for first moment (typical: 0.9)
beta2Exponential decay rate for second moment (typical: 0.999)
epsilonSmall constant for numerical stability (typical: 1e-8)
weight_decayWeight decay coefficient (typical: 0.01)
Exceptions
std::invalid_argumentif exec_context is null
std::invalid_argumentif learning_rate <= 0
std::invalid_argumentif beta1, beta2 not in (0, 1)
std::invalid_argumentif epsilon <= 0

◆ ~CpuAdamWOptimizer()

template<TensorDataType TPrecision>
Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::~CpuAdamWOptimizer ( )
overridedefault

Member Function Documentation

◆ addParameter()

template<TensorDataType TPrecision>
void Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::addParameter ( ITensor * param,
ITensor * grad )
inlineoverridevirtual

Register a parameter-gradient pair for optimization.

The optimizer does not take ownership of the parameter/gradient tensors. The caller (typically a Module) must ensure the tensors remain valid for the lifetime of the optimizer.

Allocates momentum and variance state tensors on CPU matching the parameter shape. State tensors are zero-initialized.

Parameters
paramParameter tensor to optimize (non-owning, must be on CPU)
gradGradient tensor (non-owning, must match param shape and device)
Exceptions
std::invalid_argumentif param or grad is null
std::invalid_argumentif param and grad shapes don't match
std::invalid_argumentif param or grad is not a CPU tensor
std::invalid_argumentif param or grad data type doesn't match optimizer precision
std::runtime_errorif state allocation fails

Implements Mila::Dnn::Compute::Optimizer< DeviceType::Cpu, TPrecision >.

Here is the call graph for this function:

◆ getBeta1()

template<TensorDataType TPrecision>
float Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getBeta1 ( ) const
inlinenoexcept

Get beta1 parameter.

Here is the caller graph for this function:

◆ getBeta2()

template<TensorDataType TPrecision>
float Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getBeta2 ( ) const
inlinenoexcept

Get beta2 parameter.

Here is the caller graph for this function:

◆ getEpsilon()

template<TensorDataType TPrecision>
float Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getEpsilon ( ) const
inlinenoexcept

Get epsilon parameter.

Here is the caller graph for this function:

◆ getLearningRate()

template<TensorDataType TPrecision>
float Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getLearningRate ( ) const
inlineoverridevirtual

Zero all gradient tensors.

Clears all registered gradient tensors on CPU.

Exceptions
std::runtime_errorif no parameters have been registered

Get current learning rate.

Implements Mila::Dnn::Compute::Optimizer< DeviceType::Cpu, TPrecision >.

◆ getParameterCount()

template<TensorDataType TPrecision>
size_t Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getParameterCount ( ) const
inlinenoexcept

Get number of registered parameter groups.

◆ getStepCount()

template<TensorDataType TPrecision>
size_t Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getStepCount ( ) const
inlinenoexcept

Get current step count.

◆ getWeightDecay()

template<TensorDataType TPrecision>
float Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getWeightDecay ( ) const
inlinenoexcept

Get weight decay parameter.

Here is the caller graph for this function:

◆ setLearningRate()

template<TensorDataType TPrecision>
void Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::setLearningRate ( float learning_rate)
inlineoverridevirtual

Set learning rate for future steps.

Parameters
learning_rateNew learning rate (must be positive)
Exceptions
std::invalid_argumentif learning_rate <= 0

Implements Mila::Dnn::Compute::Optimizer< DeviceType::Cpu, TPrecision >.

◆ shapeToString()

template<TensorDataType TPrecision>
std::string Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::shapeToString ( const shape_t & shape)
inlinestaticprivate

Convert shape to string for error messages.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ step()

template<TensorDataType TPrecision>
void Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::step ( )
inlineoverridevirtual

Perform one AdamW optimization step.

Updates all registered parameters on CPU using scalar loops. Execution is synchronous.

Exceptions
std::runtime_errorif no parameters have been registered
Note
Synchronous - blocks until all updates complete
Increments internal step counter for bias correction

Implements Mila::Dnn::Compute::Optimizer< DeviceType::Cpu, TPrecision >.

Here is the call graph for this function:

◆ updateParameter()

template<TensorDataType TPrecision>
void Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::updateParameter ( HostType * param_data,
const HostType * grad_data,
float * m_data,
float * v_data,
size_t num_params,
float beta1_correction,
float beta2_correction )
inlineprivate

Update a single parameter using AdamW algorithm.

Performs the AdamW update for a single parameter tensor using scalar loops. Implements the complete AdamW algorithm including:

  • First moment (momentum) update
  • Second moment (RMSprop) update
  • Bias correction
  • Parameter update with decoupled weight decay
Parameters
param_dataParameter data pointer
grad_dataGradient data pointer
m_dataFirst moment state pointer
v_dataSecond moment state pointer
num_paramsNumber of scalar parameters
beta1_correctionBias correction for first moment (1 - beta1^t)
beta2_correctionBias correction for second moment (1 - beta2^t)
Here is the call graph for this function:
Here is the caller graph for this function:

Member Data Documentation

◆ config_

template<TensorDataType TPrecision>
AdamWConfig Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::config_
private

◆ context_

template<TensorDataType TPrecision>
IExecutionContext* Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::context_
private

◆ grad_data_

template<TensorDataType TPrecision>
std::vector<const HostType*> Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::grad_data_
private

◆ grads_

template<TensorDataType TPrecision>
std::vector<ITensor*> Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::grads_
private

◆ learning_rate_

template<TensorDataType TPrecision>
float Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::learning_rate_
private

◆ m_data_

template<TensorDataType TPrecision>
std::vector<float*> Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::m_data_
private

◆ m_states_

template<TensorDataType TPrecision>
std::vector<std::shared_ptr<Tensor<TensorDataType::FP32, MR> > > Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::m_states_
private

◆ param_data_

template<TensorDataType TPrecision>
std::vector<HostType*> Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::param_data_
private

◆ params_

template<TensorDataType TPrecision>
std::vector<ITensor*> Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::params_
private

◆ step_count_

template<TensorDataType TPrecision>
size_t Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::step_count_ { 0 }
private

◆ v_data_

template<TensorDataType TPrecision>
std::vector<float*> Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::v_data_
private

◆ v_states_

template<TensorDataType TPrecision>
std::vector<std::shared_ptr<Tensor<TensorDataType::FP32, MR> > > Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::v_states_
private

The documentation for this class was generated from the following file: