CPU-specific AdamW optimizer implementation. More...

Inheritance diagram for Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >:

Collaboration diagram for Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >:

[legend]

Public Types
using	ExecutionContextType = ExecutionContext<DeviceType::Cpu>
using	HostType = typename TensorHostTypeMap<TPrecision>::host_type
using	MR = CpuMemoryResource
using	TensorType = Tensor<TPrecision, MR>

Public Member Functions
	CpuAdamWOptimizer (IExecutionContext *context, const AdamWConfig &config)
	Construct CPU AdamW optimizer.
	~CpuAdamWOptimizer () override=default
void	addParameter (ITensor param, ITensor grad) override
	Register a parameter-gradient pair for optimization.
float	getBeta1 () const noexcept
	Get beta1 parameter.
float	getBeta2 () const noexcept
	Get beta2 parameter.
float	getEpsilon () const noexcept
	Get epsilon parameter.
float	getLearningRate () const override
	Zero all gradient tensors.
size_t	getParameterCount () const noexcept
	Get number of registered parameter groups.
size_t	getStepCount () const noexcept
	Get current step count.
float	getWeightDecay () const noexcept
	Get weight decay parameter.
void	setLearningRate (float learning_rate) override
	Set learning rate for future steps.
void	step () override
	Perform one AdamW optimization step.
Public Member Functions inherited from Mila::Dnn::Compute::Optimizer< DeviceType::Cpu, TPrecision >
virtual	~Optimizer ()=default

Private Member Functions
void	updateParameter (HostType param_data, const HostType grad_data, float m_data, float v_data, size_t num_params, float beta1_correction, float beta2_correction)
	Update a single parameter using AdamW algorithm.

Static Private Member Functions
static std::string	shapeToString (const shape_t &shape)
	Convert shape to string for error messages.

Private Attributes
AdamWConfig	config_
IExecutionContext *	context_
std::vector< const HostType * >	grad_data_
std::vector< ITensor * >	grads_
float	learning_rate_
std::vector< float * >	m_data_
std::vector< std::shared_ptr< Tensor< TensorDataType::FP32, MR > > >	m_states_
std::vector< HostType * >	param_data_
std::vector< ITensor * >	params_
size_t	step_count_ { 0 }
std::vector< float * >	v_data_
std::vector< std::shared_ptr< Tensor< TensorDataType::FP32, MR > > >	v_states_

Detailed Description

template<TensorDataType TPrecision>
requires PrecisionSupportedOnDevice<TPrecision, DeviceType::Cpu>
class Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >

CPU-specific AdamW optimizer implementation.

Implements the AdamW algorithm using scalar CPU loops. Maintains per-parameter state tensors (first moment, second moment) and performs synchronous parameter updates.

AdamW algorithm:

m_t = beta1 * m_{t-1} + (1 - beta1) * g_t (first moment)
v_t = beta2 * v_{t-1} + (1 - beta2) * g_t^2 (second moment)
m_hat = m_t / (1 - beta1^t) (bias correction)
v_hat = v_t / (1 - beta2^t) (bias correction)
theta_t = theta_{t-1} - lr * (m_hat / (sqrt(v_hat) + eps) + wd * theta_{t-1})

Features:

Decoupled weight decay (AdamW variant)
Bias correction for moments
FP32 state for numerical stability
Synchronous execution

Template Parameters

TPrecision Abstract tensor precision (TensorDataType)

Member Typedef Documentation

◆ ExecutionContextType

template<TensorDataType TPrecision>

using Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::ExecutionContextType = ExecutionContext<DeviceType::Cpu>

◆ HostType

template<TensorDataType TPrecision>

using Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::HostType = typename TensorHostTypeMap<TPrecision>::host_type

◆ MR

template<TensorDataType TPrecision>

using Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::MR = CpuMemoryResource

◆ TensorType

template<TensorDataType TPrecision>

using Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::TensorType = Tensor<TPrecision, MR>

Constructor & Destructor Documentation

◆ CpuAdamWOptimizer()

template<TensorDataType TPrecision>

Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::CpuAdamWOptimizer	(	IExecutionContext *	context,
		const AdamWConfig &	config )

inlineexplicit

Construct CPU AdamW optimizer.

Parameters

exec_context	CPU execution context
learning_rate	Initial learning rate (typical: 1e-3 to 1e-4)
beta1	Exponential decay rate for first moment (typical: 0.9)
beta2	Exponential decay rate for second moment (typical: 0.999)
epsilon	Small constant for numerical stability (typical: 1e-8)
weight_decay	Weight decay coefficient (typical: 0.01)

Exceptions

std::invalid_argument	if exec_context is null
std::invalid_argument	if learning_rate <= 0
std::invalid_argument	if beta1, beta2 not in (0, 1)
std::invalid_argument	if epsilon <= 0

◆ ~CpuAdamWOptimizer()

template<TensorDataType TPrecision>

Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::~CpuAdamWOptimizer ( )

overridedefault

Member Function Documentation

◆ addParameter()

template<TensorDataType TPrecision>

void Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::addParameter	(	ITensor *	param,
		ITensor *	grad )

inlineoverridevirtual

Register a parameter-gradient pair for optimization.

The optimizer does not take ownership of the parameter/gradient tensors. The caller (typically a Module) must ensure the tensors remain valid for the lifetime of the optimizer.

Allocates momentum and variance state tensors on CPU matching the parameter shape. State tensors are zero-initialized.

Parameters

param	Parameter tensor to optimize (non-owning, must be on CPU)
grad	Gradient tensor (non-owning, must match param shape and device)

Exceptions

std::invalid_argument	if param or grad is null
std::invalid_argument	if param and grad shapes don't match
std::invalid_argument	if param or grad is not a CPU tensor
std::invalid_argument	if param or grad data type doesn't match optimizer precision
std::runtime_error	if state allocation fails

Implements Mila::Dnn::Compute::Optimizer< DeviceType::Cpu, TPrecision >.

Here is the call graph for this function:

◆ getBeta1()

template<TensorDataType TPrecision>

float Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getBeta1 ( ) const

inlinenoexcept

Get beta1 parameter.

Here is the caller graph for this function:

◆ getBeta2()

template<TensorDataType TPrecision>

float Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getBeta2 ( ) const

inlinenoexcept

Get beta2 parameter.

Here is the caller graph for this function:

◆ getEpsilon()

template<TensorDataType TPrecision>

float Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getEpsilon ( ) const

inlinenoexcept

Get epsilon parameter.

Here is the caller graph for this function:

◆ getLearningRate()

template<TensorDataType TPrecision>

float Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getLearningRate ( ) const

inlineoverridevirtual

Zero all gradient tensors.

Clears all registered gradient tensors on CPU.

Exceptions

std::runtime_error if no parameters have been registered

Get current learning rate.

Implements Mila::Dnn::Compute::Optimizer< DeviceType::Cpu, TPrecision >.

◆ getParameterCount()

template<TensorDataType TPrecision>

size_t Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getParameterCount ( ) const

inlinenoexcept

Get number of registered parameter groups.

◆ getStepCount()

template<TensorDataType TPrecision>

size_t Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getStepCount ( ) const

inlinenoexcept

Get current step count.

◆ getWeightDecay()

template<TensorDataType TPrecision>

float Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::getWeightDecay ( ) const

inlinenoexcept

Get weight decay parameter.

Here is the caller graph for this function:

◆ setLearningRate()

template<TensorDataType TPrecision>

void Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::setLearningRate ( float learning_rate )

inlineoverridevirtual

Set learning rate for future steps.

Parameters

learning_rate New learning rate (must be positive)

Exceptions

std::invalid_argument if learning_rate <= 0

Implements Mila::Dnn::Compute::Optimizer< DeviceType::Cpu, TPrecision >.

◆ shapeToString()

template<TensorDataType TPrecision>

std::string Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::shapeToString ( const shape_t & shape )

inlinestaticprivate

Convert shape to string for error messages.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ step()

template<TensorDataType TPrecision>

void Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::step ( )

inlineoverridevirtual

Perform one AdamW optimization step.

Updates all registered parameters on CPU using scalar loops. Execution is synchronous.

Exceptions

std::runtime_error if no parameters have been registered

Note: Synchronous - blocks until all updates complete; Increments internal step counter for bias correction

Implements Mila::Dnn::Compute::Optimizer< DeviceType::Cpu, TPrecision >.

Here is the call graph for this function:

◆ updateParameter()

template<TensorDataType TPrecision>

void Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::updateParameter	(	HostType *	param_data,
		const HostType *	grad_data,
		float *	m_data,
		float *	v_data,
		size_t	num_params,
		float	beta1_correction,
		float	beta2_correction )

inlineprivate

Update a single parameter using AdamW algorithm.

Performs the AdamW update for a single parameter tensor using scalar loops. Implements the complete AdamW algorithm including:

First moment (momentum) update
Second moment (RMSprop) update
Bias correction
Parameter update with decoupled weight decay

Parameters

param_data	Parameter data pointer
grad_data	Gradient data pointer
m_data	First moment state pointer
v_data	Second moment state pointer
num_params	Number of scalar parameters
beta1_correction	Bias correction for first moment (1 - beta1^t)
beta2_correction	Bias correction for second moment (1 - beta2^t)

Here is the call graph for this function:

Here is the caller graph for this function:

Member Data Documentation

◆ config_

template<TensorDataType TPrecision>

AdamWConfig Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::config_

private

◆ context_

template<TensorDataType TPrecision>

IExecutionContext* Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::context_

private

◆ grad_data_

template<TensorDataType TPrecision>

std::vector<const HostType*> Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::grad_data_

private

◆ grads_

template<TensorDataType TPrecision>

std::vector<ITensor*> Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::grads_

private

◆ learning_rate_

template<TensorDataType TPrecision>

float Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::learning_rate_

private

◆ m_data_

template<TensorDataType TPrecision>

std::vector<float*> Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::m_data_

private

◆ m_states_

template<TensorDataType TPrecision>

std::vector<std::shared_ptr<Tensor<TensorDataType::FP32, MR> > > Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::m_states_

private

◆ param_data_

template<TensorDataType TPrecision>

std::vector<HostType*> Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::param_data_

private

◆ params_

template<TensorDataType TPrecision>

std::vector<ITensor*> Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::params_

private

◆ step_count_

template<TensorDataType TPrecision>

size_t Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::step_count_ { 0 }

private

◆ v_data_

template<TensorDataType TPrecision>

std::vector<float*> Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::v_data_

private

◆ v_states_

template<TensorDataType TPrecision>

std::vector<std::shared_ptr<Tensor<TensorDataType::FP32, MR> > > Mila::Dnn::Compute::CpuAdamWOptimizer< TPrecision >::v_states_

private

The documentation for this class was generated from the following file:

/__w/Mila/Mila/Mila/Src/Dnn/Compute/Devices/Cpu/Optimizers/CpuAdamWOptimizer.ixx

Public Types

Public Member Functions

Private Member Functions

Static Private Member Functions

Private Attributes

Detailed Description

Member Typedef Documentation

◆ ExecutionContextType

◆ HostType

◆ MR

◆ TensorType

Constructor & Destructor Documentation

◆ CpuAdamWOptimizer()

◆ ~CpuAdamWOptimizer()

Member Function Documentation

◆ addParameter()

◆ getBeta1()

◆ getBeta2()

◆ getEpsilon()

◆ getLearningRate()

◆ getParameterCount()

◆ getStepCount()

◆ getWeightDecay()

◆ setLearningRate()

◆ shapeToString()

◆ step()

◆ updateParameter()

Member Data Documentation

◆ config_

◆ context_

◆ grad_data_

◆ grads_

◆ learning_rate_

◆ m_data_

◆ m_states_

◆ param_data_

◆ params_

◆ step_count_

◆ v_data_

◆ v_states_