Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::Compute::CpuLinearOp Class Referenceexport

CPU implementation of Linear operation using abstract TensorDataType API. More...

Inheritance diagram for Mila::Dnn::Compute::CpuLinearOp:
Collaboration diagram for Mila::Dnn::Compute::CpuLinearOp:

Public Types

using CpuExecutionContext = ExecutionContext<DeviceType::Cpu>
using MR = CpuMemoryResource
using TensorType = Tensor<TensorDataType::FP32, MR>
using UnaryOperationBase = UnaryOperation<DeviceType::Cpu, TensorDataType::FP32>
Public Types inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 >
using MR
using TensorInputType
using TensorOutputType
Public Types inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput >
using DataTypeTraits

Public Member Functions

 CpuLinearOp (IExecutionContext *context, const LinearConfig &config)
void backward (const ITensor &input, const ITensor &output_grad, ITensor &input_grad) const override
 Backward pass - HOT PATH, pure dispatch to CPU kernel.
void build (const BuildContext &config) override
 Build the operation for a concrete input shape.
void forward (const ITensor &input, ITensor &output) const override
 Forward pass - HOT PATH, pure dispatch to CPU kernel.
const LinearConfiggetConfig () const
std::string getName () const override
 Human-readable operation name.
OperationType getOperationType () const override
 Operation type identifier.
void setGradients (ITensor *weight_grad, ITensor *bias_grad) override
 Bind module-owned gradient tensors to the operation.
void setParameters (ITensor *weight, ITensor *bias) override
 Set parameter tensor references (module remains owner).
Public Member Functions inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 >
virtual ~UnaryOperation ()=default
Public Member Functions inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput >
virtual ~Operation ()=default
virtual void clearGradients () noexcept
 Clear any cached gradient pointers held by the operation.
virtual TensorDataType getDataType () const
 Tensor data type for this operation.
virtual DeviceType getDeviceType () const
 Device type for this operation.
virtual std::size_t getStateMemorySize () const
 Returns the number of bytes of state memory allocated by this operation.
virtual bool isBuilt () const
 Whether build() completed successfully for a concrete input shape.
virtual bool isEvalMode () const
 Query whether operation is configured for training.
virtual void setTrainingMode (TrainingMode training_mode)
 Configure operation training-mode behavior.

Private Member Functions

void forwardNaive (const float *X, float *Y, const float *W, const float *B, int64_t batch_size, int64_t in_features, int64_t out_features) const
 Naive forward implementation without loop unrolling.
void forwardUnrolled (const float *X, float *Y, const float *W, const float *B, int64_t batch_size, int64_t in_features, int64_t out_features) const
 Optimized forward implementation with loop unrolling.

Private Attributes

int64_t batch_size_ { 0 }
const float * bias_ { nullptr }
float * bias_grad_ { nullptr }
LinearConfig config_
IExecutionContextcontext_
bool enable_omp_ { false }
int64_t in_features_ { 0 }
int64_t out_features_ { 0 }
bool use_loop_unroll_ { false }
const float * weight_ { nullptr }
float * weight_grad_ { nullptr }
int64_t weight_in_features_ { 0 }
int64_t weight_out_features_ { 0 }

Static Private Attributes

static constexpr int LOOP_UNROLL = 8

Additional Inherited Members

Static Public Attributes inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput >
static constexpr TensorDataType data_type
static constexpr DeviceType device_type
Static Protected Member Functions inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 >
static const TensorInputTypeasInputTensor (const ITensor &t)
static TensorOutputTypeasOutputTensor (ITensor &t)
Protected Attributes inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput >
bool is_built_
TrainingMode training_mode_

Detailed Description

CPU implementation of Linear operation using abstract TensorDataType API.

Template parameter TPrecision selects the abstract tensor precision (e.g. FP32). float is the corresponding CPU host representation for that precision.

Design philosophy:

  • Two-phase initialization: build() does all setup, forward()/backward() are pure dispatch
  • Module owns weight/bias parameters and binds them via setParameters()
  • All dimension computation happens once in build()
  • Forward/backward are hot-path methods with minimal overhead
  • Implements: y = x * W^T + b where W is (out_features, in_features)

Forward: Y = X * W^T + b Backward:

  • dX += dY * W
  • dW += dY^T * X
  • db += sum(dY)

Member Typedef Documentation

◆ CpuExecutionContext

◆ MR

◆ TensorType

◆ UnaryOperationBase

Constructor & Destructor Documentation

◆ CpuLinearOp()

Mila::Dnn::Compute::CpuLinearOp::CpuLinearOp ( IExecutionContext * context,
const LinearConfig & config )
inline

Member Function Documentation

◆ backward()

void Mila::Dnn::Compute::CpuLinearOp::backward ( const ITensor & input,
const ITensor & output_grad,
ITensor & input_grad ) const
inlineoverridevirtual

Backward pass - HOT PATH, pure dispatch to CPU kernel.

Similar to forward(), this method does minimal work and dispatches directly to the backward kernel using cached dimensions from build().

Gradients are accumulated into the tensors provided via setParameterGradients().

Algorithm:

  • dX += dY * W
  • dW += dY^T * X
  • db += sum(dY)

Implements Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 >.

Here is the call graph for this function:

◆ build()

void Mila::Dnn::Compute::CpuLinearOp::build ( const BuildContext & config)
inlineoverridevirtual

Build the operation for a concrete input shape.

This is the COLD PATH where all setup, validation, and computation happens ONCE. After build() completes, forward() and backward() become pure dispatch methods.

Responsibilities:

  1. Validate parameters were bound via setParameters()
  2. Validate input shape compatibility with weight dimensions
  3. Compute and cache batch size and feature dimensions
  4. Determine optimal loop unrolling strategy
  5. Cache OMP parallelization threshold

After build(), the operation is ready for zero-overhead forward/backward dispatch.

Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.

Here is the call graph for this function:

◆ forward()

void Mila::Dnn::Compute::CpuLinearOp::forward ( const ITensor & input,
ITensor & output ) const
inlineoverridevirtual

Forward pass - HOT PATH, pure dispatch to CPU kernel.

All setup, validation, and dimension computation was done in build(). This method extracts raw pointers and dispatches directly to the optimized matrix multiplication kernel using pre-computed cached dimensions.

Algorithm: Y = X * W^T + b Zero redundant work - maximum performance.

Implements Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 >.

Here is the call graph for this function:

◆ forwardNaive()

void Mila::Dnn::Compute::CpuLinearOp::forwardNaive ( const float * X,
float * Y,
const float * W,
const float * B,
int64_t batch_size,
int64_t in_features,
int64_t out_features ) const
inlineprivate

Naive forward implementation without loop unrolling.

Used when batch size doesn't align with unroll factor.

Here is the caller graph for this function:

◆ forwardUnrolled()

void Mila::Dnn::Compute::CpuLinearOp::forwardUnrolled ( const float * X,
float * Y,
const float * W,
const float * B,
int64_t batch_size,
int64_t in_features,
int64_t out_features ) const
inlineprivate

Optimized forward implementation with loop unrolling.

Processes LOOP_UNROLL batch elements simultaneously for better cache utilization.

Here is the caller graph for this function:

◆ getConfig()

const LinearConfig & Mila::Dnn::Compute::CpuLinearOp::getConfig ( ) const
inline

◆ getName()

std::string Mila::Dnn::Compute::CpuLinearOp::getName ( ) const
inlineoverridevirtual

Human-readable operation name.

Implements Mila::Dnn::Compute::Operation< TDeviceType, TInput >.

◆ getOperationType()

OperationType Mila::Dnn::Compute::CpuLinearOp::getOperationType ( ) const
inlineoverridevirtual

◆ setGradients()

void Mila::Dnn::Compute::CpuLinearOp::setGradients ( ITensor * weight_grad,
ITensor * bias_grad )
inlineoverridevirtual

Bind module-owned gradient tensors to the operation.

New canonical API for binding gradient buffers. Mirrors semantics of setParameters() but for gradients used during backward().

The operation MUST NOT take ownership of the provided pointers. Implementations may cache rawData() pointers for hot-path writes.

Default: no-op for stateless operations.

Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.

Here is the call graph for this function:

◆ setParameters()

void Mila::Dnn::Compute::CpuLinearOp::setParameters ( ITensor * weight,
ITensor * bias )
inlineoverridevirtual

Set parameter tensor references (module remains owner).

The operation caches native host pointers for hot-path access. The weight tensor is required; bias is bound only when the Linear config indicates a bias is present.

Note: build() requires parameters to be bound before it is called.

Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.

Here is the call graph for this function:

Member Data Documentation

◆ batch_size_

int64_t Mila::Dnn::Compute::CpuLinearOp::batch_size_ { 0 }
private

◆ bias_

const float* Mila::Dnn::Compute::CpuLinearOp::bias_ { nullptr }
private

◆ bias_grad_

float* Mila::Dnn::Compute::CpuLinearOp::bias_grad_ { nullptr }
private

◆ config_

LinearConfig Mila::Dnn::Compute::CpuLinearOp::config_
private

◆ context_

IExecutionContext* Mila::Dnn::Compute::CpuLinearOp::context_
private

◆ enable_omp_

bool Mila::Dnn::Compute::CpuLinearOp::enable_omp_ { false }
private

◆ in_features_

int64_t Mila::Dnn::Compute::CpuLinearOp::in_features_ { 0 }
private

◆ LOOP_UNROLL

int Mila::Dnn::Compute::CpuLinearOp::LOOP_UNROLL = 8
staticconstexprprivate

◆ out_features_

int64_t Mila::Dnn::Compute::CpuLinearOp::out_features_ { 0 }
private

◆ use_loop_unroll_

bool Mila::Dnn::Compute::CpuLinearOp::use_loop_unroll_ { false }
private

◆ weight_

const float* Mila::Dnn::Compute::CpuLinearOp::weight_ { nullptr }
private

◆ weight_grad_

float* Mila::Dnn::Compute::CpuLinearOp::weight_grad_ { nullptr }
private

◆ weight_in_features_

int64_t Mila::Dnn::Compute::CpuLinearOp::weight_in_features_ { 0 }
private

◆ weight_out_features_

int64_t Mila::Dnn::Compute::CpuLinearOp::weight_out_features_ { 0 }
private

The documentation for this class was generated from the following file:
  • /__w/Mila/Mila/Mila/Src/Dnn/Compute/Devices/Cpu/Operations/CpuLinearOp.ixx