|
Mila 0.13.48
Deep Neural Network Library
|
CPU implementation of Linear operation using abstract TensorDataType API. More...


Public Types | |
| using | CpuExecutionContext = ExecutionContext<DeviceType::Cpu> |
| using | MR = CpuMemoryResource |
| using | TensorType = Tensor<TensorDataType::FP32, MR> |
| using | UnaryOperationBase = UnaryOperation<DeviceType::Cpu, TensorDataType::FP32> |
| Public Types inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 > | |
| using | MR |
| using | TensorInputType |
| using | TensorOutputType |
| Public Types inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| using | DataTypeTraits |
Public Member Functions | |
| CpuLinearOp (IExecutionContext *context, const LinearConfig &config) | |
| void | backward (const ITensor &input, const ITensor &output_grad, ITensor &input_grad) const override |
| Backward pass - HOT PATH, pure dispatch to CPU kernel. | |
| void | build (const BuildContext &config) override |
| Build the operation for a concrete input shape. | |
| void | forward (const ITensor &input, ITensor &output) const override |
| Forward pass - HOT PATH, pure dispatch to CPU kernel. | |
| const LinearConfig & | getConfig () const |
| std::string | getName () const override |
| Human-readable operation name. | |
| OperationType | getOperationType () const override |
| Operation type identifier. | |
| void | setGradients (ITensor *weight_grad, ITensor *bias_grad) override |
| Bind module-owned gradient tensors to the operation. | |
| void | setParameters (ITensor *weight, ITensor *bias) override |
| Set parameter tensor references (module remains owner). | |
| Public Member Functions inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 > | |
| virtual | ~UnaryOperation ()=default |
| Public Member Functions inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| virtual | ~Operation ()=default |
| virtual void | clearGradients () noexcept |
| Clear any cached gradient pointers held by the operation. | |
| virtual TensorDataType | getDataType () const |
| Tensor data type for this operation. | |
| virtual DeviceType | getDeviceType () const |
| Device type for this operation. | |
| virtual std::size_t | getStateMemorySize () const |
| Returns the number of bytes of state memory allocated by this operation. | |
| virtual bool | isBuilt () const |
| Whether build() completed successfully for a concrete input shape. | |
| virtual bool | isEvalMode () const |
| Query whether operation is configured for training. | |
| virtual void | setTrainingMode (TrainingMode training_mode) |
| Configure operation training-mode behavior. | |
Private Member Functions | |
| void | forwardNaive (const float *X, float *Y, const float *W, const float *B, int64_t batch_size, int64_t in_features, int64_t out_features) const |
| Naive forward implementation without loop unrolling. | |
| void | forwardUnrolled (const float *X, float *Y, const float *W, const float *B, int64_t batch_size, int64_t in_features, int64_t out_features) const |
| Optimized forward implementation with loop unrolling. | |
Private Attributes | |
| int64_t | batch_size_ { 0 } |
| const float * | bias_ { nullptr } |
| float * | bias_grad_ { nullptr } |
| LinearConfig | config_ |
| IExecutionContext * | context_ |
| bool | enable_omp_ { false } |
| int64_t | in_features_ { 0 } |
| int64_t | out_features_ { 0 } |
| bool | use_loop_unroll_ { false } |
| const float * | weight_ { nullptr } |
| float * | weight_grad_ { nullptr } |
| int64_t | weight_in_features_ { 0 } |
| int64_t | weight_out_features_ { 0 } |
Static Private Attributes | |
| static constexpr int | LOOP_UNROLL = 8 |
Additional Inherited Members | |
| Static Public Attributes inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| static constexpr TensorDataType | data_type |
| static constexpr DeviceType | device_type |
| Static Protected Member Functions inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 > | |
| static const TensorInputType & | asInputTensor (const ITensor &t) |
| static TensorOutputType & | asOutputTensor (ITensor &t) |
| Protected Attributes inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| bool | is_built_ |
| TrainingMode | training_mode_ |
CPU implementation of Linear operation using abstract TensorDataType API.
Template parameter TPrecision selects the abstract tensor precision (e.g. FP32). float is the corresponding CPU host representation for that precision.
Design philosophy:
Forward: Y = X * W^T + b Backward:
| using Mila::Dnn::Compute::CpuLinearOp::UnaryOperationBase = UnaryOperation<DeviceType::Cpu, TensorDataType::FP32> |
|
inline |
|
inlineoverridevirtual |
Backward pass - HOT PATH, pure dispatch to CPU kernel.
Similar to forward(), this method does minimal work and dispatches directly to the backward kernel using cached dimensions from build().
Gradients are accumulated into the tensors provided via setParameterGradients().
Algorithm:
Implements Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 >.

|
inlineoverridevirtual |
Build the operation for a concrete input shape.
This is the COLD PATH where all setup, validation, and computation happens ONCE. After build() completes, forward() and backward() become pure dispatch methods.
Responsibilities:
After build(), the operation is ready for zero-overhead forward/backward dispatch.
Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.

|
inlineoverridevirtual |
Forward pass - HOT PATH, pure dispatch to CPU kernel.
All setup, validation, and dimension computation was done in build(). This method extracts raw pointers and dispatches directly to the optimized matrix multiplication kernel using pre-computed cached dimensions.
Algorithm: Y = X * W^T + b Zero redundant work - maximum performance.
Implements Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 >.

|
inlineprivate |
Naive forward implementation without loop unrolling.
Used when batch size doesn't align with unroll factor.

|
inlineprivate |
Optimized forward implementation with loop unrolling.
Processes LOOP_UNROLL batch elements simultaneously for better cache utilization.

|
inline |
|
inlineoverridevirtual |
Human-readable operation name.
Implements Mila::Dnn::Compute::Operation< TDeviceType, TInput >.
|
inlineoverridevirtual |
Operation type identifier.
Implements Mila::Dnn::Compute::Operation< TDeviceType, TInput >.
|
inlineoverridevirtual |
Bind module-owned gradient tensors to the operation.
New canonical API for binding gradient buffers. Mirrors semantics of setParameters() but for gradients used during backward().
The operation MUST NOT take ownership of the provided pointers. Implementations may cache rawData() pointers for hot-path writes.
Default: no-op for stateless operations.
Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.

|
inlineoverridevirtual |
Set parameter tensor references (module remains owner).
The operation caches native host pointers for hot-path access. The weight tensor is required; bias is bound only when the Linear config indicates a bias is present.
Note: build() requires parameters to be bound before it is called.
Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.

|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
staticconstexprprivate |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |