|
Mila 0.13.48
Deep Neural Network Library
|
CPU implementation of Multi-Head Attention operation. More...


Public Types | |
| using | CpuExecutionContext = ExecutionContext<DeviceType::Cpu> |
| using | MR = CpuMemoryResource |
| using | TensorType = Tensor<TensorDataType::FP32, MR> |
| Public Types inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 > | |
| using | MR |
| using | TensorInputType |
| using | TensorOutputType |
| Public Types inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| using | DataTypeTraits |
Public Member Functions | |
| CpuAttentionOp (IExecutionContext *context, const MultiHeadAttentionConfig &config) | |
| ~CpuAttentionOp () override=default | |
| void | backward (const ITensor &input, const ITensor &output_grad, ITensor &input_grad) const override |
| Backward pass: compute gradient wrt input given output gradient. | |
| void | build (const BuildContext &config) override |
| Prepare the operation for a concrete input shape. | |
| void | forward (const ITensor &input, ITensor &output) const override |
| Forward pass: compute output = f(input). | |
| std::string | getName () const override |
| Human-readable operation name. | |
| OperationType | getOperationType () const override |
| Operation type identifier. | |
| void | setGradients (ITensor *, ITensor *) override |
| Bind module-owned gradient tensors to the operation. | |
| void | setParameters (ITensor *, ITensor *) override |
| Bind module-owned parameter tensors to the operation. | |
| Public Member Functions inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 > | |
| virtual | ~UnaryOperation ()=default |
| Public Member Functions inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| virtual | ~Operation ()=default |
| virtual void | clearGradients () noexcept |
| Clear any cached gradient pointers held by the operation. | |
| virtual TensorDataType | getDataType () const |
| Tensor data type for this operation. | |
| virtual DeviceType | getDeviceType () const |
| Device type for this operation. | |
| virtual std::size_t | getStateMemorySize () const |
| Returns the number of bytes of state memory allocated by this operation. | |
| virtual bool | isBuilt () const |
| Whether build() completed successfully for a concrete input shape. | |
| virtual bool | isEvalMode () const |
| Query whether operation is configured for training. | |
| virtual void | setTrainingMode (TrainingMode training_mode) |
| Configure operation training-mode behavior. | |
Private Member Functions | |
| void | allocateStateTensors () |
| void | applySoftmax () const |
| void | computeAttentionScores (float scale) const |
| void | computeGradientAtt () const |
| void | computeGradientK () const |
| void | computeGradientPreatt (float scale) const |
| void | computeGradientQ () const |
| void | computeGradientV () const |
| void | computeOutputValues () const |
| void | permute_backward (float *dX) const |
| void | permuteQKV (const float *X) const |
| void | unpermute (float *Y) const |
| void | unpermute_backward (const float *dY) const |
| void | validateInputShape (const shape_t &input_shape) const |
Private Attributes | |
| float * | att_ { nullptr } |
| std::shared_ptr< TensorType > | att_tensor_ |
| int | B_ { 0 } |
| MultiHeadAttentionConfig | config_ |
| IExecutionContext * | context_ |
| float * | datt_ { nullptr } |
| std::shared_ptr< TensorType > | datt_tensor_ |
| float * | dk_ { nullptr } |
| std::shared_ptr< TensorType > | dk_tensor_ |
| float * | dpreatt_ { nullptr } |
| std::shared_ptr< TensorType > | dpreatt_tensor_ |
| float * | dq_ { nullptr } |
| std::shared_ptr< TensorType > | dq_tensor_ |
| float * | dv_ { nullptr } |
| std::shared_ptr< TensorType > | dv_tensor_ |
| float * | dvout_ { nullptr } |
| std::shared_ptr< TensorType > | dvout_tensor_ |
| int | embedding_dim_ { 0 } |
| int | HS_ { 0 } |
| bool | is_built_ { false } |
| float * | k_ { nullptr } |
| std::shared_ptr< TensorType > | k_tensor_ |
| int | NH_ { 0 } |
| float * | preatt_ { nullptr } |
| std::shared_ptr< TensorType > | preatt_tensor_ |
| float * | q_ { nullptr } |
| std::shared_ptr< TensorType > | q_tensor_ |
| int | qkv_dim_ { 0 } |
| int | T_ { 0 } |
| float * | v_ { nullptr } |
| float * | v_out_ { nullptr } |
| std::shared_ptr< TensorType > | v_out_tensor_ |
| std::shared_ptr< TensorType > | v_tensor_ |
Additional Inherited Members | |
| Static Public Attributes inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| static constexpr TensorDataType | data_type |
| static constexpr DeviceType | device_type |
| Static Protected Member Functions inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 > | |
| static const TensorInputType & | asInputTensor (const ITensor &t) |
| static TensorOutputType & | asOutputTensor (ITensor &t) |
| Protected Attributes inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| bool | is_built_ |
| TrainingMode | training_mode_ |
CPU implementation of Multi-Head Attention operation.
Design philosophy:
Forward pass:
Backward pass:
|
inlineexplicit |
|
overridedefault |
|
inlineprivate |

|
inlineprivate |

|
inlineoverridevirtual |
Backward pass: compute gradient wrt input given output gradient.
Signature ordered as (input, output_grad, input_grad) to match module and operation implementations across the codebase.
Implements Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 >.

|
inlineoverridevirtual |
Prepare the operation for a concrete input shape.
Default implementation is a no-op. Operations requiring shape-dependent setup should override this method.
Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.

|
inlineprivate |


|
inlineprivate |


|
inlineprivate |


|
inlineprivate |

|
inlineprivate |


|
inlineprivate |


|
inlineprivate |


|
inlineoverridevirtual |
Forward pass: compute output = f(input).
Implementations should accept polymorphic ITensor references and may use the typed aliases / helpers to obtain typed tensor references.
Implements Mila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 >.

|
inlineoverridevirtual |
Human-readable operation name.
Implements Mila::Dnn::Compute::Operation< TDeviceType, TInput >.
|
inlineoverridevirtual |
Operation type identifier.
Implements Mila::Dnn::Compute::Operation< TDeviceType, TInput >.
|
inlineprivate |

|
inlineprivate |

|
inlineoverridevirtual |
Bind module-owned gradient tensors to the operation.
New canonical API for binding gradient buffers. Mirrors semantics of setParameters() but for gradients used during backward().
The operation MUST NOT take ownership of the provided pointers. Implementations may cache rawData() pointers for hot-path writes.
Default: no-op for stateless operations.
Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.
|
inlineoverridevirtual |
Bind module-owned parameter tensors to the operation.
The module retains ownership of the provided ITensor objects. Implementations may cache rawData() pointers for hot-path access but MUST NOT free the provided pointers.
Default: no-op for stateless operations.
Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.
|
inlineprivate |

|
inlineprivate |

|
inlineprivate |


|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |