|
Mila 0.13.48
Deep Neural Network Library
|
Fused CPU implementation of Softmax + CrossEntropy using abstract TensorDataType API. More...


Public Types | |
| using | BinaryOperationBase = BinaryOperation<DeviceType::Cpu, TPrecision, TLogits, TTargets> |
| using | CpuExecutionContext = ExecutionContext<DeviceType::Cpu> |
| using | LogitsHostType = typename TensorHostTypeMap<TLogits>::host_type |
| using | LogitsTensorType = Tensor<TLogits, MR> |
| using | MR = CpuMemoryResource |
| using | TargetsHostType = typename TensorHostTypeMap<TTargets>::host_type |
| using | TargetsTensorType = Tensor<TTargets, MR> |
| Public Types inherited from Mila::Dnn::Compute::BinaryOperation< DeviceType::Cpu, TensorDataType::FP32, TensorDataType::FP32, TensorDataType::INT32 > | |
| using | MR |
| using | ParameterGradTensor |
| using | ParameterTensor |
| using | TensorLeftType |
| using | TensorOutputType |
| using | TensorRightType |
| Public Types inherited from Mila::Dnn::Compute::Operation< TDeviceType, TPrecision > | |
| using | DataTypeTraits |
Public Member Functions | |
| CpuSoftmaxCrossEntropyOp (IExecutionContext *context, const CrossEntropyConfig &config) | |
| Construct fused Softmax+CrossEntropy operation with execution context. | |
| void | backward (const ITensor &inputA, const ITensor &inputB, const ITensor &output_gradient, ITensor &inputA_gradient, ITensor &inputB_gradient) const override |
| Backward pass - HOT PATH, computes fused gradient. | |
| void | build (const shape_t &input_shape) override |
| Build the operation for a concrete input shape. | |
| void | forward (const ITensor &inputA, const ITensor &inputB, ITensor &output) const override |
| Forward pass - HOT PATH, computes fused softmax+cross-entropy loss. | |
| const CrossEntropyConfig & | getConfig () const |
| std::string | getName () const override |
| Human-readable operation name. | |
| OperationType | getOperationType () const override |
| Operation type identifier. | |
| void | setParameters (ITensor *class_weights, ITensor *bias) override |
| Bind optional class weights parameter. | |
| Public Member Functions inherited from Mila::Dnn::Compute::BinaryOperation< DeviceType::Cpu, TensorDataType::FP32, TensorDataType::FP32, TensorDataType::INT32 > | |
| virtual | ~BinaryOperation ()=default |
| Public Member Functions inherited from Mila::Dnn::Compute::Operation< TDeviceType, TPrecision > | |
| virtual | ~Operation ()=default |
| virtual void | build (const BuildContext &build_context) |
| Prepare the operation for a concrete input shape. | |
| virtual void | clearGradients () noexcept |
| Clear any cached gradient pointers held by the operation. | |
| virtual TensorDataType | getDataType () const |
| Tensor data type for this operation. | |
| virtual DeviceType | getDeviceType () const |
| Device type for this operation. | |
| virtual std::size_t | getStateMemorySize () const |
| Returns the number of bytes of state memory allocated by this operation. | |
| virtual bool | isBuilt () const |
| Whether build() completed successfully for a concrete input shape. | |
| virtual bool | isEvalMode () const |
| Query whether operation is configured for training. | |
| virtual void | setGradients (ITensor *weight_grad, ITensor *bias_grad) |
| Bind module-owned gradient tensors to the operation. | |
| virtual void | setTrainingMode (TrainingMode training_mode) |
| Configure operation training-mode behavior. | |
Private Attributes | |
| int64_t | cached_outer_size_ { 0 } |
| int64_t | cached_vocab_size_ { 0 } |
| ITensor * | class_weights_ { nullptr } |
| CrossEntropyConfig | config_ |
| std::shared_ptr< CpuExecutionContext > | context_ |
| bool | enable_omp_ { false } |
Additional Inherited Members | |
| Static Public Attributes inherited from Mila::Dnn::Compute::Operation< TDeviceType, TPrecision > | |
| static constexpr TensorDataType | data_type |
| static constexpr DeviceType | device_type |
| Static Protected Member Functions inherited from Mila::Dnn::Compute::BinaryOperation< DeviceType::Cpu, TensorDataType::FP32, TensorDataType::FP32, TensorDataType::INT32 > | |
| static const TensorLeftType & | asLeftTensor (const ITensor &t) |
| static TensorOutputType & | asOutputTensor (ITensor &t) |
| static const TensorRightType & | asRightTensor (const ITensor &t) |
| Protected Attributes inherited from Mila::Dnn::Compute::Operation< TDeviceType, TPrecision > | |
| bool | is_built_ |
| TrainingMode | training_mode_ |
Fused CPU implementation of Softmax + CrossEntropy using abstract TensorDataType API.
This operation combines softmax normalization and cross-entropy loss computation into a single numerically stable binary operation (logits + targets ? loss).
Key properties:
Design philosophy:
| TLogitsPrecision | Precision for logits/gradients (FP32) |
| TTargets | Target indices data type (typically INT32) |
| using Mila::Dnn::Compute::CpuSoftmaxCrossEntropyOp< TPrecision, TLogits, TTargets >::BinaryOperationBase = BinaryOperation<DeviceType::Cpu, TPrecision, TLogits, TTargets> |
| using Mila::Dnn::Compute::CpuSoftmaxCrossEntropyOp< TPrecision, TLogits, TTargets >::CpuExecutionContext = ExecutionContext<DeviceType::Cpu> |
| using Mila::Dnn::Compute::CpuSoftmaxCrossEntropyOp< TPrecision, TLogits, TTargets >::LogitsHostType = typename TensorHostTypeMap<TLogits>::host_type |
| using Mila::Dnn::Compute::CpuSoftmaxCrossEntropyOp< TPrecision, TLogits, TTargets >::LogitsTensorType = Tensor<TLogits, MR> |
| using Mila::Dnn::Compute::CpuSoftmaxCrossEntropyOp< TPrecision, TLogits, TTargets >::MR = CpuMemoryResource |
| using Mila::Dnn::Compute::CpuSoftmaxCrossEntropyOp< TPrecision, TLogits, TTargets >::TargetsHostType = typename TensorHostTypeMap<TTargets>::host_type |
| using Mila::Dnn::Compute::CpuSoftmaxCrossEntropyOp< TPrecision, TLogits, TTargets >::TargetsTensorType = Tensor<TTargets, MR> |
|
inline |
Construct fused Softmax+CrossEntropy operation with execution context.
| context | CPU execution context |
| config | CrossEntropy configuration (vocab_size required) |
|
inlineoverridevirtual |
Backward pass - HOT PATH, computes fused gradient.
Beautiful property of fused softmax+cross-entropy: dL/dlogits = softmax(logits) - one_hot(targets)
Algorithm: For each sample:
| inputA | Logits tensor from forward pass |
| inputB | Targets tensor from forward pass |
| output_gradient | Gradient w.r.t. loss (typically scalar 1.0) |
| inputA_gradient | Output: gradient w.r.t. logits |
| inputB_gradient | Unused (targets are integers, not differentiable) |
|
inlineoverride |
Build the operation for a concrete input shape.
This is the COLD PATH where all setup, validation, and computation happens ONCE.
Expected input shape: [batch_size, seq_length, vocab_size] or [batch_size, vocab_size] Target shape: [batch_size, seq_length] or [batch_size]
Responsibilities:
|
inlineoverridevirtual |
Forward pass - HOT PATH, computes fused softmax+cross-entropy loss.
Computes: loss = -log(softmax(logits)[target])
Algorithm (numerically stable): For each sample:
| inputA | Logits tensor [outer_size, vocab_size] |
| inputB | Targets tensor [outer_size] (integer class indices) |
| output | Loss tensor (scalar or per-sample) |
|
inline |
|
inlineoverridevirtual |
Human-readable operation name.
Implements Mila::Dnn::Compute::Operation< TDeviceType, TPrecision >.
|
inlineoverridevirtual |
Operation type identifier.
Implements Mila::Dnn::Compute::Operation< TDeviceType, TPrecision >.
|
inlineoverridevirtual |
Bind optional class weights parameter.
| class_weights | Optional class weights tensor (may be null) |
| bias | Unused (must be null) |
Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TPrecision >.
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |