|
Mila 0.13.48
Deep Neural Network Library
|
CUDA implementation of Softmax operation using abstract TensorDataType API. More...


Public Types | |
| using | CudaExecutionContext = ExecutionContext<DeviceType::Cuda> |
| using | MR = CudaDeviceMemoryResource |
| using | NativeType = typename Cuda::TensorDataTypeMap<TPrecision>::device_type |
| using | TensorType = Tensor<TPrecision, MR> |
| using | UnaryOperationBase = UnaryOperation<DeviceType::Cuda, TPrecision> |
| Public Types inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TPrecision > | |
| using | MR |
| using | TensorInputType |
| using | TensorOutputType |
| Public Types inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| using | DataTypeTraits |
Public Member Functions | |
| CudaSoftmaxOp (IExecutionContext *context, const SoftmaxConfig &config) | |
| void | backward (const ITensor &input, const ITensor &output_grad, ITensor &input_grad) const override |
| Backward pass - HOT PATH, pure dispatch to CUDA kernel. | |
| void | build (const BuildContext &config) override |
| Build the operation for a concrete input shape. | |
| void | forward (const ITensor &input, ITensor &output) const override |
| Forward pass - HOT PATH, pure dispatch to CUDA kernel. | |
| const SoftmaxConfig & | getConfig () const |
| std::string | getName () const override |
| Human-readable operation name. | |
| OperationType | getOperationType () const override |
| Operation type identifier. | |
| void | setParameters (ITensor *weight, ITensor *bias) override |
| Set parameter tensor references (no-op for Softmax - stateless operation). | |
| Public Member Functions inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TPrecision > | |
| virtual | ~UnaryOperation ()=default |
| Public Member Functions inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| virtual | ~Operation ()=default |
| virtual void | clearGradients () noexcept |
| Clear any cached gradient pointers held by the operation. | |
| virtual TensorDataType | getDataType () const |
| Tensor data type for this operation. | |
| virtual DeviceType | getDeviceType () const |
| Device type for this operation. | |
| virtual std::size_t | getStateMemorySize () const |
| Returns the number of bytes of state memory allocated by this operation. | |
| virtual bool | isBuilt () const |
| Whether build() completed successfully for a concrete input shape. | |
| virtual bool | isEvalMode () const |
| Query whether operation is configured for training. | |
| virtual void | setGradients (ITensor *weight_grad, ITensor *bias_grad) |
| Bind module-owned gradient tensors to the operation. | |
| virtual void | setTrainingMode (TrainingMode training_mode) |
| Configure operation training-mode behavior. | |
Private Attributes | |
| int64_t | cached_axis_ { -1 } |
| int | cached_dim_size_ { 0 } |
| int | cached_inner_size_ { 0 } |
| int | cached_outer_size_ { 0 } |
| SoftmaxConfig | config_ |
| CudaExecutionContext * | context_ |
| Detail::cuda_softmax_impl< NativeType > | impl_ |
| bool | use_optimized_kernel_ { false } |
Additional Inherited Members | |
| Static Public Attributes inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| static constexpr TensorDataType | data_type |
| static constexpr DeviceType | device_type |
| Static Protected Member Functions inherited from Mila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TPrecision > | |
| static const TensorInputType & | asInputTensor (const ITensor &t) |
| static TensorOutputType & | asOutputTensor (ITensor &t) |
| Protected Attributes inherited from Mila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| bool | is_built_ |
| TrainingMode | training_mode_ |
CUDA implementation of Softmax operation using abstract TensorDataType API.
Template parameter TPrecision selects the abstract tensor precision (e.g. FP32, FP16). NativeType is the corresponding CUDA device representation for that precision.
Design philosophy:
| using Mila::Dnn::Compute::Cuda::Softmax::CudaSoftmaxOp< TPrecision >::CudaExecutionContext = ExecutionContext<DeviceType::Cuda> |
| using Mila::Dnn::Compute::Cuda::Softmax::CudaSoftmaxOp< TPrecision >::MR = CudaDeviceMemoryResource |
| using Mila::Dnn::Compute::Cuda::Softmax::CudaSoftmaxOp< TPrecision >::NativeType = typename Cuda::TensorDataTypeMap<TPrecision>::device_type |
| using Mila::Dnn::Compute::Cuda::Softmax::CudaSoftmaxOp< TPrecision >::TensorType = Tensor<TPrecision, MR> |
| using Mila::Dnn::Compute::Cuda::Softmax::CudaSoftmaxOp< TPrecision >::UnaryOperationBase = UnaryOperation<DeviceType::Cuda, TPrecision> |
|
inline |
|
inlineoverridevirtual |
Backward pass - HOT PATH, pure dispatch to CUDA kernel.
Similar to forward(), this method does minimal work and dispatches directly to the backward kernel using cached dimensions from build().
Implements Mila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TPrecision >.
|
inlineoverridevirtual |
Build the operation for a concrete input shape.
This is the COLD PATH where all setup, validation, and computation happens ONCE. After build() completes, forward() and backward() become pure dispatch methods.
Responsibilities:
After build(), the operation is ready for zero-overhead forward/backward dispatch.
Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.
|
inlineoverridevirtual |
Forward pass - HOT PATH, pure dispatch to CUDA kernel.
All setup, validation, and dimension computation was done in build(). This method extracts raw pointers and dispatches directly to the appropriate kernel variant using pre-computed cached dimensions and kernel selection.
Zero redundant work - maximum performance.
Implements Mila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TPrecision >.
|
inline |
|
inlineoverridevirtual |
Human-readable operation name.
Implements Mila::Dnn::Compute::Operation< TDeviceType, TInput >.
|
inlineoverridevirtual |
Operation type identifier.
Implements Mila::Dnn::Compute::Operation< TDeviceType, TInput >.
|
inlineoverridevirtual |
Set parameter tensor references (no-op for Softmax - stateless operation).
Softmax has no trainable parameters, so this method validates that the inputs are null and does nothing else.
Reimplemented from Mila::Dnn::Compute::Operation< TDeviceType, TInput >.
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |