CUDA implementation of the fused softmax and cross entropy operation for neural networks.
More...
|
| | FusedSoftmaxCrossEntropyOp () |
| | Constructs a new CUDA Fused Softmax Cross Entropy operation with the default device context.
|
| |
| | FusedSoftmaxCrossEntropyOp (std::shared_ptr< DeviceContext > context) |
| | Constructs a new CUDA Fused Softmax Cross Entropy operation with a specific device context.
|
| |
| void | backward (const Tensor< TPrecision, MR > &input1, const Tensor< int, MR > &input2, const Tensor< TPrecision, MR > &output, const Tensor< TPrecision, MR > &output_gradient, const std::vector< std::shared_ptr< Tensor< TPrecision, MR > > > ¶meters, std::vector< std::shared_ptr< Tensor< TPrecision, MR > > > ¶meter_gradients, Tensor< TPrecision, MR > &input1_gradient, Tensor< int, MR > &input2_gradient, const OperationAttributes &properties, const std::vector< std::shared_ptr< Tensor< TPrecision, MR > > > &output_state) const override |
| | Performs the backward pass of the fused softmax cross entropy operation.
|
| |
| void | forward (const Tensor< TPrecision, MR > &logits, const Tensor< int, MR > &targets, const std::vector< std::shared_ptr< Tensor< TPrecision, MR > > > ¶meters, const OperationAttributes &properties, Tensor< TPrecision, MR > &losses, std::vector< std::shared_ptr< Tensor< TPrecision, MR > > > &output_state) const override |
| | Performs the forward pass of the fused softmax cross entropy operation on CUDA.
|
| |
| std::string | getName () const override |
| | Gets the name of this operation.
|
| |
| | BinaryOperation (OperationType operation_type) |
| | Constructs a BinaryOperation with the specified operation type and precision policy.
|
| |
| | BinaryOperation (OperationType operation_type, std::shared_ptr< DeviceContext > context) |
| | Constructs a BinaryOperation with the specified operation type, device context, and precision policy.
|
| |
| virtual | ~BinaryOperation ()=default |
| | Virtual destructor for proper cleanup of derived classes.
|
| |
| virtual void | backward (const Tensor< int, MR > &input1, const Tensor< TPrecision, MR > &input2, const Tensor< DeviceType::Cuda, MR > &output, const Tensor< DeviceType::Cuda, MR > &output_gradient, const std::vector< std::shared_ptr< Tensor< int, MR > > > ¶meters, std::vector< std::shared_ptr< Tensor< DeviceType::Cuda, MR > > > ¶meter_gradients, Tensor< int, MR > &input1_gradient, Tensor< TPrecision, MR > &input2_gradient, const OperationAttributes &attributes, const std::vector< std::shared_ptr< Tensor< DeviceType::Cuda, MR > > > &output_state) const |
| | Executes the backward pass of a binary operation.
|
| |
| virtual void | forward (const Tensor< int, MR > &input1, const Tensor< TPrecision, MR > &input2, const std::vector< std::shared_ptr< Tensor< int, MR > > > ¶meters, const OperationAttributes &attributes, Tensor< DeviceType::Cuda, MR > &output, std::vector< std::shared_ptr< Tensor< DeviceType::Cuda, MR > > > &output_state) const=0 |
| | Executes the forward pass of a binary operation.
|
| |
| | OperationBase (OperationType operation_type, std::shared_ptr< DeviceContext > context) |
| | Constructs an OperationBase object with a specific device context and compute precision.
|
| |
| virtual | ~OperationBase ()=default |
| | Virtual destructor for the OperationBase class.
|
| |
| std::shared_ptr< DeviceContext > | getDeviceContext () const |
| | Gets the device context associated with this operation.
|
| |
| DeviceType | getDeviceType () const |
| | Gets the device type for this operation.
|
| |
| OperationType | getOperationType () const |
| | Gets the operation type enumeration value.
|
| |
template<typename TPrecision>
requires (std::is_same_v<TPrecision, float> || std::is_same_v<TPrecision, half>)
class Mila::Dnn::Compute::FusedSoftmaxCrossEntropyOp< TPrecision >
CUDA implementation of the fused softmax and cross entropy operation for neural networks.
This class provides a CUDA-based implementation of the fused softmax and cross entropy operation, which combines two commonly used operations in neural networks to improve computational efficiency. First, the softmax function converts a vector of real numbers (logits) into a probability distribution. Then, the cross entropy computes the negative log likelihood of the correct class given the predicted probabilities.
The implementation is optimized for NVIDIA GPUs using CUDA for high-performance computation, especially for large vocabulary sizes typical in language models.
- Template Parameters
-
| TPrecision | The data type used for computation (float or half). |