CUDA implementation of the fused softmax and cross entropy operation for neural networks.
More...
|
| FusedSoftmaxCrossEntropyOp () |
| Constructs a new CUDA Fused Softmax Cross Entropy operation with the default device context.
|
|
| FusedSoftmaxCrossEntropyOp (std::shared_ptr< DeviceContext > context) |
| Constructs a new CUDA Fused Softmax Cross Entropy operation with a specific device context.
|
|
void | backward (const Tensor< TPrecision, MR > &input1, const Tensor< int, MR > &input2, const Tensor< TPrecision, MR > &output, const Tensor< TPrecision, MR > &output_gradient, const std::vector< std::shared_ptr< Tensor< TPrecision, MR > > > ¶meters, std::vector< std::shared_ptr< Tensor< TPrecision, MR > > > ¶meter_gradients, Tensor< TPrecision, MR > &input1_gradient, Tensor< int, MR > &input2_gradient, const OperationAttributes &properties, const std::vector< std::shared_ptr< Tensor< TPrecision, MR > > > &output_state) const override |
| Performs the backward pass of the fused softmax cross entropy operation.
|
|
void | forward (const Tensor< TPrecision, MR > &logits, const Tensor< int, MR > &targets, const std::vector< std::shared_ptr< Tensor< TPrecision, MR > > > ¶meters, const OperationAttributes &properties, Tensor< TPrecision, MR > &losses, std::vector< std::shared_ptr< Tensor< TPrecision, MR > > > &output_state) const override |
| Performs the forward pass of the fused softmax cross entropy operation on CUDA.
|
|
std::string | getName () const override |
| Gets the name of this operation.
|
|
| BinaryOperation (OperationType operation_type) |
| Constructs a BinaryOperation with the specified operation type and precision policy.
|
|
| BinaryOperation (OperationType operation_type, std::shared_ptr< DeviceContext > context) |
| Constructs a BinaryOperation with the specified operation type, device context, and precision policy.
|
|
virtual | ~BinaryOperation ()=default |
| Virtual destructor for proper cleanup of derived classes.
|
|
virtual void | backward (const Tensor< int, MR > &input1, const Tensor< TPrecision, MR > &input2, const Tensor< DeviceType::Cuda, MR > &output, const Tensor< DeviceType::Cuda, MR > &output_gradient, const std::vector< std::shared_ptr< Tensor< int, MR > > > ¶meters, std::vector< std::shared_ptr< Tensor< DeviceType::Cuda, MR > > > ¶meter_gradients, Tensor< int, MR > &input1_gradient, Tensor< TPrecision, MR > &input2_gradient, const OperationAttributes &attributes, const std::vector< std::shared_ptr< Tensor< DeviceType::Cuda, MR > > > &output_state) const |
| Executes the backward pass of a binary operation.
|
|
virtual void | forward (const Tensor< int, MR > &input1, const Tensor< TPrecision, MR > &input2, const std::vector< std::shared_ptr< Tensor< int, MR > > > ¶meters, const OperationAttributes &attributes, Tensor< DeviceType::Cuda, MR > &output, std::vector< std::shared_ptr< Tensor< DeviceType::Cuda, MR > > > &output_state) const=0 |
| Executes the forward pass of a binary operation.
|
|
| OperationBase (OperationType operation_type, std::shared_ptr< DeviceContext > context) |
| Constructs an OperationBase object with a specific device context and compute precision.
|
|
virtual | ~OperationBase ()=default |
| Virtual destructor for the OperationBase class.
|
|
std::shared_ptr< DeviceContext > | getDeviceContext () const |
| Gets the device context associated with this operation.
|
|
DeviceType | getDeviceType () const |
| Gets the device type for this operation.
|
|
OperationType | getOperationType () const |
| Gets the operation type enumeration value.
|
|
template<typename TPrecision>
requires (std::is_same_v<TPrecision, float> || std::is_same_v<TPrecision, half>)
class Mila::Dnn::Compute::FusedSoftmaxCrossEntropyOp< TPrecision >
CUDA implementation of the fused softmax and cross entropy operation for neural networks.
This class provides a CUDA-based implementation of the fused softmax and cross entropy operation, which combines two commonly used operations in neural networks to improve computational efficiency. First, the softmax function converts a vector of real numbers (logits) into a probability distribution. Then, the cross entropy computes the negative log likelihood of the correct class given the predicted probabilities.
The implementation is optimized for NVIDIA GPUs using CUDA for high-performance computation, especially for large vocabulary sizes typical in language models.
- Template Parameters
-
TPrecision | The data type used for computation (float or half). |