|
| | CudaMatMulBiasGeluOp () |
| | Constructs a new CUDA MatMul-Bias-GELU fused operation with the default device context.
|
| void | backward (const Tensor< TInput, MR > &input, const Tensor< TOutput, MR > &output, const Tensor< TOutput, MR > &output_gradient, const std::vector< std::shared_ptr< ITensor > > ¶meters, std::vector< std::shared_ptr< Tensor< TOutput, MR > > > ¶meter_gradients, Tensor< TInput, MR > &input_gradient, const std::vector< std::shared_ptr< Tensor< TOutput, MR > > > &output_state) const |
| | Performs the backward pass of the fused MatMul-Bias-GELU operation.
|
| void | forward (const Tensor< TInput, MR > &input, const std::vector< std::shared_ptr< ITensor > > ¶meters, Tensor< TOutput, MR > &output, std::vector< std::shared_ptr< Tensor< TOutput, MR > > > &output_state) const override |
| | Performs the forward pass of the fused MatMul-Bias-GELU operation on CUDA.
|
| std::string | getName () const override |
| | Gets the name of this operation.
|
| virtual | ~UnaryOperation ()=default |
| virtual void | backward (const ITensor &input, const ITensor &output_grad, ITensor &input_grad) const=0 |
| | Backward pass: compute gradient wrt input given output gradient.
|
| virtual void | forward (const ITensor &input, ITensor &output) const=0 |
| | Forward pass: compute output = f(input).
|
| virtual | ~Operation ()=default |
| virtual void | build (const BuildContext &build_context) |
| | Prepare the operation for a concrete input shape.
|
| virtual void | clearGradients () noexcept |
| | Clear any cached gradient pointers held by the operation.
|
| virtual TensorDataType | getDataType () const |
| | Tensor data type for this operation.
|
| virtual DeviceType | getDeviceType () const |
| | Device type for this operation.
|
| virtual OperationType | getOperationType () const=0 |
| | Operation type identifier.
|
| virtual std::size_t | getStateMemorySize () const |
| | Returns the number of bytes of state memory allocated by this operation.
|
| virtual bool | isBuilt () const |
| | Whether build() completed successfully for a concrete input shape.
|
| virtual bool | isEvalMode () const |
| | Query whether operation is configured for training.
|
| virtual void | setGradients (ITensor *weight_grad, ITensor *bias_grad) |
| | Bind module-owned gradient tensors to the operation.
|
| virtual void | setParameters (ITensor *weight, ITensor *bias) |
| | Bind module-owned parameter tensors to the operation.
|
| virtual void | setTrainingMode (TrainingMode training_mode) |
| | Configure operation training-mode behavior.
|
template<typename TInput = float, typename TOutput = TInput>
requires ValidFloatTensorTypes<TInput, TOutput>
class Mila::Dnn::Compute::Cuda::MatMulBiasGelu::CudaMatMulBiasGeluOp< TInput, TOutput >
CUDA implementation of the fused MatMul-Bias-GELU operation.
This class provides a CUDA-based implementation of a fused operation that combines matrix multiplication, bias addition, and GELU activation in a single operation. Fusing these operations improves performance by reducing memory traffic and kernel launch overhead.
The implementation is optimized for NVIDIA GPUs using cuBLASLt for high-performance computation of the fused operation.
- Template Parameters
-
| TPrecision | The data type of the input tensor elements. |
| TDataType | The data type for computation and output (defaults to the input type). |