|
Mila
Deep Neural Network Library
|
Implementation of the CUDA-based Multi-Head Attention operation for transformer models. More...
#include <vector>#include <memory>#include <string>#include <stdexcept>#include "Kernels/CudaOps.h"import Compute.CudaDevice;import Dnn.Modules.Attention;import Dnn.Tensor;import Compute.DeviceContext;import Dnn.TensorTraits;import Compute.OperationRegistry;import Compute.UnaryOperation;import Compute.OperationBase;import Compute.CudaMemoryResource;import Compute.DeviceType;import Compute.OperationType;import Compute.OperationAttributes;import Dnn.ComponentConfig;import Compute.MemoryResource;Classes | |
| struct | Mila::Dnn::Compute::Detail::cuda_mha_impl< float > |
| struct | Mila::Dnn::Compute::Detail::cuda_mha_impl< half > |
| class | Mila::Dnn::Compute::CudaMultiHeadAttentionOp< TInput, TOutput > |
| CUDA implementation of the Multi-Head Attention operation for transformer models. More... | |
| class | Mila::Dnn::Compute::CudaMultiHeadAttentionOpRegistrar |
| Class responsible for registering the CudaMultiHeadAttentionOp operation. More... | |
Namespaces | |
| namespace | Mila |
| namespace | Mila::Dnn |
| namespace | Mila::Dnn::Compute |
| namespace | Mila::Dnn::Compute::Detail |
| Namespace for CUDA layer normalization implementation details. | |
Implementation of the CUDA-based Multi-Head Attention operation for transformer models.