Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
OperationTraits.Cuda.ixx File Reference

OperationTraits specializations for all CUDA operation backends. More...

Classes

struct  Mila::Dnn::Compute::OperationTraits< OperationType::CrossEntropyOp, DeviceType::Cuda, TensorDataType::BF16, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::CrossEntropyOp, DeviceType::Cuda, TensorDataType::FP32, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::GeluOp, DeviceType::Cuda, TensorDataType::BF16, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::GeluOp, DeviceType::Cuda, TensorDataType::FP32, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::GroupedQueryAttentionOp, DeviceType::Cuda, TensorDataType::BF16, NoKvCompression >
 Unquantized BF16 path. No KV cache compression. Standard inference precision. More...
struct  Mila::Dnn::Compute::OperationTraits< OperationType::GroupedQueryAttentionOp, DeviceType::Cuda, TensorDataType::FP32, NoKvCompression >
 Unquantized FP32 path. No KV cache compression. More...
struct  Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, NoWeightQuant >
 Unquantized BF16 path. Standard inference precision. More...
struct  Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerChannelFp8<> >
 FP8 per-channel quantized BF16 path. Requires SM >= 8.0 (Ampere+). More...
struct  Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupFp4< 128 > >
 FP4 E2M1 per-group quantized BF16 path. W4A16 fused GEMM with E2M1 decode, group_size=128. Requires SM >= 8.0. More...
struct  Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupFp4< 64 > >
 FP4 E2M1 per-group quantized BF16 path. W4A16 fused GEMM with E2M1 decode, group_size=64. Requires SM >= 8.0. More...
struct  Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupInt4< 128 > >
 INT4 per-group quantized BF16 path. W4A16 fused GEMM, group_size=128. Requires SM >= 8.0. More...
struct  Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupInt4< 64 > >
 INT4 per-group quantized BF16 path. W4A16 fused GEMM, group_size=64. Requires SM >= 8.0. More...
struct  Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::FP32, NoWeightQuant >
 Unquantized FP32 path. Retained for validation and reference. More...
struct  Mila::Dnn::Compute::OperationTraits< OperationType::LpeOp, DeviceType::Cuda, TensorDataType::BF16, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::LpeOp, DeviceType::Cuda, TensorDataType::FP32, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::MultiHeadAttentionOp, DeviceType::Cuda, TensorDataType::BF16, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::MultiHeadAttentionOp, DeviceType::Cuda, TensorDataType::FP32, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::ResidualOp, DeviceType::Cuda, TensorDataType::BF16, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::ResidualOp, DeviceType::Cuda, TensorDataType::FP32, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::RmsNormOp, DeviceType::Cuda, TensorDataType::BF16, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::RmsNormOp, DeviceType::Cuda, TensorDataType::FP32, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::RopeOp, DeviceType::Cuda, TensorDataType::BF16, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::RopeOp, DeviceType::Cuda, TensorDataType::FP32, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::SoftmaxOp, DeviceType::Cuda, TensorDataType::BF16, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::SoftmaxOp, DeviceType::Cuda, TensorDataType::FP32, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::SwigluOp, DeviceType::Cuda, TensorDataType::BF16, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::SwigluOp, DeviceType::Cuda, TensorDataType::FP32, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::TokenEmbeddingOp, DeviceType::Cuda, TensorDataType::BF16, void >
struct  Mila::Dnn::Compute::OperationTraits< OperationType::TokenEmbeddingOp, DeviceType::Cuda, TensorDataType::FP32, void >

Namespaces

namespace  Mila
 Mila main API namespace.
namespace  Mila::Dnn
namespace  Mila::Dnn::Compute

Detailed Description

OperationTraits specializations for all CUDA operation backends.

This partition module is the single registration point for every (OperationType, Cuda, TPrecision, TPolicy) -> concrete op mapping. Add a new specialization block here when migrating a component from its legacy *OpTypeMap to the unified OperationTraits dispatch.

Migration status: LinearOp complete GroupedQueryAttentionOp complete (NoKvCompression; PerChannelKvFp8 pending CudaGqaOp support) SamplingOp pending policy-free ops complete