|
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::CrossEntropyOp, DeviceType::Cuda, TensorDataType::BF16, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::CrossEntropyOp, DeviceType::Cuda, TensorDataType::FP32, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::GeluOp, DeviceType::Cuda, TensorDataType::BF16, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::GeluOp, DeviceType::Cuda, TensorDataType::FP32, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::GroupedQueryAttentionOp, DeviceType::Cuda, TensorDataType::BF16, NoKvCompression > |
| | Unquantized BF16 path. No KV cache compression. Standard inference precision. More...
|
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::GroupedQueryAttentionOp, DeviceType::Cuda, TensorDataType::FP32, NoKvCompression > |
| | Unquantized FP32 path. No KV cache compression. More...
|
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, NoWeightQuant > |
| | Unquantized BF16 path. Standard inference precision. More...
|
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerChannelFp8<> > |
| | FP8 per-channel quantized BF16 path. Requires SM >= 8.0 (Ampere+). More...
|
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupFp4< 128 > > |
| | FP4 E2M1 per-group quantized BF16 path. W4A16 fused GEMM with E2M1 decode, group_size=128. Requires SM >= 8.0. More...
|
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupFp4< 64 > > |
| | FP4 E2M1 per-group quantized BF16 path. W4A16 fused GEMM with E2M1 decode, group_size=64. Requires SM >= 8.0. More...
|
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupInt4< 128 > > |
| | INT4 per-group quantized BF16 path. W4A16 fused GEMM, group_size=128. Requires SM >= 8.0. More...
|
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupInt4< 64 > > |
| | INT4 per-group quantized BF16 path. W4A16 fused GEMM, group_size=64. Requires SM >= 8.0. More...
|
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::FP32, NoWeightQuant > |
| | Unquantized FP32 path. Retained for validation and reference. More...
|
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::LpeOp, DeviceType::Cuda, TensorDataType::BF16, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::LpeOp, DeviceType::Cuda, TensorDataType::FP32, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::MultiHeadAttentionOp, DeviceType::Cuda, TensorDataType::BF16, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::MultiHeadAttentionOp, DeviceType::Cuda, TensorDataType::FP32, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::ResidualOp, DeviceType::Cuda, TensorDataType::BF16, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::ResidualOp, DeviceType::Cuda, TensorDataType::FP32, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::RmsNormOp, DeviceType::Cuda, TensorDataType::BF16, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::RmsNormOp, DeviceType::Cuda, TensorDataType::FP32, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::RopeOp, DeviceType::Cuda, TensorDataType::BF16, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::RopeOp, DeviceType::Cuda, TensorDataType::FP32, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::SoftmaxOp, DeviceType::Cuda, TensorDataType::BF16, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::SoftmaxOp, DeviceType::Cuda, TensorDataType::FP32, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::SwigluOp, DeviceType::Cuda, TensorDataType::BF16, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::SwigluOp, DeviceType::Cuda, TensorDataType::FP32, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::TokenEmbeddingOp, DeviceType::Cuda, TensorDataType::BF16, void > |
| struct | Mila::Dnn::Compute::OperationTraits< OperationType::TokenEmbeddingOp, DeviceType::Cuda, TensorDataType::FP32, void > |
OperationTraits specializations for all CUDA operation backends.
This partition module is the single registration point for every (OperationType, Cuda, TPrecision, TPolicy) -> concrete op mapping. Add a new specialization block here when migrating a component from its legacy *OpTypeMap to the unified OperationTraits dispatch.
Migration status: LinearOp complete GroupedQueryAttentionOp complete (NoKvCompression; PerChannelKvFp8 pending CudaGqaOp support) SamplingOp pending policy-free ops complete