|
Mila 0.13.48
Deep Neural Network Library
|
RAII wrapper owning cuBLASLt descriptors for a Linear matmul. More...
Public Types | |
| using | ActivationType = typename TensorDataTypeMap<TComputePrecision>::device_type |
| using | ParameterType = typename TensorDataTypeMap<TParameterPrecision>::device_type |
| using | TAccumPrecision = float |
Public Member Functions | |
| CublasLtLinearPlan ()=default | |
| CublasLtLinearPlan (const CublasLtLinearPlan &)=delete | |
| CublasLtLinearPlan (CublasLtLinearPlan &&other) noexcept | |
| ~CublasLtLinearPlan () | |
| bool | isValid () const |
| CublasLtLinearPlan & | operator= (const CublasLtLinearPlan &)=delete |
| CublasLtLinearPlan & | operator= (CublasLtLinearPlan &&other) noexcept |
Public Attributes | |
| cublasLtMatmulAlgo_t | algorithm {} |
| bool | has_algorithm { false } |
| bool | has_bias_epilogue { false } |
| bool | has_weight_scale { kIsQuantized } |
| true when a weight scale pointer is needed (FP8 path) | |
| cublasLtMatrixLayout_t | layoutA { nullptr } |
| cublasLtMatrixLayout_t | layoutB { nullptr } |
| cublasLtMatrixLayout_t | layoutC { nullptr } |
| cublasLtMatmulDesc_t | matmul_desc { nullptr } |
| cublasLtMatmulPreference_t | preference { nullptr } |
Static Public Attributes | |
| static constexpr bool | kIsQuantized = (TParameterPrecision != TComputePrecision) |
RAII wrapper owning cuBLASLt descriptors for a Linear matmul.
Owns: matmul_desc - operation descriptor (transpose flags, epilogue, scale pointers) layoutA, layoutB, layoutC - matrix memory layouts preference - algorithm preference used during heuristic search algorithm - selected heuristic algorithm has_algorithm - true when heuristic returned a valid algorithm has_bias_epilogue - true when CUBLASLT_EPILOGUE_BIAS is active has_weight_scale - true when TParameterPrecision != TComputePrecision (FP8 path)
Layout convention (compile-time, driven by kIsQuantized):
Non-quantized (NT row-major): A = activations [outer_size × in_features], opA = N B = weights [out_features × in_features], opB = T C = output [outer_size × out_features]
Quantized (TN column-major, Ada SM 8.9+): A = weights (FP8) [in_features × out_features], opA = T → op(A) = W[out_features, in_features] B = activations [in_features × outer_size], opB = N → op(B) = X^T[in_features, outer_size] C = output [out_features × outer_size] (col-major ≡ row-major Y[outer_size, out_features]) A_SCALE_POINTER = per-tensor weight scale
Non-copyable; move-only.
| using Mila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >::ActivationType = typename TensorDataTypeMap<TComputePrecision>::device_type |
| using Mila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >::ParameterType = typename TensorDataTypeMap<TParameterPrecision>::device_type |
| using Mila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >::TAccumPrecision = float |
|
default |

|
inline |
|
delete |

|
inlinenoexcept |

|
inline |

|
delete |

|
inlinenoexcept |

| cublasLtMatmulAlgo_t Mila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >::algorithm {} |
| bool Mila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >::has_algorithm { false } |
| bool Mila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >::has_bias_epilogue { false } |
| bool Mila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >::has_weight_scale { kIsQuantized } |
true when a weight scale pointer is needed (FP8 path)
|
staticconstexpr |
| cublasLtMatrixLayout_t Mila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >::layoutA { nullptr } |
| cublasLtMatrixLayout_t Mila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >::layoutB { nullptr } |
| cublasLtMatrixLayout_t Mila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >::layoutC { nullptr } |
| cublasLtMatmulDesc_t Mila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >::matmul_desc { nullptr } |
| cublasLtMatmulPreference_t Mila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >::preference { nullptr } |