|
Mila 0.13.48
Deep Neural Network Library
|
CUDA-accelerated matrix multiplication with bias addition using cuBLASLt. More...
#include <cublasLt.h>#include <cuda_fp16.h>#include <cuda_bf16.h>#include <cuda_fp8.h>#include <type_traits>import Logging.Logger;import CublasLt.Error;import Dnn.TensorDataTypeTraits;import Dnn.ITensor;CUDA-accelerated matrix multiplication with bias addition using cuBLASLt.
This module provides high-performance matrix multiplication operations optimized for neural network linear layers. It leverages NVIDIA's cuBLASLt library to efficiently execute matrix operations on GPU tensor cores with configurable precision modes.
Key features:
The implementation handles various data types with appropriate compute precision selection based on the provided ComputePrecision policy, including accuracy vs. performance tradeoffs.