|
Mila 0.13.48
Deep Neural Network Library
|
Classes | |
| struct | mlp_activation_impl |
| struct | mlp_activation_impl< ActivationType::Gelu, TDeviceType, TPrecision > |
| struct | mlp_activation_impl< ActivationType::Swiglu, TDeviceType, TPrecision > |
Functions | |
| std::string | formatBytes (size_t bytes) |
| template<TensorDataType TDataType, typename MR> | |
| constexpr size_t | get_alignment () |
| Determines optimal memory alignment based on memory resource and data type. | |
| template<TensorDataType TDataType> | |
| constexpr size_t | getStorageSize (size_t logical_size) |
| Calculates storage size in bytes for given logical element count. | |
Variables | |
| constexpr size_t | CPU_SIMD_ALIGN = 64 |
| AVX-512 alignment boundary for optimal CPU SIMD operations. | |
| constexpr size_t | CUDA_WARP_SIZE = 32 |
| CUDA warp size alignment for optimal GPU memory access patterns. | |
|
inline |

|
constexpr |
Determines optimal memory alignment based on memory resource and data type.
Calculates the appropriate memory alignment boundary considering both the memory resource characteristics (CPU vs GPU) and data type requirements. This ensures optimal performance across different hardware architectures.
| TDataType | Abstract tensor data type from TensorDataType enumeration |
| MR | Memory resource type determining allocation strategy |
|
constexpr |
Calculates storage size in bytes for given logical element count.
Computes the required storage bytes for a given number of logical elements of the specified tensor data type, handling potential overflow conditions.
| TDataType | Abstract tensor data type from TensorDataType enumeration |
| logical_size | Number of logical elements |
| std::overflow_error | If calculation would overflow |

|
constexpr |
AVX-512 alignment boundary for optimal CPU SIMD operations.
Defines the alignment requirement for maximum efficiency with AVX-512 vector instructions, enabling optimal vectorized operations on modern CPU architectures.
|
constexpr |
CUDA warp size alignment for optimal GPU memory access patterns.
Defines the alignment boundary required for optimal memory coalescing in CUDA warp-level operations, ensuring maximum memory throughput in GPU kernels and device functions.