|
Mila 0.13.48
Deep Neural Network Library
|
Host-callable launcher declarations for CUDA structural tensor operations. More...
#include <cuda_runtime.h>Go to the source code of this file.
Namespaces | |
| namespace | Mila |
| Mila main API namespace. | |
| namespace | Mila::Dnn |
| namespace | Mila::Dnn::Compute |
| namespace | Mila::Dnn::Compute::Cuda |
Functions | |
| void | Mila::Dnn::Compute::Cuda::cuda_split3_bf16 (const __nv_bfloat16 *__restrict__ src, __nv_bfloat16 *__restrict__ out0, __nv_bfloat16 *__restrict__ out1, __nv_bfloat16 *__restrict__ out2, int rows, int D0, int D1, int D2, cudaStream_t stream) |
| void | Mila::Dnn::Compute::Cuda::cuda_split3_fp32 (const float *__restrict__ src, float *__restrict__ out_a, float *__restrict__ out_b, float *__restrict__ out_c, int src_rows, int dim_a, int dim_b, int dim_c, cudaStream_t stream) |
| Vectorized 3-way last-dimension split, float32. | |
Host-callable launcher declarations for CUDA structural tensor operations.