|
Mila 0.13.48
Deep Neural Network Library
|
CUDA specialization of TensorOps for initialization operations. More...
Public Types | |
| template<TensorDataType TDataType> | |
| using | host_value_t |
Static Public Member Functions | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| static void | fill (Tensor< TDataType, TMemoryResource > &tensor, host_value_t< TDataType > host_value, IExecutionContext *exec_context=nullptr) |
| Fill tensor with scalar host value using CUDA kernels. | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| static void | fill (Tensor< TDataType, TMemoryResource > &tensor, std::span< const host_value_t< TDataType > > host_values, IExecutionContext *exec_context=nullptr) |
| Fill tensor with array of host values using CUDA kernels. | |
CUDA specialization of TensorOps for initialization operations.
Provides CUDA-specific implementations of tensor fill operations using optimized device kernels for parallel execution on NVIDIA GPUs. Supports all CUDA-compatible tensor data types with automatic type conversion and quantization from host representations.
Key features:
| using Mila::Dnn::Compute::Cuda::FillOps::host_value_t |
|
inlinestatic |
Fill tensor with scalar host value using CUDA kernels.
Broadcasts a single host scalar value to all elements of a CUDA device tensor using optimized constant fill kernels. No temporary device memory is required - conversion happens directly in the kernel. Borrows execution context for stream control with zero overhead.
Implementation:
| TDataType | Abstract tensor data type |
| TMemoryResource | Memory resource type |
| tensor | Destination CUDA device tensor to fill |
| host_value | Scalar value in canonical host representation |
| exec_context | Optional execution context for stream control (borrowed, not owned) |
Example:

|
inlinestatic |
Fill tensor with array of host values using CUDA kernels.
Copies contiguous host values into a CUDA device tensor, performing automatic type conversion and quantization as needed. Borrows execution context for stream control with zero overhead.
Implementation:
| TDataType | Abstract tensor data type |
| TMemoryResource | Memory resource type |
| tensor | Destination CUDA device tensor to fill |
| host_values | Span of host values in canonical host representation |
| exec_context | Optional execution context for stream control (borrowed, not owned) |
Example:
