Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::Compute::Cuda::FillOps Struct Referenceexport

CUDA specialization of TensorOps for initialization operations. More...

Public Types

template<TensorDataType TDataType>
using host_value_t

Static Public Member Functions

template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
static void fill (Tensor< TDataType, TMemoryResource > &tensor, host_value_t< TDataType > host_value, IExecutionContext *exec_context=nullptr)
 Fill tensor with scalar host value using CUDA kernels.
template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
static void fill (Tensor< TDataType, TMemoryResource > &tensor, std::span< const host_value_t< TDataType > > host_values, IExecutionContext *exec_context=nullptr)
 Fill tensor with array of host values using CUDA kernels.

Detailed Description

CUDA specialization of TensorOps for initialization operations.

Provides CUDA-specific implementations of tensor fill operations using optimized device kernels for parallel execution on NVIDIA GPUs. Supports all CUDA-compatible tensor data types with automatic type conversion and quantization from host representations.

Key features:

  • Asynchronous kernel execution using CUDA streams
  • Zero-overhead borrowing of ExecutionContext (raw pointer semantics)
  • Automatic fallback to default stream when no context provided
  • Memory-efficient chunked processing for large arrays
  • Automatic host-to-device type conversion in kernels
  • Compile-time type dispatch for zero runtime overhead
  • Support for FP32, FP16, BF16, FP8, and integer types

Member Typedef Documentation

◆ host_value_t

Initial value:
std::conditional_t<
Compile-time traits for TensorDataType enumeration values.
Definition TensorDataTypeTraits.ixx:46

Member Function Documentation

◆ fill() [1/2]

template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
void Mila::Dnn::Compute::Cuda::FillOps::fill ( Tensor< TDataType, TMemoryResource > & tensor,
host_value_t< TDataType > host_value,
IExecutionContext * exec_context = nullptr )
inlinestatic

Fill tensor with scalar host value using CUDA kernels.

Broadcasts a single host scalar value to all elements of a CUDA device tensor using optimized constant fill kernels. No temporary device memory is required - conversion happens directly in the kernel. Borrows execution context for stream control with zero overhead.

Implementation:

  • Integer types: Use int32_t host representation with kernel conversion
  • Float types: Use float host representation with kernel conversion
  • Grid-stride loop kernels for scalability across tensor sizes
  • Asynchronous execution via provided or default CUDA stream
  • Compile-time type dispatch based on tensor data type
Template Parameters
TDataTypeAbstract tensor data type
TMemoryResourceMemory resource type
Parameters
tensorDestination CUDA device tensor to fill
host_valueScalar value in canonical host representation
exec_contextOptional execution context for stream control (borrowed, not owned)
Note
Host value is automatically converted to device native type
Uses CUDA stream from exec_context if provided, default stream otherwise
When using default stream, synchronizes before returning
When exec_context provided, caller controls synchronization
Optimized for constant broadcasts - no temporary memory allocation
exec_context must outlive this function call

Example:

// With explicit context (caller manages sync)
auto ctx = std::make_unique<CudaExecutionContext>(0);
fill(tensor1, 0.0f, ctx.get());
fill(tensor2, 1.0f, ctx.get());
ctx->synchronize();
// Without context (automatic sync)
fill(tensor, 3.14f); // Uses default stream, returns after sync
static void fill(Tensor< TDataType, TMemoryResource > &tensor, std::span< const host_value_t< TDataType > > host_values, IExecutionContext *exec_context=nullptr)
Fill tensor with array of host values.
Definition CpuTensorOps.Fill.ixx:82
Here is the call graph for this function:

◆ fill() [2/2]

template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
void Mila::Dnn::Compute::Cuda::FillOps::fill ( Tensor< TDataType, TMemoryResource > & tensor,
std::span< const host_value_t< TDataType > > host_values,
IExecutionContext * exec_context = nullptr )
inlinestatic

Fill tensor with array of host values using CUDA kernels.

Copies contiguous host values into a CUDA device tensor, performing automatic type conversion and quantization as needed. Borrows execution context for stream control with zero overhead.

Implementation:

  • Integer types: Use int32_t host representation with kernel conversion
  • Float types: Use float host representation with kernel conversion
  • Chunked processing limits temporary device memory usage
  • Asynchronous execution via provided or default CUDA stream
  • Compile-time type dispatch based on tensor data type
Template Parameters
TDataTypeAbstract tensor data type
TMemoryResourceMemory resource type
Parameters
tensorDestination CUDA device tensor to fill
host_valuesSpan of host values in canonical host representation
exec_contextOptional execution context for stream control (borrowed, not owned)
Note
Host values are automatically converted to device native types
Uses CUDA stream from exec_context if provided, default stream otherwise
When using default stream, synchronizes before returning
When exec_context provided, caller controls synchronization
exec_context must outlive this function call

Example:

// With explicit context (caller manages sync)
auto ctx = std::make_unique<CudaExecutionContext>(0);
fill(tensor, values, ctx.get());
// ... queue more operations on same stream
ctx->synchronize();
// Without context (automatic sync)
fill(tensor, values); // Uses default stream, returns after sync
Here is the call graph for this function:

The documentation for this struct was generated from the following file: