|
template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource> && TensorDataTypeTraits<TDataType>::is_float_type |
| static void | fill_normal (Tensor< TDataType, TMemoryResource > &tensor, float mean, float stddev, IExecutionContext *exec_context=nullptr) |
| | Fill a float tensor with values drawn from N(mean, stddev^2) using cuRAND.
|
template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource> && TensorDataTypeTraits<TDataType>::is_float_type |
| static void | fill_uniform (Tensor< TDataType, TMemoryResource > &tensor, float min_val, float max_val, IExecutionContext *exec_context=nullptr) |
| | Fill a float tensor with uniform values in [min_val, max_val) using cuRAND.
|
◆ fill_normal()
| void Mila::Dnn::Compute::Cuda::RandomOps::fill_normal |
( |
Tensor< TDataType, TMemoryResource > & | tensor, |
|
|
float | mean, |
|
|
float | stddev, |
|
|
IExecutionContext * | exec_context = nullptr ) |
|
inlinestatic |
Fill a float tensor with values drawn from N(mean, stddev^2) using cuRAND.
When exec_context is provided, the cached generator from CudaExecutionContext is reused. Without a context, a temporary generator is created and destroyed per call, seeded from Core::RandomGenerator.
curandGenerateNormal requires an even element count (Box-Muller pairs). If the tensor has an odd element count, a temporary device buffer of size n+1 is allocated, the full even count is generated into it, and n elements are copied to the tensor. This is zero-overhead for even-sized tensors.
- Template Parameters
-
| TDataType | Floating-point tensor data type. |
| TMemoryResource | CUDA memory resource type. |
- Parameters
-
| tensor | Destination tensor (pre-allocated). |
| mean | Mean of the normal distribution. |
| stddev | Standard deviation of the normal distribution. |
| exec_context | Optional execution context for stream and generator reuse (borrowed, not owned). |
- Note
- Only FP32 native tensors are currently supported. FP16/BF16 require a temporary float buffer with a subsequent conversion pass (not yet implemented).
- Exceptions
-
| std::runtime_error | On cuRAND or CUDA failure. |
◆ fill_uniform()
| void Mila::Dnn::Compute::Cuda::RandomOps::fill_uniform |
( |
Tensor< TDataType, TMemoryResource > & | tensor, |
|
|
float | min_val, |
|
|
float | max_val, |
|
|
IExecutionContext * | exec_context = nullptr ) |
|
inlinestatic |
Fill a float tensor with uniform values in [min_val, max_val) using cuRAND.
cuRAND generates values in [0, 1), which are scaled and shifted to [min_val, max_val) by a device kernel. When exec_context is provided, the cached generator is reused. Without a context, a temporary generator is created and destroyed per call.
- Template Parameters
-
| TDataType | Floating-point tensor data type. |
| TMemoryResource | CUDA memory resource type. |
- Parameters
-
| tensor | Destination tensor (pre-allocated). |
| min_val | Lower bound of the uniform range (inclusive). |
| max_val | Upper bound of the uniform range (exclusive). |
| exec_context | Optional execution context for stream and generator reuse (borrowed, not owned). |
- Note
- Only FP32 native tensors are currently supported.
- Exceptions
-
| std::runtime_error | On cuRAND or CUDA failure. |
◆ make_temp_generator_()
| curandGenerator_t Mila::Dnn::Compute::Cuda::RandomOps::make_temp_generator_ |
( |
cudaStream_t | stream | ) |
|
|
inlinestaticprivate |
The documentation for this struct was generated from the following file: