|
template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource> |
| static void | zero (Dnn::Tensor< TDataType, TMemoryResource > &tensor, IExecutionContext *exec_context=nullptr) |
| | Zero the contents of a CUDA tensor buffer.
|
◆ zero()
template<
TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
| void Mila::Dnn::Compute::Cuda::ZeroOps::zero |
( |
Dnn::Tensor< TDataType, TMemoryResource > & | tensor, |
|
|
IExecutionContext * | exec_context = nullptr ) |
|
inlinestatic |
Zero the contents of a CUDA tensor buffer.
- No-op for empty tensors.
- Uses cudaMemsetAsync for contiguous device buffers.
- If exec_context is provided and is a CUDA context, the context's stream is used and the call is non-blocking. If exec_context is null the tensor-provided device is used and the default stream is used synchronously.
Preconditions:
- The tensor buffer must be a single contiguous allocation (TensorBuffer by design).
- Template Parameters
-
| TDataType | Abstract tensor data type |
| TMemoryResource | Memory resource type backing the tensor |
- Parameters
-
| tensor | Destination CUDA tensor to zero |
| exec_context | Optional execution context (borrowed). When provided and recognized as CUDA context, zero is scheduled on that stream. |
The documentation for this struct was generated from the following file: