CUDA fast zeroing partition for tensor buffers. More...

#include <cuda_runtime.h>
#include <cstring>
#include <type_traits>
#include <stdexcept>
import Cuda.Helpers;
import Compute.DeviceId;
import Compute.DeviceType;
import Compute.ExecutionContext;
import Dnn.TensorDataTypeMap;
import Compute.IExecutionContext;
import Dnn.TensorDataType;
import Cuda.Error;
import Dnn.TensorDataTypeTraits;
import Dnn.Tensor;

Classes
struct	Mila::Dnn::Compute::Cuda::ZeroOps

Namespaces
namespace	Mila
	Mila main API namespace.
namespace	Mila::Dnn
namespace	Mila::Dnn::Compute
namespace	Mila::Dnn::Compute::Cuda

Detailed Description

CUDA fast zeroing partition for tensor buffers.

Provides a device-dispatched fast zero() operation that uses cudaMemsetAsync for contiguous CUDA buffers. The operation is allocation-free and accepts an optional execution context to perform non-blocking zeroing on the caller's stream.

Classes

Namespaces

Detailed Description