Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
CudaTensorOps.Fill.ixx File Reference

CUDA tensor fill operations partition. More...

#include <cuda_runtime.h>
#include <cstring>
#include <algorithm>
#include <memory>
#include <span>
#include <type_traits>
#include <stdexcept>
#include "Kernels/TensorOps.Fill.h"
import Compute.DeviceType;
import Compute.CudaDevice;
import Compute.CudaDeviceMemoryResource;
import Compute.CudaTensorDataType;
import Compute.ExecutionContext;
import Dnn.TensorDataTypeTraits;
import Cuda.Helpers;
import Dnn.TensorDataTypeMap;
import Dnn.TensorDataType;
import Dnn.ITensor;
import Dnn.Tensor;

Classes

struct  Mila::Dnn::Compute::Cuda::FillOps
 CUDA specialization of TensorOps for initialization operations. More...

Namespaces

namespace  Mila
 Mila main API namespace.
namespace  Mila::Dnn
namespace  Mila::Dnn::Compute
namespace  Mila::Dnn::Compute::Cuda

Detailed Description

CUDA tensor fill operations partition.

Implements CUDA-specific tensor fill operations using device kernels for efficient parallel initialization of tensor data. Supports both scalar broadcast fills and element-wise array copies with automatic type conversion and quantization.

Implementation strategy:

  • Scalar fills use optimized constant kernels (no temporary device memory)
  • Array fills use chunked staging for memory efficiency on large tensors
  • Stream-based asynchronous execution for pipeline optimization
  • Automatic host-to-device type conversion via CUDA kernels
  • Pure compile-time type dispatch eliminates runtime overhead