Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::Compute::Cuda::MathOps Struct Referenceexport

CUDA specialization of TensorOps for mathematical operations. More...

Static Public Member Functions

template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
static void add (const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b, Tensor< TDataType, TMemoryResource > &result, IExecutionContext *exec_context=nullptr)
 Element-wise addition of two tensors.
template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
static void divide (const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b, Tensor< TDataType, TMemoryResource > &result, IExecutionContext *exec_context=nullptr)
 Element-wise division of two tensors.
template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
static void multiply (const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b, Tensor< TDataType, TMemoryResource > &result, IExecutionContext *exec_context=nullptr)
 Element-wise multiplication of two tensors.
template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
static void subtract (const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b, Tensor< TDataType, TMemoryResource > &result, IExecutionContext *exec_context=nullptr)
 Element-wise subtraction of two tensors.
template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
static float sum (const Tensor< TDataType, TMemoryResource > &tensor, IExecutionContext *exec_context=nullptr)
 Computes sum of all tensor elements.

Static Private Member Functions

template<TensorDataType TDataType>
static void addImpl (const void *a_data, const void *b_data, void *result_data, size_t count, cudaStream_t stream, int device_id)
template<TensorDataType TDataType>
static void divideImpl (const void *a_data, const void *b_data, void *result_data, size_t count, cudaStream_t stream, int device_id)
template<TensorDataType TDataType>
static void multiplyImpl (const void *a_data, const void *b_data, void *result_data, size_t count, cudaStream_t stream, int device_id)
template<TensorDataType TDataType>
static void subtractImpl (const void *a_data, const void *b_data, void *result_data, size_t count, cudaStream_t stream, int device_id)
template<TensorDataType TDataType>
static float sumImpl (const void *tensor_data, size_t count, cudaStream_t stream, int device_id)

Detailed Description

CUDA specialization of TensorOps for mathematical operations.

Provides CUDA-specific implementations of tensor mathematical operations using optimized device kernels for parallel execution on NVIDIA GPUs. Supports all CUDA-compatible tensor data types with automatic type handling.

Key features:

  • Element-wise binary operations (add, subtract, multiply, divide)
  • Element-wise unary operations (negate, abs, sqrt)
  • Scalar operations (add scalar, multiply scalar)
  • Activation functions (ReLU, Sigmoid, Tanh)
  • Reduction operations (sum, mean, max, min)
  • Stream-based asynchronous execution
  • Zero-overhead ExecutionContext borrowing (raw pointer)
  • Automatic fallback to default stream

Member Function Documentation

◆ add()

template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
void Mila::Dnn::Compute::Cuda::MathOps::add ( const Tensor< TDataType, TMemoryResource > & a,
const Tensor< TDataType, TMemoryResource > & b,
Tensor< TDataType, TMemoryResource > & result,
IExecutionContext * exec_context = nullptr )
inlinestatic

Element-wise addition of two tensors.

Computes result[i] = a[i] + b[i] for all elements using CUDA kernels. Tensors must have identical shapes. Borrows execution context for stream control with zero overhead.

Template Parameters
TDataTypeAbstract tensor data type
TMemoryResourceMemory resource type
Parameters
aFirst input tensor
bSecond input tensor
resultOutput tensor (must be pre-allocated with matching shape)
exec_contextOptional execution context for stream control (borrowed, not owned)
Exceptions
std::invalid_argumentIf tensor shapes don't match
std::runtime_errorIf CUDA operations fail
Note
exec_context must outlive this function call
When exec_context provided, caller controls synchronization
When exec_context is null, uses default stream and synchronizes before returning

Example:

auto ctx = std::make_unique<CudaExecutionContext>(0);
add(tensor_a, tensor_b, result, ctx.get());
ctx->synchronize();
static void add(const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b, Tensor< TDataType, TMemoryResource > &result, IExecutionContext *exec_context=nullptr)
Element-wise addition of two tensors (CPU implementation).
Definition CpuTensorOps.Math.ixx:74
Here is the call graph for this function:

◆ addImpl()

template<TensorDataType TDataType>
void Mila::Dnn::Compute::Cuda::MathOps::addImpl ( const void * a_data,
const void * b_data,
void * result_data,
size_t count,
cudaStream_t stream,
int device_id )
inlinestaticprivate
Here is the call graph for this function:
Here is the caller graph for this function:

◆ divide()

template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
void Mila::Dnn::Compute::Cuda::MathOps::divide ( const Tensor< TDataType, TMemoryResource > & a,
const Tensor< TDataType, TMemoryResource > & b,
Tensor< TDataType, TMemoryResource > & result,
IExecutionContext * exec_context = nullptr )
inlinestatic

Element-wise division of two tensors.

Computes result[i] = a[i] / b[i] for all elements using CUDA kernels. Follows IEEE 754 standards for floating-point division by zero.

Template Parameters
TDataTypeAbstract tensor data type
TMemoryResourceMemory resource type
Parameters
aFirst input tensor (dividend)
bSecond input tensor (divisor)
resultOutput tensor (must be pre-allocated with matching shape)
exec_contextOptional execution context for stream control (borrowed, not owned)
Exceptions
std::invalid_argumentIf tensor shapes don't match
std::runtime_errorIf CUDA operations fail
Note
For floating-point types, division by zero produces infinity or NaN per IEEE 754
For integer types, division by zero behavior depends on kernel implementation
Here is the call graph for this function:

◆ divideImpl()

template<TensorDataType TDataType>
void Mila::Dnn::Compute::Cuda::MathOps::divideImpl ( const void * a_data,
const void * b_data,
void * result_data,
size_t count,
cudaStream_t stream,
int device_id )
inlinestaticprivate
Here is the call graph for this function:
Here is the caller graph for this function:

◆ multiply()

template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
void Mila::Dnn::Compute::Cuda::MathOps::multiply ( const Tensor< TDataType, TMemoryResource > & a,
const Tensor< TDataType, TMemoryResource > & b,
Tensor< TDataType, TMemoryResource > & result,
IExecutionContext * exec_context = nullptr )
inlinestatic

Element-wise multiplication of two tensors.

Computes result[i] = a[i] * b[i] for all elements using CUDA kernels.

Template Parameters
TDataTypeAbstract tensor data type
TMemoryResourceMemory resource type
Parameters
aFirst input tensor
bSecond input tensor
resultOutput tensor (must be pre-allocated with matching shape)
exec_contextOptional execution context for stream control (borrowed, not owned)
Here is the call graph for this function:

◆ multiplyImpl()

template<TensorDataType TDataType>
void Mila::Dnn::Compute::Cuda::MathOps::multiplyImpl ( const void * a_data,
const void * b_data,
void * result_data,
size_t count,
cudaStream_t stream,
int device_id )
inlinestaticprivate
Here is the call graph for this function:
Here is the caller graph for this function:

◆ subtract()

template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
void Mila::Dnn::Compute::Cuda::MathOps::subtract ( const Tensor< TDataType, TMemoryResource > & a,
const Tensor< TDataType, TMemoryResource > & b,
Tensor< TDataType, TMemoryResource > & result,
IExecutionContext * exec_context = nullptr )
inlinestatic

Element-wise subtraction of two tensors.

Computes result[i] = a[i] - b[i] for all elements using CUDA kernels.

Template Parameters
TDataTypeAbstract tensor data type
TMemoryResourceMemory resource type
Parameters
aFirst input tensor (minuend)
bSecond input tensor (subtrahend)
resultOutput tensor (must be pre-allocated with matching shape)
exec_contextOptional execution context for stream control (borrowed, not owned)
Here is the call graph for this function:

◆ subtractImpl()

template<TensorDataType TDataType>
void Mila::Dnn::Compute::Cuda::MathOps::subtractImpl ( const void * a_data,
const void * b_data,
void * result_data,
size_t count,
cudaStream_t stream,
int device_id )
inlinestaticprivate
Here is the call graph for this function:
Here is the caller graph for this function:

◆ sum()

template<TensorDataType TDataType, typename TMemoryResource>
requires isValidTensor<TDataType, TMemoryResource>
float Mila::Dnn::Compute::Cuda::MathOps::sum ( const Tensor< TDataType, TMemoryResource > & tensor,
IExecutionContext * exec_context = nullptr )
inlinestatic

Computes sum of all tensor elements.

Reduces tensor to a single scalar value representing the sum of all elements. Uses optimized CUDA reduction with shared memory and warp primitives.

Template Parameters
TDataTypeAbstract tensor data type
TMemoryResourceMemory resource type
Parameters
tensorInput tensor
exec_contextOptional execution context for stream control (borrowed, not owned)
Returns
Sum of all elements as float
Note
Always returns after synchronization (even with exec_context)
Result is returned as float for consistency across data types
Here is the call graph for this function:

◆ sumImpl()

template<TensorDataType TDataType>
float Mila::Dnn::Compute::Cuda::MathOps::sumImpl ( const void * tensor_data,
size_t count,
cudaStream_t stream,
int device_id )
inlinestaticprivate
Here is the call graph for this function:
Here is the caller graph for this function:

The documentation for this struct was generated from the following file: