Mila
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::Compute Namespace Reference

Namespaces

namespace  Detail
 Namespace for CUDA layer normalization implementation details.
 

Classes

class  AMPConfig
 
class  BinaryOperation
 Abstract class for binary operations in the neural network framework. More...
 
class  ComputeDevice
 Abstract interface for compute devices (CPU, CUDA, etc.). More...
 
class  ComputePrecision
 Controls automatic mixed precision behavior for neural network operations. More...
 
class  ComputeResource
 Abstract base class for compute resources. More...
 
class  CpuCrossEntropyOp
 CPU implementation of the cross entropy loss operation for neural networks. More...
 
class  CpuCrossEntropyOpRegistrar
 Class responsible for registering the CpuCrossEntropyOp operation. More...
 
class  CpuDevice
 Class representing a CPU compute device. More...
 
class  CpuEncoderOp
 CPU implementation of the encoder operation for neural networks. More...
 
class  CpuEncoderOpRegistrar
 Class responsible for registering the CpuEncoderOp operation. More...
 
class  CpuGeluOp
 
class  CpuGeluOpRegistrar
 Class responsible for registering the CpuGeluOp operation. More...
 
class  CpuLayerNormOp
 CPU implementation of the Layer Normalization operation for neural networks. More...
 
class  CpuLayerNormOpRegistrar
 Class responsible for registering the CpuLayerNormOp operation. More...
 
class  CpuLinearOp
 CPU implementation of the Fully Connected operation for neural networks. More...
 
class  CpuLinearOpRegistrar
 Class responsible for registering the CpuLinearOp operation. More...
 
class  CpuMemoryResource
 A memory resource for CPU memory allocation. More...
 
class  CpuMultiHeadAttentionOp
 CPU implementation of the Multi-Head Attention operation for neural networks. More...
 
class  CpuMultiHeadAttentionOpRegistrar
 Class responsible for registering the CpuMultiHeadAttention operation. More...
 
class  CpuResidualOp
 CPU implementation of the residual operation for neural networks. More...
 
class  CpuResidualOpRegistrar
 Class responsible for registering the CpuResidualOp operation. More...
 
class  CpuSoftmaxOp
 CPU implementation of the softmax operation for neural networks. More...
 
class  CpuSoftmaxOpRegistrar
 Class responsible for registering the CpuSoftmaxOp operation. More...
 
class  CublasLtError
 
class  CudaBadAlloc
 
class  CudaComputeResource
 
struct  CudaDataTypeMap
 Helper struct to map C++ types to CUDA data types for cuBLASLt. More...
 
struct  CudaDataTypeMap< __nv_bfloat16 >
 
struct  CudaDataTypeMap< float >
 
struct  CudaDataTypeMap< half >
 
class  CudaDevice
 Class representing a CUDA compute device. More...
 
class  CudaEncoderOp
 CUDA implementation of the Encoder operation for transformer models. More...
 
class  CudaEncoderOpRegistrar
 Class responsible for registering the CudaEncoderOp operation. More...
 
class  CudaError
 Exception class for CUDA runtime errors. More...
 
class  CudaGeluOp
 CUDA implementation of the GELU activation function for neural networks. More...
 
class  CudaGeluOpRegistrar
 Class responsible for registering the CudaGeluOp operation. More...
 
class  CudaLayerNormOp
 CUDA implementation of the Layer Normalization operation for neural networks. More...
 
class  CudaLayerNormOpRegistrar
 Class responsible for registering the CudaLayerNormOp operation. More...
 
class  CudaLinearOp
 CUDA implementation of the Fully Connected operation for neural networks. More...
 
class  CudaLinearOpRegistrar
 Class responsible for registering the CudaLinearOp operation. More...
 
class  CudaManagedMemoryResource
 A memory resource that uses CUDA managed memory. More...
 
class  CudaMatMulBiasGeluOp
 CUDA implementation of the fused MatMul-Bias-GELU operation. More...
 
class  CudaMatMulBiasGeluOpRegistrar
 Class responsible for registering the CudaMatMulBiasGeluOp operation. More...
 
class  CudaMemoryResource
 A memory resource that allocates memory on a CUDA device. More...
 
class  CudaMultiHeadAttentionOp
 CUDA implementation of the Multi-Head Attention operation for transformer models. More...
 
class  CudaMultiHeadAttentionOpRegistrar
 Class responsible for registering the CudaMultiHeadAttentionOp operation. More...
 
class  CudaPinnedMemoryResource
 A memory resource that allocates pinned (page-locked) memory using CUDA. More...
 
class  CudaResidualOp
 CUDA implementation of the residual operation for neural networks. More...
 
class  CudaResidualOpRegistrar
 Class responsible for registering the CudaResidualOp operation. More...
 
class  CudaSoftmaxOp
 CUDA implementation of the softmax operation for neural networks. More...
 
class  CudaSoftmaxOpRegistrar
 Class responsible for registering the CudaSoftmaxOp operation. More...
 
struct  DeviceAccessible
 
class  DeviceContext
 The DeviceContext class manages device contexts for module and tensor computations. More...
 
class  DeviceProps
 
class  DeviceRegistrar
 Class to manage compute device initialization. More...
 
class  DeviceRegistry
 Registry for compute device creation and management. More...
 
class  DynamicMemoryResource
 A class that represents a dynamically-determined memory resource. More...
 
struct  FusedOpMeta
 Metadata for fused operations in the neural network. More...
 
class  FusedSoftmaxCrossEntropyOp
 CUDA implementation of the fused softmax and cross entropy operation for neural networks. More...
 
class  FusedSoftmaxCrossEntropyOpRegistrar
 Class responsible for registering the FusedSoftmaxCrossEntropyOp operation. More...
 
struct  HostAccessible
 
class  HostComputeResource
 
struct  MemoryStats
 Global memory statistics for all TrackedMemoryResource instances. More...
 
struct  OperationAttributes
 Common attributes for neural network operations. More...
 
class  OperationBase
 Base class for all compute operations in the Mila neural network framework. More...
 
class  OperationRegistry
 A registry for operations that can be created based on operation names, type information, and device type. More...
 
class  OperationsRegistrar
 Class to manage compute operations initialization. More...
 
class  TrackedMemoryResource
 A memory resource wrapper that tracks allocation and deallocation statistics. More...
 
class  UnaryOperation
 Abstract base class for unary operations in the compute framework. More...
 

Concepts

concept  IsCpuComputeResource
 
concept  IsCudaComputeResource
 

Typedefs

using Mila::Dnn::Compute::DeviceMemoryResource = CudaMemoryResource
 Alias for CudaMemoryResource that represents device-accessible memory.
 
using Mila::Dnn::Compute::HostMemoryResource = CpuMemoryResource
 Alias for CpuMemoryResource that represents host-accessible memory.
 
using Mila::Dnn::Compute::MemoryResource = std::pmr::memory_resource
 An alias for the standard polymorphic memory resource.
 

Enumerations

enum class  Mila::Dnn::Compute::DeviceType { Cpu , Cuda }
 Enumeration of supported compute device types. More...
 
enum class  Mila::Dnn::Compute::OperationType {
  CrossEntropyOp , EncoderOp , FusedOp , LinearOp ,
  GeluOp , LayerNormOp , MultiHeadAttentionOp , ResidualOp ,
  SoftmaxOp
}
 Enumeration of all supported neural network operation types. More...
 

Functions

constexpr int Mila::Dnn::Compute::ceil_div (int M, int N)
 Calculates ceiling division for kernel grid/block dimensions.
 
int Mila::Dnn::Compute::checkDevice (int deviceId)
 Validates that a device ID is valid and available.
 
template<DeviceType TDeviceType>
std::shared_ptr< DeviceContextMila::Dnn::Compute::CreateCompatibleContext ()
 Creates a device context compatible with the specified device type.
 
template<typename TDataType , typename TCompute = float>
requires std::is_same_v<TDataType, float> || std::is_same_v<TDataType, half> || std::is_same_v<TDataType, __nv_bfloat16> || std::is_same_v<TDataType, __nv_fp8_e4m3>
void Mila::Dnn::Compute::cublaslt_matmul_forward (TDataType *Y, const TDataType *X, const TDataType *weight, const TDataType *bias, int outer_size, int C, int OC, cudaStream_t stream, cublasLtHandle_t cublasLtHandle)
 cuBLASLt implementation of matrix multiplication with bias addition
 
void Mila::Dnn::Compute::cublasLtCheckStatus (cublasStatus_t status, const std::source_location &location=std::source_location::current())
 Checks the status of a cuBLASLt operation and throws if an error occurred.
 
void cuda_encoder_forward_fp16 (half *Y, const int *X, const half *wte, const half *wpe, int B, int T, int C, cudaStream_t stream)
 
void cuda_encoder_forward_fp32 (float *Y, const int *X, const float *wte, const float *wpe, int B, int T, int C, cudaStream_t stream)
 
void cuda_gelu_backward_fp16 (half *dX, const half *X, const half *dY, const int N, cudaStream_t stream)
 
void cuda_gelu_backward_fp32 (float *dX, const float *X, const float *dY, const int N, cudaStream_t stream)
 
void cuda_gelu_forward_fp16 (half *Y, const half *X, int N, cudaStream_t stream)
 
void cuda_gelu_forward_fp32 (float *Y, const float *X, int N, cudaStream_t stream)
 
void cuda_layernorm_forward_fp16 (half *Y, half *mean, half *rstd, const half *X, const half *weight, const half *bias, int B, int T, int C, float epsilon, cudaStream_t stream)
 
void cuda_layernorm_forward_fp32 (float *Y, float *mean, float *rstd, const float *X, const float *weight, const float *bias, int B, int T, int C, float epsilon, cudaStream_t stream)
 
void cuda_matmul_forward_fp16 (half *Y, const half *X, const half *weight, const half *bias, int outer_size, int C, int OC, cudaStream_t stream)
 
void cuda_matmul_forward_fp32 (float *Y, const float *X, const float *weight, const float *bias, int outer_size, int C, int OC, cudaStream_t stream)
 
void cuda_mha_forward_fp16 (half *Y, half *qkvr, half *att, const half *X, int B, int T, int C, int NH, cudaStream_t stream)
 
void cuda_mha_forward_fp32 (float *Y, float *qkvr, float *att, const float *X, int B, int T, int C, int NH, cudaStream_t stream)
 
void cuda_residual_forward_fp16 (half *Y, const half *X1, const half *X2, int N, cudaStream_t stream)
 
void cuda_residual_forward_fp32 (float *Y, const float *X1, const float *X2, int N, cudaStream_t stream)
 
template<typename TPrecision >
void cuda_softmax_crossentropy_backward (TPrecision *dlogits, const TPrecision *dlosses, const TPrecision *probs, const int *targets, int batch_size, int seq_len, int vocab_size, cudaStream_t stream)
 
template<typename TPrecision >
void cuda_softmax_crossentropy_forward (TPrecision *losses, TPrecision *probs, const TPrecision *logits, const int *targets, int batch_size, int seq_len, int vocab_size, cudaStream_t stream)
 
template<typename TPrecision >
void cuda_softmax_forward (TPrecision *Y, const TPrecision *X, int N, int C, cudaStream_t stream)
 
template<typename TPrecision >
void cuda_softmax_forward_general (TPrecision *Y, const TPrecision *X, int outer_size, int dim_size, int inner_size, cudaStream_t stream)
 
void Mila::Dnn::Compute::cudaCheckLastError (const std::source_location &location=std::source_location::current())
 Checks the last CUDA error and throws if an error occurred.
 
void Mila::Dnn::Compute::cudaCheckStatus (cudaError_t status, const std::source_location &location=std::source_location::current())
 Checks the status of a CUDA operation and throws if an error occurred.
 
std::string Mila::Dnn::Compute::deviceToString (DeviceType device_type)
 Converts a DeviceType to its string representation.
 
int Mila::Dnn::Compute::findCudaDevice (int deviceId=-1, bool preferMemory=false)
 Finds the most appropriate CUDA device for computation.
 
std::string getBestDevice (DeviceType type, bool preferMemory=false)
 Gets the best device of a specific type based on performance characteristics.
 
int Mila::Dnn::Compute::getBestDeviceId (bool preferMemory=false)
 Identifies the best CUDA device based on performance characteristics.
 
int Mila::Dnn::Compute::getDeviceCount ()
 Gets the number of available CUDA devices.
 
int Mila::Dnn::Compute::getDriverVersion ()
 Gets the installed CUDA driver version.
 
int Mila::Dnn::Compute::getRuntimeVersion ()
 Gets the installed CUDA runtime version.
 
bool Mila::Dnn::Compute::isDeviceAvailable (const std::string &device_name)
 Checks if a specific device is available.
 
std::vector< std::string > Mila::Dnn::Compute::listDevices ()
 Lists all available compute devices.
 
std::vector< std::string > Mila::Dnn::Compute::listDevicesByType (DeviceType type)
 Lists compute devices of a specific type.
 
std::string Mila::Dnn::Compute::operationTypeToString (OperationType op)
 Converts an operation type to its string representation.
 
DeviceType Mila::Dnn::Compute::toDeviceType (std::string device_type)
 Converts a string to the corresponding DeviceType.
 
template<DeviceType TDeviceType>
std::shared_ptr< DeviceContextMila::Dnn::Compute::ValidateContext (std::shared_ptr< DeviceContext > context)
 Validates that the provided context is compatible with the specified device type.
 

Variables

template<typename T >
constexpr bool always_false = false
 
const float GELU_SCALING_FACTOR = sqrtf( 2.0f / M_PI )
 

Typedef Documentation

◆ DeviceMemoryResource

Alias for CudaMemoryResource that represents device-accessible memory.

This alias provides a semantic name that describes the memory's accessibility characteristics rather than its implementation details. Use DeviceMemoryResource when you need memory that can be accessed by CUDA device code and operations.

This naming follows CUDA conventions where "device" refers to GPU memory, while maintaining consistency with the architecture's naming pattern.

See also
CudaMemoryResource

◆ HostMemoryResource

Alias for CpuMemoryResource that represents host-accessible memory.

This alias provides a semantic name that describes the memory's accessibility characteristics rather than its implementation details. Use HostMemoryResource when you need memory that can be directly accessed from host (CPU) code.

See also
CpuMemoryResource

◆ MemoryResource

using Mila::Dnn::Compute::MemoryResource = typedef std::pmr::memory_resource
export

An alias for the standard polymorphic memory resource.

This provides a common abstraction for memory allocation and management across different compute devices and memory types. The memory_resource is the foundation for all memory allocations within the compute framework and can be extended for specific devices (CPU, CUDA, etc.).

See also
std::pmr::memory_resource
HostMemoryResource
CudaMemoryResource

Enumeration Type Documentation

◆ DeviceType

enum class Mila::Dnn::Compute::DeviceType
exportstrong

Enumeration of supported compute device types.

Defines the types of compute devices that can be used for tensor operations and neural network computations.

Enumerator
Cpu 

CPU device type.

Cuda 

CUDA GPU device type.

◆ OperationType

enum class Mila::Dnn::Compute::OperationType
exportstrong

Enumeration of all supported neural network operation types.

This enumeration defines the different types of operations that can be executed by the compute framework. Each operation type corresponds to a specific neural network function or layer.

Enumerator
CrossEntropyOp 

Cross entropy loss operation.

EncoderOp 

Encoder operation for transformer architecture.

FusedOp 

Fused operation combining multiple operations for performance optimization.

LinearOp 

Linear (fully connected/dense) layer operation.

GeluOp 

Gaussian Error Linear Unit activation function.

LayerNormOp 

Layer normalization operation.

MultiHeadAttentionOp 

Multi-head attention operation for transformers.

ResidualOp 

Residual connection operation.

SoftmaxOp 

Softmax activation function.

Function Documentation

◆ ceil_div()

constexpr int Mila::Dnn::Compute::ceil_div ( int  M,
int  N 
)
constexprexport

Calculates ceiling division for kernel grid/block dimensions.

Parameters
MDividend value
NDivisor value
Returns
Ceiling of M/N as an integer

◆ checkDevice()

int Mila::Dnn::Compute::checkDevice ( int  deviceId)
export

Validates that a device ID is valid and available.

Parameters
deviceIdCUDA device ID to check
Returns
The same device ID if valid
Exceptions
std::invalid_argumentIf device ID is negative
std::runtime_errorIf no CUDA devices are available
std::out_of_rangeIf device ID exceeds available device count
std::runtime_errorIf device is in prohibited compute mode
Here is the call graph for this function:
Here is the caller graph for this function:

◆ CreateCompatibleContext()

template<DeviceType TDeviceType>
std::shared_ptr< DeviceContext > Mila::Dnn::Compute::CreateCompatibleContext ( )
export

Creates a device context compatible with the specified device type.

Template Parameters
TDeviceThe device type to create a context for.
Returns
std::shared_ptr<DeviceContext> A new context of the appropriate type.

◆ cublaslt_matmul_forward()

template<typename TDataType , typename TCompute = float>
requires std::is_same_v<TDataType, float> || std::is_same_v<TDataType, half> || std::is_same_v<TDataType, __nv_bfloat16> || std::is_same_v<TDataType, __nv_fp8_e4m3>
void Mila::Dnn::Compute::cublaslt_matmul_forward ( TDataType *  Y,
const TDataType *  X,
const TDataType *  weight,
const TDataType *  bias,
int  outer_size,
int  C,
int  OC,
cudaStream_t  stream,
cublasLtHandle_t  cublasLtHandle 
)
export

cuBLASLt implementation of matrix multiplication with bias addition

Template Parameters
TPrecisionData type for computation (float, half, etc.)
Parameters
outOutput tensor data pointer
inpInput tensor data pointer
weightWeight tensor data pointer
biasBias tensor data pointer (can be nullptr)
BBatch size
TPrecisionSequence length
CInput channels
OCOutput channels
streamCUDA stream
Here is the call graph for this function:

◆ cublasLtCheckStatus()

void Mila::Dnn::Compute::cublasLtCheckStatus ( cublasStatus_t  status,
const std::source_location &  location = std::source_location::current() 
)
inlineexport

Checks the status of a cuBLASLt operation and throws if an error occurred.

Parameters
statusThe cuBLASLt error status code to check.
locationSource location information (automatically populated by default).
Exceptions
CublasLtErrorif the status is not CUBLAS_STATUS_SUCCESS.
Here is the caller graph for this function:

◆ cuda_encoder_forward_fp16()

void Mila::Dnn::Compute::cuda_encoder_forward_fp16 ( half *  Y,
const int *  X,
const half *  wte,
const half *  wpe,
int  B,
int  T,
int  C,
cudaStream_t  stream 
)
Here is the caller graph for this function:

◆ cuda_encoder_forward_fp32()

void Mila::Dnn::Compute::cuda_encoder_forward_fp32 ( float *  Y,
const int *  X,
const float *  wte,
const float *  wpe,
int  B,
int  T,
int  C,
cudaStream_t  stream 
)
Here is the caller graph for this function:

◆ cuda_gelu_backward_fp16()

void Mila::Dnn::Compute::cuda_gelu_backward_fp16 ( half *  dX,
const half *  X,
const half *  dY,
const int  N,
cudaStream_t  stream 
)

◆ cuda_gelu_backward_fp32()

void Mila::Dnn::Compute::cuda_gelu_backward_fp32 ( float *  dX,
const float *  X,
const float *  dY,
const int  N,
cudaStream_t  stream 
)
Here is the caller graph for this function:

◆ cuda_gelu_forward_fp16()

void Mila::Dnn::Compute::cuda_gelu_forward_fp16 ( half *  Y,
const half *  X,
int  N,
cudaStream_t  stream 
)
Here is the caller graph for this function:

◆ cuda_gelu_forward_fp32()

void Mila::Dnn::Compute::cuda_gelu_forward_fp32 ( float *  Y,
const float *  X,
int  N,
cudaStream_t  stream 
)
Here is the caller graph for this function:

◆ cuda_layernorm_forward_fp16()

void Mila::Dnn::Compute::cuda_layernorm_forward_fp16 ( half *  Y,
half *  mean,
half *  rstd,
const half *  X,
const half *  weight,
const half *  bias,
int  B,
int  T,
int  C,
float  epsilon,
cudaStream_t  stream 
)
Here is the caller graph for this function:

◆ cuda_layernorm_forward_fp32()

void Mila::Dnn::Compute::cuda_layernorm_forward_fp32 ( float *  Y,
float *  mean,
float *  rstd,
const float *  X,
const float *  weight,
const float *  bias,
int  B,
int  T,
int  C,
float  epsilon,
cudaStream_t  stream 
)
Here is the caller graph for this function:

◆ cuda_matmul_forward_fp16()

void Mila::Dnn::Compute::cuda_matmul_forward_fp16 ( half *  Y,
const half *  X,
const half *  weight,
const half *  bias,
int  outer_size,
int  C,
int  OC,
cudaStream_t  stream 
)
Here is the caller graph for this function:

◆ cuda_matmul_forward_fp32()

void Mila::Dnn::Compute::cuda_matmul_forward_fp32 ( float *  Y,
const float *  X,
const float *  weight,
const float *  bias,
int  outer_size,
int  C,
int  OC,
cudaStream_t  stream 
)
Here is the caller graph for this function:

◆ cuda_mha_forward_fp16()

void Mila::Dnn::Compute::cuda_mha_forward_fp16 ( half *  Y,
half *  qkvr,
half *  att,
const half *  X,
int  B,
int  T,
int  C,
int  NH,
cudaStream_t  stream 
)

◆ cuda_mha_forward_fp32()

void Mila::Dnn::Compute::cuda_mha_forward_fp32 ( float *  Y,
float *  qkvr,
float *  att,
const float *  X,
int  B,
int  T,
int  C,
int  NH,
cudaStream_t  stream 
)
Here is the caller graph for this function:

◆ cuda_residual_forward_fp16()

void Mila::Dnn::Compute::cuda_residual_forward_fp16 ( half *  Y,
const half *  X1,
const half *  X2,
int  N,
cudaStream_t  stream 
)
Here is the caller graph for this function:

◆ cuda_residual_forward_fp32()

void Mila::Dnn::Compute::cuda_residual_forward_fp32 ( float *  Y,
const float *  X1,
const float *  X2,
int  N,
cudaStream_t  stream 
)
Here is the caller graph for this function:

◆ cuda_softmax_crossentropy_backward()

template<typename TPrecision >
void Mila::Dnn::Compute::cuda_softmax_crossentropy_backward ( TPrecision *  dlogits,
const TPrecision *  dlosses,
const TPrecision *  probs,
const int *  targets,
int  batch_size,
int  seq_len,
int  vocab_size,
cudaStream_t  stream 
)

◆ cuda_softmax_crossentropy_forward()

template<typename TPrecision >
void Mila::Dnn::Compute::cuda_softmax_crossentropy_forward ( TPrecision *  losses,
TPrecision *  probs,
const TPrecision *  logits,
const int *  targets,
int  batch_size,
int  seq_len,
int  vocab_size,
cudaStream_t  stream 
)

◆ cuda_softmax_forward()

template<typename TPrecision >
void Mila::Dnn::Compute::cuda_softmax_forward ( TPrecision *  Y,
const TPrecision *  X,
int  N,
int  C,
cudaStream_t  stream 
)

◆ cuda_softmax_forward_general()

template<typename TPrecision >
void Mila::Dnn::Compute::cuda_softmax_forward_general ( TPrecision *  Y,
const TPrecision *  X,
int  outer_size,
int  dim_size,
int  inner_size,
cudaStream_t  stream 
)

◆ cudaCheckLastError()

void Mila::Dnn::Compute::cudaCheckLastError ( const std::source_location &  location = std::source_location::current())
inlineexport

Checks the last CUDA error and throws if an error occurred.

Parameters
locationSource location information (automatically populated by default).
Exceptions
CudaErrorif the last error is not cudaSuccess.
Here is the caller graph for this function:

◆ cudaCheckStatus()

void Mila::Dnn::Compute::cudaCheckStatus ( cudaError_t  status,
const std::source_location &  location = std::source_location::current() 
)
inlineexport

Checks the status of a CUDA operation and throws if an error occurred.

Parameters
statusThe CUDA error status code to check.
locationSource location information (automatically populated by default).
Exceptions
CudaErrorif the status is not cudaSuccess.
Here is the caller graph for this function:

◆ deviceToString()

std::string Mila::Dnn::Compute::deviceToString ( DeviceType  device_type)
export

Converts a DeviceType to its string representation.

Parameters
device_typeThe device type to convert.
Returns
std::string The string representation of the device type ("CPU" or "CUDA").
Exceptions
std::runtime_errorIf the device type is invalid.
Here is the caller graph for this function:

◆ findCudaDevice()

int Mila::Dnn::Compute::findCudaDevice ( int  deviceId = -1,
bool  preferMemory = false 
)
inlineexport

Finds the most appropriate CUDA device for computation.

Either validates a specific device ID if provided or finds the best available device when no preference is specified.

Parameters
deviceIdPreferred device ID, or -1 to select the best device
Returns
Valid CUDA device ID
Exceptions
std::runtime_errorIf no CUDA devices are found
Here is the call graph for this function:

◆ getBestDevice()

std::string Mila::Dnn::Compute::getBestDevice ( DeviceType  type,
bool  preferMemory = false 
)

Gets the best device of a specific type based on performance characteristics.

Parameters
typeThe device type to filter by (e.g., Cuda)
preferMemoryWhen true, prioritizes memory bandwidth over compute capability
Returns
std::string Identifier of the best available device
Here is the call graph for this function:

◆ getBestDeviceId()

int Mila::Dnn::Compute::getBestDeviceId ( bool  preferMemory = false)
inlineexport

Identifies the best CUDA device based on performance characteristics.

Evaluates available CUDA devices and selects the one with highest performance potential. Selection criteria vary based on the intended workload type.

Parameters
preferMemoryWhen true, prioritizes memory bandwidth over compute
Returns
Device ID of the best available CUDA device
Exceptions
CudaErrorIf device properties cannot be accessed
Here is the call graph for this function:
Here is the caller graph for this function:

◆ getDeviceCount()

int Mila::Dnn::Compute::getDeviceCount ( )
inlineexport

Gets the number of available CUDA devices.

Returns
Number of CUDA devices available to the application
Exceptions
CudaErrorIf device enumeration fails
Here is the call graph for this function:
Here is the caller graph for this function:

◆ getDriverVersion()

int Mila::Dnn::Compute::getDriverVersion ( )
export

Gets the installed CUDA driver version.

Returns
Integer representation of the CUDA driver version
Exceptions
CudaErrorIf driver version cannot be determined
Here is the call graph for this function:
Here is the caller graph for this function:

◆ getRuntimeVersion()

int Mila::Dnn::Compute::getRuntimeVersion ( )
export

Gets the installed CUDA runtime version.

Returns
Integer representation of the CUDA runtime version
Exceptions
CudaErrorIf runtime version cannot be determined
Here is the call graph for this function:
Here is the caller graph for this function:

◆ isDeviceAvailable()

bool Mila::Dnn::Compute::isDeviceAvailable ( const std::string &  device_name)
export

Checks if a specific device is available.

Parameters
device_nameThe name of the device to check (e.g., "CPU", "CUDA:0").
Returns
bool True if the device is available, false otherwise.
Here is the call graph for this function:

◆ listDevices()

std::vector< std::string > Mila::Dnn::Compute::listDevices ( )
export

Lists all available compute devices.

This function returns a list of all available compute devices that can be used with DeviceContext.

Returns
std::vector<std::string> A list of device identifiers (e.g., "CPU", "CUDA:0", "CUDA:1").
Here is the call graph for this function:
Here is the caller graph for this function:

◆ listDevicesByType()

std::vector< std::string > Mila::Dnn::Compute::listDevicesByType ( DeviceType  type)
export

Lists compute devices of a specific type.

Filters the available devices by their type, returning only devices that match the specified type. This allows clients to efficiently discover devices with specific capabilities.

Parameters
typeThe device type to filter by
Returns
std::vector<std::string> List of matching device identifiers
Here is the call graph for this function:

◆ operationTypeToString()

std::string Mila::Dnn::Compute::operationTypeToString ( OperationType  op)
export

Converts an operation type to its string representation.

This utility function converts an OperationType enum value to a human-readable string representation, which can be used for logging, debugging, or serialization.

Parameters
opThe operation type to convert to string
Returns
std::string The string representation of the operation type
Exceptions
std::runtime_errorIf the operation type is invalid or not recognized
Here is the caller graph for this function:

◆ toDeviceType()

DeviceType Mila::Dnn::Compute::toDeviceType ( std::string  device_type)
export

Converts a string to the corresponding DeviceType.

Performs case-insensitive matching to convert device type strings to the corresponding enum value.

Parameters
device_typeThe string representation of the device type.
Returns
DeviceType The corresponding device type enum value.
Exceptions
std::runtime_errorIf the string does not represent a valid device type. Valid options are: "CPU", "CUDA", "AUTO".
Note
"AUTO" option is currently commented out in implementation.

◆ ValidateContext()

template<DeviceType TDeviceType>
std::shared_ptr< DeviceContext > Mila::Dnn::Compute::ValidateContext ( std::shared_ptr< DeviceContext context)
export

Validates that the provided context is compatible with the specified device type.

Template Parameters
TDeviceThe device type to validate against.
Parameters
contextThe context to validate.
Returns
std::shared_ptr<DeviceContext> The validated context.
Exceptions
std::invalid_argumentIf the context is null.
std::runtime_errorIf the context is incompatible with TDevice.

Variable Documentation

◆ always_false

template<typename T >
constexpr bool Mila::Dnn::Compute::always_false = false
constexpr

◆ GELU_SCALING_FACTOR

const float Mila::Dnn::Compute::GELU_SCALING_FACTOR = sqrtf( 2.0f / M_PI )