|
Mila 0.13.48
Deep Neural Network Library
|
Class representing a CUDA compute device instance. More...


Public Member Functions | |
| CudaDevice (DeviceConstructionKey key, DeviceId device_id) | |
| Construct CUDA device with validation. | |
| std::pair< int, int > | getComputeCapability () const |
| Gets the compute capability version. | |
| int | getComputeCapabilityVersion () const |
| Gets the compute capability as a single number. | |
| DeviceId | getDeviceId () const override |
| Gets the device identifier. | |
| std::string | getDeviceName () const override |
| Gets the device name. | |
| constexpr DeviceType | getDeviceType () const override |
| Gets the device type. | |
| int | getMaxThreadsPerBlock () const |
| Gets the maximum number of threads per block. | |
| int | getMultiprocessorCount () const |
| Gets the number of multiprocessors. | |
| const CudaDeviceProps & | getProperties () const |
| Gets the properties of this CUDA device. | |
| size_t | getSharedMemoryPerBlock () const |
| Gets the shared memory per block in bytes. | |
| size_t | getTotalGlobalMemory () const |
| Gets the total global memory size in bytes. | |
| int | getWarpSize () const |
| Gets the warp size. | |
| bool | hasTensorCores () const |
| Checks if the device has Tensor Cores. | |
| bool | isBf16Supported () const |
| Checks if the device supports BF16 (bfloat16 precision). | |
| bool | isFp16Supported () const |
| Checks if the device supports FP16 (half precision). | |
| bool | isFp8Supported () const |
| Checks if the device supports FP8 (8-bit float precision). | |
| bool | isInt8Supported () const |
| Checks if the device supports INT8 tensor cores. | |
| Public Member Functions inherited from Mila::Dnn::Compute::Device | |
| virtual | ~Device ()=default |
Static Private Member Functions | |
| static DeviceId | validateDeviceId (DeviceId device_id) |
| Validates CUDA device ID. | |
Private Attributes | |
| DeviceId | device_id_ |
| CudaDeviceProps | props_ |
Additional Inherited Members | |
| Static Public Member Functions inherited from Mila::Dnn::Compute::Device | |
| static constexpr DeviceId | Cpu () noexcept |
| Create CPU device identifier. | |
| static constexpr DeviceId | Cuda (int index) noexcept |
| Create CUDA device identifier. | |
| template<DeviceType TDeviceType> | |
| static constexpr DeviceId | getDeviceId (int index) noexcept |
| static constexpr DeviceId | Metal (int index) noexcept |
| Create Metal device identifier. | |
| static constexpr DeviceId | Rocm (int index) noexcept |
| Create ROCm device identifier. | |
Class representing a CUDA compute device instance.
Provides an interface to interact with a specific NVIDIA CUDA-capable GPU. Handles device properties and capabilities for a single device instance.
Device instances are created exclusively by DeviceFactory (via DeviceRegistry). Users should obtain devices through DeviceRegistry::getDevice().
Precision Support:
|
inlineexplicit |
Construct CUDA device with validation.
Validates that the device ID is registered with DeviceRegistry and queries/caches device properties from CUDA runtime.
| key | Construction key ensuring only DeviceRegistry can create instances |
| device_id | Device identifier to initialize |
| std::invalid_argument | If device_id validation fails |
| std::runtime_error | If device is not registered or CUDA operations fail |

|
inline |
Gets the compute capability version.
|
inline |
Gets the compute capability as a single number.

|
inlineoverridevirtual |
Gets the device identifier.
Implements Mila::Dnn::Compute::Device.
|
inlineoverridevirtual |
Gets the device name.
Implements Mila::Dnn::Compute::Device.
|
inlineconstexproverridevirtual |
Gets the device type.
Implements Mila::Dnn::Compute::Device.
|
inline |
Gets the maximum number of threads per block.
|
inline |
Gets the number of multiprocessors.
|
inline |
Gets the properties of this CUDA device.
|
inline |
Gets the shared memory per block in bytes.
|
inline |
Gets the total global memory size in bytes.
|
inline |
Gets the warp size.
|
inline |
|
inline |
Checks if the device supports BF16 (bfloat16 precision).
BF16 is supported on Ampere and newer architectures (SM 8.0+).
|
inline |
Checks if the device supports FP16 (half precision).
FP16 is supported on Pascal and newer architectures (SM 6.0+).
|
inline |
Checks if the device supports FP8 (8-bit float precision).
FP8 is supported on Hopper and newer architectures (SM 9.0+).
|
inline |
Checks if the device supports INT8 tensor cores.
INT8 tensor cores are supported on Turing and newer (SM 7.5+).

Validates CUDA device ID.
Ensures device_id has correct type (Cuda), non-negative index, and is within the range of available CUDA devices.
| device_id | Device identifier to validate. |
| std::invalid_argument | If device_id type is not Cuda or index is negative. |
| std::runtime_error | If CUDA device count query fails or index is out of range. |


|
private |
|
private |