Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Class Hierarchy

Go to the graphical class hierarchy

This inheritance list is sorted roughly, but not completely, alphabetically:
[detail level 12345]
 CMila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::AcquireResult
 CMila::Dnn::AxisPartitionInformation about axis partitioning of a tensor
 Cstd::bad_alloc
 CMila::Dnn::Visualization::BlockVisualizer
 CMila::Data::BpeTrainerCorpus accumulator and trainer for BPE vocabularies
 CMila::Data::BpeVocabularyConfigConfiguration for the BPE vocabulary
 CMila::Dnn::BufferedTokenStreamer< Sink, BufSize >Buffers BufSize tokens before forwarding a contiguous span to Sink
 CMila::Dnn::BuildContextBuild-time context for Component::build()
 CMila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::CacheEntry
 CMila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::CacheKey
 CMila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::CacheKeyHash
 CMila::Data::CharTrainerCharacter-level tokenizer trainer
 CMila::Data::CharVocabularyConfigConfiguration for Character-level tokenizer training
 CMila::Dnn::Visualization::ColorLUT
 CMila::Dnn::Component< TDeviceType, TPrecision >Abstract base class for neural network components
 CMila::Dnn::Component< DeviceType::Cuda, float, float >
 CMila::Dnn::Component< TDeviceType, dtype_t::FP32 >
 CMila::Dnn::Component< TDeviceType, TComputePrecision >
 CMila::Dnn::Component< TDeviceType, TInput, TOutput >
 CMila::Dnn::Component< TDeviceType, TLogits >
 CMila::Dnn::ComponentConfigAbstract base for component configuration objects
 CMila::Dnn::ComponentFactoryFactory for reconstructing components from serialized archives
 CComputeDevice
 CMila::Dnn::Compute::CpuAttentionOpRegistrar
 CMila::Dnn::Compute::CpuCrossEntropyOpRegistrarClass responsible for registering the CpuCrossEntropyOp operation
 CMila::Dnn::Compute::CpuDeviceRegistrarCPU device plugin for device-agnostic registration
 CMila::Dnn::Compute::CpuEncoderOpRegistrarRegistrar for CpuEncoderOp operation
 CMila::Dnn::Compute::CpuGeluOpRegistrarClass responsible for registering CPU GELU operations
 CMila::Dnn::Compute::CpuLayerNormOpRegistrar
 CMila::Dnn::Compute::CpuLinearOpRegistrar
 CMila::Dnn::Compute::CpuResidualOpRegistrarRegistrar for CPU Residual operation (FP32)
 CMila::Dnn::Compute::CpuSoftmaxCrossEntropyOpRegistrarRegistrar for fused Softmax+CrossEntropy operation
 CMila::Dnn::Compute::CpuSoftmaxOpRegistrar
 CMila::Dnn::CpuTensorDataTypeTraitsCPU-specific traits for abstract tensor data types
 CMila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision >RAII wrapper owning cuBLASLt descriptors for a Linear matmul
 CMila::Dnn::Compute::Cuda::CublasLtMatMulPlan< TComputePrecision >RAII wrapper owning cuBLASLt descriptors and the selected heuristic algorithm
 CMila::Dnn::Compute::Cuda::CublasLtPlanCache< TPlan >Generic plan cache keyed on batch size bucket
 CMila::Dnn::Compute::Cuda::Gelu::Detail::cuda_gelu_impl< TNative >
 CMila::Dnn::Compute::Cuda::Gelu::Detail::cuda_gelu_impl< float >
 CMila::Dnn::Compute::Cuda::Gelu::Detail::cuda_gelu_impl< half >
 CMila::Dnn::Compute::Cuda::Gqa::Detail::cuda_gqa_kernels< T >
 CMila::Dnn::Compute::Cuda::Gqa::Detail::cuda_gqa_kernels< float >
 CMila::Dnn::Compute::Cuda::Gqa::Detail::cuda_gqa_kernels< nv_bfloat16 >
 CMila::Dnn::Compute::Cuda::LayerNorm::Detail::cuda_layernorm_impl< TNative >CUDA kernel dispatcher for LayerNorm operations
 CMila::Dnn::Compute::Cuda::LayerNorm::Detail::cuda_layernorm_impl< float >
 CMila::Dnn::Compute::Cuda::LayerNorm::Detail::cuda_layernorm_impl< half >
 CMila::Dnn::Compute::Cuda::Lpe::Detail::cuda_lpe_impl< TNative >CUDA kernel dispatcher for Lpe forward, backward, and positional decode
 CMila::Dnn::Compute::Cuda::Lpe::Detail::cuda_lpe_impl< float >FP32 specialization of the Lpe CUDA kernel dispatcher
 CMila::Dnn::Compute::Cuda::Lpe::Detail::cuda_lpe_impl< half >FP16 specialization of the Lpe CUDA kernel dispatcher
 CMila::Dnn::Compute::Cuda::Linear::Detail::cuda_matmul_impl< TNative >CUDA kernel dispatcher for Linear operations
 CMila::Dnn::Compute::Cuda::Linear::Detail::cuda_matmul_impl< float >
 CMila::Dnn::Compute::Cuda::Linear::Detail::cuda_matvec_impl< TComputeType, TWeightType >CUDA kernel dispatcher for matrix-vector multiply (M=1 decode path)
 CMila::Dnn::Compute::Cuda::Linear::Detail::cuda_matvec_impl< float, float >
 CMila::Dnn::Compute::Cuda::Linear::Detail::cuda_matvec_impl< nv_bfloat16, __nv_fp8_e4m3 >
 CMila::Dnn::Compute::Cuda::Linear::Detail::cuda_matvec_impl< nv_bfloat16, nv_bfloat16 >
 CMila::Dnn::Compute::Cuda::MultiHeadAttention::Detail::cuda_mha_kernels< TNative >CUDA kernel dispatcher for attention non-matmul operations
 CMila::Dnn::Compute::Cuda::MultiHeadAttention::Detail::cuda_mha_kernels< float >
 CMila::Dnn::Compute::Cuda::MultiHeadAttention::Detail::cuda_mha_kernels< half >
 CMila::Dnn::Compute::Cuda::Residual::Detail::cuda_residual_impl< TElementType >
 CMila::Dnn::Compute::Cuda::Residual::Detail::cuda_residual_impl< float >
 CMila::Dnn::Compute::Cuda::Residual::Detail::cuda_residual_impl< nv_bfloat16 >
 CMila::Dnn::Compute::Cuda::RmsNorm::Detail::cuda_rmsnorm_impl< TNative >CUDA kernel dispatcher for RMSNorm operations
 CMila::Dnn::Compute::Cuda::RmsNorm::Detail::cuda_rmsnorm_impl< float >
 CMila::Dnn::Compute::Cuda::RmsNorm::Detail::cuda_rmsnorm_impl< nv_bfloat16 >
 CMila::Dnn::Compute::Cuda::Rope::Detail::cuda_rope_impl< TNative >CUDA kernel dispatcher for RoPE forward, backward, cache build, and positional decode
 CMila::Dnn::Compute::Cuda::Rope::Detail::cuda_rope_impl< __nv_bfloat16 >
 CMila::Dnn::Compute::Cuda::Rope::Detail::cuda_rope_impl< float >
 CMila::Dnn::Compute::Cuda::SoftmaxCrossEntropy::Detail::cuda_softmax_crossentropy_impl< TNative >CUDA kernel dispatcher for SoftmaxCrossEntropy operations
 CMila::Dnn::Compute::Cuda::SoftmaxCrossEntropy::Detail::cuda_softmax_crossentropy_impl< float >
 CMila::Dnn::Compute::Cuda::SoftmaxCrossEntropy::Detail::cuda_softmax_crossentropy_impl< half >
 CMila::Dnn::Compute::Cuda::Softmax::Detail::cuda_softmax_impl< TNative >
 CMila::Dnn::Compute::Cuda::Softmax::Detail::cuda_softmax_impl< float >
 CMila::Dnn::Compute::Cuda::Softmax::Detail::cuda_softmax_impl< half >
 CMila::Dnn::Compute::Cuda::Detail::cuda_structural_kernels< T >
 CMila::Dnn::Compute::Cuda::Detail::cuda_structural_kernels< float >
 CMila::Dnn::Compute::Cuda::Detail::cuda_structural_kernels< nv_bfloat16 >
 CMila::Dnn::Compute::Cuda::Swiglu::Detail::cuda_swiglu_impl< TNative >
 CMila::Dnn::Compute::Cuda::Swiglu::Detail::cuda_swiglu_impl< __nv_bfloat16 >
 CMila::Dnn::Compute::Cuda::Swiglu::Detail::cuda_swiglu_impl< float >
 CMila::Dnn::Compute::Cuda::TokenEmbedding::Detail::cuda_token_embedding_impl< TNative >
 CMila::Dnn::Compute::Cuda::TokenEmbedding::Detail::cuda_token_embedding_impl< __nv_bfloat16 >
 CMila::Dnn::Compute::Cuda::TokenEmbedding::Detail::cuda_token_embedding_impl< float >
 CMila::Dnn::Compute::CudaDataTypeMap< T >Helper struct to map C++ types to CUDA data types for cuBLASLt
 CMila::Dnn::Compute::CudaDataTypeMap< __nv_bfloat16 >
 CMila::Dnn::Compute::CudaDataTypeMap< float >
 CMila::Dnn::Compute::CudaDataTypeMap< half >
 CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TDataType >Compile-time mapping from TensorDataType -> cudaDataType_t
 CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::BF16 >
 CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::FP16 >
 CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::FP32 >
 CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::FP8_E4M3 >
 CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::FP8_E5M2 >
 CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::INT32 >
 CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::INT8 >
 CMila::Dnn::Compute::CudaDevicePropsWrapper for CUDA device properties with cached values
 CMila::Dnn::Compute::CudaDeviceRegistrarCUDA device registrar for device-agnostic registration
 CMila::Dnn::Compute::Cuda::Gelu::CudaGeluOpRegistrarClass responsible for registering the CudaGeluOp operation
 CMila::Dnn::Compute::Cuda::Gqa::CudaGroupedQueryAttentionOpRegistrar
 CMila::Dnn::Compute::Cuda::LayerNorm::CudaLayerNormOpRegistrar
 CMila::Dnn::Compute::Cuda::Linear::CudaLinearOpRegistrar
 CMila::Dnn::Compute::Cuda::Lpe::CudaLpeOpRegistrar
 CMila::Dnn::Compute::Cuda::MatMulBiasGelu::CudaMatMulBiasGeluOpRegistrarClass responsible for registering the CudaMatMulBiasGeluOp operation
 CMila::Dnn::Compute::Cuda::MultiHeadAttention::CudaMultiHeadAttentionOpRegistrar
 CMila::Dnn::Compute::Cuda::Residual::CudaResidualOpRegistrar
 CMila::Dnn::Compute::Cuda::RmsNorm::CudaRmsNormOpRegistrar
 CMila::Dnn::Compute::Cuda::Rope::CudaRopeOpRegistrar
 CMila::Dnn::Compute::Cuda::SoftmaxCrossEntropy::CudaSoftmaxCrossEntropyOpRegistrarRegistrar for fused Softmax+CrossEntropy CUDA operation
 CMila::Dnn::Compute::Cuda::Softmax::CudaSoftmaxOpRegistrarClass responsible for registering the CudaSoftmaxOp operation
 CMila::Dnn::Compute::Cuda::Swiglu::CudaSwigluOpRegistrar
 CMila::Dnn::Compute::CudaTimerGPU-accurate interval timer using a CUDA event pair
 CMila::Dnn::Compute::Cuda::TokenEmbedding::CudaTokenEmbeddingOpRegistrar
 CMila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >Device-agnostic data loader interface using abstract tensor data types
 CMila::Data::DataLoader< TensorDataType::INT32, TensorDataType::INT32, TMemoryResource >
 CMila::Dnn::Compute::DeviceAbstract interface for compute device implementations
 CMila::Dnn::Compute::DeviceAccessible
 CMila::Dnn::Compute::DeviceConstructionKeyConstruction key for device factories
 CMila::Dnn::Compute::DeviceIdLightweight identifier for a compute device
 CMila::Dnn::Compute::DeviceRegistrarDevice-agnostic registrar for automatic device discovery and registration
 CMila::Dnn::Compute::DeviceRegistryRegistry of discovered compute devices with lazy instantiation
 CMila::Dnn::Compute::DeviceTypeTraits< TDevice >
 CMila::Dnn::Compute::DeviceTypeTraits< DeviceType::Cpu >DeviceTypeTraits specialization for the CPU device
 CMila::Dnn::Compute::DeviceTypeTraits< DeviceType::Cuda >DeviceTypeTraits specialization for the CUDA device
 CMila::Dnn::Compute::ExecutionContext< TDeviceType >Templated execution context for device-specific operations
 Cstd::false_type
 CMila::Dnn::Compute::Cpu::FillOpsCPU specialization of TensorOps for initialization operations
 CMila::Dnn::Compute::Cuda::FillOpsCUDA specialization of TensorOps for initialization operations
 CMila::Dnn::Visualization::Framebuffer
 CMila::Dnn::GenerateParams
 CMila::Dnn::GenerationStatisticsStatistics captured during a single generateStreaming() call
 CMila::Dnn::Compute::GqaStateNon-owning pointers to shared transient GQA scratch buffers
 Cstd::hash< Mila::Dnn::Compute::DeviceId >Hash specialization for DeviceId
 CMila::Dnn::Compute::HostAccessible
 CMila::Dnn::Compute::IExecutionContextType-erased execution context interface
 CMila::Dnn::Compute::IKvCacheLifecycleCapability interface for KV-cache state management
 CIModulePlugin
 CMila::Dnn::Extensibility::IModulePlugin
 CMila::Dnn::Compute::IPositionalDecodeCapability interface for position-dependent unary operations
 CMila::Dnn::Compute::IPositionalPairedOpCapability interface for position-dependent paired operations
 CMila::Dnn::ITensorAbstract interface providing essential tensor information and data access
 CMila::Dnn::Serialization::ITensorBlobType-erased interface for a serialized tensor blob
 CMila::Dnn::LanguageModelConfig< TDerived >CRTP base configuration for all deployable Mila language models
 CMila::Dnn::LanguageModelConfig< LlamaModelConfig >
 CMila::Dnn::LearningRateSchedulerAbstract base for learning-rate schedulers
 CMila::Dnn::Compute::LinearOpTypeMap< DeviceType::Cpu, TensorDataType::FP32 >
 CMila::Logging::LoggerAbstract logging interface and static facade
 CMila::Dnn::Compute::Cpu::MathOpsCPU specialization of TensorOps for mathematical operations
 CMila::Dnn::Compute::Cuda::MathOpsCUDA specialization of TensorOps for mathematical operations
 Cstd::pmr::memory_resource
 CMila::Dnn::Compute::MemoryResourceTraits< TMemoryResource >Memory resource traits for compile-time dispatch optimization
 CMila::Dnn::Compute::MemoryResourceTraits< CpuMemoryResource >CPU-specific memory resource traits providing detailed CPU backend characteristics
 CMila::Dnn::Compute::MemoryResourceTraits< CudaDeviceMemoryResource >CUDA device memory resource traits providing detailed GPU backend characteristics
 CMila::Dnn::Compute::MemoryResourceTraits< CudaManagedMemoryResource >CUDA managed memory resource traits providing unified memory characteristics
 CMila::Dnn::Compute::MemoryResourceTraits< CudaPinnedMemoryResource >CUDA pinned memory resource traits providing fast transfer characteristics
 CMila::Dnn::Compute::MemoryStatsGlobal memory statistics for all TrackedMemoryResource instances
 CMila::Dnn::MemoryStatsMemory allocation breakdown for a single component
 CMila::Dnn::Compute::MetalDevicePluginMetal device plugin for device-agnostic registration
 CMila::Data::MilaFileHeaderCommon file header for Mila data files
 CMila::Dnn::Detail::mlp_activation_impl< TActivation, TDeviceType, TPrecision >
 CMila::Dnn::Detail::mlp_activation_impl< ActivationType::Gelu, TDeviceType, TPrecision >
 CMila::Dnn::Detail::mlp_activation_impl< ActivationType::Swiglu, TDeviceType, TPrecision >
 CMila::Dnn::Model< TDeviceType, TPrecision >
 CMila::Dnn::Serialization::ModelArchiveModelArchive provides high-level helpers for component serialization
 CMila::Dnn::ModelConfigAbstract base configuration for all deployable Mila models
 CMila::Dnn::Visualization::ModuleVisualizer
 CMila::Dnn::MultiAxisPartitionMulti-axis partition for normalization over trailing dimensions
 CMila::Dnn::NetworkFactoryFactory registry for Network deserialization
 CMila::Dnn::Quant::KvCache::NoKvCompression
 CMila::Dnn::Quant::Weight::NoWeightQuant
 CMila::Profiling::NvtxRange
 CMila::Dnn::Compute::Operation< TDeviceType, TComputePrecision >
 CMila::Dnn::Compute::Operation< DeviceType::Cuda, TComputePrecision >
 CMila::Dnn::Compute::Operation< DeviceType::Cuda, TPrecision >
 CMila::Dnn::Compute::Operation< TDeviceType, TInput >
 CMila::Dnn::Compute::Operation< TDeviceType, TPrecision >
 CMila::Dnn::Compute::OperationRegistryCentral registry for typed, device-aware compute operations
 CMila::Dnn::Compute::OperationsRegistrarClass to manage compute operations initialization
 CMila::Dnn::Compute::OperationTraits< TOp, TDeviceType, TPrecision, TPolicy >Primary traits template for unified compile-time operation dispatch
 CMila::Dnn::Compute::OperationTraits< OperationType::CrossEntropyOp, DeviceType::Cuda, TensorDataType::BF16, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::CrossEntropyOp, DeviceType::Cuda, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::GeluOp, DeviceType::Cpu, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::GeluOp, DeviceType::Cuda, TensorDataType::BF16, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::GeluOp, DeviceType::Cuda, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::GroupedQueryAttentionOp, DeviceType::Cuda, TensorDataType::BF16, NoKvCompression >Unquantized BF16 path. No KV cache compression. Standard inference precision
 CMila::Dnn::Compute::OperationTraits< OperationType::GroupedQueryAttentionOp, DeviceType::Cuda, TensorDataType::FP32, NoKvCompression >Unquantized FP32 path. No KV cache compression
 CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cpu, TensorDataType::FP32, NoWeightQuant >
 CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, NoWeightQuant >Unquantized BF16 path. Standard inference precision
 CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerChannelFp8<> >FP8 per-channel quantized BF16 path. Requires SM >= 8.0 (Ampere+)
 CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupFp4< 128 > >FP4 E2M1 per-group quantized BF16 path. W4A16 fused GEMM with E2M1 decode, group_size=128. Requires SM >= 8.0
 CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupFp4< 64 > >FP4 E2M1 per-group quantized BF16 path. W4A16 fused GEMM with E2M1 decode, group_size=64. Requires SM >= 8.0
 CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupInt4< 128 > >INT4 per-group quantized BF16 path. W4A16 fused GEMM, group_size=128. Requires SM >= 8.0
 CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupInt4< 64 > >INT4 per-group quantized BF16 path. W4A16 fused GEMM, group_size=64. Requires SM >= 8.0
 CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::FP32, NoWeightQuant >Unquantized FP32 path. Retained for validation and reference
 CMila::Dnn::Compute::OperationTraits< OperationType::LpeOp, DeviceType::Cpu, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::LpeOp, DeviceType::Cuda, TensorDataType::BF16, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::LpeOp, DeviceType::Cuda, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::MultiHeadAttentionOp, DeviceType::Cpu, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::MultiHeadAttentionOp, DeviceType::Cuda, TensorDataType::BF16, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::MultiHeadAttentionOp, DeviceType::Cuda, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::ResidualOp, DeviceType::Cpu, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::ResidualOp, DeviceType::Cuda, TensorDataType::BF16, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::ResidualOp, DeviceType::Cuda, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::RmsNormOp, DeviceType::Cuda, TensorDataType::BF16, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::RmsNormOp, DeviceType::Cuda, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::RopeOp, DeviceType::Cuda, TensorDataType::BF16, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::RopeOp, DeviceType::Cuda, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::SoftmaxOp, DeviceType::Cpu, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::SoftmaxOp, DeviceType::Cuda, TensorDataType::BF16, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::SoftmaxOp, DeviceType::Cuda, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::SwigluOp, DeviceType::Cuda, TensorDataType::BF16, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::SwigluOp, DeviceType::Cuda, TensorDataType::FP32, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::TokenEmbeddingOp, DeviceType::Cuda, TensorDataType::BF16, void >
 CMila::Dnn::Compute::OperationTraits< OperationType::TokenEmbeddingOp, DeviceType::Cuda, TensorDataType::FP32, void >
 CMila::Dnn::Compute::Optimizer< TDeviceType, TPrecision >Abstract base class for parameter optimizers
 CMila::Dnn::Compute::Optimizer< DeviceType::Cpu, TPrecision >
 CMila::Dnn::Compute::Optimizer< DeviceType::Cuda, TPrecision >
 CMila::Data::BpeVocabulary::PairHash
 CMila::Data::BpeVocabulary::PairViewHash
 CMila::Dnn::Quant::Weight::PerChannelFp8< TStorage >
 CMila::Dnn::Quant::KvCache::PerChannelKvFp8< TStorage >Symmetric per-head per-token FP8 KV cache compression policy
 CMila::Dnn::Quant::Weight::PerGroupFp4< kGroupSize >
 CMila::Dnn::Quant::Weight::PerGroupInt4< kGroupSize >
 CMila::Dnn::Extensibility::PluginManager::PluginEntry
 CMila::Dnn::Extensibility::PluginInfoGet plugin metadata
 CMila::Dnn::Extensibility::PluginManagerManages loading and querying of module plugins
 CMila::Dnn::Serialization::PretrainedMetadataMetadata for pretrained model
 CMila::Dnn::Serialization::PretrainedModelReaderReader for Mila pretrained binary format
 CMila::Core::RandomGeneratorSingleton class providing centralized random number generation
 CMila::Dnn::Compute::Cuda::RandomOps
 CMila::Dnn::Visualization::Rect
 CMila::Dnn::Visualization::RGB
 CMila::Dnn::Compute::Cuda::Rope::RopeCacheRegistryProcess-wide shared cache for RoPE cos/sin frequency tables
 Cstd::runtime_error
 CMila::Dnn::Serialization::ModelArchive::ScopedScope
 CMila::Data::SerializationMetadataType-safe metadata container for component serialization
 CMila::Dnn::Optimizers::SerializationMetadataType-safe metadata container for component serialization
 CMila::Dnn::Serialization::SerializationMetadataType-safe metadata container for component serialization
 CMila::Dnn::SerializationMetadataType-safe metadata container for component serialization
 CMila::Dnn::Serialization::SerializerMinimal base interface for model serialization backends
 CMila::Data::SpecialTokensConfiguration for special tokens across all tokenizer types
 CMila::Utils::StepLogger
 CMila::Dnn::Compute::Cuda::StructuralOps
 CMila::Dnn::Serialization::TensorBlobMetadataMetadata for a tensor blob in pretrained model format
 CMila::Dnn::TensorBuffer< TDataType, TMemoryResource, TrackMemory >Device-agnostic buffer for storing tensor data with abstract type system
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TDataType >Compile-time mapping from abstract TensorDataType -> CUDA native device type
 CMila::Dnn::TensorDataTypeMap< TElementType >Primary template for mapping concrete C++ types to TensorDataType
 CMila::Dnn::TensorDataTypeMap< __nv_fp8_e4m3 >
 CMila::Dnn::TensorDataTypeMap< __nv_fp8_e5m2 >
 CMila::Dnn::TensorDataTypeMap< float >Concrete type mapping for float (FP32)
 CMila::Dnn::TensorDataTypeMap< half >
 CMila::Dnn::TensorDataTypeMap< nv_bfloat16 >
 CMila::Dnn::TensorDataTypeMap< std::int16_t >Concrete type mapping for 16-bit signed integer
 CMila::Dnn::TensorDataTypeMap< std::int32_t >Concrete type mapping for 32-bit signed integer
 CMila::Dnn::TensorDataTypeMap< std::int8_t >Concrete type mapping for 8-bit signed integer
 CMila::Dnn::TensorDataTypeMap< std::uint16_t >Concrete type mapping for 16-bit unsigned integer
 CMila::Dnn::TensorDataTypeMap< std::uint32_t >Concrete type mapping for 32-bit unsigned integer
 CMila::Dnn::TensorDataTypeMap< std::uint8_t >Concrete type mapping for 8-bit unsigned integer
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::BF16 >Maps TensorDataType::BF16 to CUDA __nv_bfloat16
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::FP16 >Maps TensorDataType::FP16 to CUDA __half
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::FP32 >Maps TensorDataType::FP32 to CUDA float
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::FP4_E2M1 >Maps TensorDataType::FP4_E2M1 to std::uint8_t
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::FP4_E3M0 >Maps TensorDataType::FP4_E3M0 to std::uint8_t
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::FP8_E4M3 >Maps TensorDataType::FP8_E4M3 to CUDA __nv_fp8_e4m3
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::FP8_E5M2 >Maps TensorDataType::FP8_E5M2 to CUDA __nv_fp8_e5m2
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::INT16 >Maps TensorDataType::INT16 to std::int16_t
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::INT32 >Maps TensorDataType::INT32 to std::int32_t
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::INT8 >Maps TensorDataType::INT8 to std::int8_t
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::UINT16 >Maps TensorDataType::UINT16 to std::uint16_t
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::UINT32 >Maps TensorDataType::UINT32 to std::uint32_t
 CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::UINT8 >Maps TensorDataType::UINT8 to std::uint8_t
 CMila::Dnn::TensorDataTypeTraits< TDataType >Compile-time traits for TensorDataType enumeration values
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::BF16 >Traits specialization for 16-bit brain floating point
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::FP16 >Traits specialization for 16-bit half precision floating point
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::FP32 >Traits specialization for 32-bit IEEE 754 floating point
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::FP4_E2M1 >Traits specialization for 4-bit floating point with E2M1 format
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::FP4_E3M0 >Traits specialization for 4-bit floating point with E3M0 format
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::FP8_E4M3 >Traits specialization for 8-bit floating point with E4M3 format
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::FP8_E5M2 >Traits specialization for 8-bit floating point with E5M2 format
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::INT16 >Traits specialization for 16-bit signed integer
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::INT32 >Traits specialization for 32-bit signed integer
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::INT8 >Traits specialization for 8-bit signed integer
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::UINT16 >Traits specialization for 16-bit unsigned integer
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::UINT32 >Traits specialization for 32-bit unsigned integer
 CMila::Dnn::TensorDataTypeTraits< TensorDataType::UINT8 >Traits specialization for 8-bit unsigned integer
 CMila::Dnn::TensorHostTypeMap< TDataType >Maps abstract TensorDataType to host-compatible C++ type and TensorDataType
 CMila::Dnn::TensorHostTypeMap< TensorDataType::BF16 >Host type for 16-bit brain floating point
 CMila::Dnn::TensorHostTypeMap< TensorDataType::FP16 >Host type for 16-bit half precision floating point
 CMila::Dnn::TensorHostTypeMap< TensorDataType::FP32 >Host type for 32-bit IEEE 754 floating point
 CMila::Dnn::TensorHostTypeMap< TensorDataType::FP8_E4M3 >Host type for 8-bit floating point with E4M3 format
 CMila::Dnn::TensorHostTypeMap< TensorDataType::FP8_E5M2 >Host type for 8-bit floating point with E5M2 format
 CMila::Dnn::TensorHostTypeMap< TensorDataType::INT16 >Host type for 16-bit signed integer
 CMila::Dnn::TensorHostTypeMap< TensorDataType::INT32 >Host type for 32-bit signed integer
 CMila::Dnn::TensorHostTypeMap< TensorDataType::INT8 >Host type for 8-bit signed integer
 CMila::Dnn::TensorHostTypeMap< TensorDataType::UINT16 >Host type for 16-bit unsigned integer
 CMila::Dnn::TensorHostTypeMap< TensorDataType::UINT32 >Host type for 32-bit unsigned integer
 CMila::Dnn::TensorHostTypeMap< TensorDataType::UINT8 >Host type for 8-bit unsigned integer
 CMila::Dnn::Serialization::TensorMetadataMetadata describing a tensor in serialized form
 CMila::Dnn::TensorOps< TDevice >Device-dispatched TensorOps interface template
 CMila::Dnn::TensorShapeFixed-capacity inline shape descriptor for N-dimensional tensors
 CMila::Data::Tokenizer
 CMila::Data::TokenizerTrainerAbstract interface for training tokenizer vocabularies from text corpora
 CMila::Data::TokenizerVocabularyGeneric tokenizer vocabulary interface
 CMila::Data::TokenSequenceLoaderConfigConfiguration for StreamingSequenceLoader behavior
 CMila::Data::TrainerFactoryFactory for creating tokenizer trainers and loading vocabularies
 CMila::Dnn::Compute::Cpu::TransferOpsCPU specialization of TensorOps for transfer operations
 CMila::Dnn::Compute::Cuda::TransferOpsCUDA specialization of TensorOps for tensor transfer operations
 CMila::Dnn::Compute::OperationRegistry::TypeIDComposite key for registry lookup
 CMila::Dnn::Compute::OperationRegistry::TypeIDHash
 CMila::Dnn::UniqueIdGeneratorThread-safe generator for unique tensor identifiers
 CMila::VersionSemantic Version data
 CMila::Dnn::Visualization::VisualizerContext
 CMila::Dnn::VulkanTensorTraitsVulkan-specific traits for abstract tensor data types
 CMila::Dnn::Compute::Cpu::ZeroOps
 CMila::Dnn::Compute::Cuda::ZeroOps