| CMila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::AcquireResult | |
| CMila::Dnn::AxisPartition | Information about axis partitioning of a tensor |
| Cstd::bad_alloc | |
| CMila::Dnn::Compute::CudaBadAlloc | |
| CMila::Dnn::Visualization::BlockVisualizer | |
| CMila::Data::BpeTrainer | Corpus accumulator and trainer for BPE vocabularies |
| CMila::Data::BpeVocabularyConfig | Configuration for the BPE vocabulary |
| CMila::Dnn::BufferedTokenStreamer< Sink, BufSize > | Buffers BufSize tokens before forwarding a contiguous span to Sink |
| CMila::Dnn::BuildContext | Build-time context for Component::build() |
| CMila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::CacheEntry | |
| CMila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::CacheKey | |
| CMila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::CacheKeyHash | |
| CMila::Data::CharTrainer | Character-level tokenizer trainer |
| CMila::Data::CharVocabularyConfig | Configuration for Character-level tokenizer training |
| CMila::Dnn::Visualization::ColorLUT | |
| CMila::Dnn::Component< TDeviceType, TPrecision > | Abstract base class for neural network components |
| CMila::Dnn::Lpe< TDeviceType, dtype_t::INT32, TPrecision > | |
| CMila::Dnn::TokenEmbedding< TDeviceType, dtype_t::INT32, TPrecision > | |
| CMila::Dnn::CompositeComponent< TDeviceType, TPrecision > | A component that contains and manages child components |
| CMila::Dnn::LlamaBlock< TDeviceType, TPrecision, TWeightQuantization, TKvCachePolicy > | |
| CMila::Dnn::GptBlock< TDeviceType, TPrecision > | Transformer encoder block as a composite component |
| CMila::Dnn::LlamaBlock< TDeviceType, TPrecision, TWeightQuant, TKvPolicy > | |
| CMila::Dnn::MLP< TDeviceType, TPrecision > | Multi-Layer Perceptron (MLP) composite component |
| CMila::Dnn::Network< TDeviceType, TPrecision > | Root composite network container |
| CMila::Dnn::LanguageNetwork< TDeviceType, TPrecision > | |
| CMila::Dnn::GptTransformer< TDeviceType, TPrecision > | GPT-2 style transformer (decoder-only) for autoregressive token prediction |
| CMila::Dnn::LlamaTransformer< TDeviceType, TPrecision, TWeightQuantization, TKvCachePolicy > | LLaMA-style transformer (decoder-only) for autoregressive token prediction |
| CMila::Dnn::FusedComponent< TDeviceType, TPrecision > | DEPRECATED |
| CMila::Dnn::Gelu< TDeviceType, TPrecision > | Gaussian Error Linear Unit (GELU) activation component |
| CMila::Dnn::LayerNorm< TDeviceType, TPrecision > | Device-templated Layer Normalization component |
| CMila::Dnn::Loss< TDeviceType, TPrecision > | Abstract base class for neural network loss functions |
| CMila::Dnn::MultiHeadAttention< TDeviceType, TPrecision > | Multi-Head Attention module that accepts concatenated QKV input |
| CMila::Dnn::Residual< TDeviceType, TPrecision > | Device-templated Residual connection component |
| CMila::Dnn::RmsNorm< TDeviceType, TPrecision > | Device-templated RMS Normalization component |
| CMila::Dnn::Rope< TDeviceType, TPrecision > | Device-templated RoPE component |
| CMila::Dnn::Softmax< TDeviceType, TPrecision > | Softmax activation module (device-templated) |
| CMila::Dnn::Swiglu< TDeviceType, TPrecision > | SwiGLU activation component |
| CMila::Dnn::Component< DeviceType::Cuda, float, float > | |
| CMila::Dnn::Dropout< TDeviceType, TInput, TOutput > | Dropout regularization module for neural networks |
| CMila::Dnn::Component< TDeviceType, dtype_t::FP32 > | |
| CMila::Dnn::Lpe< TDeviceType, TIndex, TPrecision > | Encoder module for token and positional embeddings (device-templated) |
| CMila::Dnn::TokenEmbedding< TDeviceType, TIndex, TPrecision > | Pure token embedding component (device-templated) |
| CMila::Dnn::Component< TDeviceType, TComputePrecision > | |
| CMila::Dnn::GroupedQueryAttention< TDeviceType, TPrecision, TKvPolicy > | |
| CMila::Dnn::Linear< TDeviceType, TPrecision > | |
| CMila::Dnn::Linear< TDeviceType, TPrecision, TWeightQuant > | |
| CMila::Dnn::Linear< TDeviceType, TPrecision, TWeightQuantization > | |
| CMila::Dnn::GroupedQueryAttention< TDeviceType, TComputePrecision, TKvPolicy > | Grouped-Query Attention module that accepts concatenated QKV input |
| CMila::Dnn::Linear< TDeviceType, TComputePrecision, TWeightQuant > | Device-templated fully connected (linear) component |
| CMila::Dnn::Component< TDeviceType, TInput, TOutput > | |
| CMila::Dnn::Dropout< DeviceType::Cpu, TInput, TOutput > | |
| CMila::Dnn::Dropout< DeviceType::Cuda, TInput, TOutput > | |
| CMila::Dnn::Component< TDeviceType, TLogits > | |
| CMila::Dnn::SoftmaxCrossEntropy< TDeviceType, TLogits, TTargets, TPrecision > | Fused SoftmaxCrossEntropy loss module (device-templated) |
| CMila::Dnn::ComponentConfig | Abstract base for component configuration objects |
| CMila::Dnn::CrossEntropyConfig | Configuration for fused SoftmaxCrossEntropy loss |
| CMila::Dnn::DropoutConfig | Configuration class for Dropout module |
| CMila::Dnn::GeluConfig | Configuration class for GELU module |
| CMila::Dnn::GptBlockConfig | Configuration class for GPT transformer blocks |
| CMila::Dnn::GptConfig | Network-level configuration for GPT-style transformer networks |
| CMila::Dnn::GqaConfig | Configuration class for the Grouped-Query Attention module |
| CMila::Dnn::LayerNormConfig | |
| CMila::Dnn::LinearConfig | Configuration object for a Linear (fully connected) layer |
| CMila::Dnn::LlamaConfig | Network-level configuration for LLaMA-style transformer networks |
| CMila::Dnn::LpeConfig | Configuration class for the Learned Positional Encoder |
| CMila::Dnn::MLPConfig | Configuration class for the Multi-Layer Perceptron (MLP) block |
| CMila::Dnn::MultiHeadAttentionConfig | Configuration class for Attention module |
| CMila::Dnn::Optimizers::AdamWConfig | Configuration for AdamW optimizer |
| CMila::Dnn::ResidualConfig | Configuration class for Residual connection component |
| CMila::Dnn::RmsNormConfig | |
| CMila::Dnn::RopeConfig | |
| CMila::Dnn::SoftmaxConfig | Configuration class for Softmax module |
| CMila::Dnn::SwigluConfig | |
| CMila::Dnn::TokenEmbeddingConfig | Configuration for the TokenEmbedding component |
| CMila::Dnn::ComponentFactory | Factory for reconstructing components from serialized archives |
| CComputeDevice | |
| CMila::Dnn::Compute::MetalDevice | Class representing a Metal compute device instance |
| CMila::Dnn::Compute::VulkanDevice | Class representing a Vulkan compute device instance |
| CMila::Dnn::Compute::CpuAttentionOpRegistrar | |
| CMila::Dnn::Compute::CpuCrossEntropyOpRegistrar | Class responsible for registering the CpuCrossEntropyOp operation |
| CMila::Dnn::Compute::CpuDeviceRegistrar | CPU device plugin for device-agnostic registration |
| CMila::Dnn::Compute::CpuEncoderOpRegistrar | Registrar for CpuEncoderOp operation |
| CMila::Dnn::Compute::CpuGeluOpRegistrar | Class responsible for registering CPU GELU operations |
| CMila::Dnn::Compute::CpuLayerNormOpRegistrar | |
| CMila::Dnn::Compute::CpuLinearOpRegistrar | |
| CMila::Dnn::Compute::CpuResidualOpRegistrar | Registrar for CPU Residual operation (FP32) |
| CMila::Dnn::Compute::CpuSoftmaxCrossEntropyOpRegistrar | Registrar for fused Softmax+CrossEntropy operation |
| CMila::Dnn::Compute::CpuSoftmaxOpRegistrar | |
| CMila::Dnn::CpuTensorDataTypeTraits | CPU-specific traits for abstract tensor data types |
| CMila::Dnn::Compute::Cuda::CublasLtLinearPlan< TComputePrecision, TParameterPrecision > | RAII wrapper owning cuBLASLt descriptors for a Linear matmul |
| CMila::Dnn::Compute::Cuda::CublasLtMatMulPlan< TComputePrecision > | RAII wrapper owning cuBLASLt descriptors and the selected heuristic algorithm |
| CMila::Dnn::Compute::Cuda::CublasLtPlanCache< TPlan > | Generic plan cache keyed on batch size bucket |
| CMila::Dnn::Compute::Cuda::Gelu::Detail::cuda_gelu_impl< TNative > | |
| CMila::Dnn::Compute::Cuda::Gelu::Detail::cuda_gelu_impl< float > | |
| CMila::Dnn::Compute::Cuda::Gelu::Detail::cuda_gelu_impl< half > | |
| CMila::Dnn::Compute::Cuda::Gqa::Detail::cuda_gqa_kernels< T > | |
| CMila::Dnn::Compute::Cuda::Gqa::Detail::cuda_gqa_kernels< float > | |
| CMila::Dnn::Compute::Cuda::Gqa::Detail::cuda_gqa_kernels< nv_bfloat16 > | |
| CMila::Dnn::Compute::Cuda::LayerNorm::Detail::cuda_layernorm_impl< TNative > | CUDA kernel dispatcher for LayerNorm operations |
| CMila::Dnn::Compute::Cuda::LayerNorm::Detail::cuda_layernorm_impl< float > | |
| CMila::Dnn::Compute::Cuda::LayerNorm::Detail::cuda_layernorm_impl< half > | |
| CMila::Dnn::Compute::Cuda::Lpe::Detail::cuda_lpe_impl< TNative > | CUDA kernel dispatcher for Lpe forward, backward, and positional decode |
| CMila::Dnn::Compute::Cuda::Lpe::Detail::cuda_lpe_impl< float > | FP32 specialization of the Lpe CUDA kernel dispatcher |
| CMila::Dnn::Compute::Cuda::Lpe::Detail::cuda_lpe_impl< half > | FP16 specialization of the Lpe CUDA kernel dispatcher |
| CMila::Dnn::Compute::Cuda::Linear::Detail::cuda_matmul_impl< TNative > | CUDA kernel dispatcher for Linear operations |
| CMila::Dnn::Compute::Cuda::Linear::Detail::cuda_matmul_impl< float > | |
| CMila::Dnn::Compute::Cuda::Linear::Detail::cuda_matvec_impl< TComputeType, TWeightType > | CUDA kernel dispatcher for matrix-vector multiply (M=1 decode path) |
| CMila::Dnn::Compute::Cuda::Linear::Detail::cuda_matvec_impl< float, float > | |
| CMila::Dnn::Compute::Cuda::Linear::Detail::cuda_matvec_impl< nv_bfloat16, __nv_fp8_e4m3 > | |
| CMila::Dnn::Compute::Cuda::Linear::Detail::cuda_matvec_impl< nv_bfloat16, nv_bfloat16 > | |
| CMila::Dnn::Compute::Cuda::MultiHeadAttention::Detail::cuda_mha_kernels< TNative > | CUDA kernel dispatcher for attention non-matmul operations |
| CMila::Dnn::Compute::Cuda::MultiHeadAttention::Detail::cuda_mha_kernels< float > | |
| CMila::Dnn::Compute::Cuda::MultiHeadAttention::Detail::cuda_mha_kernels< half > | |
| CMila::Dnn::Compute::Cuda::Residual::Detail::cuda_residual_impl< TElementType > | |
| CMila::Dnn::Compute::Cuda::Residual::Detail::cuda_residual_impl< float > | |
| CMila::Dnn::Compute::Cuda::Residual::Detail::cuda_residual_impl< nv_bfloat16 > | |
| CMila::Dnn::Compute::Cuda::RmsNorm::Detail::cuda_rmsnorm_impl< TNative > | CUDA kernel dispatcher for RMSNorm operations |
| CMila::Dnn::Compute::Cuda::RmsNorm::Detail::cuda_rmsnorm_impl< float > | |
| CMila::Dnn::Compute::Cuda::RmsNorm::Detail::cuda_rmsnorm_impl< nv_bfloat16 > | |
| CMila::Dnn::Compute::Cuda::Rope::Detail::cuda_rope_impl< TNative > | CUDA kernel dispatcher for RoPE forward, backward, cache build, and positional decode |
| CMila::Dnn::Compute::Cuda::Rope::Detail::cuda_rope_impl< __nv_bfloat16 > | |
| CMila::Dnn::Compute::Cuda::Rope::Detail::cuda_rope_impl< float > | |
| CMila::Dnn::Compute::Cuda::SoftmaxCrossEntropy::Detail::cuda_softmax_crossentropy_impl< TNative > | CUDA kernel dispatcher for SoftmaxCrossEntropy operations |
| CMila::Dnn::Compute::Cuda::SoftmaxCrossEntropy::Detail::cuda_softmax_crossentropy_impl< float > | |
| CMila::Dnn::Compute::Cuda::SoftmaxCrossEntropy::Detail::cuda_softmax_crossentropy_impl< half > | |
| CMila::Dnn::Compute::Cuda::Softmax::Detail::cuda_softmax_impl< TNative > | |
| CMila::Dnn::Compute::Cuda::Softmax::Detail::cuda_softmax_impl< float > | |
| CMila::Dnn::Compute::Cuda::Softmax::Detail::cuda_softmax_impl< half > | |
| CMila::Dnn::Compute::Cuda::Detail::cuda_structural_kernels< T > | |
| CMila::Dnn::Compute::Cuda::Detail::cuda_structural_kernels< float > | |
| CMila::Dnn::Compute::Cuda::Detail::cuda_structural_kernels< nv_bfloat16 > | |
| CMila::Dnn::Compute::Cuda::Swiglu::Detail::cuda_swiglu_impl< TNative > | |
| CMila::Dnn::Compute::Cuda::Swiglu::Detail::cuda_swiglu_impl< __nv_bfloat16 > | |
| CMila::Dnn::Compute::Cuda::Swiglu::Detail::cuda_swiglu_impl< float > | |
| CMila::Dnn::Compute::Cuda::TokenEmbedding::Detail::cuda_token_embedding_impl< TNative > | |
| CMila::Dnn::Compute::Cuda::TokenEmbedding::Detail::cuda_token_embedding_impl< __nv_bfloat16 > | |
| CMila::Dnn::Compute::Cuda::TokenEmbedding::Detail::cuda_token_embedding_impl< float > | |
| CMila::Dnn::Compute::CudaDataTypeMap< T > | Helper struct to map C++ types to CUDA data types for cuBLASLt |
| CMila::Dnn::Compute::CudaDataTypeMap< __nv_bfloat16 > | |
| CMila::Dnn::Compute::CudaDataTypeMap< float > | |
| CMila::Dnn::Compute::CudaDataTypeMap< half > | |
| CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TDataType > | Compile-time mapping from TensorDataType -> cudaDataType_t |
| CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::FP16 > | |
| CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::FP8_E4M3 > | |
| CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::FP8_E5M2 > | |
| CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::INT32 > | |
| CMila::Dnn::Compute::Cuda::CudaDataTypeTraits< TensorDataType::INT8 > | |
| CMila::Dnn::Compute::CudaDeviceProps | Wrapper for CUDA device properties with cached values |
| CMila::Dnn::Compute::CudaDeviceRegistrar | CUDA device registrar for device-agnostic registration |
| CMila::Dnn::Compute::Cuda::Gelu::CudaGeluOpRegistrar | Class responsible for registering the CudaGeluOp operation |
| CMila::Dnn::Compute::Cuda::Gqa::CudaGroupedQueryAttentionOpRegistrar | |
| CMila::Dnn::Compute::Cuda::LayerNorm::CudaLayerNormOpRegistrar | |
| CMila::Dnn::Compute::Cuda::Linear::CudaLinearOpRegistrar | |
| CMila::Dnn::Compute::Cuda::Lpe::CudaLpeOpRegistrar | |
| CMila::Dnn::Compute::Cuda::MatMulBiasGelu::CudaMatMulBiasGeluOpRegistrar | Class responsible for registering the CudaMatMulBiasGeluOp operation |
| CMila::Dnn::Compute::Cuda::MultiHeadAttention::CudaMultiHeadAttentionOpRegistrar | |
| CMila::Dnn::Compute::Cuda::Residual::CudaResidualOpRegistrar | |
| CMila::Dnn::Compute::Cuda::RmsNorm::CudaRmsNormOpRegistrar | |
| CMila::Dnn::Compute::Cuda::Rope::CudaRopeOpRegistrar | |
| CMila::Dnn::Compute::Cuda::SoftmaxCrossEntropy::CudaSoftmaxCrossEntropyOpRegistrar | Registrar for fused Softmax+CrossEntropy CUDA operation |
| CMila::Dnn::Compute::Cuda::Softmax::CudaSoftmaxOpRegistrar | Class responsible for registering the CudaSoftmaxOp operation |
| CMila::Dnn::Compute::Cuda::Swiglu::CudaSwigluOpRegistrar | |
| CMila::Dnn::Compute::CudaTimer | GPU-accurate interval timer using a CUDA event pair |
| CMila::Dnn::Compute::Cuda::TokenEmbedding::CudaTokenEmbeddingOpRegistrar | |
| CMila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource > | Device-agnostic data loader interface using abstract tensor data types |
| CMila::Data::DataLoader< TensorDataType::INT32, TensorDataType::INT32, TMemoryResource > | |
| CMila::Data::TokenSequenceLoader< TMemoryResource > | Token sequence loader for autoregressive language models |
| CMila::Dnn::Compute::Device | Abstract interface for compute device implementations |
| CMila::Dnn::Compute::CpuDevice | Class representing a CPU compute device |
| CMila::Dnn::Compute::CudaDevice | Class representing a CUDA compute device instance |
| CMila::Dnn::Compute::DeviceAccessible | |
| CMila::Dnn::Compute::DeviceConstructionKey | Construction key for device factories |
| CMila::Dnn::Compute::DeviceId | Lightweight identifier for a compute device |
| CMila::Dnn::Compute::DeviceRegistrar | Device-agnostic registrar for automatic device discovery and registration |
| CMila::Dnn::Compute::DeviceRegistry | Registry of discovered compute devices with lazy instantiation |
| CMila::Dnn::Compute::DeviceTypeTraits< TDevice > | |
| CMila::Dnn::Compute::DeviceTypeTraits< DeviceType::Cpu > | DeviceTypeTraits specialization for the CPU device |
| CMila::Dnn::Compute::DeviceTypeTraits< DeviceType::Cuda > | DeviceTypeTraits specialization for the CUDA device |
| CMila::Dnn::Compute::ExecutionContext< TDeviceType > | Templated execution context for device-specific operations |
| Cstd::false_type | |
| CMila::Dnn::Compute::always_false< T > | |
| CMila::Dnn::dependent_false< T > | |
| CMila::Dnn::Compute::Cpu::FillOps | CPU specialization of TensorOps for initialization operations |
| CMila::Dnn::TensorOps< Compute::DeviceType::Cpu > | |
| CMila::Dnn::TensorOps< Compute::DeviceType::Cuda > | |
| CMila::Dnn::Compute::Cuda::FillOps | CUDA specialization of TensorOps for initialization operations |
| CMila::Dnn::Visualization::Framebuffer | |
| CMila::Dnn::GenerateParams | |
| CMila::Dnn::GenerationStatistics | Statistics captured during a single generateStreaming() call |
| CMila::Dnn::Compute::GqaState | Non-owning pointers to shared transient GQA scratch buffers |
| Cstd::hash< Mila::Dnn::Compute::DeviceId > | Hash specialization for DeviceId |
| CMila::Dnn::Compute::HostAccessible | |
| CMila::Dnn::Compute::IExecutionContext | Type-erased execution context interface |
| CMila::Dnn::Compute::ExecutionContext< DeviceType::Cpu > | CPU execution context specialization |
| CMila::Dnn::Compute::ExecutionContext< DeviceType::Cuda > | CUDA execution context specialization |
| CMila::Dnn::Compute::ExecutionContext< DeviceType::Metal > | Metal execution context specialization |
| CMila::Dnn::Compute::ExecutionContext< DeviceType::Vulkan > | Vulkan execution context specialization |
| CMila::Dnn::Compute::IKvCacheLifecycle | Capability interface for KV-cache state management |
| CMila::Dnn::Compute::IKvInference | Compute interface for attention operations that maintain a KV cache |
| CMila::Dnn::Compute::Cuda::Gqa::CudaGqaOp< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::Gqa::CudaGqaOp< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::Gqa::CudaGqaOp< TPrecision > | CUDA Grouped-Query Attention operation |
| CMila::Dnn::Compute::IPackedKvInference | KV-cache inference interface for packed-QKV MHA backends |
| CMila::Dnn::Compute::Cuda::MultiHeadAttention::CudaMultiHeadAttentionOp< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::MultiHeadAttention::CudaMultiHeadAttentionOp< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::MultiHeadAttention::CudaMultiHeadAttentionOp< TPrecision > | CUDA implementation of Multi-Head Attention using column-major cuBLASLt optimization |
| CIModulePlugin | |
| CMyCustomLayerPlugin | |
| CMila::Dnn::Extensibility::IModulePlugin | |
| CMila::Dnn::Compute::IPositionalDecode | Capability interface for position-dependent unary operations |
| CMila::Dnn::Compute::Cuda::Lpe::CudaLpeOp< TensorDataType::INT32, TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::Lpe::CudaLpeOp< TensorDataType::INT32, TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::Lpe::CudaLpeOp< TInput, TPrecision > | CUDA implementation of the Lpe (token + positional embedding) operation |
| CMila::Dnn::Compute::IPositionalPairedOp | Capability interface for position-dependent paired operations |
| CMila::Dnn::Compute::Cuda::Rope::CudaRopeOp< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::Rope::CudaRopeOp< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::Rope::CudaRopeOp< TComputePrecision > | CUDA implementation of the Rope (rotary positional embedding) operation |
| CMila::Dnn::ITensor | Abstract interface providing essential tensor information and data access |
| CMila::Dnn::Tensor< TInputDataType, TMemoryResource > | |
| CMila::Dnn::Tensor< TTargetDataType, TMemoryResource > | |
| CMila::Dnn::Tensor< TensorDataType::INT32, TMemoryResource > | |
| CMila::Dnn::Tensor< TPrecision, MR > | |
| CMila::Dnn::Tensor< TComputePrecision, MR > | |
| CMila::Dnn::Tensor< kCacheDtype, MR > | |
| CMila::Dnn::Tensor< TIndex, MR > | |
| CMila::Dnn::Tensor< kWeightDtype, MR > | |
| CMila::Dnn::Tensor< TWeightQuant::kScaleDtype, MR > | |
| CMila::Dnn::Tensor< TTargets, MR > | |
| CMila::Dnn::Tensor< dtype_t::INT32, MR > | |
| CMila::Dnn::Tensor< TensorDataType::FP32, MR > | |
| CMila::Dnn::Tensor< TLogits, MR > | |
| CMila::Dnn::Tensor< TInputA, MR > | |
| CMila::Dnn::Tensor< TInputB, MR > | |
| CMila::Dnn::Tensor< TInput, MR > | |
| CMila::Dnn::Tensor< TensorDataType::INT32, MR > | |
| CMila::Dnn::Tensor< TDataType, Compute::CpuMemoryResource > | |
| CMila::Dnn::Tensor< dtype_t::INT32, Mila::Dnn::Compute::CpuMemoryResource > | |
| CMila::Dnn::Tensor< TensorDataType::FP32, Mila::Dnn::Compute::CpuMemoryResource > | |
| CMila::Dnn::Tensor< TDataType, TMemoryResource > | Device-aware N-dimensional tensor |
| CMila::Dnn::Serialization::ITensorBlob | Type-erased interface for a serialized tensor blob |
| CMila::Dnn::Serialization::TensorBlob< MR > | Concrete tensor blob owning a TensorBuffer-backed raw byte buffer |
| CMila::Dnn::LanguageModelConfig< TDerived > | CRTP base configuration for all deployable Mila language models |
| CMila::Dnn::LanguageModelConfig< LlamaModelConfig > | |
| CMila::Dnn::LlamaModelConfig | Deployment configuration for Llama language models |
| CMila::Dnn::LearningRateScheduler | Abstract base for learning-rate schedulers |
| CMila::Dnn::ConstantLRScheduler | Constant learning-rate scheduler |
| CMila::Dnn::CosineLRScheduler | Cosine annealing scheduler |
| CMila::Dnn::LinearLRScheduler | Linear decay scheduler |
| CMila::Dnn::Compute::LinearOpTypeMap< DeviceType::Cpu, TensorDataType::FP32 > | |
| CMila::Logging::Logger | Abstract logging interface and static facade |
| CMila::Logging::ConsoleSink | Thread-safe logging sink that writes formatted records to the console |
| CMila::Logging::FileSink | Thread-safe logging sink that writes formatted records to a file |
| CMila::Logging::NullSink | A logging sink that silently discards all records |
| CMila::Dnn::Compute::Cpu::MathOps | CPU specialization of TensorOps for mathematical operations |
| CMila::Dnn::TensorOps< Compute::DeviceType::Cuda > | |
| CMila::Dnn::Compute::Cuda::MathOps | CUDA specialization of TensorOps for mathematical operations |
| Cstd::pmr::memory_resource | |
| CMila::Dnn::Compute::MemoryResource | Clean memory resource abstraction for device-specific memory allocation |
| CMila::Dnn::Compute::CpuMemoryResource | CPU memory resource for host-accessible memory allocation |
| CMila::Dnn::Compute::CudaDeviceMemoryResource | CUDA device memory resource for GPU-accessible memory allocation |
| CMila::Dnn::Compute::CudaManagedMemoryResource | CUDA managed memory resource for unified host/device accessible memory |
| CMila::Dnn::Compute::CudaPinnedMemoryResource | CUDA pinned memory resource for fast host/device transfer memory |
| CMila::Dnn::Compute::MetalMemoryResource | Stub implementation for non-Apple platforms |
| CMila::Dnn::Compute::TrackedMemoryResource | A memory resource wrapper that tracks allocation and deallocation statistics |
| CMila::Dnn::Compute::VulkanMemoryResource | Stub implementation for platforms without Vulkan support |
| CMila::Dnn::Compute::MemoryResourceTraits< TMemoryResource > | Memory resource traits for compile-time dispatch optimization |
| CMila::Dnn::Compute::MemoryResourceTraits< CpuMemoryResource > | CPU-specific memory resource traits providing detailed CPU backend characteristics |
| CMila::Dnn::Compute::MemoryResourceTraits< CudaDeviceMemoryResource > | CUDA device memory resource traits providing detailed GPU backend characteristics |
| CMila::Dnn::Compute::MemoryResourceTraits< CudaManagedMemoryResource > | CUDA managed memory resource traits providing unified memory characteristics |
| CMila::Dnn::Compute::MemoryResourceTraits< CudaPinnedMemoryResource > | CUDA pinned memory resource traits providing fast transfer characteristics |
| CMila::Dnn::Compute::MemoryStats | Global memory statistics for all TrackedMemoryResource instances |
| CMila::Dnn::MemoryStats | Memory allocation breakdown for a single component |
| CMila::Dnn::Compute::MetalDevicePlugin | Metal device plugin for device-agnostic registration |
| CMila::Data::MilaFileHeader | Common file header for Mila data files |
| CMila::Dnn::Detail::mlp_activation_impl< TActivation, TDeviceType, TPrecision > | |
| CMila::Dnn::Detail::mlp_activation_impl< ActivationType::Gelu, TDeviceType, TPrecision > | |
| CMila::Dnn::Detail::mlp_activation_impl< ActivationType::Swiglu, TDeviceType, TPrecision > | |
| CMila::Dnn::Model< TDeviceType, TPrecision > | |
| CMila::Dnn::LanguageModel< TDeviceType, TPrecision > | |
| CMila::Dnn::GptModel< TDeviceType, TPrecision > | GPT inference model |
| CMila::Dnn::LlamaModel< TDeviceType, TPrecision > | LLaMA 3 compatible inference model |
| CMila::Dnn::Serialization::ModelArchive | ModelArchive provides high-level helpers for component serialization |
| CMila::Dnn::ModelConfig | Abstract base configuration for all deployable Mila models |
| CMila::Dnn::Visualization::ModuleVisualizer | |
| CMila::Dnn::Visualization::LayerNormVisualizer | |
| CMila::Dnn::Visualization::MLPVisualizer | |
| CMila::Dnn::MultiAxisPartition | Multi-axis partition for normalization over trailing dimensions |
| CMila::Dnn::NetworkFactory | Factory registry for Network deserialization |
| CMila::Dnn::Quant::KvCache::NoKvCompression | |
| CMila::Dnn::Quant::Weight::NoWeightQuant | |
| CMila::Profiling::NvtxRange | |
| CMila::Dnn::Compute::Operation< TDeviceType, TComputePrecision > | |
| CMila::Dnn::Compute::Operation< DeviceType::Cuda, TComputePrecision > | |
| CMila::Dnn::Compute::Cuda::Linear::CudaLinearOp< TensorDataType::FP32, NoWeightQuant > | |
| CMila::Dnn::Compute::Cuda::Linear::CudaLinearOp< TensorDataType::BF16, NoWeightQuant > | |
| CMila::Dnn::Compute::Cuda::Linear::CudaLinearOp< TensorDataType::BF16, PerChannelFp8<> > | |
| CMila::Dnn::Compute::Cuda::Linear::CudaLinearOp< TensorDataType::BF16, PerGroupInt4< 128 > > | |
| CMila::Dnn::Compute::Cuda::Linear::CudaLinearOp< TensorDataType::BF16, PerGroupInt4< 64 > > | |
| CMila::Dnn::Compute::Cuda::Linear::CudaLinearOp< TensorDataType::BF16, PerGroupFp4< 128 > > | |
| CMila::Dnn::Compute::Cuda::Linear::CudaLinearOp< TensorDataType::BF16, PerGroupFp4< 64 > > | |
| CMila::Dnn::Compute::Cuda::Linear::CudaLinearOp< TComputePrecision, TWeightQuant > | CUDA Linear operation with compile-time weight quantization policy dispatch |
| CMila::Dnn::Compute::Operation< DeviceType::Cuda, TPrecision > | |
| CMila::Dnn::Compute::Cuda::Gqa::CudaGqaOp< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::Gqa::CudaGqaOp< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::Gqa::CudaGqaOp< TPrecision > | CUDA Grouped-Query Attention operation |
| CMila::Dnn::Compute::Operation< TDeviceType, TInput > | |
| CMila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::FP32 > | |
| CMila::Dnn::Compute::CpuAttentionOp | CPU implementation of Multi-Head Attention operation |
| CMila::Dnn::Compute::CpuGeluOp | CPU implementation of GELU activation operation using abstract TensorDataType |
| CMila::Dnn::Compute::CpuLayerNormOp | CPU implementation of Layer Normalization using abstract TensorDataType API |
| CMila::Dnn::Compute::CpuLinearOp | CPU implementation of Linear operation using abstract TensorDataType API |
| CMila::Dnn::Compute::CpuSoftmaxOp | CPU implementation of Softmax using abstract TensorDataType API |
| CMila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TPrecision > | |
| CMila::Dnn::Compute::Cuda::Gelu::CudaGeluOp< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::Gelu::CudaGeluOp< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::MultiHeadAttention::CudaMultiHeadAttentionOp< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::MultiHeadAttention::CudaMultiHeadAttentionOp< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::RmsNorm::CudaRmsNormOp< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::RmsNorm::CudaRmsNormOp< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::Softmax::CudaSoftmaxOp< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::Softmax::CudaSoftmaxOp< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::Swiglu::CudaSwigluOp< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::Swiglu::CudaSwigluOp< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::Gelu::CudaGeluOp< TPrecision > | CUDA implementation of the GELU activation function for neural networks |
| CMila::Dnn::Compute::Cuda::LayerNorm::CudaLayerNormOp< TPrecision > | CUDA implementation of Layer Normalization |
| CMila::Dnn::Compute::Cuda::MultiHeadAttention::CudaMultiHeadAttentionOp< TPrecision > | CUDA implementation of Multi-Head Attention using column-major cuBLASLt optimization |
| CMila::Dnn::Compute::Cuda::RmsNorm::CudaRmsNormOp< TPrecision > | CUDA implementation of RMS Normalization |
| CMila::Dnn::Compute::Cuda::Softmax::CudaSoftmaxOp< TPrecision > | CUDA implementation of Softmax operation using abstract TensorDataType API |
| CMila::Dnn::Compute::Cuda::Swiglu::CudaSwigluOp< TPrecision > | |
| CMila::Dnn::Compute::UnaryOperation< TDeviceType, TPrecision > | |
| CMila::Dnn::Compute::UnaryOperation< TDeviceType, TInput, TPrecision > | |
| CMila::Dnn::Compute::Operation< TDeviceType, TPrecision > | |
| CMila::Dnn::Compute::BinaryOperation< DeviceType::Cpu, TPrecision, TLogits, TTargets > | |
| CMila::Dnn::Compute::BinaryOperation< DeviceType::Cuda, TPrecision, TLogits, TTargets > | |
| CMila::Dnn::Compute::BinaryOperation< DeviceType::Cpu, TensorDataType::FP32 > | |
| CMila::Dnn::Compute::CpuResidualOp | CPU Residual operation (FP32) implementing BinaryOperation interface |
| CMila::Dnn::Compute::BinaryOperation< DeviceType::Cpu, TensorDataType::FP32, TensorDataType::FP32, TensorDataType::INT32 > | |
| CMila::Dnn::Compute::CpuSoftmaxCrossEntropyOp< TPrecision, TLogits, TTargets > | Fused CPU implementation of Softmax + CrossEntropy using abstract TensorDataType API |
| CMila::Dnn::Compute::BinaryOperation< DeviceType::Cuda, TInputA, TInputA, TInputA > | |
| CMila::Dnn::Compute::Cuda::Residual::CudaResidualOp< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::Residual::CudaResidualOp< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::Residual::CudaResidualOp< TInputA, TInputB, TPrecision > | CUDA Residual operation implementing the BinaryOperation interface |
| CMila::Dnn::Compute::BinaryOperation< DeviceType::Cuda, TPrecision, TPrecision, TensorDataType::INT32 > | |
| CMila::Dnn::Compute::Cuda::SoftmaxCrossEntropy::CudaSoftmaxCrossEntropyOp< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::SoftmaxCrossEntropy::CudaSoftmaxCrossEntropyOp< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::SoftmaxCrossEntropy::CudaSoftmaxCrossEntropyOp< TPrecision, TLogits, TTargets > | Fused CUDA implementation of Softmax + CrossEntropy using abstract TensorDataType API |
| CMila::Dnn::Compute::PairedOperation< DeviceType::Cuda, TComputePrecision > | |
| CMila::Dnn::Compute::Cuda::Rope::CudaRopeOp< TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::Rope::CudaRopeOp< TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::Rope::CudaRopeOp< TComputePrecision > | CUDA implementation of the Rope (rotary positional embedding) operation |
| CMila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, int, float > | |
| CMila::Dnn::Compute::CpuCrossEntropyOp | CPU implementation of the cross entropy loss operation for neural networks |
| CMila::Dnn::Compute::UnaryOperation< DeviceType::Cpu, TensorDataType::INT32, TensorDataType::FP32 > | |
| CMila::Dnn::Compute::CpuEncoderOp | CPU implementation of the Encoder operation |
| CMila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TInput, TPrecision > | |
| CMila::Dnn::Compute::Cuda::Lpe::CudaLpeOp< TensorDataType::INT32, TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::Lpe::CudaLpeOp< TensorDataType::INT32, TensorDataType::BF16 > | |
| CMila::Dnn::Compute::Cuda::TokenEmbedding::CudaTokenEmbeddingOp< TensorDataType::INT32, TensorDataType::FP32 > | |
| CMila::Dnn::Compute::Cuda::TokenEmbedding::CudaTokenEmbeddingOp< TensorDataType::INT32, TensorDataType::BF16 > | |
| CMila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TInput, TOutput > | |
| CMila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, TInput, TInput > | |
| CMila::Dnn::Compute::Cuda::Lpe::CudaLpeOp< TInput, TPrecision > | CUDA implementation of the Lpe (token + positional embedding) operation |
| CMila::Dnn::Compute::Cuda::TokenEmbedding::CudaTokenEmbeddingOp< TInput, TPrecision > | |
| CMila::Dnn::Compute::UnaryOperation< DeviceType::Cuda, float, float > | |
| CMila::Dnn::Compute::Cuda::MatMulBiasGelu::CudaMatMulBiasGeluOp< TInput, TOutput > | CUDA implementation of the fused MatMul-Bias-GELU operation |
| CMila::Dnn::Compute::BinaryOperation< TDeviceType, TPrecision, TInputA, TInputB > | |
| CMila::Dnn::Compute::PairedOperation< TDeviceType, TPrecision, TInputA, TInputB > | Abstract base for paired operations: two inputs -> two outputs |
| CMila::Dnn::Compute::OperationRegistry | Central registry for typed, device-aware compute operations |
| CMila::Dnn::Compute::OperationsRegistrar | Class to manage compute operations initialization |
| CMila::Dnn::Compute::OperationTraits< TOp, TDeviceType, TPrecision, TPolicy > | Primary traits template for unified compile-time operation dispatch |
| CMila::Dnn::Compute::OperationTraits< OperationType::CrossEntropyOp, DeviceType::Cuda, TensorDataType::BF16, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::CrossEntropyOp, DeviceType::Cuda, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::GeluOp, DeviceType::Cpu, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::GeluOp, DeviceType::Cuda, TensorDataType::BF16, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::GeluOp, DeviceType::Cuda, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::GroupedQueryAttentionOp, DeviceType::Cuda, TensorDataType::BF16, NoKvCompression > | Unquantized BF16 path. No KV cache compression. Standard inference precision |
| CMila::Dnn::Compute::OperationTraits< OperationType::GroupedQueryAttentionOp, DeviceType::Cuda, TensorDataType::FP32, NoKvCompression > | Unquantized FP32 path. No KV cache compression |
| CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cpu, TensorDataType::FP32, NoWeightQuant > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, NoWeightQuant > | Unquantized BF16 path. Standard inference precision |
| CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerChannelFp8<> > | FP8 per-channel quantized BF16 path. Requires SM >= 8.0 (Ampere+) |
| CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupFp4< 128 > > | FP4 E2M1 per-group quantized BF16 path. W4A16 fused GEMM with E2M1 decode, group_size=128. Requires SM >= 8.0 |
| CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupFp4< 64 > > | FP4 E2M1 per-group quantized BF16 path. W4A16 fused GEMM with E2M1 decode, group_size=64. Requires SM >= 8.0 |
| CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupInt4< 128 > > | INT4 per-group quantized BF16 path. W4A16 fused GEMM, group_size=128. Requires SM >= 8.0 |
| CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::BF16, PerGroupInt4< 64 > > | INT4 per-group quantized BF16 path. W4A16 fused GEMM, group_size=64. Requires SM >= 8.0 |
| CMila::Dnn::Compute::OperationTraits< OperationType::LinearOp, DeviceType::Cuda, TensorDataType::FP32, NoWeightQuant > | Unquantized FP32 path. Retained for validation and reference |
| CMila::Dnn::Compute::OperationTraits< OperationType::LpeOp, DeviceType::Cpu, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::LpeOp, DeviceType::Cuda, TensorDataType::BF16, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::LpeOp, DeviceType::Cuda, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::MultiHeadAttentionOp, DeviceType::Cpu, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::MultiHeadAttentionOp, DeviceType::Cuda, TensorDataType::BF16, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::MultiHeadAttentionOp, DeviceType::Cuda, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::ResidualOp, DeviceType::Cpu, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::ResidualOp, DeviceType::Cuda, TensorDataType::BF16, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::ResidualOp, DeviceType::Cuda, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::RmsNormOp, DeviceType::Cuda, TensorDataType::BF16, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::RmsNormOp, DeviceType::Cuda, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::RopeOp, DeviceType::Cuda, TensorDataType::BF16, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::RopeOp, DeviceType::Cuda, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::SoftmaxOp, DeviceType::Cpu, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::SoftmaxOp, DeviceType::Cuda, TensorDataType::BF16, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::SoftmaxOp, DeviceType::Cuda, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::SwigluOp, DeviceType::Cuda, TensorDataType::BF16, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::SwigluOp, DeviceType::Cuda, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::TokenEmbeddingOp, DeviceType::Cuda, TensorDataType::BF16, void > | |
| CMila::Dnn::Compute::OperationTraits< OperationType::TokenEmbeddingOp, DeviceType::Cuda, TensorDataType::FP32, void > | |
| CMila::Dnn::Compute::Optimizer< TDeviceType, TPrecision > | Abstract base class for parameter optimizers |
| CMila::Dnn::Optimizers::AdamWOptimizer< TDeviceType, TPrecision > | Device-agnostic AdamW optimizer |
| CMila::Dnn::Compute::Optimizer< DeviceType::Cpu, TPrecision > | |
| CMila::Dnn::Compute::CpuAdamWOptimizer< TPrecision > | CPU-specific AdamW optimizer implementation |
| CMila::Dnn::Compute::Optimizer< DeviceType::Cuda, TPrecision > | |
| CMila::Dnn::Compute::CudaAdamWOptimizer< TPrecision > | CUDA-specific AdamW optimizer implementation |
| CMila::Data::BpeVocabulary::PairHash | |
| CMila::Data::BpeVocabulary::PairViewHash | |
| CMila::Dnn::Quant::Weight::PerChannelFp8< TStorage > | |
| CMila::Dnn::Quant::KvCache::PerChannelKvFp8< TStorage > | Symmetric per-head per-token FP8 KV cache compression policy |
| CMila::Dnn::Quant::Weight::PerGroupFp4< kGroupSize > | |
| CMila::Dnn::Quant::Weight::PerGroupInt4< kGroupSize > | |
| CMila::Dnn::Extensibility::PluginManager::PluginEntry | |
| CMila::Dnn::Extensibility::PluginInfo | Get plugin metadata |
| CMila::Dnn::Extensibility::PluginManager | Manages loading and querying of module plugins |
| CMila::Dnn::Serialization::PretrainedMetadata | Metadata for pretrained model |
| CMila::Dnn::Serialization::PretrainedModelReader | Reader for Mila pretrained binary format |
| CMila::Core::RandomGenerator | Singleton class providing centralized random number generation |
| CMila::Dnn::Compute::Cuda::RandomOps | |
| CMila::Dnn::TensorOps< Compute::DeviceType::Cuda > | |
| CMila::Dnn::Visualization::Rect | |
| CMila::Dnn::Visualization::RGB | |
| CMila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry | Process-wide shared cache for RoPE cos/sin frequency tables |
| Cstd::runtime_error | |
| CCudaException | |
| CMila::Dnn::Compute::CublasLtError | |
| CMila::Dnn::Compute::CudaError | Exception class for CUDA runtime errors |
| CMila::Dnn::Serialization::ModelArchive::ScopedScope | |
| CMila::Data::SerializationMetadata | Type-safe metadata container for component serialization |
| CMila::Dnn::Optimizers::SerializationMetadata | Type-safe metadata container for component serialization |
| CMila::Dnn::Serialization::SerializationMetadata | Type-safe metadata container for component serialization |
| CMila::Dnn::SerializationMetadata | Type-safe metadata container for component serialization |
| CMila::Dnn::Serialization::Serializer | Minimal base interface for model serialization backends |
| CMila::Dnn::Serialization::ArchiveSerializer | Interface for hierarchical archive serializers |
| CMila::Dnn::Serialization::ZipSerializer | ZIP archive serializer built on miniz |
| CMila::Data::SpecialTokens | Configuration for special tokens across all tokenizer types |
| CMila::Utils::StepLogger | |
| CMila::Dnn::Compute::Cuda::StructuralOps | |
| CMila::Dnn::TensorOps< Compute::DeviceType::Cuda > | |
| CMila::Dnn::Serialization::TensorBlobMetadata | Metadata for a tensor blob in pretrained model format |
| CMila::Dnn::TensorBuffer< TDataType, TMemoryResource, TrackMemory > | Device-agnostic buffer for storing tensor data with abstract type system |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TDataType > | Compile-time mapping from abstract TensorDataType -> CUDA native device type |
| CMila::Dnn::TensorDataTypeMap< TElementType > | Primary template for mapping concrete C++ types to TensorDataType |
| CMila::Dnn::TensorDataTypeMap< __nv_fp8_e4m3 > | |
| CMila::Dnn::TensorDataTypeMap< __nv_fp8_e5m2 > | |
| CMila::Dnn::TensorDataTypeMap< float > | Concrete type mapping for float (FP32) |
| CMila::Dnn::TensorDataTypeMap< half > | |
| CMila::Dnn::TensorDataTypeMap< nv_bfloat16 > | |
| CMila::Dnn::TensorDataTypeMap< std::int16_t > | Concrete type mapping for 16-bit signed integer |
| CMila::Dnn::TensorDataTypeMap< std::int32_t > | Concrete type mapping for 32-bit signed integer |
| CMila::Dnn::TensorDataTypeMap< std::int8_t > | Concrete type mapping for 8-bit signed integer |
| CMila::Dnn::TensorDataTypeMap< std::uint16_t > | Concrete type mapping for 16-bit unsigned integer |
| CMila::Dnn::TensorDataTypeMap< std::uint32_t > | Concrete type mapping for 32-bit unsigned integer |
| CMila::Dnn::TensorDataTypeMap< std::uint8_t > | Concrete type mapping for 8-bit unsigned integer |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::BF16 > | Maps TensorDataType::BF16 to CUDA __nv_bfloat16 |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::FP16 > | Maps TensorDataType::FP16 to CUDA __half |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::FP32 > | Maps TensorDataType::FP32 to CUDA float |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::FP4_E2M1 > | Maps TensorDataType::FP4_E2M1 to std::uint8_t |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::FP4_E3M0 > | Maps TensorDataType::FP4_E3M0 to std::uint8_t |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::FP8_E4M3 > | Maps TensorDataType::FP8_E4M3 to CUDA __nv_fp8_e4m3 |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::FP8_E5M2 > | Maps TensorDataType::FP8_E5M2 to CUDA __nv_fp8_e5m2 |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::INT16 > | Maps TensorDataType::INT16 to std::int16_t |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::INT32 > | Maps TensorDataType::INT32 to std::int32_t |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::INT8 > | Maps TensorDataType::INT8 to std::int8_t |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::UINT16 > | Maps TensorDataType::UINT16 to std::uint16_t |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::UINT32 > | Maps TensorDataType::UINT32 to std::uint32_t |
| CMila::Dnn::Compute::Cuda::TensorDataTypeMap< TensorDataType::UINT8 > | Maps TensorDataType::UINT8 to std::uint8_t |
| CMila::Dnn::TensorDataTypeTraits< TDataType > | Compile-time traits for TensorDataType enumeration values |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::BF16 > | Traits specialization for 16-bit brain floating point |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::FP16 > | Traits specialization for 16-bit half precision floating point |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::FP32 > | Traits specialization for 32-bit IEEE 754 floating point |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::FP4_E2M1 > | Traits specialization for 4-bit floating point with E2M1 format |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::FP4_E3M0 > | Traits specialization for 4-bit floating point with E3M0 format |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::FP8_E4M3 > | Traits specialization for 8-bit floating point with E4M3 format |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::FP8_E5M2 > | Traits specialization for 8-bit floating point with E5M2 format |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::INT16 > | Traits specialization for 16-bit signed integer |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::INT32 > | Traits specialization for 32-bit signed integer |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::INT8 > | Traits specialization for 8-bit signed integer |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::UINT16 > | Traits specialization for 16-bit unsigned integer |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::UINT32 > | Traits specialization for 32-bit unsigned integer |
| CMila::Dnn::TensorDataTypeTraits< TensorDataType::UINT8 > | Traits specialization for 8-bit unsigned integer |
| CMila::Dnn::TensorHostTypeMap< TDataType > | Maps abstract TensorDataType to host-compatible C++ type and TensorDataType |
| CMila::Dnn::TensorHostTypeMap< TensorDataType::BF16 > | Host type for 16-bit brain floating point |
| CMila::Dnn::TensorHostTypeMap< TensorDataType::FP16 > | Host type for 16-bit half precision floating point |
| CMila::Dnn::TensorHostTypeMap< TensorDataType::FP32 > | Host type for 32-bit IEEE 754 floating point |
| CMila::Dnn::TensorHostTypeMap< TensorDataType::FP8_E4M3 > | Host type for 8-bit floating point with E4M3 format |
| CMila::Dnn::TensorHostTypeMap< TensorDataType::FP8_E5M2 > | Host type for 8-bit floating point with E5M2 format |
| CMila::Dnn::TensorHostTypeMap< TensorDataType::INT16 > | Host type for 16-bit signed integer |
| CMila::Dnn::TensorHostTypeMap< TensorDataType::INT32 > | Host type for 32-bit signed integer |
| CMila::Dnn::TensorHostTypeMap< TensorDataType::INT8 > | Host type for 8-bit signed integer |
| CMila::Dnn::TensorHostTypeMap< TensorDataType::UINT16 > | Host type for 16-bit unsigned integer |
| CMila::Dnn::TensorHostTypeMap< TensorDataType::UINT32 > | Host type for 32-bit unsigned integer |
| CMila::Dnn::TensorHostTypeMap< TensorDataType::UINT8 > | Host type for 8-bit unsigned integer |
| CMila::Dnn::Serialization::TensorMetadata | Metadata describing a tensor in serialized form |
| CMila::Dnn::TensorOps< TDevice > | Device-dispatched TensorOps interface template |
| CMila::Dnn::TensorShape | Fixed-capacity inline shape descriptor for N-dimensional tensors |
| CMila::Data::Tokenizer | |
| CMila::Data::BpeTokenizer | Unified BPE tokenizer targeting GPT-2, Llama 3.x, and Mistral model families |
| CMila::Data::CharTokenizer | Character-level tokenizer |
| CMila::Data::TokenizerTrainer | Abstract interface for training tokenizer vocabularies from text corpora |
| CMila::Data::TokenizerVocabulary | Generic tokenizer vocabulary interface |
| CMila::Data::BpeVocabulary | Unified Byte Pair Encoding (BPE) vocabulary |
| CMila::Data::CharVocabulary | Character vocabulary for tokenization |
| CMila::Data::TokenSequenceLoaderConfig | Configuration for StreamingSequenceLoader behavior |
| CMila::Data::TrainerFactory | Factory for creating tokenizer trainers and loading vocabularies |
| CMila::Dnn::Compute::Cpu::TransferOps | CPU specialization of TensorOps for transfer operations |
| CMila::Dnn::TensorOps< Compute::DeviceType::Cpu > | |
| CMila::Dnn::TensorOps< Compute::DeviceType::Cuda > | |
| CMila::Dnn::Compute::Cuda::TransferOps | CUDA specialization of TensorOps for tensor transfer operations |
| CMila::Dnn::Compute::OperationRegistry::TypeID | Composite key for registry lookup |
| CMila::Dnn::Compute::OperationRegistry::TypeIDHash | |
| CMila::Dnn::UniqueIdGenerator | Thread-safe generator for unique tensor identifiers |
| CMila::Version | Semantic Version data |
| CMila::Dnn::Visualization::VisualizerContext | |
| CMila::Dnn::VulkanTensorTraits | Vulkan-specific traits for abstract tensor data types |
| CMila::Dnn::Compute::Cpu::ZeroOps | |
| CMila::Dnn::TensorOps< Compute::DeviceType::Cpu > | |
| CMila::Dnn::TensorOps< Compute::DeviceType::Cuda > | |
| CMila::Dnn::Compute::Cuda::ZeroOps | |