|
Mila 0.13.48
Deep Neural Network Library
|
Namespaces | |
| namespace | Compute |
| namespace | Detail |
| namespace | detail |
| namespace | Extensibility |
| namespace | Optimizers |
| namespace | Quant |
| namespace | Serialization |
| namespace | Visualization |
Classes | |
| struct | AxisPartition |
| Information about axis partitioning of a tensor. More... | |
| class | BufferedTokenStreamer |
| Buffers BufSize tokens before forwarding a contiguous span to Sink. More... | |
| class | BuildContext |
| Build-time context for Component::build(). More... | |
| class | Component |
| Abstract base class for neural network components. More... | |
| class | ComponentConfig |
| Abstract base for component configuration objects. More... | |
| class | ComponentFactory |
| Factory for reconstructing components from serialized archives. More... | |
| class | CompositeComponent |
| A component that contains and manages child components. More... | |
| class | ConstantLRScheduler |
| Constant learning-rate scheduler. More... | |
| class | CosineLRScheduler |
| Cosine annealing scheduler. More... | |
| class | CpuTensorDataTypeTraits |
| CPU-specific traits for abstract tensor data types. More... | |
| class | CrossEntropyConfig |
| Configuration for fused SoftmaxCrossEntropy loss. More... | |
| struct | dependent_false |
| class | Dropout |
| Dropout regularization module for neural networks. More... | |
| class | DropoutConfig |
| Configuration class for Dropout module. More... | |
| class | FusedComponent |
| DEPRECATED. More... | |
| class | Gelu |
| Gaussian Error Linear Unit (GELU) activation component. More... | |
| class | GeluConfig |
| Configuration class for GELU module. More... | |
| struct | GenerateParams |
| struct | GenerationStatistics |
| Statistics captured during a single generateStreaming() call. More... | |
| class | GptBlock |
| Transformer encoder block as a composite component. More... | |
| class | GptBlockConfig |
| Configuration class for GPT transformer blocks. More... | |
| class | GptConfig |
| Network-level configuration for GPT-style transformer networks. More... | |
| class | GptModel |
| GPT inference model. More... | |
| class | GptTransformer |
| GPT-2 style transformer (decoder-only) for autoregressive token prediction. More... | |
| class | GqaConfig |
| Configuration class for the Grouped-Query Attention module. More... | |
| class | GroupedQueryAttention |
| Grouped-Query Attention module that accepts concatenated QKV input. More... | |
| class | ITensor |
| Abstract interface providing essential tensor information and data access. More... | |
| class | LanguageModel |
| struct | LanguageModelConfig |
| CRTP base configuration for all deployable Mila language models. More... | |
| class | LanguageNetwork |
| class | LayerNorm |
| Device-templated Layer Normalization component. More... | |
| class | LayerNormConfig |
| class | LearningRateScheduler |
| Abstract base for learning-rate schedulers. More... | |
| class | Linear |
| Device-templated fully connected (linear) component. More... | |
| class | LinearConfig |
| Configuration object for a Linear (fully connected) layer. More... | |
| class | LinearLRScheduler |
| Linear decay scheduler. More... | |
| class | LlamaBlock |
| class | LlamaConfig |
| Network-level configuration for LLaMA-style transformer networks. More... | |
| class | LlamaModel |
| LLaMA 3 compatible inference model. More... | |
| struct | LlamaModelConfig |
| Deployment configuration for Llama language models. More... | |
| class | LlamaTransformer |
| LLaMA-style transformer (decoder-only) for autoregressive token prediction. More... | |
| class | Loss |
| Abstract base class for neural network loss functions. More... | |
| class | Lpe |
| Encoder module for token and positional embeddings (device-templated). More... | |
| class | LpeConfig |
| Configuration class for the Learned Positional Encoder. More... | |
| struct | MemoryStats |
| Memory allocation breakdown for a single component. More... | |
| class | MLP |
| Multi-Layer Perceptron (MLP) composite component. More... | |
| class | MLPConfig |
| Configuration class for the Multi-Layer Perceptron (MLP) block. More... | |
| class | Model |
| class | ModelConfig |
| Abstract base configuration for all deployable Mila models. More... | |
| struct | MultiAxisPartition |
| Multi-axis partition for normalization over trailing dimensions. More... | |
| class | MultiHeadAttention |
| Multi-Head Attention module that accepts concatenated QKV input. More... | |
| class | MultiHeadAttentionConfig |
| Configuration class for Attention module. More... | |
| class | Network |
| Root composite network container. More... | |
| class | NetworkFactory |
| Factory registry for Network deserialization. More... | |
| class | Residual |
| Device-templated Residual connection component. More... | |
| class | ResidualConfig |
| Configuration class for Residual connection component. More... | |
| class | RmsNorm |
| Device-templated RMS Normalization component. More... | |
| class | RmsNormConfig |
| class | Rope |
| Device-templated RoPE component. More... | |
| class | RopeConfig |
| class | SerializationMetadata |
| Type-safe metadata container for component serialization. More... | |
| class | Softmax |
| Softmax activation module (device-templated). More... | |
| class | SoftmaxConfig |
| Configuration class for Softmax module. More... | |
| class | SoftmaxCrossEntropy |
| Fused SoftmaxCrossEntropy loss module (device-templated). More... | |
| class | Swiglu |
| SwiGLU activation component. More... | |
| class | SwigluConfig |
| class | Tensor |
| Device-aware N-dimensional tensor. More... | |
| class | TensorBuffer |
| Device-agnostic buffer for storing tensor data with abstract type system. More... | |
| struct | TensorDataTypeMap |
| Primary template for mapping concrete C++ types to TensorDataType. More... | |
| struct | TensorDataTypeMap< __nv_fp8_e4m3 > |
| struct | TensorDataTypeMap< __nv_fp8_e5m2 > |
| struct | TensorDataTypeMap< float > |
| Concrete type mapping for float (FP32). More... | |
| struct | TensorDataTypeMap< half > |
| struct | TensorDataTypeMap< nv_bfloat16 > |
| struct | TensorDataTypeMap< std::int16_t > |
| Concrete type mapping for 16-bit signed integer. More... | |
| struct | TensorDataTypeMap< std::int32_t > |
| Concrete type mapping for 32-bit signed integer. More... | |
| struct | TensorDataTypeMap< std::int8_t > |
| Concrete type mapping for 8-bit signed integer. More... | |
| struct | TensorDataTypeMap< std::uint16_t > |
| Concrete type mapping for 16-bit unsigned integer. More... | |
| struct | TensorDataTypeMap< std::uint32_t > |
| Concrete type mapping for 32-bit unsigned integer. More... | |
| struct | TensorDataTypeMap< std::uint8_t > |
| Concrete type mapping for 8-bit unsigned integer. More... | |
| struct | TensorDataTypeTraits |
| Compile-time traits for TensorDataType enumeration values. More... | |
| struct | TensorDataTypeTraits< TensorDataType::BF16 > |
| Traits specialization for 16-bit brain floating point. More... | |
| struct | TensorDataTypeTraits< TensorDataType::FP16 > |
| Traits specialization for 16-bit half precision floating point. More... | |
| struct | TensorDataTypeTraits< TensorDataType::FP32 > |
| Traits specialization for 32-bit IEEE 754 floating point. More... | |
| struct | TensorDataTypeTraits< TensorDataType::FP4_E2M1 > |
| Traits specialization for 4-bit floating point with E2M1 format. More... | |
| struct | TensorDataTypeTraits< TensorDataType::FP4_E3M0 > |
| Traits specialization for 4-bit floating point with E3M0 format. More... | |
| struct | TensorDataTypeTraits< TensorDataType::FP8_E4M3 > |
| Traits specialization for 8-bit floating point with E4M3 format. More... | |
| struct | TensorDataTypeTraits< TensorDataType::FP8_E5M2 > |
| Traits specialization for 8-bit floating point with E5M2 format. More... | |
| struct | TensorDataTypeTraits< TensorDataType::INT16 > |
| Traits specialization for 16-bit signed integer. More... | |
| struct | TensorDataTypeTraits< TensorDataType::INT32 > |
| Traits specialization for 32-bit signed integer. More... | |
| struct | TensorDataTypeTraits< TensorDataType::INT8 > |
| Traits specialization for 8-bit signed integer. More... | |
| struct | TensorDataTypeTraits< TensorDataType::UINT16 > |
| Traits specialization for 16-bit unsigned integer. More... | |
| struct | TensorDataTypeTraits< TensorDataType::UINT32 > |
| Traits specialization for 32-bit unsigned integer. More... | |
| struct | TensorDataTypeTraits< TensorDataType::UINT8 > |
| Traits specialization for 8-bit unsigned integer. More... | |
| struct | TensorHostTypeMap |
| Maps abstract TensorDataType to host-compatible C++ type and TensorDataType. More... | |
| struct | TensorHostTypeMap< TensorDataType::BF16 > |
| Host type for 16-bit brain floating point. More... | |
| struct | TensorHostTypeMap< TensorDataType::FP16 > |
| Host type for 16-bit half precision floating point. More... | |
| struct | TensorHostTypeMap< TensorDataType::FP32 > |
| Host type for 32-bit IEEE 754 floating point. More... | |
| struct | TensorHostTypeMap< TensorDataType::FP8_E4M3 > |
| Host type for 8-bit floating point with E4M3 format. More... | |
| struct | TensorHostTypeMap< TensorDataType::FP8_E5M2 > |
| Host type for 8-bit floating point with E5M2 format. More... | |
| struct | TensorHostTypeMap< TensorDataType::INT16 > |
| Host type for 16-bit signed integer. More... | |
| struct | TensorHostTypeMap< TensorDataType::INT32 > |
| Host type for 32-bit signed integer. More... | |
| struct | TensorHostTypeMap< TensorDataType::INT8 > |
| Host type for 8-bit signed integer. More... | |
| struct | TensorHostTypeMap< TensorDataType::UINT16 > |
| Host type for 16-bit unsigned integer. More... | |
| struct | TensorHostTypeMap< TensorDataType::UINT32 > |
| Host type for 32-bit unsigned integer. More... | |
| struct | TensorHostTypeMap< TensorDataType::UINT8 > |
| Host type for 8-bit unsigned integer. More... | |
| struct | TensorOps |
| Device-dispatched TensorOps interface template. More... | |
| struct | TensorOps< Compute::DeviceType::Cpu > |
| struct | TensorOps< Compute::DeviceType::Cuda > |
| struct | TensorShape |
| Fixed-capacity inline shape descriptor for N-dimensional tensors. More... | |
| class | TokenEmbedding |
| Pure token embedding component (device-templated). More... | |
| class | TokenEmbeddingConfig |
| Configuration for the TokenEmbedding component. More... | |
| class | UniqueIdGenerator |
| Thread-safe generator for unique tensor identifiers. More... | |
| class | VulkanTensorTraits |
| Vulkan-specific traits for abstract tensor data types. More... | |
Concepts | |
| concept | DeviceOnlyTensorDataType |
| Concept identifying device-only abstract data types. | |
| concept | HostCompatibleTensorDataType |
| Concept identifying host-compatible abstract data types. | |
| concept | isValidTensor |
| Primary tensor configuration validation concept. | |
| concept | PrecisionSupportedOnDevice |
| Concept to validate precision is supported on a device at compile-time. | |
| concept | TokenSink |
| Satisfied by any callable accepting a span of decoded tokens. | |
| concept | TokenStreamer |
| Satisfied by any callable accepting a single decoded token. | |
| concept | ValidFloatTensorDataType |
| Concept constraining abstract data types to floating-point formats. | |
| concept | ValidIntegerTensorDataType |
| Concept constraining abstract data types to integer formats. | |
Typedefs | |
| template<typename TInput = float, typename TOutput = TInput> | |
| using | Mila::Dnn::CpuDropout = Dropout<DeviceType::Cpu, TInput, TOutput> |
| Type alias for CPU-based dropout module with customizable tensor types. | |
| template<typename TInput = float, typename TOutput = TInput> | |
| using | Mila::Dnn::CudaDropout = Dropout<DeviceType::Cuda, TInput, TOutput> |
| Type alias for CUDA-based dropout module with customizable tensor types. | |
| using | Mila::Dnn::dim_t = int64_t |
| Integer type used for tensor dimensions and indices. | |
| using | Mila::Dnn::dtype_t = TensorDataType |
| Alias for TensorDataType enumeration. | |
| template<TensorDataType TDataType> | |
| using | Mila::Dnn::host_type_t = typename TensorHostTypeMap<TDataType>::host_type |
| Convenience alias for accessing host type mapping. | |
| template<TensorDataType TDataType> | |
| using | Mila::Dnn::host_value_t = std::conditional_t<TensorDataTypeTraits<TDataType>::is_integer_type, int32_t, float> |
| Host value type for given abstract tensor data type. | |
| template<TensorDataType TDataType> | |
| using | Mila::Dnn::HostTensor = Tensor<TDataType, Compute::CpuMemoryResource> |
| Host tensor alias. | |
| using | Mila::Dnn::index_t = TensorShape |
| Index descriptor for multi-dimensional element access. | |
| using | Mila::Dnn::json = nlohmann::json |
| using | Mila::Dnn::shape_t = TensorShape |
| Row-major shape descriptor for tensor dimensional sizes. | |
| using | Mila::Dnn::stride_t = TensorShape |
| Stride descriptor (in elements) for each tensor dimension, row-major layout. | |
| using | TokenId = Data::TokenId |
Enumerations | |
| enum class | Mila::Dnn::ActivationType { None , Relu , Gelu , Silu , Swiglu , Tanh , Sigmoid , LeakyRelu , Mish } |
| Enumeration of supported activation function types. More... | |
| enum class | Mila::Dnn::ApproximationMethod { Exact , Tanh , Sigmoid } |
| Approximation methods usable by activation functions. More... | |
| enum class | Mila::Dnn::AttentionType { MultiHead , GroupedQuery , MultiQuery } |
| Enumeration of supported attention mechanism types. More... | |
| enum class | Mila::Dnn::ComponentType : int { Unknown = 0 , Linear , Gelu , Swiglu , LayerNorm , RmsNorm , Softmax , Dropout , MultiHeadAttention , GroupedQueryAttention , Residual , TokenEmbedding , Lpe , Rope , SoftmaxCrossEntropy , Mlp , Transformer , Network , Gpt2 , Llama , Mistral , Bert , CustomComponentStart = 1000 , MockComponent = CustomComponentStart } |
| Canonical list of framework-known component types. More... | |
| enum class | Mila::Dnn::ConnectionType { Addition } |
| Connection types supported by residual and skip-connection components. More... | |
| enum class | Mila::Dnn::EncodingType { Learned , RoPE , ALiBi } |
| Positional encoding strategies. More... | |
| enum class | Mila::Dnn::KvCacheCompression { None , FP8 } |
| KV cache storage and compression strategy for GroupedQueryAttention. More... | |
| enum class | Mila::Dnn::NormType { LayerNorm , RMSNorm } |
| Normalization type selection. More... | |
| enum class | Mila::Dnn::RuntimeMode : uint8_t { Inference , Training } |
| Runtime mode governing Model API and Network build policy. More... | |
| enum class | Mila::Dnn::TensorDataType { FP32 , FP16 , BF16 , FP8_E4M3 , FP8_E5M2 , FP4_E2M1 , FP4_E3M0 , INT8 , INT16 , INT32 , UINT8 , UINT16 , UINT32 } |
| Enumeration of supported abstract tensor data types. More... | |
| enum class | Mila::Dnn::TrainingMode : uint8_t { Normal , Eval } |
| Runtime behavioral state for Components built with RuntimeMode::Training. More... | |
| enum class | Mila::Dnn::WeightQuantization { None , FP8 , FP4 } |
| Weight storage and matmul strategy for Linear components. More... | |
Functions | |
| std::string | Mila::Dnn::activationTypeToString (ActivationType type) |
| Converts an ActivationType enum value to its string representation. | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| void | Mila::Dnn::add (const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b, Tensor< TDataType, TMemoryResource > &result, IExecutionContext *exec_context=nullptr) |
| Element-wise addition with optional ExecutionContext (device-dispatched). | |
| constexpr std::string_view | Mila::Dnn::ApproximationMethodToString (ApproximationMethod m) noexcept |
| Convert ApproximationMethod to a short string. | |
| std::string | Mila::Dnn::attentionTypeToString (AttentionType t) |
| Convert AttentionType to string. | |
| AxisPartition | Mila::Dnn::computeAxisPartition (const shape_t &shape, dim_t axis, const char *op_name="Operation") |
| Normalize and validate an axis, then compute partition sizes. | |
| MultiAxisPartition | Mila::Dnn::computeNormalizedShapePartition (const shape_t &shape, const shape_t &normalized_shape, const char *op_name="Operation") |
| Compute partition for normalization over trailing dimensions. | |
| int64_t | Mila::Dnn::computeNumElements (const shape_t &shape) |
| Compute total number of elements in a tensor shape. | |
| int64_t | Mila::Dnn::computePrefillChunkSize (int64_t batch, int64_t num_heads, int64_t head_dim, int64_t context_length, int64_t precision_bytes) |
| std::string | Mila::Dnn::connectionTypeToString (ConnectionType type) |
| Converts a ConnectionType enum value to its string representation. | |
| template<TensorDataType TSrcDataType, typename TSrcMemoryResource, TensorDataType TDstDataType, typename TDstMemoryResource> requires isValidTensor<TSrcDataType, TSrcMemoryResource> && isValidTensor<TDstDataType, TDstMemoryResource> | |
| void | Mila::Dnn::copy (const Tensor< TSrcDataType, TSrcMemoryResource > &src, Tensor< TDstDataType, TDstMemoryResource > &dst, IExecutionContext *exec_context=nullptr) |
| Copies tensor data from source to destination tensor with optional ExecutionContext. | |
| template<TensorDataType TDstDataType, typename TDstMemoryResource> requires isValidTensor<TDstDataType, TDstMemoryResource> | |
| void | Mila::Dnn::copyFromBlob (const Serialization::ITensorBlob &blob, Tensor< TDstDataType, TDstMemoryResource > &dst, IExecutionContext *exec_context=nullptr) |
| template<TensorDataType TSrcDataType, TensorDataType TDstDataType, typename TDstMemoryResource> requires isValidTensor<TDstDataType, TDstMemoryResource> | |
| void | Mila::Dnn::copyFromBlobWithConversion (const Serialization::ITensorBlob &blob, Tensor< TDstDataType, TDstMemoryResource > &dst, IExecutionContext *exec_context=nullptr) |
| Copy a serialized blob into a destination tensor, converting element types. | |
| template<TensorDataType TPrecision, typename MemoryResource> | |
| void | Mila::Dnn::debugDumpTensor (const ITensor &t, const std::string &label, size_t maxElements=8) |
| Debug dump a concrete tensor to the log. | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| void | Mila::Dnn::divide (const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b, Tensor< TDataType, TMemoryResource > &result, IExecutionContext *exec_context=nullptr) |
| Element-wise division with optional ExecutionContext (device-dispatched). | |
| std::string | Mila::Dnn::encodingTypeToString (EncodingType p) |
| Convert EncodingType to string. | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| void | Mila::Dnn::fill (Tensor< TDataType, TMemoryResource > &tensor, host_value_t< TDataType > host_value, IExecutionContext *exec_context=nullptr) |
| Fill a tensor with a scalar host value (device-dispatched) with optional ExecutionContext. | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| void | Mila::Dnn::fill (Tensor< TDataType, TMemoryResource > &tensor, std::span< const host_value_t< TDataType > > host_values, IExecutionContext *exec_context=nullptr) |
| Copy host values into a tensor with device dispatch and optional ExecutionContext. | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| void | Mila::Dnn::fill_normal (Tensor< TDataType, TMemoryResource > &tensor, float mean, float stddev, IExecutionContext *exec_context=nullptr) |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| void | Mila::Dnn::fill_uniform (Tensor< TDataType, TMemoryResource > &tensor, host_value_t< TDataType > min_val, host_value_t< TDataType > max_val, IExecutionContext *exec_context=nullptr) |
| ComponentType | Mila::Dnn::fromString (std::string_view s) noexcept |
| Parse a case-insensitive component name into a ComponentType. | |
| ComponentType | Mila::Dnn::fromTypeId (std::string_view s) noexcept |
| Map a short type identifier back to a ComponentType enum. | |
| GptConfig | Mila::Dnn::GPT2_Large () |
| GPT-2 Large (774M parameters). | |
| GptConfig | Mila::Dnn::GPT2_Medium () |
| GPT-2 Medium (345M parameters). | |
| GptConfig | Mila::Dnn::GPT2_Small () |
| Usage Examples: | |
| GptConfig | Mila::Dnn::GPT2_XL () |
| GPT-2 XL (1.5B parameters). | |
| std::string | Mila::Dnn::indexToString (const index_t &index) |
| LlamaConfig | Mila::Dnn::Llama2_13B () |
| Llama 2 13B. | |
| LlamaConfig | Mila::Dnn::Llama2_70B () |
| Llama 2 70B. | |
| LlamaConfig | Mila::Dnn::Llama2_7B () |
| Llama 2 7B. | |
| LlamaConfig | Mila::Dnn::Llama3_1_405B () |
| Llama 3.1 405B. | |
| LlamaConfig | Mila::Dnn::Llama3_1_70B () |
| Llama 3.1 70B. | |
| LlamaConfig | Mila::Dnn::Llama3_1_8B () |
| Llama 3.1 8B. | |
| LlamaConfig | Mila::Dnn::Llama3_2_1B () |
| Usage Examples: | |
| LlamaConfig | Mila::Dnn::Llama3_2_3B () |
| Llama 3.2 3B. | |
| LlamaConfig | Mila::Dnn::Llama3_70B () |
| Llama 3 70B (Original release). | |
| LlamaConfig | Mila::Dnn::Llama3_8B () |
| Llama 3 8B (Original release). | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| void | Mila::Dnn::multiply (const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b, Tensor< TDataType, TMemoryResource > &result, IExecutionContext *exec_context=nullptr) |
| Element-wise multiplication with optional ExecutionContext (device-dispatched). | |
| std::string | Mila::Dnn::normTypeToString (NormType n) |
| Convert NormType to string. | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| Tensor< TDataType, TMemoryResource > | Mila::Dnn::operator* (const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b) |
| Element-wise multiplication operator (always synchronous). | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| Tensor< TDataType, TMemoryResource > | Mila::Dnn::operator+ (const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b) |
| Element-wise addition operator (always synchronous). | |
| MemoryStats | Mila::Dnn::operator+ (MemoryStats lhs, const MemoryStats &rhs) noexcept |
| Aggregate two MemoryStats instances. | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| Tensor< TDataType, TMemoryResource > | Mila::Dnn::operator- (const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b) |
| Element-wise subtraction operator (always synchronous). | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| Tensor< TDataType, TMemoryResource > | Mila::Dnn::operator/ (const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b) |
| Element-wise division operator (always synchronous). | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| std::ostream & | Mila::Dnn::operator<< (std::ostream &os, const Tensor< TDataType, TMemoryResource > &tensor) |
| Stream insertion operator for tensor output. | |
| TensorDataType | Mila::Dnn::parseTensorDataType (const std::string &type_str) |
| std::string | Mila::Dnn::shapeToString (const shape_t &shape) |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| void | Mila::Dnn::split (const Tensor< TDataType, TMemoryResource > &input, Tensor< TDataType, TMemoryResource > &output_a, Tensor< TDataType, TMemoryResource > &output_b, IExecutionContext *exec_context=nullptr) |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| void | Mila::Dnn::split (const Tensor< TDataType, TMemoryResource > &input, Tensor< TDataType, TMemoryResource > &output_a, Tensor< TDataType, TMemoryResource > &output_b, Tensor< TDataType, TMemoryResource > &output_c, IExecutionContext *exec_context=nullptr) |
| std::string | Mila::Dnn::strideToString (const stride_t &stride) |
| ActivationType | Mila::Dnn::stringToActivationType (const std::string &name) |
| Converts a string to its corresponding ActivationType enum value. | |
| AttentionType | Mila::Dnn::stringToAttentionType (const std::string &v) |
| Parse string to AttentionType. | |
| ConnectionType | Mila::Dnn::stringToConnectionType (const std::string &name) |
| Converts a string to its corresponding ConnectionType enum value. | |
| EncodingType | Mila::Dnn::stringToEncodingType (const std::string &v) |
| Parse string to PositionalEncodingType. | |
| NormType | Mila::Dnn::stringToNormType (const std::string &v) |
| Parse string to NormType. | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| void | Mila::Dnn::subtract (const Tensor< TDataType, TMemoryResource > &a, const Tensor< TDataType, TMemoryResource > &b, Tensor< TDataType, TMemoryResource > &result, IExecutionContext *exec_context=nullptr) |
| Element-wise subtraction with optional ExecutionContext (device-dispatched). | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| float | Mila::Dnn::sum (const Tensor< TDataType, TMemoryResource > &tensor, IExecutionContext *exec_context=nullptr) |
| Sum reduction with optional ExecutionContext (device-dispatched). | |
| std::string | Mila::Dnn::tensorDataTypeToString (TensorDataType type) |
| Converts TensorDataType enumeration to human-readable string. | |
| template<TensorDataType TDstDataType, TensorDataType TSrcDataType, typename TSrcMemoryResource> requires isValidTensor<TSrcDataType, TSrcMemoryResource> && isValidTensor<TDstDataType, CpuMemoryResource> | |
| Tensor< TDstDataType, CpuMemoryResource > | Mila::Dnn::toHost (const Tensor< TSrcDataType, TSrcMemoryResource > &src, IExecutionContext *exec_context=nullptr) |
| Create a host (CPU) tensor from src and copy data into it. | |
| std::string | Mila::Dnn::toString (ComponentType t) noexcept |
| Convert a ComponentType enum value to its canonical name. | |
| std::string | Mila::Dnn::toTypeId (ComponentType t) noexcept |
| Get the short 2..4 character type identifier for a ComponentType. | |
| void | Mila::Dnn::validateTensorSize (const shape_t &shape, int64_t expected_size, const char *tensor_name="tensor", const char *op_name="Operation") |
| Validate that a tensor has the expected number of elements. | |
| template<TensorDataType TDataType, typename TMemoryResource> requires isValidTensor<TDataType, TMemoryResource> | |
| void | Mila::Dnn::zero (Tensor< TDataType, TMemoryResource > &tensor, IExecutionContext *exec_context=nullptr) |
| Zero a tensor using the fastest backend implementation. | |
Variables | |
| template<TensorDataType TDataType> | |
| constexpr bool | Mila::Dnn::is_host_float_type = std::is_floating_point_v<host_type_t<TDataType>> |
| Checks if a TensorDataType maps to a floating-point host type. | |
| template<TensorDataType TDataType> | |
| constexpr bool | Mila::Dnn::is_host_integer_type = std::is_integral_v<host_type_t<TDataType>> |
| Checks if a TensorDataType maps to an integer host type. | |
| constexpr int64_t | Mila::Dnn::kPrefillScratchByteCap = int64_t{ 1536 } * 1024 * 1024 |
|
export |
|
export |
|
export |
Integer type used for tensor dimensions and indices.
|
export |
Alias for TensorDataType enumeration.
Provides a concise alias for the TensorDataType enumeration to improve code readability in tensor-related contexts.
|
export |
Convenience alias for accessing host type mapping.
Provides a more concise way to access the host type for a given abstract tensor data type, following modern C++ alias template patterns.
| TDataType | Abstract tensor data type |
Example usage:
|
export |
Host value type for given abstract tensor data type.
Maps floating tensor types to float and integer tensor types to int32_t. Use this alias when declaring host-side buffers, spans or scalar arguments intended for conversion/transfer into tensors of TDataType.
| TDataType | Abstract tensor data type from TensorDataType enum. |
|
export |
Host tensor alias.
|
export |
Index descriptor for multi-dimensional element access.
One index per tensor dimension. Valid indices satisfy: 0 <= index[i] < shape[i].
|
export |
|
export |
Row-major shape descriptor for tensor dimensional sizes.
A zero in any position indicates an empty tensor.
|
export |
Stride descriptor (in elements) for each tensor dimension, row-major layout.
stride_t[i] is the element count to advance one step along dimension i. Length equals shape.size(); empty for scalars.
| using Mila::Dnn::TokenId = Data::TokenId |
|
exportstrong |
Enumeration of supported activation function types.
This enum class defines the different activation functions that can be used throughout the Mila library, particularly in neural network layers.
| Enumerator | |
|---|---|
| None | No activation (identity function). |
| Relu | Rectified Linear Unit: max(0, x). |
| Gelu | Gaussian Error Linear Unit: x * phi(x) where phi() is the standard Gaussian CDF. |
| Silu | Sigmoid Linear Unit (Swish): x * sigmoid(x). |
| Swiglu | SwiGLU: gated activation x1 * GELU(x2). |
| Tanh | Hyperbolic Tangent: tanh(x). |
| Sigmoid | Sigmoid function: 1 / (1 + exp(-x)). |
| LeakyRelu | Leaky ReLU: max(alpha * x, x) where alpha is typically 0.01. |
| Mish | Mish: x * tanh(softplus(x)). |
|
exportstrong |
|
exportstrong |
|
exportstrong |
Canonical list of framework-known component types.
These values are used by the deserializer and factory code to identify component implementations. Values 1..999 are reserved for built-in components; values >= CustomComponentStart are available for user defined components or extensions.
|
exportstrong |
Connection types supported by residual and skip-connection components.
Defines how the input and transformed output are combined in residual and skip-connection architectures.
Currently only Addition is implemented. Other types (multiplication, concatenation) may be added in the future.
| Enumerator | |
|---|---|
| Addition | Element-wise addition (y = x + F(x)). |
|
exportstrong |
Positional encoding strategies.
| Enumerator | |
|---|---|
| Learned | Learned absolute position embeddings (GPT-2 style). |
| RoPE | Rotary Position Embeddings (LLaMA style). |
| ALiBi | Attention with Linear Biases (MPT / BLOOM style). |
|
exportstrong |
KV cache storage and compression strategy for GroupedQueryAttention.
Maps to the TKvPolicy template parameter on GroupedQueryAttention and CudaGqaOp via the fromPretrained() runtime→compile-time bridge. The mapping is:
None → NoKvCompression (BF16 cache, no compression overhead) FP8 → PerChannelKvFp8<> (FP8_E4M3 cache, per-head per-token float32 scales)
New compression algorithms (SlidingWindow, LowRank, TurboQuant) add a value here and a corresponding policy struct in KvCache.QuantPolicy — no other changes are required at this level.
| Enumerator | |
|---|---|
| None | No compression — default; BF16 KV cache. |
| FP8 | FP8_E4M3 per-head per-token KV cache compression — Alpha.6 target. |
|
exportstrong |
Normalization type selection.
| Enumerator | |
|---|---|
| LayerNorm | Standard LayerNorm (mean + variance). |
| RMSNorm | Root Mean Square Norm (variance-only). |
|
exportstrong |
Runtime mode governing Model API and Network build policy.
Immutable after Model construction. Determines which public API methods are valid and how the Network allocates its buffers.
| Mode | Network build shape | Valid Model API |
|---|---|---|
| Inference | { 1, context_len } | generate() |
| Training | { batch, seq_len } | eval(), sample() |
| Enumerator | |
|---|---|
| Inference | |
| Training | |
|
exportstrong |
Enumeration of supported abstract tensor data types.
Defines device-agnostic tensor data types that can be mapped to concrete implementations on different compute devices. This abstraction prevents host compilation issues with device-specific types while enabling compile-time dispatch and optimization.
Supported categories:
|
exportstrong |
Runtime behavioral state for Components built with RuntimeMode::Training.
TrainingMode governs the runtime behavioral state of a Component that was built with RuntimeMode::Training. It is orthogonal to RuntimeMode — RuntimeMode is a build-time allocation policy while TrainingMode is a runtime behavioral toggle.
| TrainingMode | Gradients | Dropout | Batch Norm |
|---|---|---|---|
| Normal | active | on | uses batch stats |
| Eval | inactive | off | uses running stats |
TrainingMode is only meaningful on Components built with RuntimeMode::Training. Calling setTrainingMode() on a Component built with RuntimeMode::Inference throws std::runtime_error.
Model drives transitions via Network::setTrainingMode() — the toggle is never exposed directly to the user.
| Enumerator | |
|---|---|
| Normal | Gradients active, dropout on, batch norm uses batch stats. |
| Eval | No gradients, dropout off, batch norm uses running stats. |
|
exportstrong |
Weight storage and matmul strategy for Linear components.
Maps to the TWeightQuant template parameter on Linear and CudaLinearOp via the fromPretrained() runtime→compile-time bridge. The mapping is:
None → NoWeightQuant (BF16 weights, standard cuBLASLt plan) FP8 → PerChannelFp8<> (FP8_E4M3 weights, per-channel float32 scales) FP4 → PerGroupFp4<> (future)
This enum is Mila API vocabulary. Callers set it via fluent methods on the concrete model config — they do not interact with the policy structs directly.
| Enumerator | |
|---|---|
| None | BF16 weights — default; no quantization overhead. |
| FP8 | FP8_E4M3 per-channel weight quantization — Alpha.5 target. |
| FP4 | Per-group FP4 weight quantization — future target. |
|
inlineexport |
Converts an ActivationType enum value to its string representation.
| type | The ActivationType to convert |

|
export |
Element-wise addition with optional ExecutionContext (device-dispatched).
Computes result[i] = a[i] + b[i] for all elements. Automatically dispatches to the appropriate device implementation based on memory resource type.
| TDataType | Abstract tensor data type |
| TMemoryResource | Memory resource type determining device |
| a | First input tensor |
| b | Second input tensor |
| result | Output tensor (must be pre-allocated with matching shape) |
| exec_context | Optional execution context for stream control (borrowed, not owned) |
Example:

|
constexprexportnoexcept |
Convert ApproximationMethod to a short string.
Returns a constexpr std::string_view suitable for logging/serialization.

|
inlineexport |
Convert AttentionType to string.
|
export |
Normalize and validate an axis, then compute partition sizes.
| shape | Tensor shape. |
| axis | Axis to normalize (supports negative indexing). |
| op_name | Operation name for error messages. |
| std::runtime_error | If axis is out of range. |


|
export |
Compute partition for normalization over trailing dimensions.
Verifies that the trailing dimensions of shape match normalized_shape exactly, then computes outer and normalized sizes and shapes.
| shape | Input tensor shape. |
| normalized_shape | Expected trailing dimensions to normalize over. |
| op_name | Operation name for error messages. |
| std::runtime_error | If normalized_shape doesn't match trailing dims. |


|
export |
|
inlineexport |

|
inlineexport |
Converts a ConnectionType enum value to its string representation.
| type | The ConnectionType to convert |
Example:
|
export |
Copies tensor data from source to destination tensor with optional ExecutionContext.
Transfers data from source tensor to pre-allocated destination tensor. Both tensors must have compatible shapes (same dimensions). Supports type conversion and cross-device transfers with explicit stream control.
Device compatibility rules:
ExecutionContext handling:
| TSrcDataType | Source tensor data type |
| TSrcMemoryResource | Source memory resource type |
| TDstDataType | Destination tensor data type |
| TDstMemoryResource | Destination memory resource type |
| src | Source tensor to copy from |
| dst | Destination tensor to copy to (must be pre-allocated) |
| exec_context | Optional execution context for stream control (borrowed, not owned) |
| std::runtime_error | If device-only tensors are on incompatible device types |
Example:


|
export |


|
export |
Copy a serialized blob into a destination tensor, converting element types.
Intended for quantize-on-load paths where the checkpoint dtype (TSrcDataType) differs from the weight storage dtype (TDstDataType). Shape is validated against the destination tensor. Dispatches to the device-specific backend.
| TSrcDataType | Blob element dtype (e.g. BF16). |
| TDstDataType | Destination tensor dtype (e.g. FP8_E4M3). |
| TDstMemoryResource | Destination memory resource. |
| blob | Source tensor blob. |
| dst | Pre-allocated destination tensor. |
| exec_context | Optional execution context for stream control (borrowed). |
| std::invalid_argument | if blob shape != dst shape. |


|
export |
Debug dump a concrete tensor to the log.
Template parameters:
The function attempts a dynamic_cast to the concrete Tensor<TPrecision, MemoryResource>. If the concrete tensor is host-accessible the contents are printed directly (first maxElements values). Otherwise a host copy (Tensor<TPrecision, CpuMemoryResource>) is created and the first maxElements values are printed. This avoids flooding logs while giving a quick numeric snapshot.
Notes:

|
export |
Element-wise division with optional ExecutionContext (device-dispatched).
Computes result[i] = a[i] / b[i] for all elements.
| TDataType | Abstract tensor data type |
| TMemoryResource | Memory resource type determining device |
| a | First input tensor (dividend) |
| b | Second input tensor (divisor) |
| result | Output tensor (must be pre-allocated with matching shape) |
| exec_context | Optional execution context for stream control (borrowed, not owned) |

|
inlineexport |
Convert EncodingType to string.
|
export |
Fill a tensor with a scalar host value (device-dispatched) with optional ExecutionContext.
Forwards scalar fills to the device-specific TensorOps<Tag>::fill. Borrows execution context for stream control with zero overhead. The function signature enforces the expected host scalar representation for each abstract tensor data type via host_value_t<TDataType>.
| TDataType | Abstract tensor data type. |
| TMemoryResource | Memory resource type backing the tensor. |
| tensor | Destination tensor to be filled. Must satisfy isValidTensor. |
| host_value | Scalar value in host representation to broadcast to the tensor. |
| exec_context | Optional execution context for stream control (borrowed, not owned) |
Example:
|
export |
Copy host values into a tensor with device dispatch and optional ExecutionContext.
Forwards the host->tensor copy operation (span form) to the device-specific implementation TensorOps<Tag>::fill. Borrows execution context for stream control with zero overhead. Falls back to default stream when no context provided.
The host element type is selected by host_value_t<TDataType> so callers must provide values in the expected host representation (float for floating-point tensor types, int32_t for integer tensor types). The device implementation performs any necessary conversion/quantization.
| TDataType | Abstract tensor data type. |
| TMemoryResource | Memory resource type backing the tensor. |
| tensor | Destination tensor to be filled. Must satisfy isValidTensor. |
| host_values | Span of host values in host representation (see host_value_t). |
| exec_context | Optional execution context for stream control (borrowed, not owned) |
Example:
|
export |
|
export |
|
inlineexportnoexcept |
Parse a case-insensitive component name into a ComponentType.
Accepts canonical names (case-insensitive) produced by toString and returns the corresponding enum value. Returns ComponentType::Unknown if the input does not match any known type.
| s | Input string (case-insensitive) |

|
inlineexportnoexcept |
Map a short type identifier back to a ComponentType enum.
Accepts the compact lowercase identifiers produced by toTypeId and returns the corresponding enum value. Returns ComponentType::Unknown for unrecognized identifiers.
| s | Short lowercase type id (for example "fc", "mlp", "tf") |
|
export |
GPT-2 Large (774M parameters).
Architecture:

|
export |
GPT-2 Medium (345M parameters).
Architecture:

|
export |
Usage Examples:
// Use a preset directly auto config = Mila::Dnn::Networks::GPT2_Small(); auto network = GptNetwork(config, 50257, 1024, "gpt2_small");
// Customize a preset auto config = Mila::Dnn::Networks::GPT2_Small() .withDropout(0.2f); // Custom dropout
// Mix and match for research auto custom = Mila::Dnn::Networks::GPT2_Medium() .withBias(false) // Remove bias .withResidualScale(0.5f); // Add residual scaling
GPT-2 Small (117M parameters)
Architecture:

|
export |
GPT-2 XL (1.5B parameters).
Architecture:

|
export |

|
export |
Llama 2 13B.
Architecture:

|
export |
Llama 2 70B.
Architecture:

|
export |
Llama 2 7B.
Architecture:

|
export |
Llama 3.1 405B.
Architecture:

|
export |
Llama 3.1 70B.
Architecture:

|
export |
Llama 3.1 8B.
Architecture:

|
export |
Usage Examples:
// Use a preset directly auto config = Mila::Dnn::Networks::Llama3_2_1B(); auto network = LlamaNetwork(config, 128256, 131072, "llama3_2_1b");
// Customize a preset auto config = Mila::Dnn::Networks::Llama3_8B() .withRoPETheta(1000000.0f); // Extend context with higher theta
// Mix and match for research auto custom = Mila::Dnn::Networks::Llama3_8B() .withNumKVHeads(32) // Convert GQA to MHA .withResidualScale(0.5f); // Add residual scaling
Llama 3.2 1B
Architecture:

|
export |
Llama 3.2 3B.
Architecture:

|
export |
Llama 3 70B (Original release).
Architecture:

|
export |
Llama 3 8B (Original release).
Architecture:

|
export |
Element-wise multiplication with optional ExecutionContext (device-dispatched).
Computes result[i] = a[i] * b[i] for all elements.
| TDataType | Abstract tensor data type |
| TMemoryResource | Memory resource type determining device |
| a | First input tensor |
| b | Second input tensor |
| result | Output tensor (must be pre-allocated with matching shape) |
| exec_context | Optional execution context for stream control (borrowed, not owned) |

|
export |
Element-wise multiplication operator (always synchronous).

|
export |
Element-wise addition operator (always synchronous).

|
nodiscardexportnoexcept |
Aggregate two MemoryStats instances.
|
export |
Element-wise subtraction operator (always synchronous).

|
export |
Element-wise division operator (always synchronous).

|
export |
Stream insertion operator for tensor output.

|
export |

|
export |


|
export |

|
export |
|
export |

|
inlineexport |
Converts a string to its corresponding ActivationType enum value.
| name | The string representation of an activation function |
| std::invalid_argument | if the string doesn't match any known activation function |
|
inlineexport |
Parse string to AttentionType.
| std::invalid_argument | on unknown value. |
|
inlineexport |
Converts a string to its corresponding ConnectionType enum value.
| name | The string representation of a connection type |
| std::invalid_argument | if the string doesn't match any known connection type |
Example:
|
inlineexport |
Parse string to PositionalEncodingType.
| std::invalid_argument | on unknown value. |
|
inlineexport |
Parse string to NormType.
| std::invalid_argument | on unknown value. |
|
export |
Element-wise subtraction with optional ExecutionContext (device-dispatched).
Computes result[i] = a[i] - b[i] for all elements.
| TDataType | Abstract tensor data type |
| TMemoryResource | Memory resource type determining device |
| a | First input tensor (minuend) |
| b | Second input tensor (subtrahend) |
| result | Output tensor (must be pre-allocated with matching shape) |
| exec_context | Optional execution context for stream control (borrowed, not owned) |

|
export |
Sum reduction with optional ExecutionContext (device-dispatched).
Computes the sum of all elements in the tensor. Always synchronizes before returning the result (even when exec_context is provided).
| TDataType | Abstract tensor data type |
| TMemoryResource | Memory resource type determining device |
| tensor | Input tensor |
| exec_context | Optional execution context for stream control (borrowed, not owned) |

|
inlineexport |
Converts TensorDataType enumeration to human-readable string.

|
export |
Create a host (CPU) tensor from src and copy data into it.
By default the destination data type matches the source data type. The destination tensor preserves the source shape. An optional execution context may be supplied for device-side stream control when the source is device-resident.
| TSrcDataType | Source tensor data type |
| TSrcMemoryResource | Source memory resource type |
| TDstDataType | Destination tensor data type (defaults to source type) |
| src | Source tensor to copy from |
| exec_context | Optional execution context for stream control (borrowed) |

|
inlineexportnoexcept |
Convert a ComponentType enum value to its canonical name.
Returns a human-readable name suitable for logs and metadata fields (for example "Linear", "Transformer"). Always returns "Unknown" for unrecognized enum values.
| t | ComponentType enum value |

|
inlineexportnoexcept |
Get the short 2..4 character type identifier for a ComponentType.
The short type id is intended for compact labels in serialized metadata and concise diagnostics (examples: "fc" for Linear, "mlp" for MLP, "tf" for Transformer). Returns "Unknown" for unrecognized types.
| t | ComponentType enum value |
|
export |
Validate that a tensor has the expected number of elements.
| shape | Tensor shape to validate. |
| expected_size | Expected number of elements. |
| tensor_name | Name of tensor for error message. |
| op_name | Operation name for error message. |
| std::runtime_error | If size doesn't match. |

|
export |
Zero a tensor using the fastest backend implementation.
Forwards to the device-specific TensorOps<device>::zero implementation.
| TDataType | Abstract tensor data type. |
| TMemoryResource | Memory resource type backing the tensor. |
| tensor | Destination tensor to be zeroed. |
| exec_context | Optional execution context for stream control (borrowed). |

|
constexprexport |
Checks if a TensorDataType maps to a floating-point host type.
Compile-time utility to determine if the host representation of an abstract tensor data type is a floating-point type.
| TDataType | Abstract tensor data type to check |
|
constexprexport |
Checks if a TensorDataType maps to an integer host type.
Compile-time utility to determine if the host representation of an abstract tensor data type is an integer type.
| TDataType | Abstract tensor data type to check |
|
inlineconstexprexport |