|
Mila 0.13.48
Deep Neural Network Library
|
| Src | |
| Data | |
| Core | |
| FileHeader.ixx | Common file header structure for Mila data files |
| TokenizerTrainer.ixx | Abstract trainer interface for building tokenizers' vocabularies |
| TrainerConfig.ixx | |
| TrainerFactory.ixx | Factory helpers to construct tokenizer trainers and load vocabularies |
| Loaders | |
| DataLoader.ixx | Device-agnostic data loader interface using abstract tensor data types |
| TokenSequenceLoader.Config.ixx | |
| TokenSequenceLoader.ixx | |
| Tokenizers | |
| Bpe | |
| BpePreTokenizationMode.ixx | |
| BpeTokenizer.ixx | Unified BPE tokenizer for GPT-2, Llama 3.x, and Mistral model families |
| BpeTrainer.ixx | BPE vocabulary trainer with incremental corpus accumulation |
| BpeVocabulary.ixx | BPE vocabulary for GPT-2, Llama 3.x, and Mistral model families |
| BpeVocabularyConfig.ixx | Unified configuration for BPE vocabulary construction and runtime properties |
| Char | |
| CharTokenizer.ixx | Character-level tokenizer implementing the Tokenizer API |
| CharTrainer.ixx | Character-level tokenizer trainer for corpus accumulation and vocabulary building |
| CharVocabulary.ixx | Character vocabulary with factory-based construction |
| CharVocabularyConfig.ixx | Configuration for Character-level tokenizer training |
| SpecialTokens.ixx | Configuration for special tokens used across all tokenizer types |
| Tokenizer.ixx | |
| TokenizerType.ixx | |
| TokenizerVocabulary.ixx | Abstract interface for tokenizer vocabularies used by data pipelines |
| Dnn | |
| Components | |
| Activations | |
| Gelu | |
| Gelu.Config.ixx | |
| Gelu.ixx | GELU activation component implementation |
| Swiglu | |
| Swiglu.Config.ixx | Configuration for the SwiGLU activation component |
| Swiglu.ixx | SwiGLU activation component implementation |
| ActivationType.ixx | Definition of activation function types used throughout the Mila library |
| ApproximationMethod.ixx | Shared approximation method enum for activation functions |
| Attention | |
| GQA | |
| GroupedQueryAttention.Config.ixx | Configuration interface for the Grouped-Query Attention component |
| GroupedQueryAttention.ixx | Grouped-Query Attention module (concatenated QKV input) |
| MHA | |
| MultiHeadAttention.Config.ixx | |
| MultiHeadAttention.ixx | Multi-Head Attention module (concatenated QKV input) |
| AttentionType.ixx | Defines attention mechanism types used by transformer components |
| Connections | |
| ConnectionType.ixx | Definition of connection function types used by the Mila DNN library |
| Residual.ixx | Device-templated Residual connection component |
| ResidualConfig.ixx | Configuration for the Residual component |
| Embeddings | |
| TokenEmbedding.Config.ixx | |
| TokenEmbedding.ixx | Device-templated TokenEmbedding component |
| Encodings | |
| Lpe | |
| Lpe.Config.ixx | |
| Lpe.ixx | |
| Rope | |
| Rope.Config.ixx | Configuration for Rotary Position Embedding (RoPE) component |
| Rope.ixx | Rotary positional embedding (RoPE) component |
| EncodingType.ixx | Positional encoding strategy selection used by Transformer components |
| FFN | |
| MLP.Config.ixx | |
| MLP.Dispatch.ixx | Activation dispatch helpers for MLP |
| MLP.ixx | Multi-Layer Perceptron (MLP) block for neural networks |
| Linear | |
| Linear.ixx | Device-templated Linear (fully connected) component |
| LinearConfig.ixx | Configuration for the Linear (fully connected) layer |
| Losses | |
| CrossEntropyConfig.ixx | Configuration for the fused SoftmaxCrossEntropy loss module |
| SoftmaxCrossEntropy.ixx | Device-templated fused SoftmaxCrossEntropy loss module |
| Normalization | |
| LayerNorm | |
| LayerNorm.Config.ixx | Configuration for Layer Normalization component |
| LayerNorm.ixx | Layer Normalization component |
| RmsNorm | |
| RmsNorm.Config.ixx | Configuration for RMS Normalization component |
| RmsNorm.ixx | RMS Normalization component |
| NormType.ixx | Normalization layer type enumeration used by Transformer components |
| Softmax.ixx | Device-templated Softmax activation module |
| SoftmaxConfig.ixx | Configuration interface for the Softmax module in the Mila DNN framework |
| Regularization | |
| Dropout.ixx | Implementation of Dropout regularization module for neural networks |
| DropoutConfig.ixx | Configuration interface for the Dropout regularization module in the Mila DNN framework |
| Transformers | |
| Gpt | |
| Gpt.Config.ixx | Network-level configuration for GPT-style transformer networks |
| Gpt.Presets.ixx | |
| GptBlock.Config.ixx | Configuration for GPT-style transformer block (block-level) |
| GptBlock.ixx | Transformer encoder block implementation |
| GptTransformer.ixx | |
| LlaMa | |
| Llama.Block.ixx | LLaMA transformer block — module partition of LlamaTransformer |
| Llama.Config.ixx | LLaMA network-level configuration |
| Llama.ixx | LLaMA-style decoder-only transformer network |
| Llama.Presets.ixx | |
| GenerateParams.ixx | |
| MilaComponents.ixx | Aggregate module that re-exports Mila built-in DNN components |
| Compute | |
| Devices | |
| Cpu | |
| Operations | |
| CpuAttentionOp.ixx | CPU implementation of Multi-Head Attention operation |
| CpuCrossEntropyOp.ixx | Implementation of the CPU-based cross entropy operation for neural networks |
| CpuEncoderOp.ixx | CPU backend for the Encoder operation |
| CpuGeluOp.ixx | |
| CpuLayerNormOp.ixx | CPU implementation of Layer Normalization operation (TensorDataType-based) |
| CpuLinearOp.ixx | CPU implementation of Linear (fully connected) operation |
| CpuLinearOpTypeMap.ixx | LinearOpTraits specialization for CPU / FP32 |
| CpuOperations.ixx | Aggregated CPU operation module exports |
| CpuResidualOp.ixx | CPU implementation of the residual (y = x + F(x)) binary operation |
| CpuSoftmaxCrossEntropyOp.ixx | Fused CPU implementation of Softmax + CrossEntropy loss operation |
| CpuSoftmaxOp.ixx | CPU implementation of Softmax operation (TensorDataType-based) |
| OperationTraits.Cpu.ixx | OperationTraits specializations for all CPU operation backends |
| Optimizers | |
| CpuAdamWOptimizer.ixx | CPU implementation of AdamW optimizer |
| Tensors | |
| Operations | |
| CpuTensorOps.Fill.ixx | CPU tensor fill operations partition |
| CpuTensorOps.ixx | |
| CpuTensorOps.Math.ixx | CPU tensor mathematical operations partition |
| CpuTensorOps.Transfer.ixx | CPU tensor transfer operations partition |
| CpuTensorOps.Zero.ixx | CPU fast zeroing partition for tensor buffers |
| CpuTensorDataTypeTraits.ixx | CPU-specific tensor trait specializations |
| CpuDevice.ixx | Implementation of CPU-based compute device for the Mila framework |
| CpuDeviceTraits.ixx | |
| CpuDeviceTypeTraits.ixx | DeviceTypeTraits specialization for the CPU device |
| CpuExecutionContext.ixx | CPU-specific execution context specialization |
| CpuMemoryResource.ixx | |
| CpuMemoryResourceTraits.ixx | CPU-specific memory resource traits and specializations |
| Cuda | |
| Helpers | |
| CublasLt.Utils.ixx | |
| CublasLtError.ixx | |
| CudaBadAlloc.ixx | |
| CudaDebug.ixx | |
| CudaError.ixx | CUDA error handling utilities and exception class |
| CudaHelpers.ixx | CUDA utility functions for device management and kernel execution |
| CudaUtils.h | |
| Operations | |
| Activations | |
| Gelu | |
| CudaGeluOp.Dispatch.ixx | Implementation of the CUDA GELU kernel dispatch mechanism |
| CudaGeluOp.ixx | Implementation of the CUDA-based GELU activation function for neural networks |
| Swiglu | |
| CudaSwigluOp.Dispatch.ixx | |
| CudaSwigluOp.ixx | CUDA SwiGLU activation implementation |
| Attention | |
| GQA | |
| CudaGqa.Dispatch.ixx | |
| CudaGqa.Plans.ixx | |
| CudaGqaOp.ixx | CUDA Grouped-Query Attention (GQA) operation using cuBLASLt |
| CudaGqaOpTypeMap.ixx | |
| MHA | |
| CudaMhaOp.Dispatch.ixx | |
| CudaMhaOp.ixx | |
| CudaMhaOp.Plans.ixx | |
| Common | |
| CublasLtLinearPlan.ixx | CuBLASLt matmul plan builder for CudaLinearOp |
| CublasLtPlan.ixx | Shared cuBLASLt plans for building and executing matmul plans (RAII + builders) |
| CublasLtPlanCache.ixx | |
| Embeddings | |
| CudaTokenEmbeddingOp.Dispatch.ixx | |
| CudaTokenEmbeddingOp.ixx | CUDA implementation of the TokenEmbedding operation |
| Encodings | |
| Lpe | |
| CudaLpeOp.Dispatch.ixx | |
| CudaLpeOp.ixx | CUDA implementation of the Lpe (token + positional embedding) operation |
| Rope | |
| CudaRopeOp.Cache.ixx | Process-wide shared cos/sin cache registry for CudaRopeOp |
| CudaRopeOp.Dispatch.ixx | |
| CudaRopeOp.ixx | CUDA implementation of the Rope (rotary positional embedding) operation |
| Linear | |
| CublasLtMatMulBias.ixx | CUDA-accelerated matrix multiplication with bias addition using cuBLASLt |
| CudaLinearGeluOp.ixx | |
| CudaLinearOp.Dispatch.ixx | |
| CudaLinearOp.ixx | CUDA implementation of Linear operation with two-phase cuBLASLt optimization |
| CudaLinearOp.Plans.ixx | CuBLASLt plan builders for CudaLinearOp forward and backward passes |
| CudaLinearOp.Quantize.ixx | Quantize partition of CudaLinearOp |
| CudaLinearOpTypeMap.ixx | LinearOpTypeMap specializations for CUDA device targets |
| Loss | |
| CudaSoftmaxCrossEntropyOp.ixx | Fused CUDA implementation of Softmax + CrossEntropy loss operation |
| Normalizations | |
| LayerNorm | |
| LayerNormOp.Dispatch.ixx | |
| LayerNormOp.ixx | |
| RmsNorm | |
| RmsNormOp.Dispatch.ixx | |
| RmsNormOp.ixx | |
| Softmax | |
| CudaSoftmaxOp.ixx | CUDA implementation of Softmax operation (TensorDataType-based) |
| Residual | |
| CudaResidualOp.Dispatch.ixx | |
| CudaResidualOp.ixx | CUDA implementation of the residual (y = x + F(x)) binary operation |
| CudaDataTypeTraits.ixx | |
| CudaOperations.ixx | Aggregated CUDA operation module exports |
| CudaOps.h | CUDA kernel function declarations for neural network operations |
| OperationTraits.Cuda.ixx | OperationTraits specializations for all CUDA operation backends |
| Optimizers | |
| Kernels | |
| CudaOptimizers.h | |
| CudaAdamWOptimizer.ixx | CUDA implementation of AdamW optimizer |
| Profiling | |
| CudaTimer.ixx | GPU-accurate interval timer using a CUDA event pair |
| NvtxRange.ixx | |
| Tensors | |
| Operations | |
| Kernels | |
| CudaTensorOps.h | CUDA tensor operation kernel function declarations |
| Math.Elementwise.h | CUDA kernel declarations for element-wise tensor mathematical operations |
| Math.Reduction.h | CUDA kernel declarations for tensor reduction operations (sum, mean, max, min) |
| Random.h | |
| Structural.h | Host-callable launcher declarations for CUDA structural tensor operations |
| TensorOps.Fill.h | |
| Transfer.Copy.h | |
| CudaTensorOps.Fill.ixx | CUDA tensor fill operations partition |
| CudaTensorOps.ixx | |
| CudaTensorOps.Math.ixx | CUDA tensor mathematical operations partition |
| CudaTensorOps.Random.ixx | CUDA random initialization partition for tensor buffers |
| CudaTensorOps.Structural.ixx | |
| CudaTensorOps.Transfer.ixx | CUDA tensor transfer operations partition |
| CudaTensorOps.Zero.ixx | CUDA fast zeroing partition for tensor buffers |
| CudaTensorDataType-Maps.ixx | CUDA-specific mappings between abstract TensorDataType and concrete CUDA native types |
| CudaTensorDataType-Specializations.ixx | |
| CudaTensorDataType.ixx | CUDA-specific tensor data type trait system - Primary module interface |
| CudaTensorDataTypes-CublasLtTypes.ixx | |
| CudaDevice.ixx | Implementation of CUDA-based compute device for the Mila framework |
| CudaDeviceMemoryResource.ixx | |
| CudaDeviceProps.ixx | CUDA device properties wrapper with caching and convenience methods |
| CudaDeviceResources.ixx | |
| CudaDeviceTraits.ixx | |
| CudaDeviceTypeTraits.ixx | DeviceTypeTraits specialization for CUDA devices |
| CudaExecutionContext.ixx | CUDA-specific execution context specialization |
| CudaManagedMemoryResource.ixx | |
| CudaMemoryResourceTraits.ixx | |
| CudaPinnedMemoryResource.ixx | |
| Metal | |
| Tensors | |
| MetalTensorTraits.ixx | Metal-specific tensor trait specializations |
| MetalDevice.ixx | Implementation of Metal-based compute device for the Mila framework |
| MetalDevicePlugin.ixx | Metal device plugin for device-agnostic registration and discovery |
| MetalExecutionContext.ixx | Metal-specific execution context specialization |
| MetalMemoryResource.ixx | Metal-specific memory resource implementation for Apple GPU compute |
| Rocm | |
| Tensors | |
| RocmTensorTraits.ixx | |
| RocmDevice.ixx | |
| RocmExecutionContext.ixx | |
| RocmMemoryResource.ixx | |
| Operations | |
| BinaryOperation.ixx | Abstract device-agnostic binary operation interface |
| GqaOpTypeMap.ixx | |
| GqaOpTypeMap.Template.ixx | |
| GqaState.ixx | Non-owning transient scratch state for CudaGqaOp inference paths |
| IKVCacheLifecycle.ixx | Interface for operations that own and manage a KV cache |
| IKvInference.ixx | KV-cache compute interface for modern attention backends (GQA and beyond) |
| IPackedKvInference.ixx | |
| IPositionalDecode.ixx | |
| IPositionalPairedOp.ixx | Interface for paired operations whose output depends on absolute token position |
| LinearOpTypeMap.ixx | |
| LinearOpTypeMap.Template.ixx | Primary compile-time dispatch template mapping (DeviceType, TPrecision, TWeightQuant) to a concrete LinearOp type |
| OperationBase.ixx | Core abstraction for neural network operations in the Mila framework |
| OperationRegistrarHelpers.ixx | Helpers to standardize registration of unary/binary/paired ops |
| OperationRegistry.ixx | Central registry for creating and discovering compute operations |
| OperationRegistryHelpers.ixx | Compile-time templated helpers for querying the OperationRegistry |
| OperationsRegistrar.ixx | |
| OperationTraits.ixx | Aggregator for the unified operation traits dispatch table |
| OperationTraits.Template.ixx | Unified compile-time dispatch template mapping (OperationType, DeviceType, TPrecision, TPolicy) to a concrete operation type |
| OperationType.ixx | Defines the operation types supported by the compute framework |
| PairedOperation.ixx | Abstract device-agnostic paired operation interface |
| UnaryOperation.ixx | Device-agnostic unary operation interface using abstract tensor data types |
| Optimizers | |
| OptimizerBase.ixx | Base interface for neural network parameter optimizers |
| Registry | |
| DeviceRegistrar.ixx | Device-agnostic registrar for automatic device discovery and registration |
| DeviceRegistry.ixx | Central registry for discovered compute devices |
| DeviceRegistryHelpers.ixx | Utility functions for compute device discovery and management |
| Device.ixx | Abstract compute device interface and device identifier factory |
| DeviceId.ixx | Lightweight device identifier value type |
| DeviceType.ixx | Device type definitions and conversion utilities for compute devices |
| DeviceTypeTraits.ixx | |
| ExecutionContext.ixx | Templated execution context framework for compute operations and stream management |
| ExecutionContext.Template.ixx | |
| ExecutionContextFactory.ixx | |
| IExecutionContext.ixx | Minimal type-erased execution context interface |
| MemoryResource.ixx | Defines a clean memory resource abstraction focused on allocation responsibilities |
| MemoryResourceProperties.ixx | |
| MemoryResourceTracker.ixx | |
| MemoryResourceTraits.ixx | Compute backend memory resource traits for dispatch optimization |
| Core | |
| Comonent.TrainingMode.ixx | |
| Component.BuildContext.ixx | |
| Component.ixx | Base component interface for Mila DNN components |
| Component.MemoryStats.ixx | |
| ComponentConfig.ixx | Base configuration interface for DNN components |
| ComponentFactory.ixx | Factory helpers for reconstructing built-in components from archives |
| ComponentType.ixx | Enumeration of built-in component types supported by the deserializer |
| CompositeComponent.ixx | Abstract container for managing child components |
| FusedComponent.ixx | |
| LanguageModel.ixx | Abstract base for Mila autoregressive language models |
| LanguageModelConfig.ixx | CRTP base configuration for all deployable Mila language models |
| LanguageNetwork.ixx | Abstract base for language model networks |
| LearningRateScheduler.ixx | Learning-rate scheduler base and common concrete schedules |
| Loss.ixx | |
| Model.ixx | Abstract base class for all Mila models |
| Model.RuntimeMode.ixx | |
| ModelConfig.ixx | Base configuration for all deployable Mila models |
| ModelQuantizationConfig.ixx | |
| ModelReader.ixx | |
| Network.ixx | Root composite network container |
| NetworkFactory.ixx | |
| TokenStreamer.ixx | Token streaming abstractions for autoregressive generation |
| Extensibility | |
| IModulePlugin.ixx | |
| MyCustomPlugin.cpp | |
| PluginInfo.ixx | |
| PluginManager.ixx | |
| Models | |
| GptModel.ixx | GPT inference model |
| LlamaModel.ixx | LLaMA inference model |
| LlamaModelConfig.ixx | Deployment configuration for Llama language models |
| Optimizers | |
| AdamW.ixx | AdamW optimizer wrapper using fluent AdamWConfig |
| AdamWConfig.ixx | AdamW optimizer configuration |
| Quantization | |
| KvCache | |
| Policy.ixx | |
| QuantPolicy.ixx | Quantization-specific KV cache compression policies |
| Weight | |
| Policies.ixx | |
| Quantization.ixx | Umbrella module for the Mila quantization subsystem |
| Serialization | |
| ArchiveSerializer.ixx | Interface for hierarchical archive formats (ZIP, tar, etc.) |
| ModelArchive.ixx | Structured archive helper used by component save/load implementations |
| OpenMode.ixx | |
| PretrainedReader.ixx | Reader for Mila pretrained binary format |
| SerializationMetadata.ixx | Type-safe metadata container for component serialization |
| SerializationMode.ixx | |
| Serializer.ixx | Minimal base interface for all serialization backends |
| ZipSerializer.ixx | ZIP-based ModelSerializer implementation using miniz |
| Tensors | |
| Operations | |
| TensorOps-Base.ixx | Base declaration for device-specific TensorOps specializations |
| TensorOps.Fill.ixx | High-level initializer helpers (device-dispatching) for tensors |
| TensorOps.ixx | |
| TensorOps.Math.ixx | Device-dispatching math helpers for tensor arithmetic operations |
| TensorOps.Random.ixx | Device-dispatching random initialization for tensors |
| TensorOps.Structural.ixx | Device-dispatched structural operations for tensors |
| TensorOps.Transfer.ixx | Tensor transfer utilities � copy/dispatch helpers for tensor data movement |
| TensorOps.Zero.ixx | Device-dispatched fast zero operation for tensor buffers |
| ITensor.ixx | Interface providing minimal representation for tensor data across different implementations |
| Tensor.Helpers.ixx | |
| Tensor.Initializers.ixx | Tensor initialization algorithms with host distribution generation and backend dispatch |
| Tensor.ixx | Device-aware tensor type with scalar support |
| Tensor.Partitioning.ixx | |
| Tensor.Serialization.ixx | Tensor-specific serialization helpers and metadata |
| Tensor.Types.ixx | Core shape, stride, and index types for the Mila tensor API |
| TensorBuffer.ixx | Device-agnostic memory management layer for tensor data using abstract data types |
| TensorDataType.ixx | Abstract tensor data type enumeration and traits system for device-agnostic tensor operations |
| TensorDataTypeMap.ixx | Concrete C++ type to abstract TensorDataType mapping utilities |
| TensorDataTypeTraits.ixx | Compile-time traits for the abstract TensorDataType enumeration |
| TensorHostTypeMap.ixx | Device-agnostic host type mapping for abstract TensorDataType enumeration |
| Visualization | |
| Components | |
| Block.Visualizer.ixx | |
| LayerNorm.Visualizer.ixx | |
| MLP.Visualizer.ixx | |
| Core | |
| ComponentVisualizer.ixx | |
| VisualizerContext.ixx | Context container holding snapshot tensor references for the visualization pipeline |
| Rendering | |
| ColorLUT.ixx | |
| FrameBuffer.ixx | |
| HeatMapRenderer.ixx | |
| Logging | |
| ConsoleSink.ixx | Console-based logging sink for the Mila logging infrastructure |
| FileSink.ixx | File-based logging sink for the Mila logging infrastructure |
| Logger.ixx | Abstract logging interface and static facade for the Mila logging infrastructure |
| NullSink.ixx | No-op logging sink for the Mila logging infrastructure |
| Utils | |
| json.ixx | |
| RandomGenerator.ixx | Provides a centralized random number generator for the Mila library |
| TrainingLogger.ixx | |
| Mila.ixx | Mila public API umbrella module - the single supported entry point (import Mila;) |
| Version.ixx | Semantic version type and Mila library version constants |