Here is a list of all files with brief descriptions:

[detail level 123456789]

Src
Data
Core
FileHeader.ixx	Common file header structure for Mila data files
TokenizerTrainer.ixx	Abstract trainer interface for building tokenizers' vocabularies
TrainerConfig.ixx
TrainerFactory.ixx	Factory helpers to construct tokenizer trainers and load vocabularies
Loaders
DataLoader.ixx	Device-agnostic data loader interface using abstract tensor data types
TokenSequenceLoader.Config.ixx
TokenSequenceLoader.ixx
Tokenizers
Bpe
BpePreTokenizationMode.ixx
BpeTokenizer.ixx	Unified BPE tokenizer for GPT-2, Llama 3.x, and Mistral model families
BpeTrainer.ixx	BPE vocabulary trainer with incremental corpus accumulation
BpeVocabulary.ixx	BPE vocabulary for GPT-2, Llama 3.x, and Mistral model families
BpeVocabularyConfig.ixx	Unified configuration for BPE vocabulary construction and runtime properties
Char
CharTokenizer.ixx	Character-level tokenizer implementing the Tokenizer API
CharTrainer.ixx	Character-level tokenizer trainer for corpus accumulation and vocabulary building
CharVocabulary.ixx	Character vocabulary with factory-based construction
CharVocabularyConfig.ixx	Configuration for Character-level tokenizer training
SpecialTokens.ixx	Configuration for special tokens used across all tokenizer types
Tokenizer.ixx
TokenizerType.ixx
TokenizerVocabulary.ixx	Abstract interface for tokenizer vocabularies used by data pipelines
Dnn
Components
Activations
Gelu
Gelu.Config.ixx
Gelu.ixx	GELU activation component implementation
Swiglu
Swiglu.Config.ixx	Configuration for the SwiGLU activation component
Swiglu.ixx	SwiGLU activation component implementation
ActivationType.ixx	Definition of activation function types used throughout the Mila library
ApproximationMethod.ixx	Shared approximation method enum for activation functions
Attention
GQA
GroupedQueryAttention.Config.ixx	Configuration interface for the Grouped-Query Attention component
GroupedQueryAttention.ixx	Grouped-Query Attention module (concatenated QKV input)
MHA
MultiHeadAttention.Config.ixx
MultiHeadAttention.ixx	Multi-Head Attention module (concatenated QKV input)
AttentionType.ixx	Defines attention mechanism types used by transformer components
Connections
ConnectionType.ixx	Definition of connection function types used by the Mila DNN library
Residual.ixx	Device-templated Residual connection component
ResidualConfig.ixx	Configuration for the Residual component
Embeddings
TokenEmbedding.Config.ixx
TokenEmbedding.ixx	Device-templated TokenEmbedding component
Encodings
Lpe
Lpe.Config.ixx
Lpe.ixx
Rope
Rope.Config.ixx	Configuration for Rotary Position Embedding (RoPE) component
Rope.ixx	Rotary positional embedding (RoPE) component
EncodingType.ixx	Positional encoding strategy selection used by Transformer components
FFN
MLP.Config.ixx
MLP.Dispatch.ixx	Activation dispatch helpers for MLP
MLP.ixx	Multi-Layer Perceptron (MLP) block for neural networks
Linear
Linear.ixx	Device-templated Linear (fully connected) component
LinearConfig.ixx	Configuration for the Linear (fully connected) layer
Losses
CrossEntropyConfig.ixx	Configuration for the fused SoftmaxCrossEntropy loss module
SoftmaxCrossEntropy.ixx	Device-templated fused SoftmaxCrossEntropy loss module
Normalization
LayerNorm
LayerNorm.Config.ixx	Configuration for Layer Normalization component
LayerNorm.ixx	Layer Normalization component
RmsNorm
RmsNorm.Config.ixx	Configuration for RMS Normalization component
RmsNorm.ixx	RMS Normalization component
NormType.ixx	Normalization layer type enumeration used by Transformer components
Softmax.ixx	Device-templated Softmax activation module
SoftmaxConfig.ixx	Configuration interface for the Softmax module in the Mila DNN framework
Regularization
Dropout.ixx	Implementation of Dropout regularization module for neural networks
DropoutConfig.ixx	Configuration interface for the Dropout regularization module in the Mila DNN framework
Transformers
Gpt
Gpt.Config.ixx	Network-level configuration for GPT-style transformer networks
Gpt.Presets.ixx
GptBlock.Config.ixx	Configuration for GPT-style transformer block (block-level)
GptBlock.ixx	Transformer encoder block implementation
GptTransformer.ixx
LlaMa
Llama.Block.ixx	LLaMA transformer block — module partition of LlamaTransformer
Llama.Config.ixx	LLaMA network-level configuration
Llama.ixx	LLaMA-style decoder-only transformer network
Llama.Presets.ixx
GenerateParams.ixx
MilaComponents.ixx	Aggregate module that re-exports Mila built-in DNN components
Compute
Devices
Cpu
Operations
CpuAttentionOp.ixx	CPU implementation of Multi-Head Attention operation
CpuCrossEntropyOp.ixx	Implementation of the CPU-based cross entropy operation for neural networks
CpuEncoderOp.ixx	CPU backend for the Encoder operation
CpuGeluOp.ixx
CpuLayerNormOp.ixx	CPU implementation of Layer Normalization operation (TensorDataType-based)
CpuLinearOp.ixx	CPU implementation of Linear (fully connected) operation
CpuLinearOpTypeMap.ixx	LinearOpTraits specialization for CPU / FP32
CpuOperations.ixx	Aggregated CPU operation module exports
CpuResidualOp.ixx	CPU implementation of the residual (y = x + F(x)) binary operation
CpuSoftmaxCrossEntropyOp.ixx	Fused CPU implementation of Softmax + CrossEntropy loss operation
CpuSoftmaxOp.ixx	CPU implementation of Softmax operation (TensorDataType-based)
OperationTraits.Cpu.ixx	OperationTraits specializations for all CPU operation backends
Optimizers
CpuAdamWOptimizer.ixx	CPU implementation of AdamW optimizer
Tensors
Operations
CpuTensorOps.Fill.ixx	CPU tensor fill operations partition
CpuTensorOps.ixx
CpuTensorOps.Math.ixx	CPU tensor mathematical operations partition
CpuTensorOps.Transfer.ixx	CPU tensor transfer operations partition
CpuTensorOps.Zero.ixx	CPU fast zeroing partition for tensor buffers
CpuTensorDataTypeTraits.ixx	CPU-specific tensor trait specializations
CpuDevice.ixx	Implementation of CPU-based compute device for the Mila framework
CpuDeviceTraits.ixx
CpuDeviceTypeTraits.ixx	DeviceTypeTraits specialization for the CPU device
CpuExecutionContext.ixx	CPU-specific execution context specialization
CpuMemoryResource.ixx
CpuMemoryResourceTraits.ixx	CPU-specific memory resource traits and specializations
Cuda
Helpers
CublasLt.Utils.ixx
CublasLtError.ixx
CudaBadAlloc.ixx
CudaDebug.ixx
CudaError.ixx	CUDA error handling utilities and exception class
CudaHelpers.ixx	CUDA utility functions for device management and kernel execution
CudaUtils.h
Operations
Activations
Gelu
CudaGeluOp.Dispatch.ixx	Implementation of the CUDA GELU kernel dispatch mechanism
CudaGeluOp.ixx	Implementation of the CUDA-based GELU activation function for neural networks
Swiglu
CudaSwigluOp.Dispatch.ixx
CudaSwigluOp.ixx	CUDA SwiGLU activation implementation
Attention
GQA
CudaGqa.Dispatch.ixx
CudaGqa.Plans.ixx
CudaGqaOp.ixx	CUDA Grouped-Query Attention (GQA) operation using cuBLASLt
CudaGqaOpTypeMap.ixx
MHA
CudaMhaOp.Dispatch.ixx
CudaMhaOp.ixx
CudaMhaOp.Plans.ixx
Common
CublasLtLinearPlan.ixx	CuBLASLt matmul plan builder for CudaLinearOp
CublasLtPlan.ixx	Shared cuBLASLt plans for building and executing matmul plans (RAII + builders)
CublasLtPlanCache.ixx
Embeddings
CudaTokenEmbeddingOp.Dispatch.ixx
CudaTokenEmbeddingOp.ixx	CUDA implementation of the TokenEmbedding operation
Encodings
Lpe
CudaLpeOp.Dispatch.ixx
CudaLpeOp.ixx	CUDA implementation of the Lpe (token + positional embedding) operation
Rope
CudaRopeOp.Cache.ixx	Process-wide shared cos/sin cache registry for CudaRopeOp
CudaRopeOp.Dispatch.ixx
CudaRopeOp.ixx	CUDA implementation of the Rope (rotary positional embedding) operation
Linear
CublasLtMatMulBias.ixx	CUDA-accelerated matrix multiplication with bias addition using cuBLASLt
CudaLinearGeluOp.ixx
CudaLinearOp.Dispatch.ixx
CudaLinearOp.ixx	CUDA implementation of Linear operation with two-phase cuBLASLt optimization
CudaLinearOp.Plans.ixx	CuBLASLt plan builders for CudaLinearOp forward and backward passes
CudaLinearOp.Quantize.ixx	Quantize partition of CudaLinearOp
CudaLinearOpTypeMap.ixx	LinearOpTypeMap specializations for CUDA device targets
Loss
CudaSoftmaxCrossEntropyOp.ixx	Fused CUDA implementation of Softmax + CrossEntropy loss operation
Normalizations
LayerNorm
LayerNormOp.Dispatch.ixx
LayerNormOp.ixx
RmsNorm
RmsNormOp.Dispatch.ixx
RmsNormOp.ixx
Softmax
CudaSoftmaxOp.ixx	CUDA implementation of Softmax operation (TensorDataType-based)
Residual
CudaResidualOp.Dispatch.ixx
CudaResidualOp.ixx	CUDA implementation of the residual (y = x + F(x)) binary operation
CudaDataTypeTraits.ixx
CudaOperations.ixx	Aggregated CUDA operation module exports
CudaOps.h	CUDA kernel function declarations for neural network operations
OperationTraits.Cuda.ixx	OperationTraits specializations for all CUDA operation backends
Optimizers
Kernels
CudaOptimizers.h
CudaAdamWOptimizer.ixx	CUDA implementation of AdamW optimizer
Profiling
CudaTimer.ixx	GPU-accurate interval timer using a CUDA event pair
NvtxRange.ixx
Tensors
Operations
Kernels
CudaTensorOps.h	CUDA tensor operation kernel function declarations
Math.Elementwise.h	CUDA kernel declarations for element-wise tensor mathematical operations
Math.Reduction.h	CUDA kernel declarations for tensor reduction operations (sum, mean, max, min)
Random.h
Structural.h	Host-callable launcher declarations for CUDA structural tensor operations
TensorOps.Fill.h
Transfer.Copy.h
CudaTensorOps.Fill.ixx	CUDA tensor fill operations partition
CudaTensorOps.ixx
CudaTensorOps.Math.ixx	CUDA tensor mathematical operations partition
CudaTensorOps.Random.ixx	CUDA random initialization partition for tensor buffers
CudaTensorOps.Structural.ixx
CudaTensorOps.Transfer.ixx	CUDA tensor transfer operations partition
CudaTensorOps.Zero.ixx	CUDA fast zeroing partition for tensor buffers
CudaTensorDataType-Maps.ixx	CUDA-specific mappings between abstract TensorDataType and concrete CUDA native types
CudaTensorDataType-Specializations.ixx
CudaTensorDataType.ixx	CUDA-specific tensor data type trait system - Primary module interface
CudaTensorDataTypes-CublasLtTypes.ixx
CudaDevice.ixx	Implementation of CUDA-based compute device for the Mila framework
CudaDeviceMemoryResource.ixx
CudaDeviceProps.ixx	CUDA device properties wrapper with caching and convenience methods
CudaDeviceResources.ixx
CudaDeviceTraits.ixx
CudaDeviceTypeTraits.ixx	DeviceTypeTraits specialization for CUDA devices
CudaExecutionContext.ixx	CUDA-specific execution context specialization
CudaManagedMemoryResource.ixx
CudaMemoryResourceTraits.ixx
CudaPinnedMemoryResource.ixx
Metal
Tensors
MetalTensorTraits.ixx	Metal-specific tensor trait specializations
MetalDevice.ixx	Implementation of Metal-based compute device for the Mila framework
MetalDevicePlugin.ixx	Metal device plugin for device-agnostic registration and discovery
MetalExecutionContext.ixx	Metal-specific execution context specialization
MetalMemoryResource.ixx	Metal-specific memory resource implementation for Apple GPU compute
Rocm
Tensors
RocmTensorTraits.ixx
RocmDevice.ixx
RocmExecutionContext.ixx
RocmMemoryResource.ixx
Operations
BinaryOperation.ixx	Abstract device-agnostic binary operation interface
GqaOpTypeMap.ixx
GqaOpTypeMap.Template.ixx
GqaState.ixx	Non-owning transient scratch state for CudaGqaOp inference paths
IKVCacheLifecycle.ixx	Interface for operations that own and manage a KV cache
IKvInference.ixx	KV-cache compute interface for modern attention backends (GQA and beyond)
IPackedKvInference.ixx
IPositionalDecode.ixx
IPositionalPairedOp.ixx	Interface for paired operations whose output depends on absolute token position
LinearOpTypeMap.ixx
LinearOpTypeMap.Template.ixx	Primary compile-time dispatch template mapping (DeviceType, TPrecision, TWeightQuant) to a concrete LinearOp type
OperationBase.ixx	Core abstraction for neural network operations in the Mila framework
OperationRegistrarHelpers.ixx	Helpers to standardize registration of unary/binary/paired ops
OperationRegistry.ixx	Central registry for creating and discovering compute operations
OperationRegistryHelpers.ixx	Compile-time templated helpers for querying the OperationRegistry
OperationsRegistrar.ixx
OperationTraits.ixx	Aggregator for the unified operation traits dispatch table
OperationTraits.Template.ixx	Unified compile-time dispatch template mapping (OperationType, DeviceType, TPrecision, TPolicy) to a concrete operation type
OperationType.ixx	Defines the operation types supported by the compute framework
PairedOperation.ixx	Abstract device-agnostic paired operation interface
UnaryOperation.ixx	Device-agnostic unary operation interface using abstract tensor data types
Optimizers
OptimizerBase.ixx	Base interface for neural network parameter optimizers
Registry
DeviceRegistrar.ixx	Device-agnostic registrar for automatic device discovery and registration
DeviceRegistry.ixx	Central registry for discovered compute devices
DeviceRegistryHelpers.ixx	Utility functions for compute device discovery and management
Device.ixx	Abstract compute device interface and device identifier factory
DeviceId.ixx	Lightweight device identifier value type
DeviceType.ixx	Device type definitions and conversion utilities for compute devices
DeviceTypeTraits.ixx
ExecutionContext.ixx	Templated execution context framework for compute operations and stream management
ExecutionContext.Template.ixx
ExecutionContextFactory.ixx
IExecutionContext.ixx	Minimal type-erased execution context interface
MemoryResource.ixx	Defines a clean memory resource abstraction focused on allocation responsibilities
MemoryResourceProperties.ixx
MemoryResourceTracker.ixx
MemoryResourceTraits.ixx	Compute backend memory resource traits for dispatch optimization
Core
Comonent.TrainingMode.ixx
Component.BuildContext.ixx
Component.ixx	Base component interface for Mila DNN components
Component.MemoryStats.ixx
ComponentConfig.ixx	Base configuration interface for DNN components
ComponentFactory.ixx	Factory helpers for reconstructing built-in components from archives
ComponentType.ixx	Enumeration of built-in component types supported by the deserializer
CompositeComponent.ixx	Abstract container for managing child components
FusedComponent.ixx
LanguageModel.ixx	Abstract base for Mila autoregressive language models
LanguageModelConfig.ixx	CRTP base configuration for all deployable Mila language models
LanguageNetwork.ixx	Abstract base for language model networks
LearningRateScheduler.ixx	Learning-rate scheduler base and common concrete schedules
Loss.ixx
Model.ixx	Abstract base class for all Mila models
Model.RuntimeMode.ixx
ModelConfig.ixx	Base configuration for all deployable Mila models
ModelQuantizationConfig.ixx
ModelReader.ixx
Network.ixx	Root composite network container
NetworkFactory.ixx
TokenStreamer.ixx	Token streaming abstractions for autoregressive generation
Extensibility
IModulePlugin.ixx
MyCustomPlugin.cpp
PluginInfo.ixx
PluginManager.ixx
Models
GptModel.ixx	GPT inference model
LlamaModel.ixx	LLaMA inference model
LlamaModelConfig.ixx	Deployment configuration for Llama language models
Optimizers
AdamW.ixx	AdamW optimizer wrapper using fluent AdamWConfig
AdamWConfig.ixx	AdamW optimizer configuration
Quantization
KvCache
Policy.ixx
QuantPolicy.ixx	Quantization-specific KV cache compression policies
Weight
Policies.ixx
Quantization.ixx	Umbrella module for the Mila quantization subsystem
Serialization
ArchiveSerializer.ixx	Interface for hierarchical archive formats (ZIP, tar, etc.)
ModelArchive.ixx	Structured archive helper used by component save/load implementations
OpenMode.ixx
PretrainedReader.ixx	Reader for Mila pretrained binary format
SerializationMetadata.ixx	Type-safe metadata container for component serialization
SerializationMode.ixx
Serializer.ixx	Minimal base interface for all serialization backends
ZipSerializer.ixx	ZIP-based ModelSerializer implementation using miniz
Tensors
Operations
TensorOps-Base.ixx	Base declaration for device-specific TensorOps specializations
TensorOps.Fill.ixx	High-level initializer helpers (device-dispatching) for tensors
TensorOps.ixx
TensorOps.Math.ixx	Device-dispatching math helpers for tensor arithmetic operations
TensorOps.Random.ixx	Device-dispatching random initialization for tensors
TensorOps.Structural.ixx	Device-dispatched structural operations for tensors
TensorOps.Transfer.ixx	Tensor transfer utilities � copy/dispatch helpers for tensor data movement
TensorOps.Zero.ixx	Device-dispatched fast zero operation for tensor buffers
ITensor.ixx	Interface providing minimal representation for tensor data across different implementations
Tensor.Helpers.ixx
Tensor.Initializers.ixx	Tensor initialization algorithms with host distribution generation and backend dispatch
Tensor.ixx	Device-aware tensor type with scalar support
Tensor.Partitioning.ixx
Tensor.Serialization.ixx	Tensor-specific serialization helpers and metadata
Tensor.Types.ixx	Core shape, stride, and index types for the Mila tensor API
TensorBuffer.ixx	Device-agnostic memory management layer for tensor data using abstract data types
TensorDataType.ixx	Abstract tensor data type enumeration and traits system for device-agnostic tensor operations
TensorDataTypeMap.ixx	Concrete C++ type to abstract TensorDataType mapping utilities
TensorDataTypeTraits.ixx	Compile-time traits for the abstract TensorDataType enumeration
TensorHostTypeMap.ixx	Device-agnostic host type mapping for abstract TensorDataType enumeration
Visualization
Components
Block.Visualizer.ixx
LayerNorm.Visualizer.ixx
MLP.Visualizer.ixx
Core
ComponentVisualizer.ixx
VisualizerContext.ixx	Context container holding snapshot tensor references for the visualization pipeline
Rendering
ColorLUT.ixx
FrameBuffer.ixx
HeatMapRenderer.ixx
Logging
ConsoleSink.ixx	Console-based logging sink for the Mila logging infrastructure
FileSink.ixx	File-based logging sink for the Mila logging infrastructure
Logger.ixx	Abstract logging interface and static facade for the Mila logging infrastructure
NullSink.ixx	No-op logging sink for the Mila logging infrastructure
Utils
json.ixx
RandomGenerator.ixx	Provides a centralized random number generator for the Mila library
TrainingLogger.ixx
Mila.ixx	Mila public API umbrella module - the single supported entry point (import Mila;)
Version.ixx	Semantic version type and Mila library version constants