|
Mila 0.13.48
Deep Neural Network Library
|
Device-agnostic data loader interface using abstract tensor data types. More...
Public Types | |
| using | InputDataType = TensorDataType |
| Input tensor abstract data type. | |
| using | InputTensor = Tensor<TInputDataType, TMemoryResource> |
| Input tensor type alias. | |
| using | MemoryResource = TMemoryResource |
| Memory resource type for tensor allocation. | |
| using | TargetDataType = TensorDataType |
| Target tensor abstract data type. | |
| using | TargetTensor = Tensor<TTargetDataType, TMemoryResource> |
| Target tensor type alias. | |
Public Member Functions | |
| DataLoader (const DataLoader &)=delete | |
| Copy operations explicitly deleted for performance safety. | |
| DataLoader (DataLoader &&)=default | |
| Move operations for efficient ownership transfer. | |
| DataLoader (int64_t batch_size) | |
| Constructs data loader with specified batch configuration. | |
| virtual | ~DataLoader ()=default |
| Virtual destructor ensuring proper cleanup in derived classes. | |
| int64_t | batchSize () const noexcept |
| Returns the configured batch size. | |
| int64_t | currentBatch () const noexcept |
| Returns the current batch index. | |
| virtual std::string | getDatasetInfo () const |
| Returns dataset statistics for optimization and analysis. | |
| virtual bool | hasNext () const |
| Checks if more batches are available. | |
| virtual const InputTensor & | inputs () const =0 |
| Provides immutable access to input tensor for current batch. | |
| virtual InputTensor & | inputs ()=0 |
| Provides mutable access to input tensor for current batch. | |
| virtual void | nextBatch ()=0 |
| Loads the next batch of data from the dataset. | |
| virtual int64_t | numBatches () const =0 |
| Returns the total number of batches in the dataset. | |
| DataLoader & | operator= (const DataLoader &)=delete |
| DataLoader & | operator= (DataLoader &&)=default |
| virtual void | reset () |
| Resets the loader to the beginning of the dataset. | |
| virtual const TargetTensor & | targets () const =0 |
| Provides immutable access to target tensor for current batch. | |
| virtual TargetTensor & | targets ()=0 |
| Provides mutable access to target tensor for current batch. | |
| virtual bool | validateCurrentBatch () const |
| Validates current batch data integrity. | |
Static Public Member Functions | |
| static constexpr bool | supportsMixedPrecision () noexcept |
| Checks if data loader supports mixed-precision workflows. | |
| static constexpr bool | usesPinnedMemory () noexcept |
| Checks if data loader uses pinned memory for GPU optimization. | |
Static Public Attributes | |
| static constexpr TensorDataType | input_data_type = TInputDataType |
| Compile-time input data type constant. | |
| static constexpr bool | is_mixed_precision = (TInputDataType != TTargetDataType) |
| Mixed-precision workflow detection. | |
| static constexpr TensorDataType | target_data_type = TTargetDataType |
| Compile-time target data type constant. | |
| static constexpr bool | uses_pinned_memory = false |
| Pinned memory optimization (CUDA-only; false on CPU-only builds). | |
Protected Member Functions | |
| void | incrementBatch () noexcept |
| Increments current batch counter. | |
| void | setCurrentBatch (int64_t batch_index) noexcept |
| Updates current batch counter. | |
Private Attributes | |
| int64_t | batch_size_ |
| Number of samples in each batch (immutable after construction). | |
| int64_t | current_batch_ |
| Zero-based index of currently loaded batch. | |
Device-agnostic data loader interface using abstract tensor data types.
Advanced data loading framework providing efficient batch processing for neural network training and evaluation across heterogeneous compute environments. Uses abstract TensorDataType enumeration to enable seamless operation on different devices without exposing device-specific concrete types to host compilation.
Core architectural principles:
The loader supports both CPU and pinned memory resources for optimal performance in GPU training scenarios, enabling efficient overlapped data transfers while maintaining device independence through the abstract type system.
| TInputDataType | Abstract data type for input tensors from TensorDataType enumeration |
| TTargetDataType | Abstract data type for target tensors from TensorDataType enumeration |
| TMemoryResource | Memory resource type determining allocation strategy and device targeting |
Example usage:
| using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::InputDataType = TensorDataType |
Input tensor abstract data type.
| using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::InputTensor = Tensor<TInputDataType, TMemoryResource> |
Input tensor type alias.
| using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::MemoryResource = TMemoryResource |
Memory resource type for tensor allocation.
| using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::TargetDataType = TensorDataType |
Target tensor abstract data type.
| using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::TargetTensor = Tensor<TTargetDataType, TMemoryResource> |
Target tensor type alias.
|
inlineexplicit |
Constructs data loader with specified batch configuration.
Initializes the data loader with the specified batch size and prepares the internal state for efficient batch processing. The loader is ready to begin data iteration after construction.
| batch_size | Number of samples to include in each batch |
| std::invalid_argument | If batch_size is zero |
|
virtualdefault |
Virtual destructor ensuring proper cleanup in derived classes.
Provides proper resource cleanup for polymorphic destruction, enabling safe use of base class pointers to derived instances.
|
delete |
Copy operations explicitly deleted for performance safety.
Prevents accidental expensive copy operations involving large datasets and complex internal state management.
|
default |
Move operations for efficient ownership transfer.
Enables efficient transfer of data loader instances without copying internal state or dataset references.
|
inlinenoexcept |
Returns the configured batch size.
Provides the number of samples included in each batch as specified during data loader construction. This value remains constant throughout the loader's lifetime.
|
inlinenoexcept |
Returns the current batch index.
Provides the zero-based index of the batch that was most recently loaded through nextBatch(). Useful for progress tracking and debugging data loading workflows.
|
inlinevirtual |
Returns dataset statistics for optimization and analysis.
Derived classes may override this method to provide dataset-specific statistics such as sample count, class distribution, or data characteristics that can inform training optimization and analysis.
|
inlinevirtual |
Checks if more batches are available.
Determines whether additional batches can be loaded from the dataset, enabling efficient iteration control in training and evaluation loops.
|
inlineprotectednoexcept |
Increments current batch counter.
Protected helper method for derived classes to increment the batch counter after successfully loading the next batch. Simplifies sequential batch loading implementations.
|
pure virtual |
Provides immutable access to input tensor for current batch.
Derived classes must implement this method to provide read-only access to the tensor containing input data for the currently loaded batch.
Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.
|
pure virtual |
Provides mutable access to input tensor for current batch.
Derived classes must implement this method to provide access to the tensor containing input data for the currently loaded batch. The tensor should be properly shaped and contain valid data after nextBatch() call.
Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

|
pure virtual |
Loads the next batch of data from the dataset.
Derived classes must implement this method to load the next batch of data into the input and target tensors. Implementation should handle data preprocessing, memory allocation, and batch composition according to the specific dataset requirements.
| std::runtime_error | If no more batches are available |
| std::runtime_error | If data loading fails |
Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.
|
pure virtual |
Returns the total number of batches in the dataset.
Derived classes must implement this method to report the total number of batches available in their specific dataset. This information is essential for training loop progress tracking and epoch management.
Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

|
delete |
|
default |
|
inlinevirtual |
Resets the loader to the beginning of the dataset.
Resets the internal state to start iteration from the first batch. Derived classes may override this method to implement additional reset functionality such as dataset reshuffling or preprocessing pipeline reinitialization.
Reimplemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.
|
inlineprotectednoexcept |
Updates current batch counter.
Protected helper method for derived classes to update the batch counter after successfully loading a new batch. Ensures consistent state management across all data loader implementations.
| batch_index | New batch index to set |
|
inlinestaticconstexprnoexcept |
Checks if data loader supports mixed-precision workflows.
Compile-time detection of whether the loader uses different data types for inputs and targets, enabling mixed-precision training optimizations.
|
pure virtual |
Provides immutable access to target tensor for current batch.
Derived classes must implement this method to provide read-only access to the tensor containing target/label data for the currently loaded batch.
Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.
|
pure virtual |
Provides mutable access to target tensor for current batch.
Derived classes must implement this method to provide access to the tensor containing target/label data for the currently loaded batch. The tensor should contain ground truth data corresponding to the inputs.
Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

|
inlinestaticconstexprnoexcept |
Checks if data loader uses pinned memory for GPU optimization.
Compile-time detection of pinned memory usage, indicating optimization for efficient host-to-device memory transfers in GPU training scenarios.
|
inlinevirtual |
Validates current batch data integrity.
Derived classes may override this method to implement data validation checks ensuring batch integrity, proper tensor shapes, and valid data ranges. Useful for debugging data loading pipelines and preprocessing issues.
|
private |
Number of samples in each batch (immutable after construction).
|
private |
Zero-based index of currently loaded batch.
|
staticconstexpr |
Compile-time input data type constant.
|
staticconstexpr |
Mixed-precision workflow detection.
|
staticconstexpr |
Compile-time target data type constant.
|
staticconstexpr |
Pinned memory optimization (CUDA-only; false on CPU-only builds).