Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource > Class Template Referenceabstractexport

Device-agnostic data loader interface using abstract tensor data types. More...

Public Types

using InputDataType = TensorDataType
 Input tensor abstract data type.
using InputTensor = Tensor<TInputDataType, TMemoryResource>
 Input tensor type alias.
using MemoryResource = TMemoryResource
 Memory resource type for tensor allocation.
using TargetDataType = TensorDataType
 Target tensor abstract data type.
using TargetTensor = Tensor<TTargetDataType, TMemoryResource>
 Target tensor type alias.

Public Member Functions

 DataLoader (const DataLoader &)=delete
 Copy operations explicitly deleted for performance safety.
 DataLoader (DataLoader &&)=default
 Move operations for efficient ownership transfer.
 DataLoader (int64_t batch_size)
 Constructs data loader with specified batch configuration.
virtual ~DataLoader ()=default
 Virtual destructor ensuring proper cleanup in derived classes.
int64_t batchSize () const noexcept
 Returns the configured batch size.
int64_t currentBatch () const noexcept
 Returns the current batch index.
virtual std::string getDatasetInfo () const
 Returns dataset statistics for optimization and analysis.
virtual bool hasNext () const
 Checks if more batches are available.
virtual const InputTensorinputs () const =0
 Provides immutable access to input tensor for current batch.
virtual InputTensorinputs ()=0
 Provides mutable access to input tensor for current batch.
virtual void nextBatch ()=0
 Loads the next batch of data from the dataset.
virtual int64_t numBatches () const =0
 Returns the total number of batches in the dataset.
DataLoaderoperator= (const DataLoader &)=delete
DataLoaderoperator= (DataLoader &&)=default
virtual void reset ()
 Resets the loader to the beginning of the dataset.
virtual const TargetTensortargets () const =0
 Provides immutable access to target tensor for current batch.
virtual TargetTensortargets ()=0
 Provides mutable access to target tensor for current batch.
virtual bool validateCurrentBatch () const
 Validates current batch data integrity.

Static Public Member Functions

static constexpr bool supportsMixedPrecision () noexcept
 Checks if data loader supports mixed-precision workflows.
static constexpr bool usesPinnedMemory () noexcept
 Checks if data loader uses pinned memory for GPU optimization.

Static Public Attributes

static constexpr TensorDataType input_data_type = TInputDataType
 Compile-time input data type constant.
static constexpr bool is_mixed_precision = (TInputDataType != TTargetDataType)
 Mixed-precision workflow detection.
static constexpr TensorDataType target_data_type = TTargetDataType
 Compile-time target data type constant.
static constexpr bool uses_pinned_memory = false
 Pinned memory optimization (CUDA-only; false on CPU-only builds).

Protected Member Functions

void incrementBatch () noexcept
 Increments current batch counter.
void setCurrentBatch (int64_t batch_index) noexcept
 Updates current batch counter.

Private Attributes

int64_t batch_size_
 Number of samples in each batch (immutable after construction).
int64_t current_batch_
 Zero-based index of currently loaded batch.

Detailed Description

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
class Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >

Device-agnostic data loader interface using abstract tensor data types.

Advanced data loading framework providing efficient batch processing for neural network training and evaluation across heterogeneous compute environments. Uses abstract TensorDataType enumeration to enable seamless operation on different devices without exposing device-specific concrete types to host compilation.

Core architectural principles:

  • Abstract data types prevent device-specific compilation issues
  • Support for mixed-precision workflows with different input/target types
  • Optimized memory resource selection for efficient data pipeline performance
  • Type-safe operations with compile-time compatibility validation
  • Extensible design supporting various data sources and preprocessing pipelines

The loader supports both CPU and pinned memory resources for optimal performance in GPU training scenarios, enabling efficient overlapped data transfers while maintaining device independence through the abstract type system.

Template Parameters
TInputDataTypeAbstract data type for input tensors from TensorDataType enumeration
TTargetDataTypeAbstract data type for target tensors from TensorDataType enumeration
TMemoryResourceMemory resource type determining allocation strategy and device targeting
Note
Memory resource must be either CudaPinnedMemoryResource or CpuMemoryResource for host accessibility
Input and target data types must be compatible with the specified memory resource
Derived classes must implement pure virtual methods for specific data source integration
See also
TensorDataType for supported abstract data type enumeration
TensorDataTypeTraits for compile-time data type characteristics
MemoryResource for device memory abstraction layer

Example usage:

// Mixed-precision data loader for CPU preprocessing
class ImageDataLoader : public DataLoader<TensorDataType::FP32, TensorDataType::INT32, CpuMemoryResource> {
// Implementation for image data loading
};
// High-performance data loader with pinned memory for GPU training
class PinnedDataLoader : public DataLoader<TensorDataType::FP16, TensorDataType::FP16, CudaPinnedMemoryResource> {
// Implementation optimized for GPU transfer
};
Device-agnostic data loader interface using abstract tensor data types.
Definition DataLoader.ixx:90
DataLoader(int64_t batch_size)
Constructs data loader with specified batch configuration.
Definition DataLoader.ixx:127

Member Typedef Documentation

◆ InputDataType

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::InputDataType = TensorDataType

Input tensor abstract data type.

◆ InputTensor

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::InputTensor = Tensor<TInputDataType, TMemoryResource>

Input tensor type alias.

◆ MemoryResource

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::MemoryResource = TMemoryResource

Memory resource type for tensor allocation.

◆ TargetDataType

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::TargetDataType = TensorDataType

Target tensor abstract data type.

◆ TargetTensor

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::TargetTensor = Tensor<TTargetDataType, TMemoryResource>

Target tensor type alias.

Constructor & Destructor Documentation

◆ DataLoader() [1/3]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::DataLoader ( int64_t batch_size)
inlineexplicit

Constructs data loader with specified batch configuration.

Initializes the data loader with the specified batch size and prepares the internal state for efficient batch processing. The loader is ready to begin data iteration after construction.

Parameters
batch_sizeNumber of samples to include in each batch
Exceptions
std::invalid_argumentIf batch_size is zero
Note
Batch size affects memory allocation and processing efficiency
Larger batches generally improve throughput but require more memory
Consider GPU memory constraints when selecting batch size for device training

◆ ~DataLoader()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
virtual Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::~DataLoader ( )
virtualdefault

Virtual destructor ensuring proper cleanup in derived classes.

Provides proper resource cleanup for polymorphic destruction, enabling safe use of base class pointers to derived instances.

◆ DataLoader() [2/3]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::DataLoader ( const DataLoader< TInputDataType, TTargetDataType, TMemoryResource > & )
delete

Copy operations explicitly deleted for performance safety.

Prevents accidental expensive copy operations involving large datasets and complex internal state management.

◆ DataLoader() [3/3]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::DataLoader ( DataLoader< TInputDataType, TTargetDataType, TMemoryResource > && )
default

Move operations for efficient ownership transfer.

Enables efficient transfer of data loader instances without copying internal state or dataset references.

Member Function Documentation

◆ batchSize()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
int64_t Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::batchSize ( ) const
inlinenoexcept

Returns the configured batch size.

Provides the number of samples included in each batch as specified during data loader construction. This value remains constant throughout the loader's lifetime.

Returns
Number of samples in each batch
Note
Final batch may contain fewer samples if dataset size is not divisible by batch size
Batch size affects memory requirements and processing efficiency

◆ currentBatch()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
int64_t Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::currentBatch ( ) const
inlinenoexcept

Returns the current batch index.

Provides the zero-based index of the batch that was most recently loaded through nextBatch(). Useful for progress tracking and debugging data loading workflows.

Returns
Zero-based index of current batch
Note
Returns 0 before first call to nextBatch()
Index increments with each successful nextBatch() call
Reset to 0 when reset() method is called

◆ getDatasetInfo()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
virtual std::string Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::getDatasetInfo ( ) const
inlinevirtual

Returns dataset statistics for optimization and analysis.

Derived classes may override this method to provide dataset-specific statistics such as sample count, class distribution, or data characteristics that can inform training optimization and analysis.

Returns
String containing human-readable dataset statistics
Note
Default implementation provides basic batch configuration information
Override to include dataset-specific metrics and characteristics

◆ hasNext()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
virtual bool Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::hasNext ( ) const
inlinevirtual

Checks if more batches are available.

Determines whether additional batches can be loaded from the dataset, enabling efficient iteration control in training and evaluation loops.

Returns
true if more batches are available, false if dataset is exhausted
Note
Implementation should consider current position and total dataset size
Used to determine when to reset or stop iteration

◆ incrementBatch()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
void Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::incrementBatch ( )
inlineprotectednoexcept

Increments current batch counter.

Protected helper method for derived classes to increment the batch counter after successfully loading the next batch. Simplifies sequential batch loading implementations.

Note
Should be called by derived classes after successful nextBatch() operation
Automatically handles sequential batch progression

◆ inputs() [1/2]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
virtual const InputTensor & Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::inputs ( ) const
pure virtual

Provides immutable access to input tensor for current batch.

Derived classes must implement this method to provide read-only access to the tensor containing input data for the currently loaded batch.

Returns
Const reference to input tensor containing current batch data
Note
Enables safe access for analysis and debugging without modification risk
Should return same data as mutable version

Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

◆ inputs() [2/2]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
virtual InputTensor & Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::inputs ( )
pure virtual

Provides mutable access to input tensor for current batch.

Derived classes must implement this method to provide access to the tensor containing input data for the currently loaded batch. The tensor should be properly shaped and contain valid data after nextBatch() call.

Returns
Mutable reference to input tensor containing current batch data
Note
Tensor shape should match expected input dimensions for the model
Data should be preprocessed and ready for model consumption
Memory layout should be optimized for target compute device

Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

Here is the caller graph for this function:

◆ nextBatch()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
virtual void Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::nextBatch ( )
pure virtual

Loads the next batch of data from the dataset.

Derived classes must implement this method to load the next batch of data into the input and target tensors. Implementation should handle data preprocessing, memory allocation, and batch composition according to the specific dataset requirements.

Exceptions
std::runtime_errorIf no more batches are available
std::runtime_errorIf data loading fails
Note
Implementation must update current_batch_ counter after successful load
Should handle end-of-dataset conditions appropriately
May involve complex preprocessing pipelines and data augmentation

Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

◆ numBatches()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
virtual int64_t Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::numBatches ( ) const
pure virtual

Returns the total number of batches in the dataset.

Derived classes must implement this method to report the total number of batches available in their specific dataset. This information is essential for training loop progress tracking and epoch management.

Returns
Total number of batches available in the dataset
Note
Implementation should account for partial batches at dataset end
Value may change if dataset is modified or resampled
Used for training progress reporting and epoch boundary detection

Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

Here is the caller graph for this function:

◆ operator=() [1/2]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
DataLoader & Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::operator= ( const DataLoader< TInputDataType, TTargetDataType, TMemoryResource > & )
delete

◆ operator=() [2/2]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
DataLoader & Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::operator= ( DataLoader< TInputDataType, TTargetDataType, TMemoryResource > && )
default

◆ reset()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
virtual void Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::reset ( )
inlinevirtual

Resets the loader to the beginning of the dataset.

Resets the internal state to start iteration from the first batch. Derived classes may override this method to implement additional reset functionality such as dataset reshuffling or preprocessing pipeline reinitialization.

Note
Base implementation resets batch counter to zero
Called automatically at epoch boundaries in training loops
Override to implement custom reset behavior (shuffling, etc.)

Reimplemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

◆ setCurrentBatch()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
void Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::setCurrentBatch ( int64_t batch_index)
inlineprotectednoexcept

Updates current batch counter.

Protected helper method for derived classes to update the batch counter after successfully loading a new batch. Ensures consistent state management across all data loader implementations.

Parameters
batch_indexNew batch index to set
Note
Should be called by derived classes after successful batch loading
Enables consistent progress tracking across all loader types

◆ supportsMixedPrecision()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
constexpr bool Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::supportsMixedPrecision ( )
inlinestaticconstexprnoexcept

Checks if data loader supports mixed-precision workflows.

Compile-time detection of whether the loader uses different data types for inputs and targets, enabling mixed-precision training optimizations.

Returns
true if input and target use different data types, false otherwise

◆ targets() [1/2]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
virtual const TargetTensor & Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::targets ( ) const
pure virtual

Provides immutable access to target tensor for current batch.

Derived classes must implement this method to provide read-only access to the tensor containing target/label data for the currently loaded batch.

Returns
Const reference to target tensor containing current batch labels
Note
Enables safe access for analysis and debugging without modification risk
Should return same data as mutable version

Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

◆ targets() [2/2]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
virtual TargetTensor & Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::targets ( )
pure virtual

Provides mutable access to target tensor for current batch.

Derived classes must implement this method to provide access to the tensor containing target/label data for the currently loaded batch. The tensor should contain ground truth data corresponding to the inputs.

Returns
Mutable reference to target tensor containing current batch labels
Note
Target data should align with input batch ordering
Data format should match model's expected output structure
For mixed-precision workflows, may use different data type than inputs

Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

Here is the caller graph for this function:

◆ usesPinnedMemory()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
constexpr bool Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::usesPinnedMemory ( )
inlinestaticconstexprnoexcept

Checks if data loader uses pinned memory for GPU optimization.

Compile-time detection of pinned memory usage, indicating optimization for efficient host-to-device memory transfers in GPU training scenarios.

Returns
true if using pinned memory, false for standard CPU memory

◆ validateCurrentBatch()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
virtual bool Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::validateCurrentBatch ( ) const
inlinevirtual

Validates current batch data integrity.

Derived classes may override this method to implement data validation checks ensuring batch integrity, proper tensor shapes, and valid data ranges. Useful for debugging data loading pipelines and preprocessing issues.

Returns
true if current batch data passes validation, false otherwise
Note
Default implementation performs basic existence checks
Override to implement dataset-specific validation logic
Can be used in debug builds for comprehensive data verification

Member Data Documentation

◆ batch_size_

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
int64_t Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::batch_size_
private

Number of samples in each batch (immutable after construction).

◆ current_batch_

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
int64_t Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::current_batch_
private

Zero-based index of currently loaded batch.

◆ input_data_type

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
TensorDataType Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::input_data_type = TInputDataType
staticconstexpr

Compile-time input data type constant.

◆ is_mixed_precision

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
bool Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::is_mixed_precision = (TInputDataType != TTargetDataType)
staticconstexpr

Mixed-precision workflow detection.

◆ target_data_type

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
TensorDataType Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::target_data_type = TTargetDataType
staticconstexpr

Compile-time target data type constant.

◆ uses_pinned_memory

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
bool Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::uses_pinned_memory = false
staticconstexpr

Pinned memory optimization (CUDA-only; false on CPU-only builds).


The documentation for this class was generated from the following file: