Device-agnostic data loader interface using abstract tensor data types. More...

Public Types
using	InputDataType = TensorDataType
	Input tensor abstract data type.
using	InputTensor = Tensor<TInputDataType, TMemoryResource>
	Input tensor type alias.
using	MemoryResource = TMemoryResource
	Memory resource type for tensor allocation.
using	TargetDataType = TensorDataType
	Target tensor abstract data type.
using	TargetTensor = Tensor<TTargetDataType, TMemoryResource>
	Target tensor type alias.

Public Member Functions
	DataLoader (const DataLoader &)=delete
	Copy operations explicitly deleted for performance safety.
	DataLoader (DataLoader &&)=default
	Move operations for efficient ownership transfer.
	DataLoader (int64_t batch_size)
	Constructs data loader with specified batch configuration.
virtual	~DataLoader ()=default
	Virtual destructor ensuring proper cleanup in derived classes.
int64_t	batchSize () const noexcept
	Returns the configured batch size.
int64_t	currentBatch () const noexcept
	Returns the current batch index.
virtual std::string	getDatasetInfo () const
	Returns dataset statistics for optimization and analysis.
virtual bool	hasNext () const
	Checks if more batches are available.
virtual const InputTensor &	inputs () const =0
	Provides immutable access to input tensor for current batch.
virtual InputTensor &	inputs ()=0
	Provides mutable access to input tensor for current batch.
virtual void	nextBatch ()=0
	Loads the next batch of data from the dataset.
virtual int64_t	numBatches () const =0
	Returns the total number of batches in the dataset.
DataLoader &	operator= (const DataLoader &)=delete
DataLoader &	operator= (DataLoader &&)=default
virtual void	reset ()
	Resets the loader to the beginning of the dataset.
virtual const TargetTensor &	targets () const =0
	Provides immutable access to target tensor for current batch.
virtual TargetTensor &	targets ()=0
	Provides mutable access to target tensor for current batch.
virtual bool	validateCurrentBatch () const
	Validates current batch data integrity.

Static Public Member Functions
static constexpr bool	supportsMixedPrecision () noexcept
	Checks if data loader supports mixed-precision workflows.
static constexpr bool	usesPinnedMemory () noexcept
	Checks if data loader uses pinned memory for GPU optimization.

Static Public Attributes
static constexpr TensorDataType	input_data_type = TInputDataType
	Compile-time input data type constant.
static constexpr bool	is_mixed_precision = (TInputDataType != TTargetDataType)
	Mixed-precision workflow detection.
static constexpr TensorDataType	target_data_type = TTargetDataType
	Compile-time target data type constant.
static constexpr bool	uses_pinned_memory = false
	Pinned memory optimization (CUDA-only; false on CPU-only builds).

Protected Member Functions
void	incrementBatch () noexcept
	Increments current batch counter.
void	setCurrentBatch (int64_t batch_index) noexcept
	Updates current batch counter.

Private Attributes
int64_t	batch_size_
	Number of samples in each batch (immutable after construction).
int64_t	current_batch_
	Zero-based index of currently loaded batch.

Detailed Description

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>
class Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >

Device-agnostic data loader interface using abstract tensor data types.

Advanced data loading framework providing efficient batch processing for neural network training and evaluation across heterogeneous compute environments. Uses abstract TensorDataType enumeration to enable seamless operation on different devices without exposing device-specific concrete types to host compilation.

Core architectural principles:

Abstract data types prevent device-specific compilation issues
Support for mixed-precision workflows with different input/target types
Optimized memory resource selection for efficient data pipeline performance
Type-safe operations with compile-time compatibility validation
Extensible design supporting various data sources and preprocessing pipelines

The loader supports both CPU and pinned memory resources for optimal performance in GPU training scenarios, enabling efficient overlapped data transfers while maintaining device independence through the abstract type system.

Template Parameters

TInputDataType	Abstract data type for input tensors from TensorDataType enumeration
TTargetDataType	Abstract data type for target tensors from TensorDataType enumeration
TMemoryResource	Memory resource type determining allocation strategy and device targeting

Note: Memory resource must be either CudaPinnedMemoryResource or CpuMemoryResource for host accessibility; Input and target data types must be compatible with the specified memory resource; Derived classes must implement pure virtual methods for specific data source integration

See also: TensorDataType for supported abstract data type enumeration; TensorDataTypeTraits for compile-time data type characteristics; MemoryResource for device memory abstraction layer

Example usage:

// Mixed-precision data loader for CPU preprocessing
class ImageDataLoader : public DataLoader<TensorDataType::FP32, TensorDataType::INT32, CpuMemoryResource> {
    // Implementation for image data loading
};
 
// High-performance data loader with pinned memory for GPU training
class PinnedDataLoader : public DataLoader<TensorDataType::FP16, TensorDataType::FP16, CudaPinnedMemoryResource> {
    // Implementation optimized for GPU transfer
};

Member Typedef Documentation

◆ InputDataType

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::InputDataType = TensorDataType

Input tensor abstract data type.

◆ InputTensor

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::InputTensor = Tensor<TInputDataType, TMemoryResource>

Input tensor type alias.

◆ MemoryResource

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::MemoryResource = TMemoryResource

Memory resource type for tensor allocation.

◆ TargetDataType

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::TargetDataType = TensorDataType

Target tensor abstract data type.

◆ TargetTensor

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

using Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::TargetTensor = Tensor<TTargetDataType, TMemoryResource>

Target tensor type alias.

Constructor & Destructor Documentation

◆ DataLoader() [1/3]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::DataLoader ( int64_t batch_size )

inlineexplicit

Constructs data loader with specified batch configuration.

Initializes the data loader with the specified batch size and prepares the internal state for efficient batch processing. The loader is ready to begin data iteration after construction.

Parameters

batch_size Number of samples to include in each batch

Exceptions

std::invalid_argument If batch_size is zero

Note: Batch size affects memory allocation and processing efficiency; Larger batches generally improve throughput but require more memory; Consider GPU memory constraints when selecting batch size for device training

◆ ~DataLoader()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

virtual Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::~DataLoader ( )

virtualdefault

Virtual destructor ensuring proper cleanup in derived classes.

Provides proper resource cleanup for polymorphic destruction, enabling safe use of base class pointers to derived instances.

◆ DataLoader() [2/3]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::DataLoader ( const DataLoader< TInputDataType, TTargetDataType, TMemoryResource > & )

delete

Copy operations explicitly deleted for performance safety.

Prevents accidental expensive copy operations involving large datasets and complex internal state management.

◆ DataLoader() [3/3]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::DataLoader ( DataLoader< TInputDataType, TTargetDataType, TMemoryResource > && )

default

Move operations for efficient ownership transfer.

Enables efficient transfer of data loader instances without copying internal state or dataset references.

Member Function Documentation

◆ batchSize()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

int64_t Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::batchSize ( ) const

inlinenoexcept

Returns the configured batch size.

Provides the number of samples included in each batch as specified during data loader construction. This value remains constant throughout the loader's lifetime.

Returns: Number of samples in each batch

Note: Final batch may contain fewer samples if dataset size is not divisible by batch size; Batch size affects memory requirements and processing efficiency

◆ currentBatch()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

int64_t Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::currentBatch ( ) const

inlinenoexcept

Returns the current batch index.

Provides the zero-based index of the batch that was most recently loaded through nextBatch(). Useful for progress tracking and debugging data loading workflows.

Returns: Zero-based index of current batch

Note: Returns 0 before first call to nextBatch(); Index increments with each successful nextBatch() call; Reset to 0 when reset() method is called

◆ getDatasetInfo()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

virtual std::string Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::getDatasetInfo ( ) const

inlinevirtual

Returns dataset statistics for optimization and analysis.

Derived classes may override this method to provide dataset-specific statistics such as sample count, class distribution, or data characteristics that can inform training optimization and analysis.

Returns: String containing human-readable dataset statistics

Note: Default implementation provides basic batch configuration information; Override to include dataset-specific metrics and characteristics

◆ hasNext()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

virtual bool Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::hasNext ( ) const

inlinevirtual

Checks if more batches are available.

Determines whether additional batches can be loaded from the dataset, enabling efficient iteration control in training and evaluation loops.

Returns: true if more batches are available, false if dataset is exhausted

Note: Implementation should consider current position and total dataset size; Used to determine when to reset or stop iteration

◆ incrementBatch()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

void Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::incrementBatch ( )

inlineprotectednoexcept

Increments current batch counter.

Protected helper method for derived classes to increment the batch counter after successfully loading the next batch. Simplifies sequential batch loading implementations.

Note: Should be called by derived classes after successful nextBatch() operation; Automatically handles sequential batch progression

◆ inputs() [1/2]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

virtual const InputTensor & Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::inputs ( ) const

pure virtual

Provides immutable access to input tensor for current batch.

Derived classes must implement this method to provide read-only access to the tensor containing input data for the currently loaded batch.

Returns: Const reference to input tensor containing current batch data

Note: Enables safe access for analysis and debugging without modification risk; Should return same data as mutable version

Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

◆ inputs() [2/2]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

virtual InputTensor & Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::inputs ( )

pure virtual

Provides mutable access to input tensor for current batch.

Derived classes must implement this method to provide access to the tensor containing input data for the currently loaded batch. The tensor should be properly shaped and contain valid data after nextBatch() call.

Returns: Mutable reference to input tensor containing current batch data

Note: Tensor shape should match expected input dimensions for the model; Data should be preprocessed and ready for model consumption; Memory layout should be optimized for target compute device

Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

Here is the caller graph for this function:

◆ nextBatch()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

virtual void Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::nextBatch ( )

pure virtual

Loads the next batch of data from the dataset.

Derived classes must implement this method to load the next batch of data into the input and target tensors. Implementation should handle data preprocessing, memory allocation, and batch composition according to the specific dataset requirements.

Exceptions

std::runtime_error	If no more batches are available
std::runtime_error	If data loading fails

Note: Implementation must update current_batch_ counter after successful load; Should handle end-of-dataset conditions appropriately; May involve complex preprocessing pipelines and data augmentation

Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

◆ numBatches()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

virtual int64_t Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::numBatches ( ) const

pure virtual

Returns the total number of batches in the dataset.

Derived classes must implement this method to report the total number of batches available in their specific dataset. This information is essential for training loop progress tracking and epoch management.

Returns: Total number of batches available in the dataset

Note: Implementation should account for partial batches at dataset end; Value may change if dataset is modified or resampled; Used for training progress reporting and epoch boundary detection

Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

Here is the caller graph for this function:

◆ operator=() [1/2]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

DataLoader & Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::operator= ( const DataLoader< TInputDataType, TTargetDataType, TMemoryResource > & )

delete

◆ operator=() [2/2]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

DataLoader & Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::operator= ( DataLoader< TInputDataType, TTargetDataType, TMemoryResource > && )

default

◆ reset()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

virtual void Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::reset ( )

inlinevirtual

Resets the loader to the beginning of the dataset.

Resets the internal state to start iteration from the first batch. Derived classes may override this method to implement additional reset functionality such as dataset reshuffling or preprocessing pipeline reinitialization.

Note: Base implementation resets batch counter to zero; Called automatically at epoch boundaries in training loops; Override to implement custom reset behavior (shuffling, etc.)

Reimplemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

◆ setCurrentBatch()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

void Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::setCurrentBatch ( int64_t batch_index )

inlineprotectednoexcept

Updates current batch counter.

Protected helper method for derived classes to update the batch counter after successfully loading a new batch. Ensures consistent state management across all data loader implementations.

Parameters

batch_index New batch index to set

Note: Should be called by derived classes after successful batch loading; Enables consistent progress tracking across all loader types

◆ supportsMixedPrecision()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

constexpr bool Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::supportsMixedPrecision ( )

inlinestaticconstexprnoexcept

Checks if data loader supports mixed-precision workflows.

Compile-time detection of whether the loader uses different data types for inputs and targets, enabling mixed-precision training optimizations.

Returns: true if input and target use different data types, false otherwise

◆ targets() [1/2]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

virtual const TargetTensor & Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::targets ( ) const

pure virtual

Provides immutable access to target tensor for current batch.

Derived classes must implement this method to provide read-only access to the tensor containing target/label data for the currently loaded batch.

Returns: Const reference to target tensor containing current batch labels

Note: Enables safe access for analysis and debugging without modification risk; Should return same data as mutable version

Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

◆ targets() [2/2]

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

virtual TargetTensor & Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::targets ( )

pure virtual

Provides mutable access to target tensor for current batch.

Derived classes must implement this method to provide access to the tensor containing target/label data for the currently loaded batch. The tensor should contain ground truth data corresponding to the inputs.

Returns: Mutable reference to target tensor containing current batch labels

Note: Target data should align with input batch ordering; Data format should match model's expected output structure; For mixed-precision workflows, may use different data type than inputs

Implemented in Mila::Data::TokenSequenceLoader< TMemoryResource >.

Here is the caller graph for this function:

◆ usesPinnedMemory()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

constexpr bool Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::usesPinnedMemory ( )

inlinestaticconstexprnoexcept

Checks if data loader uses pinned memory for GPU optimization.

Compile-time detection of pinned memory usage, indicating optimization for efficient host-to-device memory transfers in GPU training scenarios.

Returns: true if using pinned memory, false for standard CPU memory

◆ validateCurrentBatch()

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

virtual bool Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::validateCurrentBatch ( ) const

inlinevirtual

Validates current batch data integrity.

Derived classes may override this method to implement data validation checks ensuring batch integrity, proper tensor shapes, and valid data ranges. Useful for debugging data loading pipelines and preprocessing issues.

Returns: true if current batch data passes validation, false otherwise

Note: Default implementation performs basic existence checks; Override to implement dataset-specific validation logic; Can be used in debug builds for comprehensive data verification

Member Data Documentation

◆ batch_size_

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

int64_t Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::batch_size_

private

Number of samples in each batch (immutable after construction).

◆ current_batch_

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

int64_t Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::current_batch_

private

Zero-based index of currently loaded batch.

◆ input_data_type

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

TensorDataType Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::input_data_type = TInputDataType

staticconstexpr

Compile-time input data type constant.

◆ is_mixed_precision

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

bool Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::is_mixed_precision = (TInputDataType != TTargetDataType)

staticconstexpr

Mixed-precision workflow detection.

◆ target_data_type

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

TensorDataType Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::target_data_type = TTargetDataType

staticconstexpr

Compile-time target data type constant.

◆ uses_pinned_memory

template<TensorDataType TInputDataType, TensorDataType TTargetDataType, typename TMemoryResource>

bool Mila::Data::DataLoader< TInputDataType, TTargetDataType, TMemoryResource >::uses_pinned_memory = false

staticconstexpr

Pinned memory optimization (CUDA-only; false on CPU-only builds).

The documentation for this class was generated from the following file:

/__w/Mila/Mila/Mila/Src/Data/Loaders/DataLoader.ixx

Public Types

Public Member Functions

Static Public Member Functions

Static Public Attributes

Protected Member Functions

Private Attributes

Detailed Description

Member Typedef Documentation

◆ InputDataType

◆ InputTensor

◆ MemoryResource

◆ TargetDataType

◆ TargetTensor

Constructor & Destructor Documentation

◆ DataLoader() [1/3]

◆ ~DataLoader()

◆ DataLoader() [2/3]

◆ DataLoader() [3/3]

Member Function Documentation

◆ batchSize()

◆ currentBatch()

◆ getDatasetInfo()

◆ hasNext()

◆ incrementBatch()

◆ inputs() [1/2]

◆ inputs() [2/2]

◆ nextBatch()

◆ numBatches()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ reset()

◆ setCurrentBatch()

◆ supportsMixedPrecision()

◆ targets() [1/2]

◆ targets() [2/2]

◆ usesPinnedMemory()

◆ validateCurrentBatch()

Member Data Documentation

◆ batch_size_

◆ current_batch_

◆ input_data_type

◆ is_mixed_precision

◆ target_data_type

◆ uses_pinned_memory