Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Mila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry Class Reference

Process-wide shared cache for RoPE cos/sin frequency tables. More...

Classes

struct  AcquireResult
struct  CacheEntry
struct  CacheKey
struct  CacheKeyHash

Public Member Functions

AcquireResult acquire (const CacheKey &key, std::size_t cache_bytes)
 Acquire a shared reference to the cos/sin cache for the given key.
void release (const CacheKey &key) noexcept
 Release a reference to the shared cache.

Static Public Member Functions

static RopeCacheRegistryinstance () noexcept

Private Member Functions

 RopeCacheRegistry ()=default
 RopeCacheRegistry (const RopeCacheRegistry &)=delete
RopeCacheRegistryoperator= (const RopeCacheRegistry &)=delete

Private Attributes

std::unordered_map< CacheKey, CacheEntry, CacheKeyHashentries_
std::mutex mutex_

Detailed Description

Process-wide shared cache for RoPE cos/sin frequency tables.

The cos/sin tables are a pure function of (device_id, max_seq_len, head_dim, base, precision). In a typical transformer every attention layer constructs a CudaRopeOp with identical parameters; this registry ensures the tables are allocated and filled exactly once per unique configuration and freed when the last referencing op is destroyed.

Thread safety: acquire() and release() are individually serialized by an internal mutex. build_cache() is called by the first acquirer outside the lock; subsequent acquirers receive is_new == false and skip the fill.

Constructor & Destructor Documentation

◆ RopeCacheRegistry() [1/2]

Mila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::RopeCacheRegistry ( )
privatedefault
Here is the caller graph for this function:

◆ RopeCacheRegistry() [2/2]

Mila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::RopeCacheRegistry ( const RopeCacheRegistry & )
privatedelete
Here is the call graph for this function:

Member Function Documentation

◆ acquire()

AcquireResult Mila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::acquire ( const CacheKey & key,
std::size_t cache_bytes )
inline

Acquire a shared reference to the cos/sin cache for the given key.

On first acquisition for a key, allocates device memory and returns is_new == true so the caller fills the tables via build_cache(). Subsequent acquisitions increment the reference count and return is_new == false.

Parameters
keyUniquely identifies the cache configuration.
cache_bytesByte size for one of the cos or sin arrays.
Returns
AcquireResult with device pointers and is_new flag.
Exceptions
CudaErrorif device memory allocation fails.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ instance()

RopeCacheRegistry & Mila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::instance ( )
inlinestaticnoexcept
Here is the call graph for this function:
Here is the caller graph for this function:

◆ operator=()

RopeCacheRegistry & Mila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::operator= ( const RopeCacheRegistry & )
privatedelete
Here is the call graph for this function:

◆ release()

void Mila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::release ( const CacheKey & key)
inlinenoexcept

Release a reference to the shared cache.

Decrements the reference count. Frees device memory when it reaches zero. Safe to call from destructors — cudaFree errors are silently ignored as they are not actionable during cleanup.

Here is the caller graph for this function:

Member Data Documentation

◆ entries_

std::unordered_map<CacheKey, CacheEntry, CacheKeyHash> Mila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::entries_
private

◆ mutex_

std::mutex Mila::Dnn::Compute::Cuda::Rope::RopeCacheRegistry::mutex_
private

The documentation for this class was generated from the following file: