|
Mila 0.13.48
Deep Neural Network Library
|
Process-wide shared cache for RoPE cos/sin frequency tables. More...
Classes | |
| struct | AcquireResult |
| struct | CacheEntry |
| struct | CacheKey |
| struct | CacheKeyHash |
Public Member Functions | |
| AcquireResult | acquire (const CacheKey &key, std::size_t cache_bytes) |
| Acquire a shared reference to the cos/sin cache for the given key. | |
| void | release (const CacheKey &key) noexcept |
| Release a reference to the shared cache. | |
Static Public Member Functions | |
| static RopeCacheRegistry & | instance () noexcept |
Private Member Functions | |
| RopeCacheRegistry ()=default | |
| RopeCacheRegistry (const RopeCacheRegistry &)=delete | |
| RopeCacheRegistry & | operator= (const RopeCacheRegistry &)=delete |
Private Attributes | |
| std::unordered_map< CacheKey, CacheEntry, CacheKeyHash > | entries_ |
| std::mutex | mutex_ |
Process-wide shared cache for RoPE cos/sin frequency tables.
The cos/sin tables are a pure function of (device_id, max_seq_len, head_dim, base, precision). In a typical transformer every attention layer constructs a CudaRopeOp with identical parameters; this registry ensures the tables are allocated and filled exactly once per unique configuration and freed when the last referencing op is destroyed.
Thread safety: acquire() and release() are individually serialized by an internal mutex. build_cache() is called by the first acquirer outside the lock; subsequent acquirers receive is_new == false and skip the fill.
|
privatedefault |

|
privatedelete |

|
inline |
Acquire a shared reference to the cos/sin cache for the given key.
On first acquisition for a key, allocates device memory and returns is_new == true so the caller fills the tables via build_cache(). Subsequent acquisitions increment the reference count and return is_new == false.
| key | Uniquely identifies the cache configuration. |
| cache_bytes | Byte size for one of the cos or sin arrays. |
| CudaError | if device memory allocation fails. |


|
inlinestaticnoexcept |


|
privatedelete |

|
inlinenoexcept |
Release a reference to the shared cache.
Decrements the reference count. Frees device memory when it reaches zero. Safe to call from destructors — cudaFree errors are silently ignored as they are not actionable during cleanup.

|
private |
|
private |