|
Mila 0.13.48
Deep Neural Network Library
|
Non-owning pointers to shared transient GQA scratch buffers. More...

Public Attributes | |
| ITensor * | att { nullptr } |
| ITensor * | att_decode { nullptr } |
| ITensor * | preatt { nullptr } |
| ITensor * | preatt_decode { nullptr } |
| ITensor * | q_permute { nullptr } |
| ITensor * | v_out { nullptr } |
| ITensor * | v_out_decode { nullptr } |
Non-owning pointers to shared transient GQA scratch buffers.
All slots are nullable. CudaGqaOp::setState() accepts a partially populated state — only non-null slots replace the previously wired pointers.
Prefill slots: q_permute [B, NH, chunk, HS] preatt [B, NH, chunk, T] att [B, NH, chunk, T] v_out [B, NH, chunk, HS]
Decode slots: preatt_decode [B, NH, 1, T] att_decode [B, NH, 1, T] v_out_decode [B, NH, 1, HS]
Ownership: caller retains ownership and must ensure tensors outlive all GQA layers that reference them.
| ITensor* Mila::Dnn::Compute::GqaState::att { nullptr } |
| ITensor* Mila::Dnn::Compute::GqaState::att_decode { nullptr } |
| ITensor* Mila::Dnn::Compute::GqaState::preatt { nullptr } |
| ITensor* Mila::Dnn::Compute::GqaState::preatt_decode { nullptr } |
| ITensor* Mila::Dnn::Compute::GqaState::q_permute { nullptr } |
| ITensor* Mila::Dnn::Compute::GqaState::v_out { nullptr } |
| ITensor* Mila::Dnn::Compute::GqaState::v_out_decode { nullptr } |