|
Mila 0.13.48
Deep Neural Network Library
|
LLaMA-style decoder-only transformer network. More...
#include <string>#include <vector>#include <memory>#include <sstream>#include <iostream>#include <stdexcept>#include <cstdint>#include <format>#include <optional>#include <filesystem>#include <algorithm>import Compute.ExecutionContextFactory;import Compute.GqaState;import Compute.Device;import Compute.ExecutionContext;import Dnn.Components.Rope;import Compute.CpuMemoryResource;import Dnn.Quantization.KvCache.Policy;import Serialization.ModelArchive;import Dnn.Quantization.Weight.Policies;import Serialization.PretrainedReader;import Compute.DeviceTypeTraits;import Dnn.Components.RmsNorm;import Dnn.Components.LlamaTransformer:Config;import Dnn.ITensor;import Compute.DeviceId;import Compute.DeviceType;import Dnn.ActivationType;import Dnn.Component;import Dnn.ComponentType;import Dnn.TensorDataTypeTraits;import Dnn.Tensor;import Dnn.TensorTypes;import Dnn.LanguageNetwork;import Dnn.Components.TokenEmbedding;import Serialization.Tensor;import Dnn.TensorDataType;import Dnn.Components.Linear;Classes | |
| class | Mila::Dnn::LlamaTransformer< TDeviceType, TPrecision, TWeightQuantization, TKvCachePolicy > |
| LLaMA-style transformer (decoder-only) for autoregressive token prediction. More... | |
Namespaces | |
| namespace | Mila |
| Mila main API namespace. | |
| namespace | Mila::Dnn |
Functions | |
| int64_t | Mila::Dnn::computePrefillChunkSize (int64_t batch, int64_t num_heads, int64_t head_dim, int64_t context_length, int64_t precision_bytes) |
Variables | |
| constexpr int64_t | Mila::Dnn::kPrefillScratchByteCap = int64_t{ 1536 } * 1024 * 1024 |
LLaMA-style decoder-only transformer network.
Device-templated network implementing a LLaMA-style autoregressive decoder.