Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Llama.ixx File Reference

LLaMA-style decoder-only transformer network. More...

#include <string>
#include <vector>
#include <memory>
#include <sstream>
#include <iostream>
#include <stdexcept>
#include <cstdint>
#include <format>
#include <optional>
#include <filesystem>
#include <algorithm>
import Compute.ExecutionContextFactory;
import Compute.GqaState;
import Compute.Device;
import Compute.ExecutionContext;
import Dnn.Components.Rope;
import Compute.CpuMemoryResource;
import Dnn.Quantization.KvCache.Policy;
import Serialization.ModelArchive;
import Dnn.Quantization.Weight.Policies;
import Serialization.PretrainedReader;
import Compute.DeviceTypeTraits;
import Dnn.Components.RmsNorm;
import Dnn.Components.LlamaTransformer:Config;
import Dnn.ITensor;
import Compute.DeviceId;
import Compute.DeviceType;
import Dnn.ActivationType;
import Dnn.Component;
import Dnn.ComponentType;
import Dnn.TensorDataTypeTraits;
import Dnn.Tensor;
import Dnn.TensorTypes;
import Dnn.LanguageNetwork;
import Dnn.Components.TokenEmbedding;
import Serialization.Tensor;
import Dnn.TensorDataType;
import Dnn.Components.Linear;

Classes

class  Mila::Dnn::LlamaTransformer< TDeviceType, TPrecision, TWeightQuantization, TKvCachePolicy >
 LLaMA-style transformer (decoder-only) for autoregressive token prediction. More...

Namespaces

namespace  Mila
 Mila main API namespace.
namespace  Mila::Dnn

Functions

int64_t Mila::Dnn::computePrefillChunkSize (int64_t batch, int64_t num_heads, int64_t head_dim, int64_t context_length, int64_t precision_bytes)

Variables

constexpr int64_t Mila::Dnn::kPrefillScratchByteCap = int64_t{ 1536 } * 1024 * 1024

Detailed Description

LLaMA-style decoder-only transformer network.

Device-templated network implementing a LLaMA-style autoregressive decoder.