|
Mila 0.13.48
Deep Neural Network Library
|
Unified BPE tokenizer for GPT-2, Llama 3.x, and Mistral model families. More...
#include <string>#include <string_view>#include <vector>#include <span>#include <memory>#include <optional>#include <filesystem>#include <chrono>#include <iostream>#include <regex>#include <limits>#include <stdexcept>import Data.TokenizerVocabulary;import Data.Tokenizer;import Data.BpePreTokenizationMode;import Data.BpeVocabulary;Classes | |
| class | Mila::Data::BpeTokenizer |
| Unified BPE tokenizer targeting GPT-2, Llama 3.x, and Mistral model families. More... | |
Namespaces | |
| namespace | Mila |
| Mila main API namespace. | |
| namespace | Mila::Data |
Typedefs | |
| using | Mila::Data::TokenId |
Unified BPE tokenizer for GPT-2, Llama 3.x, and Mistral model families.
Encode pipeline:
Decode pipeline: Concatenate token strings and reverse the byte encoding back to UTF-8.