Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
Data.BpePreTokenizationMode Module Reference

Enumerations

enum class  Mila::Data::PreTokenizationMode { None , Whitespace , Gpt2Regex , Llama3Regex }
 Pre-tokenization strategies for GPT-4 style BPE tokenizers. More...

Variables

constexpr const char * Mila::Data::GPT2_PRETOKENIZATION_PATTERN
constexpr const char * Mila::Data::GPT2_PRETOKENIZATION_PATTERN_ASCII_FALLBACK
constexpr const char * Mila::Data::LLAMA3_PRETOKENIZATION_PATTERN
constexpr const char * Mila::Data::LLAMA3_PRETOKENIZATION_PATTERN_ASCII_FALLBACK

Files

file  /__w/Mila/Mila/Mila/Src/Data/Tokenizers/Bpe/BpePreTokenizationMode.ixx