|
Mila 0.13.48
Deep Neural Network Library
|
This is the complete list of members for Mila::Data::BpeTokenizer, including all inherited members.
| BpeTokenizer(BpeVocabulary vocab) | Mila::Data::BpeTokenizer | inlineexplicit |
| decode(std::span< const TokenId > tokens) override | Mila::Data::BpeTokenizer | inlinevirtual |
| decodeToken(const std::string &token, std::string &out) | Mila::Data::BpeTokenizer | inlineprivate |
| encode(const std::string &text) override | Mila::Data::BpeTokenizer | inlinevirtual |
| encodeSegment(const std::string &text, std::vector< TokenId > &out) | Mila::Data::BpeTokenizer | inlineprivate |
| encodeSegmentBpe(const std::vector< std::string > &words, std::vector< TokenId > &out) | Mila::Data::BpeTokenizer | inlineprivate |
| encodeSegmentMaxMunch(const std::vector< std::string > &words, std::vector< TokenId > &out) | Mila::Data::BpeTokenizer | inlineprivate |
| getBosTokenId() const override | Mila::Data::BpeTokenizer | inlinevirtual |
| getEosTokenId() const override | Mila::Data::BpeTokenizer | inlinevirtual |
| getPadTokenId() const override | Mila::Data::BpeTokenizer | inlinevirtual |
| getVocab() const | Mila::Data::BpeTokenizer | inline |
| getVocabSize() const override | Mila::Data::BpeTokenizer | inlinevirtual |
| initializePreTokenization() | Mila::Data::BpeTokenizer | inlineprivate |
| isValidToken(TokenId tokenId) const override | Mila::Data::BpeTokenizer | inlinevirtual |
| load(const std::filesystem::path &path) | Mila::Data::BpeTokenizer | inlinestatic |
| loadGpt2(const std::filesystem::path &path) | Mila::Data::BpeTokenizer | inlinestatic |
| loadLlama32(const std::filesystem::path &path) | Mila::Data::BpeTokenizer | inlinestatic |
| loadMistral(const std::filesystem::path &vocab_path, const std::filesystem::path &merges_path) | Mila::Data::BpeTokenizer | inlinestatic |
| pre_tokenization_regex_ | Mila::Data::BpeTokenizer | private |
| preTokenize(const std::string &text) | Mila::Data::BpeTokenizer | inlineprivate |
| tokenToString(TokenId tokenId) const override | Mila::Data::BpeTokenizer | inlinevirtual |
| utf8CharLength(unsigned char first_byte) | Mila::Data::BpeTokenizer | inlineprivatestatic |
| vocab_ | Mila::Data::BpeTokenizer | private |
| ~Tokenizer()=default | Mila::Data::Tokenizer | virtual |