|
Mila 0.13.48
Deep Neural Network Library
|
Character-level tokenizer trainer for corpus accumulation and vocabulary building. More...
#include <string>#include <vector>#include <fstream>#include <filesystem>#include <iostream>#include <stdexcept>#include <cstdint>#include <memory>#include <optional>#include <algorithm>import Data.TokenizerVocabulary;import Data.TokenizerTrainer;import Data.CharVocabulary;import Data.CharTokenizer;import Data.Tokenizer;import Data.CharVocabularyConfig;Classes | |
| class | Mila::Data::CharTrainer |
| Character-level tokenizer trainer. More... | |
Namespaces | |
| namespace | Mila |
| Mila main API namespace. | |
| namespace | Mila::Data |
Typedefs | |
| using | Mila::Data::TokenId |
Character-level tokenizer trainer for corpus accumulation and vocabulary building.
Provides corpus management and delegates to CharVocabulary factory methods. Maintained for API consistency with BpeTrainer, though character tokenization is simple enough that direct use of CharVocabulary::trainFromFile() is often preferred.