Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
TokenizerTrainer.ixx File Reference

Abstract trainer interface for building tokenizers' vocabularies. More...

#include <string>
#include <memory>
#include <filesystem>
#include <istream>
#include <fstream>
#include <sstream>
import Data.TokenizerVocabulary;

Classes

class  Mila::Data::TokenizerTrainer
 Abstract interface for training tokenizer vocabularies from text corpora. More...

Namespaces

namespace  Mila
 Mila main API namespace.
namespace  Mila::Data

Detailed Description

Abstract trainer interface for building tokenizers' vocabularies.

Defines the minimal lifecycle for constructing a tokenizer vocabulary from a text corpus. Tokenizer Trainers perform training and return the resulting vocabulary object; ownership is transferred to the caller.