Mila 0.13.48
Deep Neural Network Library
Loading...
Searching...
No Matches
CharTrainer.ixx File Reference

Character-level tokenizer trainer for corpus accumulation and vocabulary building. More...

#include <string>
#include <vector>
#include <fstream>
#include <filesystem>
#include <iostream>
#include <stdexcept>
#include <cstdint>
#include <memory>
#include <optional>
#include <algorithm>
import Data.TokenizerVocabulary;
import Data.TokenizerTrainer;
import Data.CharVocabulary;
import Data.CharTokenizer;
import Data.Tokenizer;
import Data.CharVocabularyConfig;

Classes

class  Mila::Data::CharTrainer
 Character-level tokenizer trainer. More...

Namespaces

namespace  Mila
 Mila main API namespace.
namespace  Mila::Data

Typedefs

using Mila::Data::TokenId

Detailed Description

Character-level tokenizer trainer for corpus accumulation and vocabulary building.

Provides corpus management and delegates to CharVocabulary factory methods. Maintained for API consistency with BpeTrainer, though character tokenization is simple enough that direct use of CharVocabulary::trainFromFile() is often preferred.