Corpus Cleaner
CorpusCleaner Member List

This is the complete list of members for CorpusCleaner, including all inherited members.

CleanPipeline(void)CorpusCleaner
CorpusCleaner(string input_path, string output_path, uint32_t min_length, uint32_t max_length, set< string > accept_language, bool store_rejected, bool sentence_segment, float language_threshold, double perplexity_threshold, GenerateDedupLSH *generate_dedup_lsh, LSHDeduplicator *deduplicator)CorpusCleaner
EmojiRemover(Document &document)CorpusCleaner
LanguageFilter(Document &document)CorpusCleaner
LengthFilter(Document &document)CorpusCleaner
MinhashDeduplication(Document &document)CorpusCleaner
Normalizer(Document &document)CorpusCleaner
PerplexityFilter(Document &document)CorpusCleaner
PipelineStep(Document &document, void(CorpusCleaner::*cleaner)(Document &))CorpusCleaner
QuotesRemover(Document &document)CorpusCleaner
SentenceSegmenter(string input_folder_path, string output_folder_path)CorpusCleaner
SpecialCharacterRemover(Document &document)CorpusCleaner
StoreException(string function_name, string reference)CorpusCleaner
URLRemover(Document &document)CorpusCleaner
ZeroPunctuationFilter(Document &document)CorpusCleaner
~CorpusCleaner()CorpusCleaner