|
Corpus Cleaner
|
This is the complete list of members for CorpusCleaner, including all inherited members.
| CleanPipeline(void) | CorpusCleaner | |
| CorpusCleaner(string input_path, string output_path, uint32_t min_length, uint32_t max_length, set< string > accept_language, bool store_rejected, bool sentence_segment, float language_threshold, double perplexity_threshold, GenerateDedupLSH *generate_dedup_lsh, LSHDeduplicator *deduplicator) | CorpusCleaner | |
| EmojiRemover(Document &document) | CorpusCleaner | |
| LanguageFilter(Document &document) | CorpusCleaner | |
| LengthFilter(Document &document) | CorpusCleaner | |
| MinhashDeduplication(Document &document) | CorpusCleaner | |
| Normalizer(Document &document) | CorpusCleaner | |
| PerplexityFilter(Document &document) | CorpusCleaner | |
| PipelineStep(Document &document, void(CorpusCleaner::*cleaner)(Document &)) | CorpusCleaner | |
| QuotesRemover(Document &document) | CorpusCleaner | |
| SentenceSegmenter(string input_folder_path, string output_folder_path) | CorpusCleaner | |
| SpecialCharacterRemover(Document &document) | CorpusCleaner | |
| StoreException(string function_name, string reference) | CorpusCleaner | |
| URLRemover(Document &document) | CorpusCleaner | |
| ZeroPunctuationFilter(Document &document) | CorpusCleaner | |
| ~CorpusCleaner() | CorpusCleaner |