Corpus Cleaner
Main Page
Related Pages
Namespaces
Classes
Files
Class List
Class Index
Class Hierarchy
Class Members
All
Functions
Variables
a
c
e
f
g
i
k
l
m
n
p
q
r
s
t
u
z
~
Here is a list of all class members with links to the classes they belong to:
- a -
Apply() :
LSHDeduplicator
- c -
CalculateLSH() :
GenerateDedupLSH
CleanPipeline() :
CorpusCleaner
CorpusCleaner() :
CorpusCleaner
- e -
elapsed_time :
_STATS
EmojiRemover() :
CorpusCleaner
- f -
FastTextEx() :
fasttext::FastTextEx
file_name :
_STATS
- g -
GenerateDedupLSH() :
GenerateDedupLSH
GetMinhash() :
GenerateDedupLSH
GetTotalBucketSize() :
LSHDeduplicator
- i -
id :
_DOCUMENT
InitializeBlacklist() :
LSHDeduplicator
InitializeSeen() :
LSHDeduplicator
is_rejected :
_DOCUMENT
- k -
KenLMFilter() :
KenLMFilter
- l -
language :
_DOCUMENT
language_score :
_DOCUMENT
LanguageFilter() :
CorpusCleaner
LengthFilter() :
CorpusCleaner
LoadBlacklistToSeen() :
LSHDeduplicator
LSHDeduplicator() :
LSHDeduplicator
- m -
metadata :
_DOCUMENT
MinhashDeduplication() :
CorpusCleaner
- n -
NGramTokenize() :
GenerateDedupLSH
Normalizer() :
CorpusCleaner
- p -
perplexity :
_DOCUMENT
Perplexity() :
KenLMFilter
PerplexityFilter() :
CorpusCleaner
PerplexityWithSentencePiece() :
KenLMFilter
PipelineStep() :
CorpusCleaner
predictOneLine() :
fasttext::FastTextEx
process_name :
_STATS
processor :
KenLMFilter
- q -
QuotesRemover() :
CorpusCleaner
- r -
result_file_size :
_STATS
- s -
Score() :
KenLMFilter
ScoreWithSentencePiece() :
KenLMFilter
SentenceSegmenter() :
CorpusCleaner
SizeOfBlacklist() :
LSHDeduplicator
SizeOfSeen() :
LSHDeduplicator
SpecialCharacterRemover() :
CorpusCleaner
StoreBlacklist() :
LSHDeduplicator
StoreException() :
CorpusCleaner
- t -
text :
_DOCUMENT
- u -
URLRemover() :
CorpusCleaner
- z -
ZeroPunctuationFilter() :
CorpusCleaner
- ~ -
~CorpusCleaner() :
CorpusCleaner
~GenerateDedupLSH() :
GenerateDedupLSH
~LSHDeduplicator() :
LSHDeduplicator
Generated by
1.10.0