|
Corpus Cleaner
|
Structure for storing statistical information for each process of CorpusCleaner. More...
#include <corpus_cleaner.hpp>
Public Attributes | |
| string | text ="" |
| string | id ="" |
| bool | is_rejected =false |
| set< string > | metadata |
| string | language |
| float | language_score =0 |
| double | perplexity =999999 |
Structure for storing statistical information for each process of CorpusCleaner.
Each process of CorpusCleaner obtains the following specific information.
Definition at line 23 of file corpus_cleaner.hpp.
| string _DOCUMENT::id ="" |
Definition at line 25 of file corpus_cleaner.hpp.
| bool _DOCUMENT::is_rejected =false |
Definition at line 26 of file corpus_cleaner.hpp.
| string _DOCUMENT::language |
Definition at line 28 of file corpus_cleaner.hpp.
| float _DOCUMENT::language_score =0 |
Definition at line 29 of file corpus_cleaner.hpp.
| set<string> _DOCUMENT::metadata |
Definition at line 27 of file corpus_cleaner.hpp.
| double _DOCUMENT::perplexity =999999 |
Definition at line 30 of file corpus_cleaner.hpp.
| string _DOCUMENT::text ="" |
Definition at line 24 of file corpus_cleaner.hpp.