Corpus Cleaner
Public Attributes | List of all members
_DOCUMENT Struct Reference

Structure for storing statistical information for each process of CorpusCleaner. More...

#include <corpus_cleaner.hpp>

Public Attributes

string text =""
 
string id =""
 
bool is_rejected =false
 
set< string > metadata
 
string language
 
float language_score =0
 
double perplexity =999999
 

Detailed Description

Structure for storing statistical information for each process of CorpusCleaner.

Each process of CorpusCleaner obtains the following specific information.

Definition at line 23 of file corpus_cleaner.hpp.

Member Data Documentation

◆ id

string _DOCUMENT::id =""

Definition at line 25 of file corpus_cleaner.hpp.

◆ is_rejected

bool _DOCUMENT::is_rejected =false

Definition at line 26 of file corpus_cleaner.hpp.

◆ language

string _DOCUMENT::language

Definition at line 28 of file corpus_cleaner.hpp.

◆ language_score

float _DOCUMENT::language_score =0

Definition at line 29 of file corpus_cleaner.hpp.

◆ metadata

set<string> _DOCUMENT::metadata

Definition at line 27 of file corpus_cleaner.hpp.

◆ perplexity

double _DOCUMENT::perplexity =999999

Definition at line 30 of file corpus_cleaner.hpp.

◆ text

string _DOCUMENT::text =""

Definition at line 24 of file corpus_cleaner.hpp.


The documentation for this struct was generated from the following file: