Corpus Cleaner
Functions
main.cpp File Reference
#include "corpus_cleaner.hpp"

Go to the source code of this file.

Functions

uint64_t ConutLines (const string &filename)
 Get line count of filename file.
 
void SplitFiles (const vector< string > &output_files, const string &input_file)
 split one file into multiple equal parts based on the number of lines
 
void MergeFiles (const vector< string > &input_files, const string &output_file)
 split one file into multiple equal parts based on the number of lines
 
void MultiProcessCorpusClean (const string input_folder_path, const string output_folder_path)
 
int main (void)
 

Function Documentation

◆ ConutLines()

uint64_t ConutLines ( const string & filename)

Get line count of filename file.

Example: string input_path = "../data/input/"; ConutLines(input_path);

Parameters
stringfilename: file name
Returns
uint64_t: count of file line

Definition at line 16 of file main.cpp.

◆ main()

int main ( void )

Definition at line 144 of file main.cpp.

◆ MergeFiles()

void MergeFiles ( const vector< string > & input_files,
const string & output_file )

split one file into multiple equal parts based on the number of lines

Parameters
sconstvector<string>& input_files: file list that is merged
conststring& output_file: merged file
Returns
void: None

Definition at line 80 of file main.cpp.

◆ MultiProcessCorpusClean()

void MultiProcessCorpusClean ( const string input_folder_path,
const string output_folder_path )

Definition at line 104 of file main.cpp.

◆ SplitFiles()

void SplitFiles ( const vector< string > & output_files,
const string & input_file )

split one file into multiple equal parts based on the number of lines

Parameters
stringfilename: file name
Returns
uint64_t: count of file line

Definition at line 32 of file main.cpp.