Corpus Cleaner

This function can be used to pre-process and clean the documents, with the option to remove stop-words and perform language stemming or lemmatization. It can also be used to substitute words in the text and to calculate corpus statistics. In particular, the software will calculate: the number of Tokens and Types, the Type-Token Ratio, the number of Hapaxes, and the Hapax-Type Ratio.

Parameters