Text Classification

This is an experimental function created to extract text features that can better classify different groups of documents. It uses different machine learning algorithms and NLP procedures to make a prediction. Feature importance is evaluated considering each feature’s Shapley values.

This is a beta function that is constantly being improved. Also, it is very resource intensive. Please only use it with small datasets.

Parameters

Please provide an input CSV file that includes the group labels on its third column (better using strings than numbers, please do not try to validate them as source weights). Please note that the function only works for classification problems and NOT for regression.

Output

The function will produce the following files: