The app can analyze any textual data. A corpus should be uploaded as a single CSV file with the following characteristics:

• Is fully encoded in utf-8.
• Text has to be in the first column, and dates (mm/dd/yyyy) in the second column. No other date formats are accepted.
• Text must have no quoting, and the CSV separator must not be in the text field.
• Source weights (optional) can be in the third column. Please note that higher scores indicate more important sources.
• Each user has a maximum allowed file size. Uploading bigger files is not allowed: they will be automatically deleted after upload.

You can analyze only one file at a time. The uploader will retain only the last uploaded CSV file, deleting the others.

# Validate Text File

This function is used to validate the uploaded CSV file and verify that there are no errors. Please:

• specify the CSV separator used in the CSV file (use a single character with no quoting) and the minimum length of documents you want to consider (they need to have at least three characters);
• optionally choose to validate weights if placed in the third column of the CSV file;
• optionally choose to automatically remove lines with errors. Please be careful, as this will change the uploaded file;
• optionally choose to remove or keep all the text documents (file rows) that match a specific search query. This will significantly slow down the validation process. The following parameters can be used:
• Use search operators: this option can be used to remove or keep only the documents that match the search query specified through the following fields. If the “No” option is selected, the system will only identify and list the documents matching the query without keeping or removing them. A preliminary run with the “No” option selected is recommended to see which (and how many) documents are matched by the query.
• Doc includes ALL these words: matched documents will include ALL the words listed here. List words with no quoting or spacing, only separated by a comma. Use lowercase and only letters (no punctuation).
• Doc includes AT LEAST ONE of these words: matched documents will include AT LEASE ONE of the words listed here. List words with no quoting or spacing, only separated by a comma. Use lowercase and only letters (no punctuation).
• Doc DOES NOT INCLUDE ANY of these words: matched documents will NOT include any of the words listed here. List words with no quoting or spacing, only separated by a comma. Use lowercase and only letters (no punctuation).

## Output

The validated (and corrected) file will be automatically saved on the system (replacing the original CSV). There is no need to reupload a correctly validated file. The system will also offer the possibility to download the input file that has been corrected.