Named Entity Recognition and Network Generator

This function performs advanced NLP tasks to extract Named Entities from the text documents in the input file. For more information about Named Entity Recognition, please read here.

Parameters

General Output

The function will produce a file with the identified Named Entities ranked based on their occurrence frequencies (indicated as “Count” in the output file). If time intervals are specified, Named Entities will be listed for each interval, and a “Time” column will appear in the output file, indicating the last day of each interval. The output file also includes a code used to categorize each entity and a column describing each category. Categories might change depending on the language selected.

Network Generation

If the Create Social Networks option is flagged, the function will produce social networks where nodes are Named Entities and links represent their co-occurrence in the text. One network will be generated for each time interval. Networks are saved in the Pajek “.net” format (see here for more information or look at the Networks page).

If you compare the number of Named Entities identified in the general output file and the number of nodes in each network, it might be that the former is higher than the number of nodes. This might depend on the fact that the same entity could be classified more than once (e.g., one time as a person’s name and another time as a location). The network generator function disregards these differences.