Specifying the correct parameters is essential to the app’s functioning and the correct run of the main SBS analysis. You will find basic and advanced settings on the Set Parameters page.
Name of uploaded file
: name of the file to be analyzed. This is generally auto-completed.CSV separator
: specifies the separator used in the CSV file. Insert a single character without quoting.Time unit
and Time intervals
: specifies the time interval for the calculation of the Semantic Brand Score. For example, you might want to calculate a daily score or aggregate the texts produced in one week.Start time
and End time
: these are the start and end times of the analysis. This setting is particularly important when you want to restrict the analysis to a specific timeframe (but the CSV you uploaded has more data).Brand list
: it is the list of brands to be analyzed (can be any word, such as the name of a product or a person). You must list the brands without quotes, separating them with a comma. Use a single word for each brand.Language
: this is the language of uploaded texts (please be consistent and try to analyze one language at a time). This setting will be used for the calculation of sentiment, for stemming/lemmatization, and for the removal of stopwords. This setting will be ignored if you select different languages in the advanced settings.Min co-occurrence frequency
: textual co-occurrences with a frequency below this threshold will be filtered out. The value must be an integer bigger or equal to 1 (1 means no filtering). This value can strongly impact the computation time (higher values, faster analysis). However, setting high values here could produce an error, potentially removing all the network links. The calculation of co-occurrences depends on the time periods specified for the analysis.Generate topic models
: if set to “yes”, topic modeling will be performed and added to the graphical report. It can be resource intensive. The “yes and only” option simplifies the process and only carries out a topic modeling without calculating the SBS and producing its related reports. In the case of a “yes and only” choice:
Start time
and End time
will be used);Link filter adjustment for topics
parameter and consider the Min co-occurrence frequency
instead. Value is usually higher than that of an analysis carried out on time intervals;Send email at the end
: if checked, the system will send an email when the analysis ends.Cluster brands
: sometimes, it is useful to merge multiple words that represent a brand. Each brand/concept could be represented by a set of keywords. If this is the case, you can use the cluster brands field to specify the words to merge. For example, we may want to have a single word in lieu of the word “pope” and the word “Francis”. The following syntax has to be used "cluster1":["word1","word2",..], "cluster2":["word6","word8",..],..
. The same word cannot appear in multiple clusters. Hyphens cannot be used in words in the cluster. Please replace them with a whitespace (e.g., if you want to replace the word “zero-emission”, please put “zero emission” in the cluster). Additionally, asterisks can be used at the end of words, indicating that a specific word could be completed with any possible set of characters. For example, if the word "asp*"
is used, this will match both the words "aspirin"
and "aspire"
. This does not work with multiple words. All words in a cluster will be replaced with the cluster label.
Custom stopwords
: can be used to specify custom stopwords, i.e., words that will be ignored during the analysis. Custom stopwords should be listed separated by a comma, without quotes. Including multiple words (e.g., formula 1
) is possible.
Use different languages for stopwords
: sometimes, it is necessary to remove stopwords from multiple dictionaries (languages). This field is also helpful when the language of the uploaded text is not available in the parameter Language
. Use the SKIP
option if you do not wish to remove stopwords (you can still remove custom ones).
Stemming or Lemmatization
: choose whether to apply stemming (recommended) or lemmatization.
Use a different language for stemming or lemmatization
: this field can be used to select an alternative language for stemming or lemmatization, with more choices with respect to the parameter Language
. The choice might affect the calculation of Sentiment which is not supported for all languages. Use the SKIP
option if you do not wish to apply stemming or lemmatization.
Connectivity approximation
: when the value is set lower than 1, the calculation is approximated considering a random subset of nodes in the network. Please note that results might change between different runs if the calculation is approximated (value < 1). The lower the value, the higher the approximation.
Percentage of text to analyze
: used when the analysis has to consider only a portion of each text document, for example, just the title and lead of online news. A value of 1 means that the full text will be analyzed; lower values indicate a lower percentage of text to analyze (e.g., 0.5 = the initial 50% of each document).
Word co-occurrence range
: indicates the range of the maximum distance between words to determine a co-occurrence. Values of 5 or 7 are usually good, and results are often robust with respect to this parameter when the values are within a reasonable range (2 to 20).
Distinctiveness centrality alpha
: is the value of the alpha coefficient used to calculate Distinctiveness Centrality and, therefore, Diversity. It can be any number >= 1. Setting a value of 1 is recommended.
Use source weight
: if this is checked, the analysis will be performed considering source weights. Source weights are specified in the third column of the input CSV file. Higher numbers indicate more important sources.
Calculate setiment
: if checked, the system will additionally calculate the sentiment of the words associated with each brand.
Max number of topics
: indicates the maximum number of topics that should be considered for topic modeling.
Link filter adjustment for topics
: this is a parameter similar to that of Min co-occurrence frequency
and it is used to remove low-weight links prior to topic modeling. Please consider that the topic modeling is carried out on the whole dataset (networks are not split by period of analysis). Leave this parameter to 1, or increase it to filter out more links. We usually recommend values between 0.75 and 1.25.
Generate Graphs
: if checked, the system will produce the graphical report as a result of the analysis.
Save parameters
: choosing a name will save the parameter configuration so it can be imported in the future.
To save a configuration, click on Generate Parameters File. Please note that the configuration will not be saved in case of errors.
This page can be used to import previously saved parameter configurations. Once imported they will be visible on the Set Parameters page.