SBS BI Docs Fetch Input Networks

# SBS Parameters

Specifying the correct parameters is essential to the app’s functioning and the correct run of the main SBS analysis. You will find basic and advanced settings on the Set Parameters page.

#### Basic Settings

• Name of uploaded file: name of the file to be analyzed. This is generally auto-completed.
• CSV separator: specifies the separator used in the CSV file. Insert a single character without quoting.
• Time unit and Time intervals: specifies the time interval for the calculation of the Semantic Brand Score. For example, you might want to calculate a daily score or aggregate the texts produced in one week.
• Start time and End time: these are the start and end times of the analysis. This setting is particularly important when you want to restrict the analysis to a specific timeframe (but the CSV you uploaded has more data).
• Brand list: it is the list of brands to be analyzed (can be any word, such as the name of a product or a person). You must list the brands without quotes, separating them with a comma. Use a single word for each brand.
• Language: this is the language of uploaded texts (please be consistent and try to analyze one language at a time). This setting will be used for the calculation of sentiment, for stemming/lemmatization, and for the removal of stopwords. This setting will be ignored if you select different languages in the advanced settings.
• Min co-occurrence frequency: textual co-occurrences with a frequency below this threshold will be filtered out. The value must be an integer bigger or equal to 1 (1 means no filtering). This value can strongly impact the computation time (higher values, faster analysis). However, setting high values here could produce an error, potentially removing all the network links. The calculation of co-occurrences depends on the time periods specified for the analysis.
• Generate topic models : if set to “yes”, topic modeling will be performed and added to the graphical report. It can be resource intensive. The “yes and only” option simplifies the process and only carries out a topic modeling without calculating the SBS and producing its related reports. In the case of a “yes and only” choice:
• no time intervals will be considered (all text documents with a date between Start time and End time will be used);
• please ignore the Link filter adjustment for topics parameter and consider the Min co-occurrence frequency instead. Value is usually higher than that of an analysis carried out on time intervals;
• indicating brands is still mandatory in the parameters form. If you have no brands and are just interested in the topic modeling, invent a word that does not exist and use it as a brand.
• Send email at the end: if checked, the system will send an email when the analysis ends.

• Cluster brands: sometimes, it is useful to merge multiple words that represent a brand. Each brand/concept could be represented by a set of keywords. If this is the case, you can use the cluster brands field to specify the words to merge. For example, we may want to have a single node for the word “pope” and the word “Francis”. The following syntax has to be used "cluster1":["word1","word2",..], "cluster2":["word6","word8",..],... The same word cannot appear in multiple clusters.

Additionally, asterisks can be used at the end of words, indicating that a specific word could be completed with any possible set of characters. For example, if the word "asp*" is used, this will match both the words "aspirin" and "aspire". This also works with multiple words; for example, "financial sect*" will match the words "financial sector".

• Custom stopwords: can be used to specify custom stopwords, i.e., words that will be ignored during the analysis. Custom stopwords should be listed separated by a comma, without quotes. Including multiple words (e.g., formula 1) is possible.

• Use different languages for stopwords: sometimes, it is necessary to remove stopwords from multiple dictionaries (languages). This field is also helpful when the language of the uploaded text is not available in the parameter Language. Use the SKIP option if you do not wish to remove stopwords (you can still remove custom ones).

• Stemming or Lemmatization: choose whether to apply stemming (recommended) or lemmatization.

• Use a different language for stemming or lemmatization: this field can be used to select an alternative language for stemming or lemmatization, with more choices with respect to the parameter Language. The choice might affect the calculation of Sentiment which is not supported for all languages. Use the SKIP option if you do not wish to apply stemming or lemmatization.

• Connectivity approximation: when the value is set lower than 1, the calculation is approximated considering a random subset of nodes in the network. Please note that results might change between different runs if the calculation is approximated (value < 1). The lower the value, the higher the approximation.

• Percentage of text to analyze: used when the analysis has to consider only a portion of each text document, for example, just the title and lead of online news. A value of 1 means that the full text will be analyzed; lower values indicate a lower percentage of text to analyze (e.g., 0.5 = the initial 50% of each document).

• Word co-occurrence range: indicates the range of the maximum distance between words to determine a co-occurrence. Values of 5 or 7 are usually good, and results are often robust with respect to this parameter when the values are within a reasonable range (2 to 20).

• Distinctiveness centrality alpha: is the value of the alpha coefficient used to calculate Distinctiveness Centrality and, therefore, Diversity. It can be any number >= 1. Setting a value of 1 is recommended.

• Use source weight: if this is checked, the analysis will be performed considering source weights. Source weights are specified in the third column of the input CSV file. Higher numbers indicate more important sources.

• Calculate setiment: if checked, the system will additionally calculate the sentiment of the words associated with each brand.

• Max number of topics: indicates the maximum number of topics that should be considered for topic modeling.

• Link filter adjustment for topics: this is a parameter similar to that of Min co-occurrence frequency and it is used to remove low-weight links prior to topic modeling. Please consider that the topic modeling is carried out on the whole dataset (networks are not split by period of analysis). Leave this parameter to 1, or increase it to filter out more links. We usually recommend values between 0.75 and 1.25.

• Generate Graphs: if checked, the system will produce the graphical report as a result of the analysis.

• Save parameters: choosing a name will save the parameter configuration so it can be imported in the future.

To save a configuration, click on Generate Parameters File. Please note that the configuration will not be saved in case of errors.

## Saved Parameters

This page can be used to import previously saved parameter configurations. Once imported they will be visible on the Set Parameters page.