SBS BI Docs Fetch Input Networks

# Networks

These are functions that can be used to upload and analyze (social) networks.

The app can analyze any network in the Pajek “.net” format (see here for more information). You should:

• Upload a single “.zip” file only containing one or more network files in the Pajek “.net” format.
• Provide the list of vertices and their labels using a single space as a delimiter.
• Node labels must be strings: single words (no whitespaces, use double quoting “”), use only a-z letters and numbers.
• Avoid leading whitespaces in the file lines. Separate numbers with single spaces.
• If you upload more than one network, all must be directed or undirected.
• Simplify networks with parallel edges (multigraphs are not allowed).
• Use strings and not numbers to label nodes.

Here is an example of a correct network file (directed, weighted):

*Vertices 6
1 "v1"
2 "v2"
3 "v3"
4 "v4"
5 "v5"
6 "v6"
*Arcs
2 6 1
3 4 10
3 5 1
4 5 5
4 1 1
5 6 1
5 1 3
6 3 1


### Validation

Validating networks after the upload is mandatory. Only if the validation succeeds could you proceed with the analysis. Please click on Validate Networks after the upload.

## Network Analysis

These functions allow the calculation of different social network analysis metrics or the transformation of the uploaded networks.

### Network Measures

The Analysis field can be used to calculate the following metrics:

• betweenness centrality: calculates the betweenness centrality of nodes.
• closeness centrality: calculates the closeness centrality of nodes.
• community detection: finds network communities (partitions) by using the Louvain Clustering Algorithm.
• degree centrality: calculates the degree centrality of nodes (defined as the number of links incident upon a node). Includes weighted degree and in-degree and out-degree for directed networks. In the case of directed networks, the contribution index is also calculated: CI = (weighted out-degree - weighted in-degree) / (weighted all-degree).
• distinctiveness centrality: calculates the distinctiveness centrality of nodes. Includes in-distinctiveness and out-distinctiveness for directed networks.
• network distance (graph edit): calculates the graph edit distance among all networks. Graph edit distance is a graph similarity measure analogous to Levenshtein distance for strings. It is the minimum number of edit operations necessary to transform a graph and make it isomorphic to another graph. Here we consider equivalnce instead of isomorphism (we match node lables, and weights if specified). This is memory and resource intensive. Please only use it with very small networks.
• network similarity (jaccard): calculates the degree of similarity among all networks, using the Jaccard index (either weighted or unweighted). This is memory and resource intensive. Please use it with small networks.
• network similarity (tf-idf cosine): calculates the degree of similarity among all networks. Specifically, a document-term matrix is created considering each network as a document (i.e., a matrix row) and calculating term-frequency as the weighted all-degree of nodes. Subsequently, the matrix is transformed following a TF-IDF logic and using L2 normalization. Cosine similarity is later used to calculate similarities. This operation usually makes sense when networks originate from text and represent links among words. The Threshold field will be considered if its value is set lower than 1. It will be used for removing terms (nodes) that appear too frequently; for example, a value of 0.5 means “ignore nodes with a degree higher than zero in at least 50% of the networks”. The default value is 0.9. Similarly, nodes with a degree higher than zero in less than 0.1% of the networks will be ignored.
• node similarity (simrank): calculates the SimRank similarity of the nodes provided in the node list with all the other nodes in each network. SimRank is a similarity metric that says “two objects are considered to be similar if they are referenced by similar objects”. Unless a different value is specified in the Threshold field, the similarity scores will be reported for the top 100 similar nodes. Results are affected by the choice of considering arc weights. This is memory and resource intensive. Please only use it with small networks.
• rotating leadership: calculates rotating leadership of network nodes (i.e., their oscillations in betweenness centrality). Please pay attention that betweenness variations are calculated sequentially (from one network to the next), taking files in alphabetical order (please be careful while labeling your networks and use letters instead of numbers). This is usually applied to time series of networks.

### Network Transformation

• keep only nodes in the list: considering the node list parameter, this function will transform the network by only retaining the listed nodes.
• remove listed nodes: considering the node list parameter, this function will transform the network by removing the listed nodes.
• remove links with weight below threshold: this function will delete all the links that have a weight below the value indicated in the Threshold field.
• remove links with weight above threshold: this function will delete all the links that have a weight above the value indicated in the Threshold field.
• remove nodes with degree below threshold: this function will delete all the nodes with a degree centrality lower than the value indicated in the Threshold field. If Consider arc weights is selected, the function will calculate weighted degree centrality and refer to this measure.
• remove nodes with degree above threshold: this function will delete all the nodes with a degree centrality higher than the value indicated in the Threshold field. If Consider arc weights is selected, the function will calculate weighted degree centrality and refer to this measure.
• merge nodes by list: considering the list of nodes to merge, this function will merge nodes as specified and retain the label of the first node in each group. In the case of multiple networks, node labels of the first network (alphabetical order) will be considered. Suggestion: process one network at a time.
• merge networks: create one single network by combining all those that have been uploaded. Arc weights are summed. Please remember to indicate if the networks are directed or not.

### Parameters

• Threshold: this is a generic threshold value used by some network transformation functions. The other functions will ignore it.
• Consider arc weights: if flagged, arc weights will be considered for the analysis. Otherwise, weighted networks will be dichotomized in most cases. Arc weights are regarded as link strength (not as a distance), so their reciprocal value is used in some analyses (e.g., when calculating weighted betweenness centrality).
• Directed network: if flagged, it will tell the system to treat the network as a directed graph. If a directed graph has been uploaded and you leave this box unchecked, your networks will be transformed to undirected graphs before some analyses.
• Remove Loops: if flagged, loops will be removed.
• Resolution parameter for community detection: will change the size of the communities, default to 1. Represents the time described here.
• Variation threshold for rotating leadership: is the threshold used to define a significant betweenness oscillation for calculating rotating leadership. It indicates the minimum percentage change in betweenness to produce an oscillation, considering one network and the one that follows (in alphabetical order).
• Alpha parameter for distinctiveness centrality: is the alpha coefficient value used to calculate Distinctiveness Centrality. The default is 1.
• List of nodes (to keep or remove): this field can be used to provide a list of nodes to be considered for the selected network task. Please use only integers (corresponding to the node number) separated by a comma. Number intervals can be separated using the hyphen (e.g., 1,2,4,7,9-11).
• List of nodes to merge: this field can be used to provide a list of nodes that will be merged if the corresponding function is selected. Use comma-separated numbers and the pipe to separate groups (e.g., 1,2,3,5|8,6|9-12). In the case of multiple networks, node labels of the first network (alphabetical order) will be considered. Suggestion: process one network at a time.

### Output

The similarity functions produce a similarity matrix, or a similarity list, as a result. The higher the value in a matrix cell, the more similar the corresponding networks.

All the other functions produce a table with nodes in the first column and the different metrics in the subsequent columns, calculated for each network provided as input (the network name appears at the end of each column label). Some metrics may be indicated with different labels to specify, for example, if arc weights or directionality were considered during the calculation (e.g., “betw_UNdir” indicates scores of betweenness centrality, calculated disregarding the direction of the network arcs).

When network transformation is applied, the output will be a new “.zip” file with the transformed networks.