This function takes as input a tibble graph (from tidygraph) or a list of tibble graphs, and then runs different cluster detection algorithms depending on the method chosen by the user (see @details for information on the different methods. The function associate each node to its corresponding cluster identifier. It also creates a cluster attribute for edges: to each edge is associated a corresponding cluster identifier if the two nodes connected by the edge belong to the same cluster If nodes have a different cluster, the edge takes "00" as cluster attribute.
Arguments
- graphs
A tibble graph from tidygraph, a list of tibble graphs or a data frame.
- weights
The weights of the edges. It must be a positive numeric vector,
NULL
orNA
. If it isNULL
and the input graph has a ‘weight’ edge attribute, then that attribute will be used. IfNULL
and no such attribute is present, then the edges will have equal weights. Set this toNA
if the graph was a ‘weight’ edge attribute, but you don't want to use it for community detection. Edge weights are used to calculate weighted edge betweenness. This means that edges are interpreted as distances, not as connection strengths.- clustering_method
The different clustering algorithms implemented in the function (see details). The parameters of the function depend of the clustering method chosen.
- objective_function
The objective function to maximize for the leiden algorithm. Whether to use the Constant Potts Model (CPM) or modularity. Must be either "CPM" or "modularity" (see
igraph::cluster_leiden()
). CPM is used by default.- resolution
The resolution parameter to use for leiden algorithm (see
igraph::cluster_leiden()
). Higher resolutions lead to more smaller communities, while lower resolutions lead to fewer larger communities.- n_iterations
the number of iterations to iterate the Leiden algorithm. Each iteration may improve the partition further (see
igraph::cluster_leiden()
).- n_groups
May be used by the fast greedy or the walktrap algorithm. Integer scalar, the desired number of communities. If too low or two high, then an error message is given.
- node_weights
May be used both for the Leiden or infomap algorithms. For Leiden, if this is not provided, it will be automatically determined on the basis of the objective_function (see
igraph::cluster_leiden()
). For infomap, if it is not present, then all vertices are considered to have the same weight. A larger vertex weight means a larger probability that the random surfer jumps to that vertex (seeigraph::cluster_infomap()
).- trials
The number of attempts to partition the network (can be any integer value equal or larger than 1) for the infomap algorithm (see
igraph::cluster_infomap()
).- steps
The length of the random walks to perform for the walktrap algorithm (see
igraph::cluster_walktrap()
)- verbose
Set to
FALSE
if you don't want the function to display different sort of information.- seed
Enter a random number to set the seed within the function. Some algorithms use heuristics and random processes that might result in different cluster each time the function is run. Setting the seed is particularly useful for reproducibility and if you want to make sure to find the same clusters each time the function is run with the same graphs.
Value
The same tidygraph graph or tidygraph list as input, but with a new cluster column for nodes with a column with the size of these clusters, and three cluster columns for edges (see the details).
Details
The function could be run indifferently on one tidigraph object or on a list
of tidygraph object, as created by build_dynamic_networks()
.
The function implements five different algorithms. Four exists in
igraph and are used in this package through their implement
in tidygraph (see
group_graph()). The function also implements the
Leiden algorithm (Traag et al. 2019)
which is in igraph
but not
in tidygraph
yet (see cluster_leiden()).
The newly created columns with the cluster identifier for nodes and edges
are named depending of the method used. If you use the Leiden algorithm, the
function will create a column called cluster_leiden
for nodes, and three columns
for the edges, called cluster_leiden_from
, cluster_leiden_to
and cluster_leiden
.
The function also
automatically calculates the percentage of total nodes that are gathered in each
cluster, in the column size_com
.
To make plotting easier later, a zero is put before one-digit cluster identifier (cluster 5 becomes "05"; cluster 10 becomes "10"). Attributing a cluster identifier to edges allow for giving edges the same color of the nodes they are connecting together if the two nodes have the same color, or a different color from both nodes, if the nodes belong to different clusters.
References
Traag VA, Waltman L, family=Eck gJpu (2019). “From Louvain to Leiden: Guaranteeing Well-Connected Communities.” Scientific reports, 9(1), 1--12.
Examples
library(networkflow)
nodes <- Nodes_stagflation |>
dplyr::rename(ID_Art = ItemID_Ref) |>
dplyr::filter(Type == "Stagflation")
references <- Ref_stagflation |>
dplyr::rename(ID_Art = Citing_ItemID_Ref)
temporal_networks <- build_dynamic_networks(nodes = nodes,
directed_edges = references,
source_id = "ID_Art",
target_id = "ItemID_Ref",
time_variable = "Year",
cooccurrence_method = "coupling_similarity",
time_window = 20,
edges_threshold = 1,
overlapping_window = TRUE,
filter_components = TRUE)
#> ℹ The method use for co-occurence is the coupling_similarity method.
#> ℹ The edge threshold is: 1.
#> ℹ We remove the nodes that are alone with no edge.
#>
#> ── Creation of the network for the 1975-1994 window. ───────────────────────────
#>
#> ── Creation of the network for the 1976-1995 window. ───────────────────────────
#>
#> ── Creation of the network for the 1977-1996 window. ───────────────────────────
#>
#> ── Creation of the network for the 1978-1997 window. ───────────────────────────
#>
#> ── Creation of the network for the 1979-1998 window. ───────────────────────────
#>
#> ── Creation of the network for the 1980-1999 window. ───────────────────────────
#>
#> ── Creation of the network for the 1981-2000 window. ───────────────────────────
#>
#> ── Creation of the network for the 1982-2001 window. ───────────────────────────
#>
#> ── Creation of the network for the 1983-2002 window. ───────────────────────────
#>
#> ── Creation of the network for the 1984-2003 window. ───────────────────────────
#>
#> ── Creation of the network for the 1985-2004 window. ───────────────────────────
#>
#> ── Creation of the network for the 1986-2005 window. ───────────────────────────
#>
#> ── Creation of the network for the 1987-2006 window. ───────────────────────────
#>
#> ── Creation of the network for the 1988-2007 window. ───────────────────────────
#>
#> ── Creation of the network for the 1989-2008 window. ───────────────────────────
#>
#> ── Creation of the network for the 1990-2009 window. ───────────────────────────
#>
#> ── Creation of the network for the 1991-2010 window. ───────────────────────────
#>
#> ── Creation of the network for the 1992-2011 window. ───────────────────────────
#>
#> ── Creation of the network for the 1993-2012 window. ───────────────────────────
#>
#> ── Creation of the network for the 1994-2013 window. ───────────────────────────
temporal_networks <- add_clusters(temporal_networks,
objective_function = "modularity",
clustering_method = "leiden")
#>
#> ── Cluster detection for the "1975-1994" period ────────────────────────────────
#> ℹ The leiden method detected 6 clusters. The biggest cluster represents "25.7%" of the network.
#>
#> ── Cluster detection for the "1976-1995" period ────────────────────────────────
#> ℹ The leiden method detected 6 clusters. The biggest cluster represents "26.8%" of the network.
#>
#> ── Cluster detection for the "1977-1996" period ────────────────────────────────
#> ℹ The leiden method detected 6 clusters. The biggest cluster represents "33.8%" of the network.
#>
#> ── Cluster detection for the "1978-1997" period ────────────────────────────────
#> ℹ The leiden method detected 6 clusters. The biggest cluster represents "29%" of the network.
#>
#> ── Cluster detection for the "1979-1998" period ────────────────────────────────
#> ℹ The leiden method detected 5 clusters. The biggest cluster represents "32.8%" of the network.
#>
#> ── Cluster detection for the "1980-1999" period ────────────────────────────────
#> ℹ The leiden method detected 6 clusters. The biggest cluster represents "29.1%" of the network.
#>
#> ── Cluster detection for the "1981-2000" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "39.1%" of the network.
#>
#> ── Cluster detection for the "1982-2001" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "40.5%" of the network.
#>
#> ── Cluster detection for the "1983-2002" period ────────────────────────────────
#> ℹ The leiden method detected 5 clusters. The biggest cluster represents "26.2%" of the network.
#>
#> ── Cluster detection for the "1984-2003" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "31%" of the network.
#>
#> ── Cluster detection for the "1985-2004" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "26.7%" of the network.
#>
#> ── Cluster detection for the "1986-2005" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "37.3%" of the network.
#>
#> ── Cluster detection for the "1987-2006" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "35.2%" of the network.
#>
#> ── Cluster detection for the "1988-2007" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "33.3%" of the network.
#>
#> ── Cluster detection for the "1989-2008" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "34.5%" of the network.
#>
#> ── Cluster detection for the "1990-2009" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "33.3%" of the network.
#>
#> ── Cluster detection for the "1991-2010" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "33.8%" of the network.
#>
#> ── Cluster detection for the "1992-2011" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "33.8%" of the network.
#>
#> ── Cluster detection for the "1993-2012" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "36.4%" of the network.
#>
#> ── Cluster detection for the "1994-2013" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "41.1%" of the network.
temporal_networks[[1]]
#> # A tbl_graph: 74 nodes and 446 edges
#> #
#> # An undirected simple graph with 1 component
#> #
#> # Edge Data: 446 × 8 (active)
#> from to weight Source Target cluster_leiden_from cluster_leiden_to
#> <int> <int> <dbl> <chr> <chr> <chr> <chr>
#> 1 6 11 0.00158 1021902 1111111122 02 02
#> 2 6 45 0.000173 1021902 1111111128 02 03
#> 3 6 66 0.000430 1021902 1111111134 02 03
#> 4 6 35 0.000644 1021902 1111111146 02 02
#> 5 6 20 0.000126 1021902 1111111180 02 04
#> 6 6 42 0.000614 1021902 1111111182 02 02
#> 7 6 21 0.000343 1021902 1111111183 02 02
#> 8 6 53 0.000259 1021902 1184127 02 03
#> 9 6 31 0.00121 1021902 14490177 02 02
#> 10 6 65 0.000274 1021902 16167977 02 03
#> # ℹ 436 more rows
#> # ℹ 1 more variable: cluster_leiden <chr>
#> #
#> # Node Data: 74 × 10
#> ID_Art Author Year Author_date Title Journal Type time_window cluster_leiden
#> <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 16182… GORDO… 1975 GORDON-R-1… ALTE… BROOKI… Stag… 1975-1994 01
#> 2 26283… GORDO… 1975 GORDON-R-1… THE … BROOKI… Stag… 1975-1994 01
#> 3 16182… OKUN-A 1975 OKUN-A-197… INFL… BROOKI… Stag… 1975-1994 02
#> # ℹ 71 more rows
#> # ℹ 1 more variable: size_cluster_leiden <dbl>