Find Similar Clusters across Multiple Temporal Networks — merge_dynamic

This function creates a new column "intertemporal_name" for each network from a list of temporal networks to identify similar clusters across time. The function gives the same name to two clusters from two succesive temporal networks if they match the conditions defined by the user: threshold_similarity, cluster_colum and similarity_type.

Usage

merge_dynamic_clusters(
  list_graph,
  cluster_id,
  node_id,
  threshold_similarity = 0.5001,
  similarity_type = c("complete", "partial")
)

Arguments

list_graph

A list of tibble graphs ((from tidygraph)) The list is expected to be ordered in a sequential order from the oldest to the most recent network.

cluster_id

The column with the identifier of the cluster. If you have used add_clusters(), it is of the form cluster_{clustering_method}.

node_id

The column with the unique identifier of each node.

threshold_similarity

The threshold_similarity variable defines how sensitive the function is to giving the same name to two clusters. A higher threshold will lead to more communities.

For example, if you have two temporal networks with two communities each. Communities A and B for the older network, and communities A' and B' for the more recent network. A threshold of 0.51 with a "complete" similarity_type means that community A' will be given the name A if 51% of the nodes from A' in the more recent network originate from A in the older network, and 51% of the node from A in the older network becomes in A' in the more recent network.

similarity_type

Choose a similarity type to compare the threshold to:

"complete" similarity compute the share of nodes going from an older community to a more recent community on all the nodes in both networks
"partial" similarity compute the share of nodes going from an older community to a more recent community only on nodes that exists in both networks

Complete similarity is particularly suited if the number of nodes in your networks is relatively stable over time as the threshold capture the share of all nodes moving between clusters. Partial similarity can be particularly useful when the number of nodes in your networks increases rapidly. The interpretation of the threshold is that it captures the share of nodes existing in both networks moving between clusters.

For example, with a complete similarity threshold of 0.51, if (1) all nodes from community A in network t-1 go into community A' in network t+1, and (2) all nodes in community A' present in network t-1 originate from community A, but (3) the number of nodes in A' is more than twice of A because of new nodes that did not exists in t-1, A' will never meet the threshold requirement to be named A despite a strong similarity between the two clusters. Conceptually, this might be a desired behavior of the function because one might considered that A' is too different from A to be considered the same cluster as its composition is changed from new nodes. In that case complete similarity is the right choice. However, if one consider that A and A' are very similar because all the nodes that exists in both networks are identified as part of the same community, then partial threshold similarity is more desirable.

Value

The function returns the same list of networks used as input in list_graph but with a new column dynamic_{cluster_id} (i.e, the name of the new column depends of the column that served as input). The column is the result of the inter-graphs grouping of the original clusters of the cluster_id. The dynamic clusters are also merged with the different cluster_id columns of the edges data.

Examples

library(networkflow)

nodes <- Nodes_stagflation |>
dplyr::rename(ID_Art = ItemID_Ref) |>
dplyr::filter(Type == "Stagflation")

references <- Ref_stagflation |>
dplyr::rename(ID_Art = Citing_ItemID_Ref)

temporal_networks <- build_dynamic_networks(nodes = nodes,
directed_edges = references,
source_id = "ID_Art",
target_id = "ItemID_Ref",
time_variable = "Year",
cooccurrence_method = "coupling_similarity",
time_window = 10,
edges_threshold = 1,
overlapping_window = TRUE,
filter_components = TRUE)
#> ℹ The method use for co-occurence is the coupling_similarity method.
#> ℹ The edge threshold is: 1.
#> ℹ We remove the nodes that are alone with no edge. 
#> 
#> ── Creation of the network for the 1975-1984 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1976-1985 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1977-1986 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1978-1987 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1979-1988 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1980-1989 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1981-1990 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1982-1991 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1983-1992 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1984-1993 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1985-1994 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1986-1995 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1987-1996 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1988-1997 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1989-1998 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1990-1999 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1991-2000 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1992-2001 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1993-2002 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1994-2003 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1995-2004 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1996-2005 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1997-2006 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1998-2007 window. ───────────────────────────
#> 
#> ── Creation of the network for the 1999-2008 window. ───────────────────────────
#> 
#> ── Creation of the network for the 2000-2009 window. ───────────────────────────
#> 
#> ── Creation of the network for the 2001-2010 window. ───────────────────────────
#> 
#> ── Creation of the network for the 2002-2011 window. ───────────────────────────
#> 
#> ── Creation of the network for the 2003-2012 window. ───────────────────────────
#> 
#> ── Creation of the network for the 2004-2013 window. ───────────────────────────

temporal_networks <- add_clusters(temporal_networks,
objective_function = "modularity",
clustering_method = "leiden")
#> 
#> ── Cluster detection for the "1975-1984" period ────────────────────────────────
#> ℹ The leiden method detected 5 clusters. The biggest cluster represents "39.1%" of the network.
#> 
#> ── Cluster detection for the "1976-1985" period ────────────────────────────────
#> ℹ The leiden method detected 6 clusters. The biggest cluster represents "24.6%" of the network.
#> 
#> ── Cluster detection for the "1977-1986" period ────────────────────────────────
#> ℹ The leiden method detected 6 clusters. The biggest cluster represents "26.2%" of the network.
#> 
#> ── Cluster detection for the "1978-1987" period ────────────────────────────────
#> ℹ The leiden method detected 6 clusters. The biggest cluster represents "28.6%" of the network.
#> 
#> ── Cluster detection for the "1979-1988" period ────────────────────────────────
#> ℹ The leiden method detected 5 clusters. The biggest cluster represents "35.4%" of the network.
#> 
#> ── Cluster detection for the "1980-1989" period ────────────────────────────────
#> ℹ The leiden method detected 5 clusters. The biggest cluster represents "40%" of the network.
#> 
#> ── Cluster detection for the "1981-1990" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "37%" of the network.
#> 
#> ── Cluster detection for the "1982-1991" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "38.1%" of the network.
#> 
#> ── Cluster detection for the "1983-1992" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "42.9%" of the network.
#> 
#> ── Cluster detection for the "1984-1993" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "40%" of the network.
#> 
#> ── Cluster detection for the "1985-1994" period ────────────────────────────────
#> ℹ The leiden method detected 2 clusters. The biggest cluster represents "57.1%" of the network.
#> 
#> ── Cluster detection for the "1986-1995" period ────────────────────────────────
#> ℹ The leiden method detected 2 clusters. The biggest cluster represents "57.1%" of the network.
#> 
#> ── Cluster detection for the "1987-1996" period ────────────────────────────────
#> ℹ The leiden method detected 2 clusters. The biggest cluster represents "62.5%" of the network.
#> 
#> ── Cluster detection for the "1988-1997" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "36.4%" of the network.
#> 
#> ── Cluster detection for the "1989-1998" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "45.5%" of the network.
#> 
#> ── Cluster detection for the "1990-1999" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "46.2%" of the network.
#> 
#> ── Cluster detection for the "1991-2000" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "38.9%" of the network.
#> 
#> ── Cluster detection for the "1992-2001" period ────────────────────────────────
#> ℹ The leiden method detected 4 clusters. The biggest cluster represents "35%" of the network.
#> 
#> ── Cluster detection for the "1993-2002" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "40.7%" of the network.
#> 
#> ── Cluster detection for the "1994-2003" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "38.7%" of the network.
#> 
#> ── Cluster detection for the "1995-2004" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "38.9%" of the network.
#> 
#> ── Cluster detection for the "1996-2005" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "40.5%" of the network.
#> 
#> ── Cluster detection for the "1997-2006" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "40%" of the network.
#> 
#> ── Cluster detection for the "1998-2007" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "39.6%" of the network.
#> 
#> ── Cluster detection for the "1999-2008" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "39.1%" of the network.
#> 
#> ── Cluster detection for the "2000-2009" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "49%" of the network.
#> 
#> ── Cluster detection for the "2001-2010" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "45.7%" of the network.
#> 
#> ── Cluster detection for the "2002-2011" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "50%" of the network.
#> 
#> ── Cluster detection for the "2003-2012" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "55.3%" of the network.
#> 
#> ── Cluster detection for the "2004-2013" period ────────────────────────────────
#> ℹ The leiden method detected 3 clusters. The biggest cluster represents "54.8%" of the network.

temporal_networks <- merge_dynamic_clusters(temporal_networks,
cluster_id = "cluster_leiden",
node_id = "ID_Art",
threshold_similarity = 0.51,
similarity_type = "partial")

temporal_networks[[1]]
#> # A tbl_graph: 64 nodes and 375 edges
#> #
#> # An undirected simple graph with 1 component
#> #
#> # Edge Data: 375 × 11 (active)
#>     from    to   weight Source  Target     cluster_leiden_from cluster_leiden_to
#>    <int> <int>    <dbl> <chr>   <chr>      <chr>               <chr>            
#>  1     6    11 0.00160  1021902 1111111122 02                  02               
#>  2     6    45 0.000183 1021902 1111111128 02                  03               
#>  3     6    35 0.000751 1021902 1111111146 02                  02               
#>  4     6    20 0.000128 1021902 1111111180 02                  02               
#>  5     6    42 0.000624 1021902 1111111182 02                  02               
#>  6     6    21 0.000365 1021902 1111111183 02                  02               
#>  7     6    52 0.000274 1021902 1184127    02                  03               
#>  8     6    31 0.00124  1021902 14490177   02                  02               
#>  9     6    64 0.000278 1021902 16167977   02                  03               
#> 10     3     6 0.000173 1021902 16182201   02                  02               
#> # ℹ 365 more rows
#> # ℹ 4 more variables: cluster_leiden <chr>, dynamic_cluster_leiden_from <chr>,
#> #   dynamic_cluster_leiden_to <chr>, dynamic_cluster_leiden <chr>
#> #
#> # Node Data: 64 × 11
#>   ID_Art Author  Year Author_date Title Journal Type  time_window cluster_leiden
#>   <chr>  <chr>  <int> <chr>       <chr> <chr>   <chr> <chr>       <chr>         
#> 1 16182… GORDO…  1975 GORDON-R-1… ALTE… BROOKI… Stag… 1975-1984   01            
#> 2 26283… GORDO…  1975 GORDON-R-1… THE … BROOKI… Stag… 1975-1984   01            
#> 3 16182… OKUN-A  1975 OKUN-A-197… INFL… BROOKI… Stag… 1975-1984   02            
#> # ℹ 61 more rows
#> # ℹ 2 more variables: size_cluster_leiden <dbl>, dynamic_cluster_leiden <chr>