Skip to contents

This function creates a data.frame that can be easily plotted with ggalluvial from a list of networks.

Usage

networks_to_alluv(
  list_graph = NA,
  intertemporal_cluster_column = "intertemporal_name",
  node_id = NA,
  summary_cl_stats = TRUE,
  keep_color = TRUE,
  color_column = "color",
  keep_cluster_label = TRUE,
  cluster_label_column = "cluster_label"
)

Arguments

list_graph

Your list with all networks

intertemporal_cluster_column

The column with the identifier of the inter-temporal cluster. If you have used add_clusters() and merge_dynamic_clusters(), it is of the form dynamic_cluster_{clustering_method}.

node_id

The column with the unique identifier of each node.

summary_cl_stats

If set to TRUE, the data.frame will contain a list of variable that summarize cluster statistics of the alluvial. These variables can be particularly useful to filter smaller communities when plotting according to different variables:

  • share_cluster_alluv is the percentage share of a given cluster across all time windows;

  • share_cluster_window is the percentage share of a given cluster in a given time window;

  • share_cluster_max is the highest value of share_cluster_window for a given cluster across all individual time windows;

  • length_cluster is the number of time windows a cluster exists.

keep_color

Set to TRUE (by default) if you want to keep the column with the color associated to the different categories of intertemporal_cluster_column. Such a column exists in your list of tibble graphs if you have use color_networks().

color_column

The name of the column with the colors of the categories in intertemporal_cluster_column. By default, "color", as it is the column name resulting from the use of color_networks().

keep_cluster_label

Set to TRUE if you want to keep the column with a name/label associated to the different categories of intertemporal_cluster_column. Such a column exists in your list of tibble graphs if you have use name_clusters().

cluster_label_column

The name of the column with the name/label associated to the categories in intertemporal_cluster_column. By default, "cluster_label", as it is the column name resulting from the use of name_clusters().

Examples

library(networkflow)

nodes <- Nodes_stagflation |>
dplyr::rename(ID_Art = ItemID_Ref) |>
dplyr::filter(Type == "Stagflation")

references <- Ref_stagflation |>
dplyr::rename(ID_Art = Citing_ItemID_Ref)

temporal_networks <- build_dynamic_networks(nodes = nodes,
directed_edges = references,
source_id = "ID_Art",
target_id = "ItemID_Ref",
time_variable = "Year",
cooccurrence_method = "coupling_similarity",
time_window = 20,
edges_threshold = 1,
overlapping_window = TRUE,
filter_components = TRUE,
verbose = FALSE)

temporal_networks <- add_clusters(temporal_networks,
objective_function = "modularity",
clustering_method = "leiden",
verbose = FALSE)

temporal_networks <- merge_dynamic_clusters(temporal_networks,
cluster_id = "cluster_leiden",
node_id = "ID_Art",
threshold_similarity = 0.51,
similarity_type = "partial")

temporal_networks <- name_clusters(graphs = temporal_networks,
method = "tf-idf",
name_merged_clusters = TRUE,
cluster_id = "dynamic_cluster_leiden",
text_columns = "Title",
nb_terms_label = 5,
clean_word_method = "lemmatise")
#> Warning: Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
#> Error in mutate(d_tmp, ...):  In argument: `cluster_label = ifelse(is.na(eval(ensym(label_name))),
#>   "no_name", eval(ensym(label_name)))`.
#> Caused by error in `ensym()`:
#> ! could not find function "ensym"

temporal_networks <- color_networks(graphs = temporal_networks,
column_to_color = "dynamic_cluster_leiden",
color = NULL)
#>  unique_color_across_list has been set to FALSE. There are 16 different categories to color.
#>  color is neither a vector of color characters, nor a data.frame. We will proceed with base R colors.
#>  We draw 7 colors from the ggplot2 palette and 7 from the Okabe-Ito palette. As more than 14 colors are needed, the colors will be recycled.

alluv_dt <- networks_to_alluv(temporal_networks,
intertemporal_cluster_column = "dynamic_cluster_leiden",
node_id = "ID_Art")
#>  The column "cluster_label" does not exist in the list of graphs provided. No name kept for clusters.

alluv_dt[1:5]
#>    dynamic_cluster_leiden    window     ID_Art   color share_cluster_alluv
#>                    <char>    <char>     <char>  <char>               <num>
#> 1:                   cl_1 1975-1994   16182155 #56B4E9                6.14
#> 2:                   cl_1 1975-1994   26283591 #56B4E9                6.14
#> 3:                   cl_1 1975-1994   31895842 #56B4E9                6.14
#> 4:                   cl_1 1975-1994 1111111131 #56B4E9                6.14
#> 5:                   cl_1 1975-1994 1111111150 #56B4E9                6.14
#>    share_cluster_window share_cluster_max length_cluster    y_alluv
#>                   <num>             <num>          <int>      <num>
#> 1:                17.57             18.31              7 0.01351351
#> 2:                17.57             18.31              7 0.01351351
#> 3:                17.57             18.31              7 0.01351351
#> 4:                17.57             18.31              7 0.01351351
#> 5:                17.57             18.31              7 0.01351351