Skip to contents

[Experimental]

The function gives a name to networks clusters. It also gives the edges the name of their cluster. The clusters are named according to the column chosen by the user (for instance, in the case nodes are articles, the name may be the author and date of an article).

Usage

name_clusters(
  graphs,
  method = c("tidygraph_functions", "given_column", "tf-idf"),
  name_merged_clusters = FALSE,
  cluster_id,
  label_columns,
  label_name = "cluster_label",
  tidygraph_function = NULL,
  order_by = NULL,
  text_columns = NULL,
  nb_terms_label = 3,
  ...
)

Arguments

graphs

A tibble graph (from tidygraph) or a list of tibble graphs.

method

The method for finding the names, among tidygraph_functions, given_column, and tf-idf (see the details). The tf-idf method is chosen by default.

name_merged_clusters

Set to TRUE if your clusters have been established for all your tibble graphs and thus are unique. Typically, you have such clusters after running merge_dynamic_clusters().

cluster_id

The column you want to name. Generally, the column with the identifier of the clusters, whether the simple cluster detected with add_clusters() or the merged clusters detected with merge_dynamic_clusters().

label_columns

The column you want to be used to name the clusters. If the nodes are article, you can choose, for instance, the columns with the author of the article and the date of publication.

label_name

The name of the column with cluster names, that will be created by the function. "cluster_label" by default.

tidygraph_function

For the tidygraph_functions method (see the details), the centrality measure to be chosen among the measures implemented in in tidygraph (see tidygraph::centrality()).

order_by

For the given_column method, the column within the nodes list of your tibble graph(s) you want to be used to classify nodes and choose names. This must be a numeric column. For instance, you can use the node_size column of your network if you have set compute_size to TRUE in build_network() or build_dynamic_networks().

text_columns

For the tf-idf method, the columns with the text you want to analyze. If you give multiple columns, they will be united to extract the terms. This is a parameter of extract_tfidf().

nb_terms_label

For the tf-idf method, the number of terms you want to be used to serve a the name of a cluster. Terms will be separated by a comma. This is a parameter of extract_tfidf().

...

Additional arguments from extract_tfidf, outside of those referred above as well as of grouping_across_list which is not relevant here.

Value

The same tibble graph or list of tibble graphs with a new column with the names of the clusters, for both nodes and edges. If you choose the tidygraph_functions method, the function also returns for nodes a column with the centrality measure computed.

Details

The node to be used for naming the community is chosen depending on 3 methods:

  • the tidygraph_functions method: the name of a cluster comes from the node, within the cluster, which has the highest centrality measure. The user can choose the different centrality measure implemented in tidygraph (see tidygraph::centrality() for details).

  • the given_column method: the user gives a column of the tibble graph(s), with numeric values, that will be used to classify the nodes and choose the name of each cluster. The label_columns of the node with the highest numerical value in the cluster will be used to name the cluster.

  • the tf-idf method: clusters are name according to the terms with the highest tf-idf value for each cluster. The user furnishes one or several columns with text, and the function extracts the term and calculate the tf-idf value of each term, depending on all the clusters. This method uses extract_tfidf().

Please note that, when name_merged_clusters is set to FALSE, the TF-IDF is computed tibble graph by tibble graph. It means that it is more likely that clusters in different tibble graphs will share the same name.

Examples

library(networkflow)

nodes <- Nodes_stagflation |>
dplyr::rename(ID_Art = ItemID_Ref) |>
dplyr::filter(Type == "Stagflation")

references <- Ref_stagflation |>
dplyr::rename(ID_Art = Citing_ItemID_Ref)

temporal_networks <- build_dynamic_networks(nodes = nodes,
directed_edges = references,
source_id = "ID_Art",
target_id = "ItemID_Ref",
time_variable = "Year",
cooccurrence_method = "coupling_similarity",
time_window = 20,
edges_threshold = 1,
overlapping_window = TRUE,
filter_components = TRUE,
verbose = FALSE)

temporal_networks <- add_clusters(temporal_networks,
objective_function = "modularity",
clustering_method = "leiden",
verbose = FALSE)

# You can name the clusters in each tibble graphs:

temporal_networks_with_names <- name_clusters(graphs = temporal_networks,
method = "tidygraph_functions",
name_merged_clusters = FALSE,
cluster_id = "cluster_leiden",
label_columns = c("Author", "Year"),
tidygraph_function = tidygraph::centrality_pagerank())
#> Error in mutate(d_tmp, ...):  In argument: `cluster_label = ifelse(is.na(eval(ensym(label_name))),
#>   "no_name", eval(ensym(label_name)))`.
#> Caused by error in `ensym()`:
#> ! could not find function "ensym"

temporal_networks_with_names[[1]]
#> Error in eval(expr, envir, enclos): object 'temporal_networks_with_names' not found

# Or you can name the dynamic clusters:

temporal_networks <- merge_dynamic_clusters(temporal_networks,
cluster_id = "cluster_leiden",
node_id = "ID_Art",
threshold_similarity = 0.51,
similarity_type = "partial")

temporal_networks_with_names <- name_clusters(graphs = temporal_networks,
method = "tf-idf",
name_merged_clusters = TRUE,
cluster_id = "dynamic_cluster_leiden",
text_columns = "Title",
nb_terms_label = 5,
clean_word_method = "lemmatise")
#> Warning: Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
#> Error in mutate(d_tmp, ...):  In argument: `cluster_label = ifelse(is.na(eval(ensym(label_name))),
#>   "no_name", eval(ensym(label_name)))`.
#> Caused by error in `ensym()`:
#> ! could not find function "ensym"

temporal_networks_with_names[[1]]
#> Error in eval(expr, envir, enclos): object 'temporal_networks_with_names' not found