The function gives a name to networks clusters. It also gives the edges the name of their cluster. The clusters are named according to the column chosen by the user (for instance, in the case nodes are articles, the name may be the author and date of an article).
Usage
name_clusters(
graphs,
method = c("tidygraph_functions", "given_column", "tf-idf"),
name_merged_clusters = FALSE,
cluster_id,
label_columns,
label_name = "cluster_label",
tidygraph_function = NULL,
order_by = NULL,
text_columns = NULL,
nb_terms_label = 3,
...
)
Arguments
- graphs
A tibble graph (from tidygraph) or a list of tibble graphs.
- method
The method for finding the names, among
tidygraph_functions
,given_column
, andtf-idf
(see the details). Thetf-idf
method is chosen by default.- name_merged_clusters
Set to
TRUE
if your clusters have been established for all your tibble graphs and thus are unique. Typically, you have such clusters after runningmerge_dynamic_clusters()
.- cluster_id
The column you want to name. Generally, the column with the identifier of the clusters, whether the simple cluster detected with
add_clusters()
or the merged clusters detected withmerge_dynamic_clusters()
.- label_columns
The column you want to be used to name the clusters. If the nodes are article, you can choose, for instance, the columns with the author of the article and the date of publication.
- label_name
The name of the column with cluster names, that will be created by the function. "cluster_label" by default.
- tidygraph_function
For the
tidygraph_functions
method (see the details), the centrality measure to be chosen among the measures implemented in intidygraph
(seetidygraph::centrality()
).- order_by
For the
given_column
method, the column within the nodes list of your tibble graph(s) you want to be used to classify nodes and choose names. This must be a numeric column. For instance, you can use thenode_size
column of your network if you have setcompute_size
toTRUE
inbuild_network()
orbuild_dynamic_networks()
.- text_columns
For the
tf-idf
method, the columns with the text you want to analyze. If you give multiple columns, they will be united to extract the terms. This is a parameter ofextract_tfidf()
.- nb_terms_label
For the
tf-idf
method, the number of terms you want to be used to serve a the name of a cluster. Terms will be separated by a comma. This is a parameter ofextract_tfidf()
.- ...
Additional arguments from
extract_tfidf
, outside of those referred above as well as ofgrouping_across_list
which is not relevant here.
Value
The same tibble graph or list of tibble graphs with a new column with the
names of the clusters, for both nodes and edges. If you choose the tidygraph_functions
method,
the function also returns for nodes a column with the centrality measure computed.
Details
The node to be used for naming the community is chosen depending on 3 methods:
the
tidygraph_functions
method: the name of a cluster comes from the node, within the cluster, which has the highest centrality measure. The user can choose the different centrality measure implemented intidygraph
(seetidygraph::centrality()
for details).the
given_column
method: the user gives a column of the tibble graph(s), with numeric values, that will be used to classify the nodes and choose the name of each cluster. Thelabel_columns
of the node with the highest numerical value in the cluster will be used to name the cluster.the
tf-idf
method: clusters are name according to the terms with the highest tf-idf value for each cluster. The user furnishes one or several columns with text, and the function extracts the term and calculate the tf-idf value of each term, depending on all the clusters. This method usesextract_tfidf()
.
Please note that, when name_merged_clusters
is set to FALSE
, the TF-IDF is computed
tibble graph by tibble graph. It means that it is more likely that clusters in different
tibble graphs will share the same name.
Examples
library(networkflow)
nodes <- Nodes_stagflation |>
dplyr::rename(ID_Art = ItemID_Ref) |>
dplyr::filter(Type == "Stagflation")
references <- Ref_stagflation |>
dplyr::rename(ID_Art = Citing_ItemID_Ref)
temporal_networks <- build_dynamic_networks(nodes = nodes,
directed_edges = references,
source_id = "ID_Art",
target_id = "ItemID_Ref",
time_variable = "Year",
cooccurrence_method = "coupling_similarity",
time_window = 20,
edges_threshold = 1,
overlapping_window = TRUE,
filter_components = TRUE,
verbose = FALSE)
temporal_networks <- add_clusters(temporal_networks,
objective_function = "modularity",
clustering_method = "leiden",
verbose = FALSE)
# You can name the clusters in each tibble graphs:
temporal_networks_with_names <- name_clusters(graphs = temporal_networks,
method = "tidygraph_functions",
name_merged_clusters = FALSE,
cluster_id = "cluster_leiden",
label_columns = c("Author", "Year"),
tidygraph_function = tidygraph::centrality_pagerank())
#> Error in mutate(d_tmp, ...): ℹ In argument: `cluster_label = ifelse(is.na(eval(ensym(label_name))),
#> "no_name", eval(ensym(label_name)))`.
#> Caused by error in `ensym()`:
#> ! could not find function "ensym"
temporal_networks_with_names[[1]]
#> Error in eval(expr, envir, enclos): object 'temporal_networks_with_names' not found
# Or you can name the dynamic clusters:
temporal_networks <- merge_dynamic_clusters(temporal_networks,
cluster_id = "cluster_leiden",
node_id = "ID_Art",
threshold_similarity = 0.51,
similarity_type = "partial")
temporal_networks_with_names <- name_clusters(graphs = temporal_networks,
method = "tf-idf",
name_merged_clusters = TRUE,
cluster_id = "dynamic_cluster_leiden",
text_columns = "Title",
nb_terms_label = 5,
clean_word_method = "lemmatise")
#> Warning: Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
#> Error in mutate(d_tmp, ...): ℹ In argument: `cluster_label = ifelse(is.na(eval(ensym(label_name))),
#> "no_name", eval(ensym(label_name)))`.
#> Caused by error in `ensym()`:
#> ! could not find function "ensym"
temporal_networks_with_names[[1]]
#> Error in eval(expr, envir, enclos): object 'temporal_networks_with_names' not found