This function calculates a refined similarity measure of coupling links, from a direct citation data frame. It is sinpired by (Shen et al. 2019) . To a certain extent, it mixes the coupling_strength() function with the cosine measure of the biblio_coupling() function.

coupling_similarity(
  dt,
  source,
  ref,
  weight_threshold = 1,
  output_in_character = TRUE
)

Arguments

dt

The table with citing and cited documents.

source

The column name of the source identifiers, that is the documents that are citing. In bibliographic coupling, these documents are the nodes of the network.

ref

The column name of the references that are cited.

weight_threshold

Corresponds to the value of the non-normalized weights of edges. The function just keeps the edges that have a non-normalized weight superior to the weight_threshold. In other words, if you set the parameter to 2, the function keeps only the edges between nodes that share at least two references in common in their bibliography. In a large bibliographic coupling network, you can consider for instance that sharing only one reference is not sufficient/significant for two articles to be linked together. This parameter could also be modified to avoid creating intractable networks with too many edges.

output_in_character

If TRUE, the function ends by transforming the from and to columns in character, to make the creation of a tidygraph network easier.

Value

A data.table with the articles identifiers in from and to columns, with the similarity measure in another column. It also keeps a copy of from and to in the Source and Target columns. This is useful is you are using the tidygraph package then, where from and to values are modified when creating a graph.

Details

The function use the following formalisation:

$$\frac{R_{S}(A) \bullet R_{S}(B)}{\sqrt{R_{S}(A).R_{S}(B)}}$$

  1. with $$R_{S}(A) \bullet R_{S}(B) = \sum_{j}\sqrt{log({\frac{N}{freq(R_{j})}})}$$ that is a measure similar to the coupling strength measure;

  2. and $$R_{S}(A).R_{S}(B) = \sum_{j}\sqrt{log({\frac{N}{freq(R_{j}(A))}})} . \sum_{j}\sqrt{log({\frac{N}{freq(R_{j}(B))}})}$$ which is the separated sum for each article of the normalized value of a citation. It is the cosine measure of documents A and B but adapted to the spirit of the coupling strength.

References

Shen S, Zhu D, Rousseau R, Su X, Wang D (2019). “A Refined Method for Computing Bibliographic Coupling Strengths.” Journal of Informetrics, 13(2), 605--615. https://linkinghub.elsevier.com/retrieve/pii/S1751157716300244.

Examples

library(biblionetwork) coupling_similarity(Ref_stagflation, source = "Citing_ItemID_Ref", ref = "ItemID_Ref")