This function calculates the number of references that different articles share together, as well as the coupling angle value of edges in a bibliographic coupling network (Sen and Gan 1983) , from a direct citation data frame. This is a standard way to build bibliographic coupling network using Salton's cosine measure: it divides the number of references that two articles share by the square root of the product of both articles bibliography lengths. It avoids giving too much importance to articles with a large bibliography.

biblio_coupling(
  dt,
  source,
  ref,
  normalized_weight_only = TRUE,
  weight_threshold = 1,
  output_in_character = TRUE
)

Arguments

dt

For bibliographic coupling (or co-citation), the dataframe with citing and cited documents. It could also be used

  1. for title co-occurence network, with source being the articles, and ref being the list of words in articles titles;

  2. for co-authorship network, with source being the authors, and ref the list of articles.

source

The column name of the source identifiers, that is the documents that are citing. In a coupling network, these documents are the nodes of the network.

ref

The column name of the cited references identifiers.

normalized_weight_only

If set to FALSE, the function returns the weights normalized by the cosine measure, but also the number of shared references.

weight_threshold

Corresponds to the value of the non-normalized weights of edges. The function just keeps the edges that have a non-normalized weight superior to the weight_threshold. In other words, if you set the parameter to 2, the function keeps only the edges between nodes that share at least two references in common in their bibliography. In a large bibliographic coupling network, you can consider for instance that sharing only one reference is not sufficient/significant for two articles to be linked together. This parameter could also be modified to avoid creating intractable networks with too many edges.

output_in_character

If TRUE, the function ends by transforming the from and to columns in character, to make the creation of a tidygraph network easier.

Value

A data.table with the articles (or authors) identifiers in from and to columns, with one or two additional columns (the coupling angle measure and the number of shared references). It also keeps a copy of from and to in the Source and Target columns. This is useful is you are using the tidygraph package after, where from and to values are modified when creating a graph.

Details

This function implements the following weight measure:

$$\frac{R(A) \bullet R(B)}{\sqrt{L(A).L(B)}}$$

with \(R(A)\) and \(R(B)\) the references of document A and document B, \(R(A) \bullet R(B)\) being the number of shared references by A and B, and \(L(A)\) and \(L(B)\) the length of the bibliographies of document A and document B.

This function uses data.table package and is thus very fast. It allows the user to compute the coupling angle on a very large data frame quickly.

This function is a relatively general function that can also be used

  1. for co-citation networks (just by inversing the source and ref columns). If you want to avoid confusion, rather use the biblio_cocitation() function.

  2. for title co-occurence networks (taking care of the length of the title thanks to the coupling angle measure);

  3. for co-authorship networks (taking care of the number of co-authors an author has collaborated with on a period). For co-authorship, rather use the coauth_network() function.

References

Sen SK, Gan SK (1983). “A Mathematical Extension of the Idea of Bibliographic Coupling and Its Applications.” Annals of library science and documentation, 30(2). http://nopr.niscair.res.in/bitstream/123456789/28008/1/ALIS%2030(2)%2078-82.pdf.

Examples

library(biblionetwork) biblio_coupling(Ref_stagflation, source = "Citing_ItemID_Ref", ref = "ItemID_Ref", weight_threshold = 3)
#> from to weight Source Target #> 1: 214927 2207578 0.14605935 214927 2207578 #> 2: 214927 8456979 0.09733285 214927 8456979 #> 3: 214927 10729971 0.29848100 214927 10729971 #> 4: 214927 19627977 0.11202241 214927 19627977 #> 5: 1021902 12824456 0.06537205 1021902 12824456 #> --- #> 958: 1111111147 1111111156 0.17325923 1111111147 1111111156 #> 959: 1111111147 1111111161 0.13333938 1111111147 1111111161 #> 960: 1111111156 1111111161 0.08580846 1111111156 1111111161 #> 961: 1111111159 1111111171 0.24333213 1111111159 1111111171 #> 962: 1111111182 1111111183 0.27060404 1111111182 1111111183