Skip to contents

[Experimental]

Usage

find_ref(
  input = NULL,
  path = NULL,
  output_format = c("ref", "xml", "bib", "json"),
  no_layout = FALSE,
  overwrite = FALSE
)

Arguments

input

Vector of file paths to the PDF documents to be analyzed.

path

The path where the parsed file(s) will be saved. If NULL, files are saved in the current working directory. If empty, the parsed data is returned directly and not saved.

output_format

Desired output format, one of "ref", "xml", "bib", or "json". "ref" is a text file with one reference per line. Default is "json".

no_layout

Logical; if TRUE, anystyle '--no-layout' option is used (e.g., use this if your document uses a multi-column layout). Default is FALSE.

overwrite

Logical; if TRUE, existing files at the output location will be overwritten. Default is FALSE.

Value

If path is empty, returns a list (or a single element if only one input) containing the extracted references in the specified format. If path is specified, parsed files are saved at the location and the function returns NULL.

Details

This function utilizes the anystyle Ruby gem to analyze PDF or text documents and extract all references it finds. The input can be a single or multiple PDF documents,

Examples

if (FALSE) {
# Find references in a single PDF and return as JSON
find_ref(
  input = "path/to/document.pdf",
  output_format = "json"
)

# Find references from a folder of documents and save as XML
find_ref(
  input = "path/to/documents/", path = "path/to/output",
  output_format = "xml"
)
}