Usage
find_ref_to_df(input = NULL, no_layout = FALSE, clean_ref = TRUE)
parse_ref_to_df(input = NULL, clean_ref = TRUE)
Arguments
- input
Vector of file paths to the documents to be analyzed (PDF for
find_ref_to_df()
and text forparse_ref_to_df()
).- no_layout
Logical; if TRUE, the '--no-layout' option is used in
find_ref_to_df()
, which might be necessary for some PDFs (e.g., use this if your document uses a multi-column layout). Ignored inparse_ref_to_df()
. Default is FALSE.- clean_ref
Logical; if TRUE, cleans the references using the
clean_ref()
function after conversion (applicable to both functions). Default is TRUE. Seeclean_ref()
for details on what the function does.
Value
A tidy data frame with one row per reference, including metadata (author, title, etc...), unique identifiers for each reference and document, and the complete original reference.
Details
These functions convert references found in PDF documents or parsed from text files into tidy data frames.
find_ref_to_df()
utilizes the find_ref()
function for PDFs, and parse_ref_to_df()
utilizes
the parse_ref()
function for text files.
find_ref_to_df()
analyzes PDF documents and extracts all references, converting them into a
structured data frame.
It requires the 'anystyle' Ruby gem and uses both the 'find' and 'parse' features
(find_ref()
and parse_ref()
respectively) to gather detailed information about each reference.
parse_ref_to_df()
works similarly but is designed for text documents. It parses structured references
from text files and converts them into a data frame.
These functions Creates unique identifiers for each reference within a document and across the entire set of documents.
id_doc
: A unique identifier for each document based on its position in the input.id_ref
: A unique identifier for each reference within its document. It's a combination ofid_doc
and the reference's row number within the document, ensuring each reference across all documents has a unique ID.
See also
find_ref()
, parse_ref()
, and clean_ref()
for related functionality.
Examples
if (FALSE) {
# For a PDF document
references_df <- find_ref_to_df(input = c(
"path/to/document1.pdf",
"path/to/document2.pdf"
))
# For a text file
references_df <- parse_ref_to_df(input = "path/to/references.txt")
}