Installation, Initialization, and Data Cleaning • leadeR

Prerequisites

leadeR relies on spaCy, a Python NLP library, via the spacyr R package. You will need:

Python (3.8 or later)
spaCy with an English language model

Install spaCy and the English model from a terminal:

pip install spacy
python -m spacy download en_core_web_sm

Installing leadeR

Install leadeR from GitHub:

# install.packages("remotes")
remotes::install_github("mmukaigawara/leadeR")

Initialization

Before using any leadeR function, initialize spaCy and (optionally) set a seed for reproducibility of bootstrap results.

library(leadeR)
library(data.table)

spacyr::spacy_initialize()

set.seed(02138)

Sample data

The package ships with three speeches by John F. Kennedy:

Dataset	Date	Occasion
`jfk19610120`	January 20, 1961	Inaugural Address
`jfk19610925`	September 25, 1961	Address Before the UN General Assembly
`jfk19630610`	June 10, 1963	Commencement Address at American University

head(jfk19610120)

Text cleaning

Speech transcripts often contain editorial annotations in brackets, parentheses, or curly braces. The clean_text() function removes these and normalizes whitespace.

jfk1 <- clean_text(jfk19610120)
jfk2 <- clean_text(jfk19610925)
jfk3 <- clean_text(jfk19630610)

Users may need additional cleaning steps depending on the source of their text data (e.g., removing headers, footers, or speaker labels).