Prerequisites
leadeR relies on spaCy, a Python NLP library, via the spacyr R package. You will need:
- Python (3.8 or later)
- spaCy with an English language model
Install spaCy and the English model from a terminal:
Installing leadeR
Install leadeR from GitHub:
# install.packages("remotes")
remotes::install_github("mmukaigawara/leadeR")Initialization
Before using any leadeR function, initialize spaCy and (optionally) set a seed for reproducibility of bootstrap results.
library(leadeR)
library(data.table)
spacyr::spacy_initialize()
set.seed(02138)Sample data
The package ships with three speeches by John F. Kennedy:
| Dataset | Date | Occasion |
|---|---|---|
jfk19610120 |
January 20, 1961 | Inaugural Address |
jfk19610925 |
September 25, 1961 | Address Before the UN General Assembly |
jfk19630610 |
June 10, 1963 | Commencement Address at American University |
head(jfk19610120)Text cleaning
Speech transcripts often contain editorial annotations in brackets,
parentheses, or curly braces. The clean_text() function
removes these and normalizes whitespace.
jfk1 <- clean_text(jfk19610120)
jfk2 <- clean_text(jfk19610925)
jfk3 <- clean_text(jfk19630610)Users may need additional cleaning steps depending on the source of their text data (e.g., removing headers, footers, or speaker labels).
