Skip to contents

Prerequisites

leadeR relies on spaCy, a Python NLP library, via the spacyr R package. You will need:

  • Python (3.8 or later)
  • spaCy with an English language model

Install spaCy and the English model from a terminal:

pip install spacy
python -m spacy download en_core_web_sm

Installing leadeR

Install leadeR from GitHub:

# install.packages("remotes")
remotes::install_github("mmukaigawara/leadeR")

Initialization

Before using any leadeR function, initialize spaCy and (optionally) set a seed for reproducibility of bootstrap results.

Sample data

The package ships with three speeches by John F. Kennedy:

Dataset Date Occasion
jfk19610120 January 20, 1961 Inaugural Address
jfk19610925 September 25, 1961 Address Before the UN General Assembly
jfk19630610 June 10, 1963 Commencement Address at American University
head(jfk19610120)

Text cleaning

Speech transcripts often contain editorial annotations in brackets, parentheses, or curly braces. The clean_text() function removes these and normalizes whitespace.

jfk1 <- clean_text(jfk19610120)
jfk2 <- clean_text(jfk19610925)
jfk3 <- clean_text(jfk19630610)

Users may need additional cleaning steps depending on the source of their text data (e.g., removing headers, footers, or speaker labels).