In this tutorial, you will learn how to summarise, aggregate, and analyze text in R:
- How to tokenize and filter text
- How to clean and preprocess text
- How to visualize results with ggplot
- How to perform automated gender assignment from name data (and think about possible biases these methods may enclose)
To practice these skills, we will use a dataset that I have already collected from the Edinburgh Fringe Festival website.
You can try this out yourself too: to obtain these data, you must first obtain an API key. Instructions on how to do this are available at the Edinburgh Fringe API page:
Before proceeding, we'll load the remaining packages we will need for this tutorial.