-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path20-assessment-data.Rmd
152 lines (106 loc) · 7.22 KB
/
20-assessment-data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# Assessment data
## Introduction
Below you will find a series of datasets. You can choose to use these for the summative assessment. Alternatively, you can contact me with a suggestion of a dataset and a relevant research question. See the [Course Overview](https://cjbarrie.github.io/CTA-ED/course-overview.html) page for full details of the assessment.
## @osnabrugge_playing_2021 data
We can access data from @osnabrugge_playing_2021 [here](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/QDTLYV)
To prepare these data, we can use the same code as used by the original authors:
```{r, eval= FALSE}
library("ggplot2")
library("plyr")
library("gdata")
library("stringr")
library("data.table")
## Prep Osnabrugge et al.
data = fread("/Users/cbarrie6/Dropbox/Teaching/Edinburgh/teaching/CTA_21-22/assessment/data/uk_data.csv", encoding="UTF-8")
data$date = as.Date(data$date)
#Table 2: Examples: Emotive and neutral speeches
example1 = subset(data, id_speech==854597)
example1$emotive_rhetoric
example1$text
example2 = subset(data, id_speech==778143)
example2$emotive_rhetoric
example2$text
#Create time variable
data$time= NA
data$time[data$date>=as.Date("2001-01-01") & data$date<=as.Date("2001-06-30")] = "01/1"
data$time[data$date>=as.Date("2001-07-01") & data$date<=as.Date("2001-12-31")] = "01/2"
data$time[data$date>=as.Date("2002-01-01") & data$date<=as.Date("2002-06-30")] = "02/1"
data$time[data$date>=as.Date("2002-07-01") & data$date<=as.Date("2002-12-31")] = "02/2"
data$time[data$date>=as.Date("2003-01-01") & data$date<=as.Date("2003-06-30")] = "03/1"
data$time[data$date>=as.Date("2003-07-01") & data$date<=as.Date("2003-12-31")] = "03/2"
data$time[data$date>=as.Date("2004-01-01") & data$date<=as.Date("2004-06-30")] = "04/1"
data$time[data$date>=as.Date("2004-07-01") & data$date<=as.Date("2004-12-31")] = "04/2"
data$time[data$date>=as.Date("2005-01-01") & data$date<=as.Date("2005-06-30")] = "05/1"
data$time[data$date>=as.Date("2005-07-01") & data$date<=as.Date("2005-12-31")] = "05/2"
data$time[data$date>=as.Date("2006-01-01") & data$date<=as.Date("2006-06-30")] = "06/1"
data$time[data$date>=as.Date("2006-07-01") & data$date<=as.Date("2006-12-31")] = "06/2"
data$time[data$date>=as.Date("2007-01-01") & data$date<=as.Date("2007-06-30")] = "07/1"
data$time[data$date>=as.Date("2007-07-01") & data$date<=as.Date("2007-12-31")] = "07/2"
data$time[data$date>=as.Date("2008-01-01") & data$date<=as.Date("2008-06-30")] = "08/1"
data$time[data$date>=as.Date("2008-07-01") & data$date<=as.Date("2008-12-31")] = "08/2"
data$time[data$date>=as.Date("2009-01-01") & data$date<=as.Date("2009-06-30")] = "09/1"
data$time[data$date>=as.Date("2009-07-01") & data$date<=as.Date("2009-12-31")] = "09/2"
data$time[data$date>=as.Date("2010-01-01") & data$date<=as.Date("2010-06-30")] = "10/1"
data$time[data$date>=as.Date("2010-07-01") & data$date<=as.Date("2010-12-31")] = "10/2"
data$time[data$date>=as.Date("2011-01-01") & data$date<=as.Date("2011-06-30")] = "11/1"
data$time[data$date>=as.Date("2011-07-01") & data$date<=as.Date("2011-12-31")] = "11/2"
data$time[data$date>=as.Date("2012-01-01") & data$date<=as.Date("2012-06-30")] = "12/1"
data$time[data$date>=as.Date("2012-07-01") & data$date<=as.Date("2012-12-31")] = "12/2"
data$time[data$date>=as.Date("2013-01-01") & data$date<=as.Date("2013-06-30")] = "13/1"
data$time[data$date>=as.Date("2013-07-01") & data$date<=as.Date("2013-12-31")] = "13/2"
data$time[data$date>=as.Date("2014-01-01") & data$date<=as.Date("2014-06-30")] = "14/1"
data$time[data$date>=as.Date("2014-07-01") & data$date<=as.Date("2014-12-31")] = "14/2"
data$time[data$date>=as.Date("2015-01-01") & data$date<=as.Date("2015-06-30")] = "15/1"
data$time[data$date>=as.Date("2015-07-01") & data$date<=as.Date("2015-12-31")] = "15/2"
data$time[data$date>=as.Date("2016-01-01") & data$date<=as.Date("2016-06-30")] = "16/1"
data$time[data$date>=as.Date("2016-07-01") & data$date<=as.Date("2016-12-31")] = "16/2"
data$time[data$date>=as.Date("2017-01-01") & data$date<=as.Date("2017-06-30")] = "17/1"
data$time[data$date>=as.Date("2017-07-01") & data$date<=as.Date("2017-12-31")] = "17/2"
data$time[data$date>=as.Date("2018-01-01") & data$date<=as.Date("2018-06-30")] = "18/1"
data$time[data$date>=as.Date("2018-07-01") & data$date<=as.Date("2018-12-31")] = "18/2"
data$time[data$date>=as.Date("2019-01-01") & data$date<=as.Date("2019-06-30")] = "19/1"
data$time[data$date>=as.Date("2019-07-01") & data$date<=as.Date("2019-12-31")] = "19/2"
data$time2 = data$time
data$time2 = str_replace(data$time2, "/", "_")
data$stage = 0
data$stage[data$m_questions==1]= 1
data$stage[data$u_questions==1]= 2
data$stage[data$queen_debate_others==1]= 3
data$stage[data$queen_debate_day1==1]= 4
data$stage[data$pm_questions==1]= 5
```
Below, I display a sample of these data.
```{r, echo=FALSE}
data <- readRDS("data/assessment/osnabrugge_samp.rds")
```
```{r, echo=F}
data <- data[1:3,]
data %>%
select(id_speech, text, last_name, first_name, date, government, female, age) %>%
kbl() %>%
kable_styling(bootstrap_options = "striped")
```
If the full dataset is too large for your machines, you can easily take a sample of it with:
```{r, eval=F}
data_samp <- data %>%
sample_n(10000)
```
## Twitter Transparency data
Select a dataset/datasets of interest from the Twitter Transparency archive [here](https://transparency.twitter.com/en/reports/information-operations.html). These are datasets that have been flagged for "information operations" activity; that is, activity designed to distort, often through automated messaging, the information landscape to the benefit of a given entity (normally a government).
The datasets are all listed and downloadable in ".csv" format if you scroll down to "03. Download Archive." Here, you will just be asked to enter your email address as agreement to Terms of Use.
## @waller2021
You can download embeddings and online community scores used in this article the Github repo linked [here](https://github.com/CSSLab/social-dimensions).
To get the community embeddings data in usable format we can do:
```{r, eval = F}
embeddings <- read.table("https://raw.githubusercontent.com/CSSLab/social-dimensions/main/data/embedding-vectors.tsv")
embeddings_metadata <- data.table:::fread("https://raw.githubusercontent.com/CSSLab/social-dimensions/main/data/embedding-metadata.tsv")
embeddings_scores <- read.csv("https://raw.githubusercontent.com/CSSLab/social-dimensions/main/data/scores.csv")
```
Then to add in information on what each vector of dimensions 150 (i.e., here: columns), we can add in the community information to the embeddings with:
```{r, eval = F}
communities <- embeddings_metadata$community
rownames(embeddings) <- communities
```
## R Markdown
You can access a template R Markdown response for your code from the Github repo for this book by clicking this [link](https://raw.githubusercontent.com/cjbarrie/CTA-ED/main/data/assessment/CTA_example.Rmd?raw=true) and download the word document it outputs by clicking this [link](https://github.com/cjbarrie/CTA-ED/blob/main/data/assessment/CTA_example.docx?raw=true).
Though you **should submit the R markdown output in word** you can also see what it looks like when generated as html [here](https://raw.githack.com/cjbarrie/CTA-ED/main/data/assessment/CTA_example.html).