Skip to content

Commit

Permalink
First update commit
Browse files Browse the repository at this point in the history
  • Loading branch information
cjbarrie committed Jan 2, 2024
1 parent 0fefac9 commit 845e682
Show file tree
Hide file tree
Showing 71 changed files with 988 additions and 858 deletions.
Binary file modified .DS_Store
Binary file not shown.
3 changes: 2 additions & 1 deletion 03-week3.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Week 3: Dictionary-based techniques

An extension of word frequency analyses, which we covered last week, are so-called "dictionary-based" techniques. In their most basic form, these analyses use an index of target terms and classify the corpus of interest based on their presence or absence. The technical dimensions of this type of analysis are covered in the chapter section by @krippendorff_content_2004, and some of the issues attending them in the article by - @loughran_when_2011.
An extension of word frequency analyses, which we covered last week, are so-called "dictionary-based" techniques. In their most basic form, these analyses use an index of target terms and classify the corpus of interest based on their presence or absence. The technical dimensions of this type of analysis are covered in the chapter section by @krippendorff_content_2004, and some of the issues attending them in the article by - @loughran_when_2011. The article by @brooke2021trouble provides an outstanding illustration of the use of text analysis techniques to make inferences about larger questions of bias.

We will also be reading two examples of the application of these techniques by @martins_rise_2020 and @young_affective_2012. Here, we will be discussing how successful the authors are in measuring the phenomenon of interest ("prosociality" and "tone" respectively). Questions about sampling and representativeness will again be relevant here, and will naturally inform our assessments of this work.

Expand All @@ -14,6 +14,7 @@ Questions:

- @martins_rise_2020
- @voigt_language_2017
- @brooke2021trouble

**Further reading**:

Expand Down
19 changes: 19 additions & 0 deletions CTA.bib
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
@article{haroon2022,
title = {YouTube, The Great Radicalizer? Auditing and Mitigating Ideological Biases in YouTube Recommendations},
author = {Haroon, Muhammad and Chhabra, Anshuman and Liu, Xin and Mohapatra, Prasant and Shafiq, Zubair and Wojcieszak, Magdalena},
year = {2022},
date = {2022},
doi = {10.48550/ARXIV.2203.10666},
url = {https://arxiv.org/abs/2203.10666}
}

@article{brooke2021trouble,
title={Trouble in programmer’s paradise: gender-biases in sharing and recognising technical knowledge on Stack Overflow},
author={Brooke, SJ},
journal={Information, Communication \& Society},
volume={24},
number={14},
pages={2091--2112},
year={2021},
publisher={Taylor \& Francis}
}

@book{silge_text_2017,
address = {London},
Expand Down
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-134-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-134-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-136-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-150-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-151-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-158-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-170-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-171-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-19-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-20-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-24-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-270-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-36-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-37-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-38-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-42-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-51-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _bookdown_files/main_files/figure-html/unnamed-chunk-75-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 3 additions & 2 deletions docs/404.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Page not found | Computational Text Analysis</title>
<meta name="author" content="Christopher Barrie">
<meta name="author" content="Marion Lieutaud">
<meta name="description" content="The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are...">
<meta name="generator" content="bookdown 0.33 with bs4_book()">
<meta property="og:title" content="Page not found | Computational Text Analysis">
Expand All @@ -31,6 +31,7 @@
div.csl-bib-body { }
div.csl-entry {
clear: both;
margin-bottom: 0em;
}
.hanging div.csl-entry {
margin-left:2em;
Expand Down Expand Up @@ -124,7 +125,7 @@ <h1>Page not found<a class="anchor" aria-label="anchor" href="#page-not-found"><
<footer class="bg-primary text-light mt-5"><div class="container"><div class="row">

<div class="col-12 col-md-6 mt-3">
<p>"<strong>Computational Text Analysis</strong>" was written by Christopher Barrie. It was last built on 2023-03-27.</p>
<p>"<strong>Computational Text Analysis</strong>" was written by Marion Lieutaud. It was last built on 2024-01-02.</p>
</div>

<div class="col-12 col-md-6 mt-3">
Expand Down
21 changes: 11 additions & 10 deletions docs/assessment-data.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Chapter 25 Assessment data | Computational Text Analysis</title>
<meta name="author" content="Christopher Barrie">
<meta name="author" content="Marion Lieutaud">
<meta name="description" content="25.1 Introduction Below you will find a series of datasets. You can choose to use these for the summative assessment. Alternatively, you can contact me with a suggestion of a dataset and a...">
<meta name="generator" content="bookdown 0.33 with bs4_book()">
<meta property="og:title" content="Chapter 25 Assessment data | Computational Text Analysis">
Expand All @@ -31,6 +31,7 @@
div.csl-bib-body { }
div.csl-entry {
clear: both;
margin-bottom: 0em;
}
.hanging div.csl-entry {
margin-left:2em;
Expand Down Expand Up @@ -121,16 +122,16 @@ <h2>
</h2>
<p>We can access data from <span class="citation">Osnabrügge, Hobolt, and Rodon (<a href="references-2.html#ref-osnabrugge_playing_2021">2021</a>)</span> <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/QDTLYV">here</a></p>
<p>To prepare these data, we can use the same code as used by the original authors:</p>
<div class="sourceCode" id="cb460"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb459"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="st"><a href="https://ggplot2.tidyverse.org">"ggplot2"</a></span><span class="op">)</span></span>
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="st"><a href="http://had.co.nz/plyr">"plyr"</a></span><span class="op">)</span></span>
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="st">"gdata"</span><span class="op">)</span></span>
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="st"><a href="https://github.com/r-gregmisc/gdata">"gdata"</a></span><span class="op">)</span></span>
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="st"><a href="https://stringr.tidyverse.org">"stringr"</a></span><span class="op">)</span></span>
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="st"><a href="https://r-datatable.com">"data.table"</a></span><span class="op">)</span></span>
<span></span>
<span><span class="co">## Prep Osnabrugge et al. </span></span>
<span></span>
<span><span class="va">data</span> <span class="op">=</span> <span class="fu"><a href="https://Rdatatable.gitlab.io/data.table/reference/fread.html">fread</a></span><span class="op">(</span><span class="st">"/Users/cbarrie6/Dropbox/Teaching/Edinburgh/teaching/CTA_21-22/assessment/data/uk_data.csv"</span>, encoding<span class="op">=</span><span class="st">"UTF-8"</span><span class="op">)</span></span>
<span><span class="va">data</span> <span class="op">=</span> <span class="fu"><a href="https://rdatatable.gitlab.io/data.table/reference/fread.html">fread</a></span><span class="op">(</span><span class="st">"/Users/cbarrie6/Dropbox/Teaching/Edinburgh/teaching/CTA_21-22/assessment/data/uk_data.csv"</span>, encoding<span class="op">=</span><span class="st">"UTF-8"</span><span class="op">)</span></span>
<span></span>
<span></span>
<span><span class="va">data</span><span class="op">$</span><span class="va">date</span> <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/as.Date.html">as.Date</a></span><span class="op">(</span><span class="va">data</span><span class="op">$</span><span class="va">date</span><span class="op">)</span></span>
Expand Down Expand Up @@ -187,7 +188,7 @@ <h2>
<span><span class="va">data</span><span class="op">$</span><span class="va">time</span><span class="op">[</span><span class="va">data</span><span class="op">$</span><span class="va">date</span><span class="op">&gt;=</span><span class="fu"><a href="https://rdrr.io/r/base/as.Date.html">as.Date</a></span><span class="op">(</span><span class="st">"2019-07-01"</span><span class="op">)</span> <span class="op">&amp;</span> <span class="va">data</span><span class="op">$</span><span class="va">date</span><span class="op">&lt;=</span><span class="fu"><a href="https://rdrr.io/r/base/as.Date.html">as.Date</a></span><span class="op">(</span><span class="st">"2019-12-31"</span><span class="op">)</span><span class="op">]</span> <span class="op">=</span> <span class="st">"19/2"</span></span>
<span></span>
<span><span class="va">data</span><span class="op">$</span><span class="va">time2</span> <span class="op">=</span> <span class="va">data</span><span class="op">$</span><span class="va">time</span></span>
<span><span class="va">data</span><span class="op">$</span><span class="va">time2</span> <span class="op">=</span> <span class="fu"><a href="https://stringr.tidyverse.org/reference/str_replace.html">str_replace</a></span><span class="op">(</span><span class="va">data</span><span class="op">$</span><span class="va">time2</span>, <span class="st">"/"</span>, <span class="st">"_"</span><span class="op">)</span></span>
<span><span class="va">data</span><span class="op">$</span><span class="va">time2</span> <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/pkg/stringr/man/str_replace.html">str_replace</a></span><span class="op">(</span><span class="va">data</span><span class="op">$</span><span class="va">time2</span>, <span class="st">"/"</span>, <span class="st">"_"</span><span class="op">)</span></span>
<span></span>
<span><span class="va">data</span><span class="op">$</span><span class="va">stage</span> <span class="op">=</span> <span class="fl">0</span></span>
<span><span class="va">data</span><span class="op">$</span><span class="va">stage</span><span class="op">[</span><span class="va">data</span><span class="op">$</span><span class="va">m_questions</span><span class="op">==</span><span class="fl">1</span><span class="op">]</span><span class="op">=</span> <span class="fl">1</span></span>
Expand Down Expand Up @@ -305,7 +306,7 @@ <h2>
</tbody>
</table></div>
<p>If the full dataset is too large for your machines, you can easily take a sample of it with:</p>
<div class="sourceCode" id="cb461"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb460"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">data_samp</span> <span class="op">&lt;-</span> <span class="va">data</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span></span>
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/sample_n.html">sample_n</a></span><span class="op">(</span><span class="fl">10000</span><span class="op">)</span></span></code></pre></div>
</div>
Expand All @@ -322,14 +323,14 @@ <h2>
</h2>
<p>You can download embeddings and online community scores used in this article the Github repo linked <a href="https://github.com/CSSLab/social-dimensions">here</a>.</p>
<p>To get the community embeddings data in usable format we can do:</p>
<div class="sourceCode" id="cb462"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb461"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">embeddings</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/utils/read.table.html">read.table</a></span><span class="op">(</span><span class="st">"https://raw.githubusercontent.com/CSSLab/social-dimensions/main/data/embedding-vectors.tsv"</span><span class="op">)</span></span>
<span></span>
<span><span class="va">embeddings_metadata</span> <span class="op">&lt;-</span> <span class="fu">data.table</span><span class="fu">:::</span><span class="fu"><a href="https://Rdatatable.gitlab.io/data.table/reference/fread.html">fread</a></span><span class="op">(</span><span class="st">"https://raw.githubusercontent.com/CSSLab/social-dimensions/main/data/embedding-metadata.tsv"</span><span class="op">)</span></span>
<span><span class="va">embeddings_metadata</span> <span class="op">&lt;-</span> <span class="fu">data.table</span><span class="fu">:::</span><span class="fu"><a href="https://rdatatable.gitlab.io/data.table/reference/fread.html">fread</a></span><span class="op">(</span><span class="st">"https://raw.githubusercontent.com/CSSLab/social-dimensions/main/data/embedding-metadata.tsv"</span><span class="op">)</span></span>
<span></span>
<span><span class="va">embeddings_scores</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/utils/read.table.html">read.csv</a></span><span class="op">(</span><span class="st">"https://raw.githubusercontent.com/CSSLab/social-dimensions/main/data/scores.csv"</span><span class="op">)</span></span></code></pre></div>
<p>Then to add in information on what each vector of dimensions 150 (i.e., here: columns), we can add in the community information to the embeddings with:</p>
<div class="sourceCode" id="cb463"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb462"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">communities</span> <span class="op">&lt;-</span> <span class="va">embeddings_metadata</span><span class="op">$</span><span class="va">community</span></span>
<span></span>
<span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">rownames</a></span><span class="op">(</span><span class="va">embeddings</span><span class="op">)</span> <span class="op">&lt;-</span> <span class="va">communities</span></span></code></pre></div>
Expand Down Expand Up @@ -372,7 +373,7 @@ <h2>
<footer class="bg-primary text-light mt-5"><div class="container"><div class="row">

<div class="col-12 col-md-6 mt-3">
<p>"<strong>Computational Text Analysis</strong>" was written by Christopher Barrie. It was last built on 2023-03-27.</p>
<p>"<strong>Computational Text Analysis</strong>" was written by Marion Lieutaud. It was last built on 2024-01-02.</p>
</div>

<div class="col-12 col-md-6 mt-3">
Expand Down
5 changes: 3 additions & 2 deletions docs/course-overview.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Course Overview | Computational Text Analysis</title>
<meta name="author" content="Christopher Barrie">
<meta name="author" content="Marion Lieutaud">
<meta name="description" content="In recent years, the use of computational techniques for the quantitative analysis of text has exploded. The volume and quantity of text data to which we now have access in the digital age is...">
<meta name="generator" content="bookdown 0.33 with bs4_book()">
<meta property="og:title" content="Course Overview | Computational Text Analysis">
Expand All @@ -31,6 +31,7 @@
div.csl-bib-body { }
div.csl-entry {
clear: both;
margin-bottom: 0em;
}
.hanging div.csl-entry {
margin-left:2em;
Expand Down Expand Up @@ -218,7 +219,7 @@ <h3>Final assessment<a class="anchor" aria-label="anchor" href="#final-assessmen
<footer class="bg-primary text-light mt-5"><div class="container"><div class="row">

<div class="col-12 col-md-6 mt-3">
<p>"<strong>Computational Text Analysis</strong>" was written by Christopher Barrie. It was last built on 2023-03-27.</p>
<p>"<strong>Computational Text Analysis</strong>" was written by Marion Lieutaud. It was last built on 2024-01-02.</p>
</div>

<div class="col-12 col-md-6 mt-3">
Expand Down
Loading

0 comments on commit 845e682

Please sign in to comment.