This repository has been archived by the owner on Sep 18, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 369
/
block011_write-your-own-function-03.html
408 lines (347 loc) · 17 KB
/
block011_write-your-own-function-03.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />
<title>Write your own R functions, part 3</title>
<script src="libs/jquery-1.11.3/jquery.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="libs/bootstrap-3.3.5/css/bootstrap.min.css" rel="stylesheet" />
<script src="libs/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/respond.min.js"></script>
<script src="libs/navigation-1.1/tabsets.js"></script>
<link href="libs/highlightjs-9.12.0/default.css" rel="stylesheet" />
<script src="libs/highlightjs-9.12.0/highlight.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-68219208-1', 'auto');
ga('send', 'pageview');
</script>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
pre:not([class]) {
background-color: white;
}
</style>
<script type="text/javascript">
if (window.hljs) {
hljs.configure({languages: []});
hljs.initHighlightingOnLoad();
if (document.readyState && document.readyState === "complete") {
window.setTimeout(function() { hljs.initHighlighting(); }, 0);
}
}
</script>
<style type="text/css">
h1 {
font-size: 34px;
}
h1.title {
font-size: 38px;
}
h2 {
font-size: 30px;
}
h3 {
font-size: 24px;
}
h4 {
font-size: 18px;
}
h5 {
font-size: 16px;
}
h6 {
font-size: 12px;
}
.table th:not([align]) {
text-align: left;
}
</style>
<link rel="stylesheet" href="libs/local/main.css" type="text/css" />
<link rel="stylesheet" href="libs/local/nav.css" type="text/css" />
<link rel="stylesheet" href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" type="text/css" />
</head>
<body>
<style type = "text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
code {
color: inherit;
background-color: rgba(0, 0, 0, 0.04);
}
img {
max-width:100%;
height: auto;
}
.tabbed-pane {
padding-top: 12px;
}
.html-widget {
margin-bottom: 20px;
}
button.code-folding-btn:focus {
outline: none;
}
</style>
<div class="container-fluid main-container">
<!-- tabsets -->
<style type="text/css">
.tabset-dropdown > .nav-tabs {
display: inline-table;
max-height: 500px;
min-height: 44px;
overflow-y: auto;
background: white;
border: 1px solid #ddd;
border-radius: 4px;
}
.tabset-dropdown > .nav-tabs > li.active:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li.active:before {
content: "";
border: none;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs > li.active {
display: block;
}
.tabset-dropdown > .nav-tabs > li > a,
.tabset-dropdown > .nav-tabs > li > a:focus,
.tabset-dropdown > .nav-tabs > li > a:hover {
border: none;
display: inline-block;
border-radius: 4px;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li {
display: block;
float: none;
}
.tabset-dropdown > .nav-tabs > li {
display: none;
}
</style>
<script>
$(document).ready(function () {
window.buildTabsets("TOC");
});
$(document).ready(function () {
$('.tabset-dropdown > .nav-tabs > li').click(function () {
$(this).parent().toggleClass('nav-tabs-open')
});
});
</script>
<!-- code folding -->
<header>
<div class="nav">
<a class="nav-logo" href="index.html">
<img src="static/img/stat545-logo-s.png" width="70px" height="70px"/>
</a>
<ul>
<li class="home"><a href="index.html">Home</a></li>
<li class="faq"><a href="faq.html">FAQ</a></li>
<li class="syllabus"><a href="syllabus.html">Syllabus</a></li>
<li class="topics"><a href="topics.html">Topics</a></li>
<li class="people"><a href="people.html">People</a></li>
</ul>
</div>
</header>
<div class="fluid-row" id="header">
<h1 class="title toc-ignore">Write your own R functions, part 3</h1>
</div>
<div id="TOC">
<ul>
<li><a href="#where-were-we-where-are-we-going">Where were we? Where are we going?</a></li>
<li><a href="#load-the-gapminder-data">Load the Gapminder data</a></li>
<li><a href="#restore-our-max-minus-min-function">Restore our max minus min function</a></li>
<li><a href="#be-proactive-about-nas">Be proactive about <code>NA</code>s</a></li>
<li><a href="#the-useful-but-mysterious-...-argument">The useful but mysterious <code>...</code> argument</a></li>
<li><a href="#use-testthat-for-formal-unit-tests">Use <code>testthat</code> for formal unit tests</a></li>
<li><a href="#resources">Resources</a></li>
</ul>
</div>
<div id="where-were-we-where-are-we-going" class="section level3">
<h3>Where were we? Where are we going?</h3>
<p>In <a href="block011_write-your-own-function-02.html">part 2</a> we generalized our first R function so it could take the difference between any two quantiles of a numeric vector. We also set default values for the underlying probabilities, so that, by default, we compute the max minus the min.</p>
<p>In this part, we tackle <code>NA</code>s, the special argument <code>...</code> and formal testing.</p>
</div>
<div id="load-the-gapminder-data" class="section level3">
<h3>Load the Gapminder data</h3>
<p>As usual, load the Gapminder data.</p>
<pre class="r"><code>library(gapminder)</code></pre>
</div>
<div id="restore-our-max-minus-min-function" class="section level3">
<h3>Restore our max minus min function</h3>
<p>Let’s keep our previous function around as a baseline.</p>
<pre class="r"><code>qdiff3 <- function(x, probs = c(0, 1)) {
stopifnot(is.numeric(x))
the_quantiles <- quantile(x, probs)
return(max(the_quantiles) - min(the_quantiles))
}</code></pre>
</div>
<div id="be-proactive-about-nas" class="section level3">
<h3>Be proactive about <code>NA</code>s</h3>
<p>I am being gentle by letting you practice with the Gapminder data. In real life, missing data will make your life a living hell. If you are lucky, it will be properly indicated by the special value <code>NA</code>, but don’t hold your breath. Many built-in R functions have an <code>na.rm =</code> argument through which you can specify how you want to handle <code>NA</code>s. Typically the default value is <code>na.rm = FALSE</code> and typical default behavior is to either let <code>NA</code>s propagate or to raise an error. Let’s see how <code>quantile()</code> handles <code>NA</code>s:</p>
<pre class="r"><code>z <- gapminder$lifeExp
z[3] <- NA
quantile(gapminder$lifeExp)
## 0% 25% 50% 75% 100%
## 23.5990 48.1980 60.7125 70.8455 82.6030
quantile(z)
## Error in quantile.default(z): missing values and NaN's not allowed if 'na.rm' is FALSE
quantile(z, na.rm = TRUE)
## 0% 25% 50% 75% 100%
## 23.599 48.228 60.765 70.846 82.603</code></pre>
<p>So <code>quantile()</code> simply will not operate in the presence of <code>NA</code>s unless <code>na.rm = TRUE</code>. How shall we modify our function?</p>
<p>If we wanted to hardwire <code>na.rm = TRUE</code>, we could. Focus on our call to <code>quantile()</code> inside our function definition.</p>
<pre class="r"><code>qdiff4 <- function(x, probs = c(0, 1)) {
stopifnot(is.numeric(x))
the_quantiles <- quantile(x, probs, na.rm = TRUE)
return(max(the_quantiles) - min(the_quantiles))
}
qdiff4(gapminder$lifeExp)
## [1] 59.004
qdiff4(z)
## [1] 59.004</code></pre>
<p>This works but it is dangerous to invert the default behavior of a well-known built-in function and to provide the user with no way to override this.</p>
<p>We could add an <code>na.rm =</code> argument to our own function. We might even enforce our preferred default – but at least we’re giving the user a way to control the behavior around <code>NA</code>s.</p>
<pre class="r"><code>qdiff5 <- function(x, probs = c(0, 1), na.rm = TRUE) {
stopifnot(is.numeric(x))
the_quantiles <- quantile(x, probs, na.rm = na.rm)
return(max(the_quantiles) - min(the_quantiles))
}
qdiff5(gapminder$lifeExp)
## [1] 59.004
qdiff5(z)
## [1] 59.004
qdiff5(z, na.rm = FALSE)
## Error in quantile.default(x, probs, na.rm = na.rm): missing values and NaN's not allowed if 'na.rm' is FALSE</code></pre>
</div>
<div id="the-useful-but-mysterious-...-argument" class="section level3">
<h3>The useful but mysterious <code>...</code> argument</h3>
<p>You probably could have lived a long and happy life without knowing there are at least 9 different algorithms for computing quantiles. <a href="http://www.rdocumentation.org/packages/stats/functions/quantile">Go read about the <code>type</code> argument</a> of <code>quantile()</code>. TLDR: If a quantile is not unambiguously equal to an observed data point, you must somehow average two data points. You can weight this average different ways, depending on the rest of the data, and <code>type =</code> controls this.</p>
<p>Let’s say we want to give the user of our function the ability to specify how the quantiles are computed, but we want to accomplish with as little fuss as possible. In fact, we don’t even want to clutter our function’s interface with this! This calls for the very special <code>...</code> argument. In English, this set of three dots is frequently called an “ellipsis”.</p>
<pre class="r"><code>qdiff6 <- function(x, probs = c(0, 1), na.rm = TRUE, ...) {
the_quantiles <- quantile(x = x, probs = probs, na.rm = na.rm, ...)
return(max(the_quantiles) - min(the_quantiles))
}</code></pre>
<p>The practical significance of the <code>type =</code> argument is virtually nonexistent, so we can’t demo with the Gapminder data. Thanks to <a href="https://twitter.com/wrathematics">@wrathematics</a>, here’s a small example where we can (barely) detect a difference due to <code>type</code>.</p>
<pre class="r"><code>set.seed(1234)
z <- rnorm(10)
quantile(z, type = 1)
## 0% 25% 50% 75% 100%
## -2.3456977 -0.8900378 -0.5644520 0.4291247 1.0844412
quantile(z, type = 4)
## 0% 25% 50% 75% 100%
## -2.345698 -1.048552 -0.564452 0.353277 1.084441
all.equal(quantile(z, type = 1), quantile(z, type = 4))
## [1] "Mean relative difference: 0.1776594"</code></pre>
<p>Now we can call our function, requesting that quantiles be computed in different ways.</p>
<pre class="r"><code>qdiff6(z, probs = c(0.25, 0.75), type = 1)
## [1] 1.319163
qdiff6(z, probs = c(0.25, 0.75), type = 4)
## [1] 1.401829</code></pre>
<p>While the difference may be subtle, <strong>it’s there</strong>. Marvel at the fact that we have passed <code>type = 1</code> through to <code>quantile()</code> <em>even though it was not a formal argument of our own function</em>.</p>
<p>The special argument <code>...</code> is very useful when you want the ability to pass arbitrary arguments down to another function, but without constantly expanding the formal arguments to your function. This leaves you with a less cluttered function definition and gives you future flexibility to specify these arguments only when you need to.</p>
<p>You will also encounter the <code>...</code> argument in many built-in functions – read up <a href="http://www.rdocumentation.org/packages/base/functions/c">on <code>c()</code></a> or <a href="http://www.rdocumentation.org/packages/base/functions/list"><code>list()</code></a> – and now you have a better sense of what it means. It is not a breezy “and so on and so forth.”</p>
<p>There are also downsides to <code>...</code>, so use it with intention. In a package, you will have to work harder to create truly informative documentation for your user. Also, the quiet, absorbent properties of <code>...</code> mean it can sometimes silently swallow other named arguments, when the user has a typo in the name. Depending on whether or how this fails, it can be a little tricky to find out what went wrong.</p>
</div>
<div id="use-testthat-for-formal-unit-tests" class="section level3">
<h3>Use <code>testthat</code> for formal unit tests</h3>
<p>Until now, we’ve relied on informal tests of our evolving function. If you are going to use a function alot, especially if it is part of a package, it is wise to use formal unit tests.</p>
<p>The <a href="https://github.com/hadley/testthat"><code>testthat</code> package</a> provides excellent facilities for this, with a distinct emphasis on automated unit testing of entire packages. However, we can take it out for a test drive even with our one measly function.</p>
<p>We will construct a test with <code>test_that()</code> and, within it, we put one or more <em>expectations</em> that check actual against expected results. You simply harden your informal, interactive tests into formal unit tests. Here are some examples of tests and indicative expectations.</p>
<pre class="r"><code>library(testthat)
test_that('invalid args are detected', {
expect_error(qdiff6("eggplants are purple"))
expect_error(qdiff6(iris))
})
test_that('NA handling works', {
expect_error(qdiff6(c(1:5, NA), na.rm = FALSE))
expect_equal(qdiff6(c(1:5, NA)), 4)
})</code></pre>
<p>No news is good news! Let’s see what test failure would look like. Let’s revert to a version of our function that does no <code>NA</code> handling, then test for proper <code>NA</code> handling. We can watch it fail.</p>
<pre class="r"><code>qdiff_no_NA <- function(x, probs = c(0, 1)) {
the_quantiles <- quantile(x = x, probs = probs)
return(max(the_quantiles) - min(the_quantiles))
}
test_that('NA handling works', {
expect_that(qdiff_no_NA(c(1:5, NA)), equals(4))
})
## Error: Test failed: 'NA handling works'
## * missing values and NaN's not allowed if 'na.rm' is FALSE
## 1: expect_that(qdiff_no_NA(c(1:5, NA)), equals(4)) at <text>:7
## 2: condition(object)
## 3: expect_equal(x, expected, ..., expected.label = label)
## 4: quasi_label(enquo(object), label)
## 5: eval_bare(get_expr(quo), get_env(quo))
## 6: qdiff_no_NA(c(1:5, NA))
## 7: quantile(x = x, probs = probs) at <text>:2
## 8: quantile.default(x = x, probs = probs)
## 9: stop("missing values and NaN's not allowed if 'na.rm' is FALSE")</code></pre>
<p>Similar to the advice to use assertions in data analytical scripts, I recommend you use unit tests to monitor the behavior of functions you (or others) will use often. If your tests cover the function’s important behavior, then you can edit the internals freely. You’ll rest easy in the knowledge that, if you broke anything important, the tests will fail and alert you to the problem. A function that is important enough for unit tests probably also belongs in a package, where there are obvious mechanisms for running the tests as part of overall package checks.</p>
<!--
### other content
match.arg()
defaulting to NULL then checking is.null() and take it from there
-->
</div>
<div id="resources" class="section level3">
<h3>Resources</h3>
<p>Hadley Wickham’s book <a href="http://adv-r.had.co.nz">Advanced R</a></p>
<ul>
<li>Section on <a href="http://adv-r.had.co.nz/Functions.html#function-arguments">function arguments</a></li>
</ul>
<p>Unit testing with <code>testthat</code>:</p>
<ul>
<li>On <a href="https://cran.r-project.org/web/packages/testthat/index.html">CRAN</a>, development on <a href="https://github.com/hadley/testthat">GitHub</a></li>
</ul>
<p>Hadley Wickham’s <a href="http://r-pkgs.had.co.nz">R packages</a> book</p>
<ul>
<li><a href="http://r-pkgs.had.co.nz/tests.html">Testing chapter</a></li>
</ul>
<p>Article <a href="https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf">testthat: Get Started with Testing</a> in The R Journal Vol. 3/1, June 2011. Maybe this is completely superceded by the newer chapter above? Be aware that parts could be out of date, but I recall it was a helpful read.</p>
</div>
<div class="footer">
This work is licensed under the <a href="http://creativecommons.org/licenses/by-nc/3.0/">CC BY-NC 3.0 Creative Commons License</a>.
</div>
</div>
<script>
// add bootstrap table styles to pandoc tables
function bootstrapStylePandocTables() {
$('tr.header').parent('thead').parent('table').addClass('table table-condensed');
}
$(document).ready(function () {
bootstrapStylePandocTables();
});
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>
</body>
</html>