-
Notifications
You must be signed in to change notification settings - Fork 50
/
12_ggtree_utilities.Rmd
476 lines (319 loc) · 20 KB
/
12_ggtree_utilities.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
# (PART\*) Part IV: Miscellaneous topics {-}
\newpage
# ggtree Utilities {#chapter12}
## Facet Utilities {#facet-utils}
### facet_widths {#facet_widths}
Adjusting relative widths of facet panels is a common requirement, especially for using `geom_facet()`\index{geom\textunderscore facet} to visualize a tree with associated data. However, this is not supported by the `r CRANpkg("ggplot2")` package. To address this issue, `r Biocpkg("ggtree")` provides the `facet_widths()` function and it works with both `ggtree` and `ggplot` objects.
```{r eval=F}
library(ggplot2)
library(ggtree)
library(reshape2)
set.seed(123)
tree <- rtree(30)
p <- ggtree(tree, branch.length = "none") +
geom_tiplab() + theme(legend.position='none')
a <- runif(30, 0,1)
b <- 1 - a
df <- data.frame(tree$tip.label, a, b)
df <- melt(df, id = "tree.tip.label")
p2 <- p + geom_facet(panel = 'bar', data = df, geom = geom_bar,
mapping = aes(x = value, fill = as.factor(variable)),
orientation = 'y', width = 0.8, stat='identity') +
xlim_tree(9)
facet_widths(p2, widths = c(1, 2))
```
It also supports using a name vector to set the widths of specific panels. The following code will display an identical figure to Figure \@ref(fig:facetWidth)A.
```r
facet_widths(p2, c(Tree = .5))
```
The `facet_widths()` function also works with other `ggplot` objects as demonstrated in Figure \@ref(fig:facetWidth)B.
```{r eval=FALSE}
p <- ggplot(iris, aes(Sepal.Width, Petal.Length)) +
geom_point() + facet_grid(.~Species)
facet_widths(p, c(setosa = .5))
```
(ref:facetWidthscap) Adjust relative widths of ggplot facets.
(ref:facetWidthcap) **Adjust relative widths of ggplot facets.** The `facet_widths()` function works with `ggtree` (A) as well as `ggplot` (B).
```{r facetWidth, echo=F, fig.width=6, fig.height=7, fig.scap="(ref:facetWidthscap)", fig.cap="(ref:facetWidthcap)"}
library(ggplot2)
library(ggtree)
library(reshape2)
set.seed(123)
tree <- rtree(30)
p <- ggtree(tree, branch.length = "none") +
geom_tiplab(size=3) + theme(legend.position='none')
a <- runif(30, 0,1)
b <- 1 - a
df <- data.frame(tree$tip.label, a, b)
df <- melt(df, id = "tree.tip.label")
p2 <- p + geom_facet(panel = 'bar', data = df, geom = geom_bar,
mapping = aes(x = value, fill = as.factor(variable)),
orientation = 'y', width = 0.8, stat='identity') +
xlim_tree(9)
pp = facet_widths(p2, widths = c(1, 2))
g <- ggplot(iris, aes(Sepal.Width, Petal.Length)) +
geom_point() + facet_grid(.~Species)
gg = facet_widths(g, c(setosa = .5))
#plot_grid(plot_grid(ggdraw(), pp, rel_widths=c(.04, 1)),
# gg, ncol=1, labels = LETTERS[1:2], rel_heights=c(1.5, 1))
plot_list(pp, gg, ncol=1, tag_levels='A', heights=c(1.5, 1))
```
### facet_labeller {#facet_labeller}
The `facet_labeller()` function was designed to relabel selected panels (Figure \@ref(fig:facetLab)), and it currently only works with `ggtree` objects (*i.e.*, `geom_facet()` outputs). A more versatile version that works with both `ggtree` and `ggplot` objects is implemented in the `r CRANpkg("ggfun")` package (*i.e.*, the `facet_set()` function).
```{r eval=F}
facet_labeller(p2, c(Tree = "phylogeny", bar = "HELLO"))
```
If you want to combine `facet_widths()` with `facet_labeller()`, you need to call `facet_labeller()` to relabel the panels before using `facet_widths()` to set the relative widths of each panel. Otherwise, it won't work since the output of `facet_widths()` is redrawn from `grid` object.
```{r eval=F}
facet_labeller(p2, c(Tree = "phylogeny")) %>% facet_widths(c(Tree = .4))
```
(ref:facetLabscap) Rename facet labels.
(ref:facetLabcap) **Rename facet labels.** Rename multiple labels simultaneously (A) or only for a specific one (B) are all supported. `facet_labeller()` can combine with `facet_widths()` to rename facet label and then adjust relative widths (B).
```{r facetLab, echo=FALSE,fig.width=6, fig.height=7, fig.scap="(ref:facetLabscap)", fig.cap="(ref:facetLabcap)"}
pg1 <- facet_labeller(p2, c(Tree = "phylogeny", bar = "HELLO"))
pg2 <- facet_labeller(p2, c(Tree = "phylogeny")) %>% facet_widths(c(Tree = .4))
#plot_grid(plot_grid(ggdraw(), pg1, rel_widths=c(.04, 1)),
# plot_grid(ggdraw(), pg2, rel_widths=c(.04, 1)),
# ncol=1, labels = c("A", "B"))
plot_list(pg1, pg2, ncol=1, tag_levels='A')
```
## Geometric Layers {#geom2}
Subsetting is not supported in layers defined in `r CRANpkg("ggplot2")`, while it is quite useful in phylogenetic annotation since it allows us to annotate at specific node(s) (e.g., only label bootstrap values that are larger than 75)\index{grammar of graphics}.
In `r Biocpkg("ggtree")`, we provide several modified versions of layers defined in `r CRANpkg("ggplot2")` to support the `subset` aesthetic mapping, including:
+ `geom_segment2()`
+ `geom_point2()`
+ `geom_text2()`
+ `geom_label2()`
These layers works with both `r Biocpkg("ggtree")` and `r CRANpkg("ggplot2")` (Figure \@ref(fig:layer2)).
(ref:layer2scap) Geometric layers that support subsetting.
(ref:layer2cap) **Geometric layers that support subsetting.** These layers work with `ggplot2` (A) and `ggtree` (B).
```{r layer2, fig.width=11, fig.height=5, fig.cap="(ref:layer2cap)", fig.scap="(ref:layer2scap)", out.width="100%"}
library(ggplot2)
library(ggtree)
data(mpg)
p <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_text2(aes(label=manufacturer,
subset = hwy > 40 | displ > 6.5),
nudge_y = 1) +
coord_cartesian(clip = "off") +
theme_light() +
theme(legend.position = c(.85, .75))
p2 <- ggtree(rtree(10)) +
geom_label2(aes(subset = node <5, label = label))
plot_list(p, p2, ncol=2, tag_levels='A')
```
## Layout Utilities
In [session 4.2](#tree-layouts), we introduce several layouts supported by `r Biocpkg("ggtree")`. The `r Biocpkg("ggtree")` package also provides several layout functions that can transform from one to another. Note that not all layouts are supported (see Table \@ref(tab:layoutLayerTab) and Figure \@ref(fig:layoutLayer))\index{tree layout}.
```{r layoutLayerTab, echo=FALSE}
layout.df = tibble::tribble(~Layout, ~Description,
"layout_circular", "transform rectangular layout to circular layout",
"layout_dendrogram", "transform rectangular layout to dendrogram layout",
"layout_fan", "transform rectangular/circular layout to fan layout",
"layout_rectangular", "transform circular/fan layout to rectangular layout",
"layout_inward_circular", "transform rectangular/circular layout to inward_circular layout"
)
knitr::kable(layout.df, caption = "Layout transformers.", booktabs = T)
```
```{r eval=FALSE}
set.seed(2019)
x <- rtree(20)
p <- ggtree(x)
p + layout_dendrogram()
ggtree(x, layout = "circular") + layout_rectangular()
p + layout_circular()
p + layout_fan(angle=90)
p + layout_inward_circular(xlim=4) + geom_tiplab(hjust=1)
```
(ref:layoutLayerscap) Layout functions for transforming among different layouts.
(ref:layoutLayercap) **Layout functions for transforming among different layouts**. Default rectangular layout (A); transform rectangular to dendrogram layout (B); transform circular to rectangular layout (C); transform rectangular to circular layout (D); transform rectangular to fan layout (E); transform rectangular to inward circular layout (F).
```{r layoutLayer, echo=FALSE, fig.width=10.8, fig.height=7.5, message=FALSE, fig.cap="(ref:layoutLayercap)", fig.scap="(ref:layoutLayerscap)", out.width='100%'}
set.seed(2019)
x <- rtree(20)
p <- ggtree(x)
if (FALSE) {
pp1 <- cowplot::plot_grid(
p,
p + layout_dendrogram(),
p + layout_circular() + layout_rectangular(),
ncol=3, labels = LETTERS[1:3])
require(ggplotify)
pp2 <- cowplot::plot_grid(
as.ggplot(p + layout_circular(), scale=1.2, hjust=-.1),
as.ggplot(p + layout_fan(angle=90), scale=1.2),
as.ggplot(p + layout_inward_circular(xlim=4) + geom_tiplab(hjust=1), scale=1.2, hjust=.1),
ncol=3, labels = LETTERS[4:6])
cowplot::plot_grid(pp1, pp2, ncol=1, rel_heights=c(2, 3))
}
plot_list(p,
p + layout_dendrogram(),
p + layout_circular() + layout_rectangular(),
as.ggplot(p + layout_circular(), scale=1.2, hjust=-.1),
as.ggplot(p + layout_fan(angle=90), scale=1.2),
as.ggplot(p + layout_inward_circular(xlim=4) + geom_tiplab(hjust=1), scale=1.2, hjust=.1),
ncol=3, heights=c(2, 3), tag_levels='A')
```
## Scale Utilities
The `r Biocpkg("ggtree")` package provides several scale functions to manipulate the *x*-axis, including the `scale_x_range()` documented in [session 5.2.4](#uncertainty-of-evolutionary-inference), `xlim_tree()`, `xlim_expand()`, `ggexpand()`, `hexpand()` and `vexpand()`.
### Expand x limit for a specific facet panel {#xlim_expand}
Sometimes we need to set `xlim` for a specific facet panel (*e.g.*, allocate more space for [long tip labels](#faq-label-truncated) at `Tree` panel). However, the `ggplot2::xlim()` function applies to all the panels. The `r Biocpkg("ggtree")` provides `xlim_expand()` to adjust `xlim` for user-specific facet panel. It accepts two parameters, `xlim`, and `panel`, and can adjust all individual panels as demonstrated in Figure \@ref(fig:xlimExpand)A. If you only want to adjust `xlim` of the `Tree` panel, you can use `xlim_tree()` as a shortcut.
```{r eval=FALSE}
set.seed(2019-05-02)
x <- rtree(30)
p <- ggtree(x) + geom_tiplab()
d <- data.frame(label = x$tip.label,
value = rnorm(30))
p2 <- p + geom_facet(panel = "Dot", data = d,
geom = geom_point, mapping = aes(x = value))
p2 + xlim_tree(6) + xlim_expand(c(-10, 10), 'Dot')
```
The `xlim_expand()` function also works with `ggplot2::facet_grid()`. As demonstrated in Figure \@ref(fig:xlimExpand)B, only the `xlim` of *virginica* panel was adjusted by `xlim_expand()`.
```{r eval=FALSE}
g <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point() + facet_grid(. ~ Species, scales = "free_x")
g + xlim_expand(c(0, 15), 'virginica')
```
(ref:xlimExpandscap) Setting xlim for user-specific facet panels.
(ref:xlimExpandcap) **Setting xlim for user-specific facet panels.** Using `xlim_tree()` to set the Tree panel of the `ggtree` output (A) and `xlim_expand()` to set the Dot panel of the `ggtree` output (A) and the Virginica panel of the `ggplot` output (B).
```{r xlimExpand, echo=FALSE, fig.cap="(ref:xlimExpandcap)", fig.scap="(ref:xlimExpandscap)", fig.width=12, fig.height = 5, out.width='100%'}
set.seed(2019-05-02)
x <- rtree(30)
p <- ggtree(x) + geom_tiplab()
d <- data.frame(label = x$tip.label,
value = rnorm(30))
p2 <- p + geom_facet(panel = "Dot", data = d,
geom = geom_point, mapping = aes(x = value))
p2 <- p2 + xlim_expand(c(0, 6), 'Tree') + xlim_expand(c(-10, 10), 'Dot')
g <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point() + facet_grid(. ~ Species, scales = "free_x")
#plot_grid(plot_grid(ggdraw(), p2, rel_widths=c(.04, 1)),
# g + theme_grey() + xlim_expand(c(0, 15), 'virginica'),
# ncol=2, labels=c("A", "B"))
plot_list(p2, g + theme_grey() + xlim_expand(c(0, 15), 'virginica'),
ncol=2, tag_levels = 'A')
```
### Expand plot limit by the ratio of plot range {#ggexpand}
The `r CRANpkg("ggplot2")` package cannot automatically adjust plot limits and it is very common that long text was truncated. Users need to adjust x (y) limits manually via the `xlim()` (`ylim()`) command (see also [FAQ: Tip label truncated](#faq-label-truncated)).
The `xlim()` (`ylim()`) is a good solution to this issue. However, we can make it more simple, by expanding the plot panel by a ratio of the axis range without knowing what the exact value is.
We provide `hexpand()` function to expand x limit by specifying a fraction of the x range and it works for both directions (`direction=1` for right-hand side and `direction=-1` for left-hand side) (Figure \@ref(fig:hexpand)). Another version of `vexpand()` works with similar behavior for *y*-axis and the `ggexpand()` function works for both *x*- and *y*-axis (Figure \@ref(fig:phylonetworx)).
(ref:hexpandscap) Expanding plot limits by a fraction of x or y range.
(ref:hexpandcap) **Expanding plot limits by a fraction of the x or y range.** Expand x limit at right-hand side by default (A), and expand x limit for left-hand side when direction = -1 and expand y limit at the upper side (B).
```{r hexpand,fig.cap="(ref:hexpandcap)", fig.scap="(ref:hexpandscap)", fig.width=12, fig.height = 5, out.width='100%'}
x$tip.label <- paste0('to make the label longer_', x$tip.label)
p1 <- ggtree(x) + geom_tiplab() + hexpand(.4)
p2 <- ggplot(iris, aes(Sepal.Width, Petal.Width)) +
geom_point() +
hexpand(.2, direction = -1) +
vexpand(.2)
plot_list(p1, p2, tag_levels="A", widths=c(.6, .4))
```
## Tree data utilities
### Filter tree data {#td_filter}
The `r Biocpkg("ggtree")` package defined [several geom layers](#geom2) that support subsetting tree data. However, many other geom layers that didn't provide this feature, are defined in `r CRANpkg("ggplot2")` and its extensions. To allow filtering tree data with these layers, `r Biocpkg("ggtree")` provides an accompanying function, `td_filter()` that returns a function that works similar to `dplyr::filter()` and can be passed to the `data` parameter in geom layers to filter `ggtree` plot data as demonstrated in Figure \@ref(fig:tdFilter).
(ref:tdfilterscap) Filtering ggtree plot data in geom layers.
(ref:tdfiltercap) **Filtering ggtree plot data in geom layers.** Only selected tips (offspring of the node indicated by the blue circle point) were labeled.
```{r tdFilter,fig.cap="(ref:tdfiltercap)", fig.scap="(ref:tdfilterscap)", fig.width=7, fig.height = 5}
library(tidytree)
set.seed(1997)
tree <- rtree(50)
p <- ggtree(tree)
selected_nodes <- offspring(p, 67)$node
p + geom_text(aes(label=label),
data=td_filter(isTip &
node %in% selected_nodes),
hjust=0) +
geom_nodepoint(aes(subset = node ==67),
size=5, color='blue')
```
### Flatten list-column tree data {#td_unnest}
The `ggtree` plot data is a tidy data frame where each row represents a unique node. If multiple values are associated with a node, the data can be stored as nested data (i.e., in a list-column).
```{r}
set.seed(1997)
tr <- rtree(5)
d <- data.frame(id=rep(tr$tip.label,2),
value=abs(rnorm(10, 6, 2)),
group=c(rep("A", 5),rep("B",5)))
require(tidyr)
d2 <- nest(d, value =value, group=group)
## d2 is a nested data
d2
```
Nested data is supported by the operator, `%<+%`, and can be mapped to the tree structure. If a geom layer can't directly support visualizing nested data, we need to flatten the data before applying the geom layer to display it. The `r Biocpkg("ggtree")` package provides a function, `td_unnest()`, which returns a function that works similar to `tidyr::unnest()` and can be used to flatten `ggtree` plot data as demonstrated in Figure \@ref(fig:tdUnnest)A.
All tree data utilities provide a `.f` parameter to pass a function to pre-operate the data. This creates the possibility to combine different tree data utilities as demonstrated in Figure \@ref(fig:tdUnnest)B.
(ref:tdunnestscap) Flattening ggtree plot data.
(ref:tdunnestcap) **Flattening ggtree plot data.** List-columns can be flattened by `td_unnest()` and two circle points were displayed on each tip simultaneously (A). Different tree data utilities can be combined to work together, e.g., filter data by `td_filter()`, and then flatten it by `td_unnest()`) (B).
```{r tdUnnest,fig.cap="(ref:tdunnestcap)", fig.scap="(ref:tdunnestscap)", fig.width=8, fig.height = 5, out.width='100%'}
p <- ggtree(tr) %<+% d2
p2 <- p +
geom_point(aes(x, y, size= value, colour=group),
data = td_unnest(c(value, group)), alpha=.4) +
scale_size(range=c(3,10), limits=c(3, 10))
p3 <- p +
geom_point(aes(x, y, size= value, colour=group),
data = td_unnest(c(value, group),
.f = td_filter(isTip & node==4)),
alpha=.4) +
scale_size(range=c(3,10), limits=c(3, 10))
plot_list(p2, p3, tag_levels = 'A')
```
## Tree Utilities
### Extract tip order {#tiporder}
To create [composite plots](#composite_plot), users need to re-order their data manually before creating tree-associated graphs. The order of their data should be consistent with the tip order presented in the `ggtree()` plot. For this purpose, we provide the `get_taxa_name()` function to extract an ordered vector of tips based on the tree structure plotted by `ggtree()`.
(ref:tiporderscap) An example tree for demonstraing `get_taxa_name()` function.
(ref:tipordercap) **An example tree for demonstrating `get_taxa_name()` function.**
```{r tiporder,fig.cap="(ref:tipordercap)", fig.scap="(ref:tiporderscap)", fig.width=5, fig.height = 5}
set.seed(123)
tree <- rtree(10)
p <- ggtree(tree) + geom_tiplab() +
geom_hilight(node = 12, extendto = 2.5)
x <- paste("Taxa order:",
paste0(get_taxa_name(p), collapse=', '))
p + labs(title=x)
```
The `get_taxa_name()` function will return a vector of ordered tip labels according to the tree structure displayed in Figure \@ref(fig:tiporder).
```{r}
get_taxa_name(p)
```
If users specify a node, the `get_taxa_name()` will extract the tip order of the selected clade (i.e., highlighted region in Figure \@ref(fig:tiporder)).
```{r}
get_taxa_name(p, node = 12)
```
### Padding taxa labels
The `label_pad()` function adds padding characters (default is `·`) to taxa labels.
```{r}
set.seed(2015-12-21)
tree <- rtree(5)
tree$tip.label[2] <- "long string for test"
d <- data.frame(label = tree$tip.label,
newlabel = label_pad(tree$tip.label),
newlabel2 = label_pad(tree$tip.label, pad = " "))
print(d)
```
This feature is useful if we want to align tip labels to the end as demonstrated in Figure \@ref(fig:labelpad). Note that in this case, monospace font should be used to ensure the lengths of the labels displayed in the plot are the same.
(ref:labelpadscap) Align tip label to the end.
(ref:labelpadcap) **Align tip label to the end.** With a dotted line (A) and without a dotted line (B).
```{r labelpad, fig.width=6.9, fig.height=3.4, fig.cap="(ref:labelpadcap)", fig.scap="(ref:labelpadscap)", out.width='100%'}
p <- ggtree(tree) %<+% d + xlim(NA, 5)
p1 <- p + geom_tiplab(aes(label=newlabel),
align=TRUE, family='mono',
linetype = "dotted", linesize = .7)
p2 <- p + geom_tiplab(aes(label=newlabel2),
align=TRUE, family='mono',
linetype = NULL, offset=-.5) + xlim(NA, 5)
plot_list(p1, p2, ncol=2, tag_levels = "A")
```
## Interactive ggtree Annotation {#identify}
```{r echo=FALSE}
link = ifelse (knitr::is_latex_output(), "https://twitter.com/drandersgs/status/965996335882059776", "#plotly")
plotly_ggtree_link <- paste0("an [interactive phylogenetic tree](", link, ")")
```
The `r Biocpkg("ggtree")` package supports interactive tree annotation or manipulation by implementing an `identify()` method. Users can click on a node to highlight a clade, to label or rotate it, *etc*. Users can also use the `r CRANpkg("plotly")` package to convert a `ggtree` object to a `plotly` object to quickly create
```{r results='asis', echo=FALSE}
cat(plotly_ggtree_link, '.', sep='')
```
```{r, eval = !knitr::is_latex_output(), child="ggtree-identity.Rmd"}
```
Video of using `identify()` to interactively manipulate a phylogenetic tree can be found on Youtube `r icons::icon_style(icons::fontawesome("youtube"), fill='red')` and Youku:
+ Highlighting clades: [Youtube `r icons::icon_style(icons::fontawesome("youtube"), fill='red')`](https://youtu.be/KcF8Ec38mzI) and [Youku](http://v.youku.com/v_show/id_XMTYyMzgyODYyOA).
+ Labelling clades: [Youtube `r icons::icon_style(icons::fontawesome("youtube"), fill='red')`](https://youtu.be/SmcceRD_jxg) and [Youku](http://v.youku.com/v_show/id_XMTYyNDIzODA0NA).
+ Rotating clades: [Youtube `r icons::icon_style(icons::fontawesome("youtube"), fill='red')`](https://youtu.be/lKNn4QlPO0E) and [Youku](http://v.youku.com/v_show/id_XMTYyMzgyODg2OA).