-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathBOF_2024_output.Rmd
291 lines (212 loc) · 12.8 KB
/
BOF_2024_output.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
---
title: "Crosswind LAM Results"
author: "Matt Tyers"
date: "2024-09-05"
output: word_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, dpi=300, fig.height = 7, fig.width = 10, message=FALSE, warning=FALSE)
```
## Estimating Mean Weight
Since fish sampling occurred during two events within different seasons and mimicking sport fishing under different regulations, size selectivity was observed to differ substantially between sampling events. Because of this, a temporally stratified estimator was used to estimate mean weight for lake trout in Crosswind lake, given below, in which $b_{March}$ and $b_{June}$ represent the weights given to the March and June sampling events. These were treated as equal in the analysis below. The stratified estimators can be expressed as the following, for observations $i \in 1 ... n_k$ within season k:
$$\bar{W} = \frac{(b_{March} \times \bar{W}_{March}) + (b_{June} \times \bar{W}_{June})}{b_{March} + b_{June}}$$
and
$$\hat{V}(\bar{W}) = \frac{\left(b_{March}^2 \times \hat{V}(\bar{W}_{March})\right) + \left(b_{June}^2 \times \hat{V}(\bar{W}_{June})\right)}{(b_{March} + b_{June})^2}$$
in which
$$\bar{W}_k = \frac{\sum_{i=1}^{n_k}W_{k[i]}}{n_k}$$
and
$$\hat{V}(\bar{W}_k) = \frac{\sum_{i=1}^{n_k}(W_{k[i]}-\bar{W}_k)^2}{n_k(n_k-1)}$$
```{r}
library(tidyverse)
## per Corey email 9/4/2024
legal_nobait <- read_csv("BOF_2024/flat_data/legal_nobait.csv")
# ## the data I was sent before
# march_data <- read_csv("Analysis_2024/flat_data/March_Catch_Data.csv")[,-(14:21)]
# june_data <- read_csv("Analysis_2024/flat_data/June_Catch_Data.csv")
#
# all_data <- data.frame(event = c(rep("March", nrow(march_data)),
# rep("June", nrow(june_data))),
# FL = c(march_data$FL, june_data$FL),
# TL = c(march_data$FL, june_data$TL),
# weight_kg = c(march_data$kg, june_data$Kg),
# bait = c(march_data$Bait, june_data$Bait),
# gear = c(march_data$Gear, rep(NA, nrow(june_data))),
# technique = c(march_data$Technique, rep(NA, nrow(june_data))))
#
# ## reconstructing legal_nobait from previous data - does it match?
# legal_nobait1 <- filter(all_data, (event == "March") | (event == "June" & bait == "N"))
#
# (sort(legal_nobait$kg) == sort(legal_nobait1$weight_kg)) # %>% all
# ## it does match!
########### LAM inputs
area <- 3717
log10_area <- log10(area)
log10_YP <- 0.72*log10_area + 0.6
YP_kg <- 10^log10_YP
########### estimating mean weight of harvest
season_weights <- c(0.5, 0.5) # summer, then winter
# by season
season_Wbar <- tapply(legal_nobait$kg, legal_nobait$Season, mean)
season_n <- tapply(legal_nobait$kg, legal_nobait$Season, length)
season_varWbar <- tapply(legal_nobait$kg, legal_nobait$Season, var)/season_n
season_seWbar <- sqrt(season_varWbar)
# combined
Wbar <- sum(season_weights*season_Wbar)/sum(season_weights)
varWbar <- sum((season_weights^2)*season_varWbar)/(sum(season_weights)^2)
seWbar <- sqrt(varWbar)
```
For this purpose, the weights of `r season_n[2]` fish in the March sample were used, as well as `r season_n[1]` fish from the June sample that were not caught using bait. The purpose of this was to provide representative samples of harvest that could occur during the Winter and Summer sport fishing seasons, under current regulations.
\pagebreak
### Results
Mean weights by sample are summarized below.
```{r, results='asis'}
tab1 <- cbind(season_n, season_Wbar, season_seWbar)
colnames(tab1) <- c("n","Wbar_kg","SE(Wbar_kg)")
knitr::kable(tab1[2:1,], digits=3)
```
The mean weight of lake trout harvested from Crosswind lake was estimated as `r round(Wbar, 3)` kg (SE `r round(seWbar, 3)` kg) using stratification.
## LA Model
The LA model (Evans et al. 1991) was used to estimate yield potential of lake trout in Crosswind Lake as:
$$log_{10}(\hat{YP}) = 0.60 + 0.72log_{10}(A)$$
in which $\hat{YP}$ represents the estimated yield potential in kg biomass per year, and $A$ represents the lake area in hectares.
The unbiased yield potential (delta method; Seber 1982), in terms of number of lake trout, was estimated as:
$$\hat{YP}_n=\frac{\hat{YP}}{\bar{W}} + \frac{\hat{YP}}{\bar{W}^3}\hat{V}(\bar{W})$$
in which $\bar{W}$ represents the estimated mean weight of lake trout, using data from both the March and June sampling events.
The approximate variance was estimated as the following, treating estimated yield potential as constant:
$$\hat{V}_{naive}(\hat{YP}_n) = \frac{\hat{YP}^2}{\bar{W}^4}\hat{V}(\bar{W})$$
```{r}
# LAM assuming YP is known without error
YP_n <- YP_kg/Wbar + YP_kg/(Wbar^3)*varWbar
var_YP_n <- (YP_kg^2)/(Wbar^4)*varWbar
se_YP_n <- sqrt(var_YP_n)
ci_YP_n <- YP_n + c(-1,1)*1.96*se_YP_n
```
\pagebreak
### Results - seasons equally weighted
Using this methodology and weighting the winter and summer samples equally, the yield potential in numbers of fish was estimated as `r round(YP_n, 2)` fish (SE `r round(se_YP_n, 2)`) fish, with an approximate 95% Wald confidence interval of (`r round(ci_YP_n[1], 2)` - `r round(ci_YP_n[2], 2)`).
### Sensitivity to choice of weighting
To illustrate how inferences might differ under different seasonal weighting schemes, results were calculated for a sequence of weightings ranging from 100% Winter (0% Summer) to 0% Winter (100% Summer). Results are displayed below.
```{r}
## investigating the effect of different weightings
possible_wts <- seq(.01, .99, by=.01)
ci_YP_n_all <- matrix(nrow=length(possible_wts), ncol=2)
YP_n_all <- rep(NA, length(possible_wts))
for(i in seq_along(possible_wts)) {
season_weights <- c(possible_wts[i], 1-possible_wts[i]) # summer, then winter
# by season
season_Wbar <- tapply(legal_nobait$kg, legal_nobait$Season, mean)
season_n <- tapply(legal_nobait$kg, legal_nobait$Season, length)
season_varWbar <- tapply(legal_nobait$kg, legal_nobait$Season, var)/season_n
season_seWbar <- sqrt(season_varWbar)
# combined
Wbar <- sum(season_weights*season_Wbar)/sum(season_weights)
varWbar <- sum((season_weights^2)*season_varWbar)/(sum(season_weights)^2)
seWbar <- sqrt(varWbar)
# LAM assuming YP is known without error
YP_n <- YP_kg/Wbar + YP_kg/(Wbar^3)*varWbar
var_YP_n <- (YP_kg^2)/(Wbar^4)*varWbar
se_YP_n <- sqrt(var_YP_n)
ci_YP_n <- YP_n + c(-1,1)*1.96*se_YP_n
YP_n_all[i] <- YP_n
ci_YP_n_all[i,] <- ci_YP_n
}
plot(NA, xlim=0:1, ylim=range(ci_YP_n_all),
ylab="YP_n", xlab="Seasonal Weights",
xaxt="n")
lines(possible_wts, YP_n_all)
lines(possible_wts, ci_YP_n_all[,1], lty=2)
lines(possible_wts, ci_YP_n_all[,2], lty=2)
legend("topleft", lty=1:2, legend=c("Estimate", "95% CI"))
axis(side=1, at=seq(0, 1, by=0.2),
labels = paste0(seq(0, 100, 20), "% / ",
seq(100, 0, -20), "%"))
```
### Incorporating uncertainty in true proportions of harvest by season
An alternate analysis was performed, with the intent of incorporating the uncertainty in the true proportions of harvest by season. For 100,000 iterations of Monte Carlo simulation, seasonal weightings were drawn from a Beta(5,5) distribution (plotted below), in order to approximate our state of knowledge of the true proportions by season. Use of a Beta(5,5) distribution can be interpreted as `r 100*round((pbeta(0.6, 5, 5) - pbeta(0.4, 5, 5)), 2)`% certainty that the proportion of total harvest occurring in Winter is between 40% and 60%, and `r 100*round((pbeta(0.75, 5, 5) - pbeta(0.25, 5, 5)), 2)`% certainty that it is between 25% and 75%.
```{r, fig.height=4, fig.width=7}
curve(dbeta(x, 5, 5))
```
For each iteration, the weight samples for each season were resampled with replacement in order to incorporate sampling variance, then calculations were performed for mean weights for each season and then overall mean weight, using the randomly-drawn seasonal weightings. Finally, the estimated Yield Potential in numbers of fish was calculated for each iteration, thus approximating the sampling distribution of $\hat{YP}_n$, incorporating the uncertainty in true proportions of harvest by season.
```{r, fig.height=3.5, fig.width=7}
## the sampling variances of the weight samples are incorporated using bootstrapping!
nsim <- 100000
YP_sim <- rep(NA, nsim)
for(i in 1:nsim) {
winter_boot <- sample(legal_nobait$kg[legal_nobait$Season == "Winter"], replace=TRUE)
summer_boot <- sample(legal_nobait$kg[legal_nobait$Season == "Summer"], replace=TRUE)
# weight_winter <- 0.5
weight_winter <- rbeta(1, 5, 5)
weight_summer <- 1 - weight_winter
Wbar <- (weight_winter * mean(winter_boot)) + (weight_summer * mean(summer_boot))
YP_sim[i] <- YP_kg/Wbar
}
```
The distribution of simulated $\hat{YP}_n$ is shown below.
```{r, fig.height=3.5, fig.width=7}
hist(YP_sim)
# mean(YP_sim)
# median(YP_sim)
# sd(YP_sim)
#
# quantile(YP_sim, c(0.025, 0.975))
```
Incorporating the uncertainty in true proportions of harvest by season using a Beta(5,5) distribution and resampling the fish weights in each season yields a bootstrap mean of `r round(mean(YP_sim), 2)` fish (SE `r round(sd(YP_sim), 2)` fish) and an approximate 95% confidence interval of (`r round(quantile(YP_sim, 0.025), 2)` - `r round(quantile(YP_sim, 0.975), 2)`).
### Incorporating uncertainty due to estimation in the Lake Area Model
It is important to recognize the variance due to prediction from the LAM. The LA model reported by Evans et al. (1991) is expressed as a linear regression on the log-log scale, with an $R^2$ of 0.69 and n=43. Sampling from the lake area data reported by Evans et al. and simulating YP values such that the estimated regression parameters are equivalent to those reported provides an approximate estimate of the model residual variance and thereby prediction variance for a new lake with the same area as Crosswind Lake.
The previous analysis was repeated, using values of $\hat{YP}$ simulated using the prediction variance estimated for Crosswind Lake, and back-transformed to the natural scale. To isolate the effect of uncertainty due to estimation of the LAM, seasonal weightings were set to constant and equal to one another.
```{r}
# prediction variance of a new lake (Crosswind), on the log scale
predvarsim <- 0.09958763
## vector of simulated YP (kg) of new lake, on the log scale
predlog10YP_kg <- rnorm(nsim, 0.6 + 0.72*log10(area), sqrt(predvarsim))
## back to natural scale
predYP_kg <- 10^predlog10YP_kg
# hist(predYP)
# the sampling variances of the weight samples are incorporated using bootstrapping!
# nsim <- 10000
YP_sim <- rep(NA, nsim)
for(i in 1:nsim) {
winter_boot <- sample(legal_nobait$kg[legal_nobait$Season == "Winter"], replace=TRUE)
summer_boot <- sample(legal_nobait$kg[legal_nobait$Season == "Summer"], replace=TRUE)
weight_winter <- 0.5
# weight_winter <- rbeta(1, 5, 5)
weight_summer <- 1 - weight_winter
Wbar <- (weight_winter * mean(winter_boot)) + (weight_summer * mean(summer_boot))
YP_sim[i] <- predYP_kg[i]/Wbar
}
# hist(YP_sim)
# mean(YP_sim)
# median(YP_sim)
# sd(YP_sim)
#
# quantile(YP_sim, c(0.025, 0.975))
```
Incorporating the uncertainty due to estimation of the LAM and resampling the fish weights in each season yields a bootstrap mean and median of `r round(mean(YP_sim), 2)` and `r round(median(YP_sim), 2)` fish, respectively (SE `r round(sd(YP_sim), 2)` fish) and an approximate 95% confidence interval of (`r round(quantile(YP_sim, 0.025), 2)` - `r round(quantile(YP_sim, 0.975), 2)`).
```{r}
# prediction variance of a new lake (Crosswind), on the log scale
predvarsim <- 0.09958763
## vector of simulated YP (kg) of new lake, on the log scale
predlog10YP_kg <- rnorm(nsim, 0.6 + 0.72*log10(area), sqrt(predvarsim))
## back to natural scale
predYP_kg <- 10^predlog10YP_kg
# hist(predYP)
# the sampling variances of the weight samples are incorporated using bootstrapping!
# nsim <- 10000
YP_sim <- rep(NA, nsim)
for(i in 1:nsim) {
winter_boot <- sample(legal_nobait$kg[legal_nobait$Season == "Winter"], replace=TRUE)
summer_boot <- sample(legal_nobait$kg[legal_nobait$Season == "Summer"], replace=TRUE)
# weight_winter <- 0.5
weight_winter <- rbeta(1, 5, 5)
weight_summer <- 1 - weight_winter
Wbar <- (weight_winter * mean(winter_boot)) + (weight_summer * mean(summer_boot))
YP_sim[i] <- predYP_kg[i]/Wbar
}
# hist(YP_sim)
# mean(YP_sim)
# median(YP_sim)
# sd(YP_sim)
#
# quantile(YP_sim, c(0.025, 0.975))
```
Incorporating the uncertainty in seasonal weighting using the Beta(5,5) distribution presented above adds a negligible amount of variance when considering uncertainty due to estimation of the LAM. Combining both sources of uncertainty and resampling the fish weights in each season yields a bootstrap mean and median of `r round(mean(YP_sim), 2)` and `r round(median(YP_sim), 2)` fish, respectively (SE `r round(sd(YP_sim), 2)` fish) and an approximate 95% confidence interval of (`r round(quantile(YP_sim, 0.025), 2)` - `r round(quantile(YP_sim, 0.975), 2)`).