BOF_2024_output.Rmd

---
title: "Crosswind LAM Results"
author: "Matt Tyers"
date: "2024-09-05"
output: word_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, dpi=300, fig.height = 7, fig.width = 10, message=FALSE, warning=FALSE)
```

## Estimating Mean Weight

Since fish sampling occurred during two events within different seasons and mimicking sport fishing under different regulations, size selectivity was observed to differ substantially between sampling events.  Because of this, a temporally stratified estimator was used to estimate mean weight for lake trout in Crosswind lake, given below, in which $b_{March}$ and $b_{June}$ represent the weights given to the March and June sampling events.  These were treated as equal in the analysis below.  The stratified estimators can be expressed as the following, for observations $i \in 1 ... n_k$ within season k:

$$\bar{W} = \frac{(b_{March} \times \bar{W}_{March}) + (b_{June} \times \bar{W}_{June})}{b_{March} + b_{June}}$$

and

$$\hat{V}(\bar{W}) = \frac{\left(b_{March}^2 \times \hat{V}(\bar{W}_{March})\right) + \left(b_{June}^2 \times \hat{V}(\bar{W}_{June})\right)}{(b_{March} + b_{June})^2}$$

in which

$$\bar{W}_k = \frac{\sum_{i=1}^{n_k}W_{k[i]}}{n_k}$$

and

$$\hat{V}(\bar{W}_k) = \frac{\sum_{i=1}^{n_k}(W_{k[i]}-\bar{W}_k)^2}{n_k(n_k-1)}$$

```{r}
library(tidyverse)

## per Corey email 9/4/2024
legal_nobait <- read_csv("BOF_2024/flat_data/legal_nobait.csv")

# ## the data I was sent before
# march_data <- read_csv("Analysis_2024/flat_data/March_Catch_Data.csv")[,-(14:21)]
# june_data <- read_csv("Analysis_2024/flat_data/June_Catch_Data.csv")
# 
# all_data <- data.frame(event = c(rep("March", nrow(march_data)),
#                                  rep("June", nrow(june_data))),
#                        FL = c(march_data$FL, june_data$FL),
#                        TL = c(march_data$FL, june_data$TL),
#                        weight_kg = c(march_data$kg, june_data$Kg),
#                        bait = c(march_data$Bait, june_data$Bait),
#                        gear = c(march_data$Gear, rep(NA, nrow(june_data))),
#                        technique = c(march_data$Technique, rep(NA, nrow(june_data))))
# 
# ## reconstructing legal_nobait from previous data - does it match?
# legal_nobait1 <- filter(all_data, (event == "March") | (event == "June" & bait == "N"))
# 
# (sort(legal_nobait$kg) == sort(legal_nobait1$weight_kg)) # %>% all
# ## it does match!


########### LAM inputs

area <- 3717
log10_area <- log10(area)
log10_YP <- 0.72*log10_area + 0.6
YP_kg <- 10^log10_YP


########### estimating mean weight of harvest
season_weights <- c(0.5, 0.5)   # summer, then winter

# by season
season_Wbar <- tapply(legal_nobait$kg, legal_nobait$Season, mean)
season_n <- tapply(legal_nobait$kg, legal_nobait$Season, length)
season_varWbar <- tapply(legal_nobait$kg, legal_nobait$Season, var)/season_n
season_seWbar <- sqrt(season_varWbar)

# combined
Wbar <- sum(season_weights*season_Wbar)/sum(season_weights)
varWbar <- sum((season_weights^2)*season_varWbar)/(sum(season_weights)^2)
seWbar <- sqrt(varWbar)
```

For this purpose, the weights of `r season_n[2]` fish in the March sample were used, as well as `r season_n[1]` fish from the June sample that were not caught using bait.  The purpose of this was to provide representative samples of harvest that could occur during the Winter and Summer sport fishing seasons, under current regulations.

\pagebreak

### Results

Mean weights by sample are summarized below.

```{r, results='asis'}
tab1 <- cbind(season_n, season_Wbar, season_seWbar)
colnames(tab1) <- c("n","Wbar_kg","SE(Wbar_kg)")
knitr::kable(tab1[2:1,], digits=3)
```

The mean weight of lake trout harvested from Crosswind lake was estimated as `r round(Wbar, 3)` kg (SE `r round(seWbar, 3)` kg) using stratification.

## LA Model

The LA model (Evans et al. 1991) was used to estimate yield potential of lake trout in Crosswind Lake as:

$$log_{10}(\hat{YP}) = 0.60 + 0.72log_{10}(A)$$

in which $\hat{YP}$ represents the estimated yield potential in kg biomass per year, and $A$ represents the lake area in hectares.

The unbiased yield potential (delta method; Seber 1982), in terms of number of lake trout, was estimated as:

$$\hat{YP}_n=\frac{\hat{YP}}{\bar{W}} + \frac{\hat{YP}}{\bar{W}^3}\hat{V}(\bar{W})$$

in which $\bar{W}$ represents the estimated mean weight of lake trout, using data from both the March and June sampling events.

The approximate variance was estimated as the following, treating estimated yield potential as constant:

$$\hat{V}_{naive}(\hat{YP}_n) = \frac{\hat{YP}^2}{\bar{W}^4}\hat{V}(\bar{W})$$

```{r}
# LAM assuming YP is known without error
YP_n <- YP_kg/Wbar + YP_kg/(Wbar^3)*varWbar
var_YP_n <- (YP_kg^2)/(Wbar^4)*varWbar
se_YP_n <- sqrt(var_YP_n)
ci_YP_n <- YP_n + c(-1,1)*1.96*se_YP_n
```

\pagebreak

### Results - seasons equally weighted

Using this methodology and weighting the winter and summer samples equally, the yield potential in numbers of fish was estimated as `r round(YP_n, 2)` fish (SE `r round(se_YP_n, 2)`) fish, with an approximate 95% Wald confidence interval of (`r round(ci_YP_n[1], 2)` - `r round(ci_YP_n[2], 2)`).

### Sensitivity to choice of weighting

To illustrate how inferences might differ under different seasonal weighting schemes, results were calculated for a sequence of weightings ranging from 100% Winter (0% Summer) to 0% Winter (100% Summer).  Results are displayed below.

```{r}
## investigating the effect of different weightings
possible_wts <- seq(.01, .99, by=.01)
ci_YP_n_all <- matrix(nrow=length(possible_wts), ncol=2)
YP_n_all <- rep(NA, length(possible_wts))
for(i in seq_along(possible_wts)) {
  season_weights <- c(possible_wts[i], 1-possible_wts[i])   # summer, then winter

  # by season
  season_Wbar <- tapply(legal_nobait$kg, legal_nobait$Season, mean)
  season_n <- tapply(legal_nobait$kg, legal_nobait$Season, length)
  season_varWbar <- tapply(legal_nobait$kg, legal_nobait$Season, var)/season_n
  season_seWbar <- sqrt(season_varWbar)

  # combined
  Wbar <- sum(season_weights*season_Wbar)/sum(season_weights)
  varWbar <- sum((season_weights^2)*season_varWbar)/(sum(season_weights)^2)
  seWbar <- sqrt(varWbar)

  # LAM assuming YP is known without error
  YP_n <- YP_kg/Wbar + YP_kg/(Wbar^3)*varWbar
  var_YP_n <- (YP_kg^2)/(Wbar^4)*varWbar
  se_YP_n <- sqrt(var_YP_n)
  ci_YP_n <- YP_n + c(-1,1)*1.96*se_YP_n

  YP_n_all[i] <- YP_n
  ci_YP_n_all[i,] <- ci_YP_n
}
plot(NA, xlim=0:1, ylim=range(ci_YP_n_all),
     ylab="YP_n", xlab="Seasonal Weights",
     xaxt="n")
lines(possible_wts, YP_n_all)
lines(possible_wts, ci_YP_n_all[,1], lty=2)
lines(possible_wts, ci_YP_n_all[,2], lty=2)
legend("topleft", lty=1:2, legend=c("Estimate", "95% CI"))

axis(side=1, at=seq(0, 1, by=0.2),
     labels = paste0(seq(0, 100, 20), "% / ",
                    seq(100, 0, -20), "%"))
```

### Incorporating uncertainty in true proportions of harvest by season

An alternate analysis was performed, with the intent of incorporating the uncertainty in the true proportions of harvest by season.  For 100,000 iterations of Monte Carlo simulation, seasonal weightings were drawn from a Beta(5,5) distribution (plotted below), in order to approximate our state of knowledge of the true proportions by season.  Use of a Beta(5,5) distribution can be interpreted as `r 100*round((pbeta(0.6, 5, 5) - pbeta(0.4, 5, 5)), 2)`% certainty that the proportion of total harvest occurring in Winter is between 40% and 60%, and `r 100*round((pbeta(0.75, 5, 5) - pbeta(0.25, 5, 5)), 2)`% certainty that it is between 25% and 75%.

```{r, fig.height=4, fig.width=7}
curve(dbeta(x, 5, 5))
```

For each iteration, the weight samples for each season were resampled with replacement in order to incorporate sampling variance, then calculations were performed for mean weights for each season and then overall mean weight, using the randomly-drawn seasonal weightings.  Finally, the estimated Yield Potential in numbers of fish was calculated for each iteration, thus approximating the sampling distribution of $\hat{YP}_n$, incorporating the uncertainty in true proportions of harvest by season.

```{r, fig.height=3.5, fig.width=7}
## the sampling variances of the weight samples are incorporated using bootstrapping!
nsim <- 100000
YP_sim <- rep(NA, nsim)
for(i in 1:nsim) {
  winter_boot <- sample(legal_nobait$kg[legal_nobait$Season == "Winter"], replace=TRUE)
  summer_boot <- sample(legal_nobait$kg[legal_nobait$Season == "Summer"], replace=TRUE)

  # weight_winter <- 0.5
  weight_winter <- rbeta(1, 5, 5)
  weight_summer <- 1 - weight_winter

  Wbar <- (weight_winter * mean(winter_boot)) + (weight_summer * mean(summer_boot))
  YP_sim[i] <- YP_kg/Wbar
}
```

The distribution of simulated $\hat{YP}_n$ is shown below.

```{r, fig.height=3.5, fig.width=7}
hist(YP_sim)
# mean(YP_sim)
# median(YP_sim)
# sd(YP_sim)
# 
# quantile(YP_sim, c(0.025, 0.975))
```

Incorporating the uncertainty in true proportions of harvest by season using a Beta(5,5) distribution and resampling the fish weights in each season yields a bootstrap mean of `r round(mean(YP_sim), 2)` fish (SE `r round(sd(YP_sim), 2)` fish) and an approximate 95% confidence interval of (`r round(quantile(YP_sim, 0.025), 2)` - `r round(quantile(YP_sim, 0.975), 2)`).

### Incorporating uncertainty due to estimation in the Lake Area Model

It is important to recognize the variance due to prediction from the LAM.  The LA model reported by Evans et al. (1991) is expressed as a linear regression on the log-log scale, with an $R^2$ of 0.69 and n=43.  Sampling from the lake area data reported by Evans et al. and simulating YP values such that the estimated regression parameters are equivalent to those reported provides an approximate estimate of the model residual variance and thereby prediction variance for a new lake with the same area as Crosswind Lake.  

The previous analysis was repeated, using values of $\hat{YP}$ simulated using the prediction variance estimated for Crosswind Lake, and back-transformed to the natural scale.  To isolate the effect of uncertainty due to estimation of the LAM, seasonal weightings were set to constant and equal to one another.

```{r}
# prediction variance of a new lake (Crosswind), on the log scale
predvarsim <- 0.09958763

## vector of simulated YP (kg) of new lake, on the log scale
predlog10YP_kg <- rnorm(nsim, 0.6 + 0.72*log10(area), sqrt(predvarsim))

## back to natural scale
predYP_kg <- 10^predlog10YP_kg
# hist(predYP)

# the sampling variances of the weight samples are incorporated using bootstrapping!
# nsim <- 10000
YP_sim <- rep(NA, nsim)
for(i in 1:nsim) {
  winter_boot <- sample(legal_nobait$kg[legal_nobait$Season == "Winter"], replace=TRUE)
  summer_boot <- sample(legal_nobait$kg[legal_nobait$Season == "Summer"], replace=TRUE)

  weight_winter <- 0.5
  # weight_winter <- rbeta(1, 5, 5)
  weight_summer <- 1 - weight_winter

  Wbar <- (weight_winter * mean(winter_boot)) + (weight_summer * mean(summer_boot))
  YP_sim[i] <- predYP_kg[i]/Wbar
}
# hist(YP_sim)
# mean(YP_sim)
# median(YP_sim)
# sd(YP_sim)
# 
# quantile(YP_sim, c(0.025, 0.975))
```


Incorporating the uncertainty due to estimation of the LAM and resampling the fish weights in each season yields a bootstrap mean and median of `r round(mean(YP_sim), 2)` and `r round(median(YP_sim), 2)` fish, respectively (SE `r round(sd(YP_sim), 2)` fish) and an approximate 95% confidence interval of (`r round(quantile(YP_sim, 0.025), 2)` - `r round(quantile(YP_sim, 0.975), 2)`).

```{r}
# prediction variance of a new lake (Crosswind), on the log scale
predvarsim <- 0.09958763

## vector of simulated YP (kg) of new lake, on the log scale
predlog10YP_kg <- rnorm(nsim, 0.6 + 0.72*log10(area), sqrt(predvarsim))

## back to natural scale
predYP_kg <- 10^predlog10YP_kg
# hist(predYP)

# the sampling variances of the weight samples are incorporated using bootstrapping!
# nsim <- 10000
YP_sim <- rep(NA, nsim)
for(i in 1:nsim) {
  winter_boot <- sample(legal_nobait$kg[legal_nobait$Season == "Winter"], replace=TRUE)
  summer_boot <- sample(legal_nobait$kg[legal_nobait$Season == "Summer"], replace=TRUE)

  # weight_winter <- 0.5
  weight_winter <- rbeta(1, 5, 5)
  weight_summer <- 1 - weight_winter

  Wbar <- (weight_winter * mean(winter_boot)) + (weight_summer * mean(summer_boot))
  YP_sim[i] <- predYP_kg[i]/Wbar
}
# hist(YP_sim)
# mean(YP_sim)
# median(YP_sim)
# sd(YP_sim)
# 
# quantile(YP_sim, c(0.025, 0.975))
```


Incorporating the uncertainty in seasonal weighting using the Beta(5,5) distribution presented above adds a negligible amount of variance when considering uncertainty due to estimation of the LAM.  Combining both sources of uncertainty and resampling the fish weights in each season yields a bootstrap mean and median of `r round(mean(YP_sim), 2)` and `r round(median(YP_sim), 2)` fish, respectively (SE `r round(sd(YP_sim), 2)` fish) and an approximate 95% confidence interval of (`r round(quantile(YP_sim, 0.025), 2)` - `r round(quantile(YP_sim, 0.975), 2)`).