---
title: "Sampling recordings - Multple Time Periods"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Sampling recordings - Multple Time Periods}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This brief vignette shows an example of a basic workflow selecting recordings
for different times of day by site and year.

First we'll load the packages we want to work with
```{r setup}
#| message: false
library(ARUtools)
library(dplyr)
library(purrr)
library(tidyr)
library(glue)
library(lubridate)
```

Next we'll prepare our metadata on the recordings, by cleaning, adding site-level
information and calculating the time to sunrise/sunset for each file. 
We'll also define recordings as either 'early' (occurring before 6am) or 'late'
(occurring after 6am).

```{r}
s <- clean_site_index(example_sites_clean,
  name_date = c("date_time_start", "date_time_end")
)
m <- clean_metadata(project_files = example_files) |>
  add_sites(s) |>
  calc_sun() |>
  mutate(
    time_period = if_else(hour(date_time) < 6, "early", "late"),
    year = year(date)
  )
m
```

Time to do some sampling!

First we **define the selection parameters** for each time frame we're interested 
in sampling. This might be "dawn" and "dusk", or in this example, "early" and "late" 
morning.

This function will also **simulate** the selection weights so we can see what 
we've defined.

```{r}
#| fig-width: 12
#| fig-asp: 0.7
#| out-width: 80%
p <- list(
  "early" = sim_selection_weights(min_range = c(-70, 240)),
  "late" = sim_selection_weights(min_range = c(100, 300), min_mean = 200)
)
p
```

Now we can **calculate selection weights**

Here we'll calculate a separate set of selection weights for early and late 
recordings in each year. Then we'll group recordings by site, year, and time period.
```{r}
w <- m |>
  nest(data = c(-time_period, -year)) |>
  mutate(
    params = p,
    sel = map2(data, params, calc_selection_weights)
  ) |>
  unnest(sel) |>
  select(-"data", -"params") |>
  mutate(selection_group = glue("{site_id}_{year}_{time_period}"))
w
```

This `w` data sets contains the original sampling recordings, but now also 
new columns containing various measures of the probability of selection.


We'll **define the number of samples** we'd like to have.
```{r}
n <- w |>
  summarize(n_recordings = n(), .by = c("selection_group", "time_period")) |>
  mutate(
    n = if_else(time_period == "early", 5, 2),
    n_os = if_else(time_period == "early", floor(n * 1 / 3), floor(n * 1 / 4)),
    n_os = pmax(0, pmin(n_recordings - n, round(n / 3))),
    n = pmin(n, n_recordings)
  )
n
```

And finally **sample the recordings**!
```{r}
g <- sample_recordings(w, n,
  col_site_id = selection_group,
  col_sel_weights = psel_normalized
)
g
```

The recordings selected for sampling...
```{r}
g$sites_base
```