--- title: "Sampling recordings - Multple Time Periods" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Sampling recordings - Multple Time Periods} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This brief vignette shows an example of a basic workflow selecting recordings for different times of day by site and year. First we'll load the packages we want to work with ```{r setup} #| message: false library(ARUtools) library(dplyr) library(purrr) library(tidyr) library(glue) library(lubridate) ``` Next we'll prepare our metadata on the recordings, by cleaning, adding site-level information and calculating the time to sunrise/sunset for each file. We'll also define recordings as either 'early' (occurring before 6am) or 'late' (occurring after 6am). ```{r} s <- clean_site_index(example_sites_clean, name_date = c("date_time_start", "date_time_end") ) m <- clean_metadata(project_files = example_files) |> add_sites(s) |> calc_sun() |> mutate( time_period = if_else(hour(date_time) < 6, "early", "late"), year = year(date) ) m ``` Time to do some sampling! First we **define the selection parameters** for each time frame we're interested in sampling. This might be "dawn" and "dusk", or in this example, "early" and "late" morning. This function will also **simulate** the selection weights so we can see what we've defined. ```{r} #| fig-width: 12 #| fig-asp: 0.7 #| out-width: 80% p <- list( "early" = sim_selection_weights(min_range = c(-70, 240)), "late" = sim_selection_weights(min_range = c(100, 300), min_mean = 200) ) p ``` Now we can **calculate selection weights** Here we'll calculate a separate set of selection weights for early and late recordings in each year. Then we'll group recordings by site, year, and time period. ```{r} w <- m |> nest(data = c(-time_period, -year)) |> mutate( params = p, sel = map2(data, params, calc_selection_weights) ) |> unnest(sel) |> select(-"data", -"params") |> mutate(selection_group = glue("{site_id}_{year}_{time_period}")) w ``` This `w` data sets contains the original sampling recordings, but now also new columns containing various measures of the probability of selection. We'll **define the number of samples** we'd like to have. ```{r} n <- w |> summarize(n_recordings = n(), .by = c("selection_group", "time_period")) |> mutate( n = if_else(time_period == "early", 5, 2), n_os = if_else(time_period == "early", floor(n * 1 / 3), floor(n * 1 / 4)), n_os = pmax(0, pmin(n_recordings - n, round(n / 3))), n = pmin(n, n_recordings) ) n ``` And finally **sample the recordings**! ```{r} g <- sample_recordings(w, n, col_site_id = selection_group, col_sel_weights = psel_normalized ) g ``` The recordings selected for sampling... ```{r} g$sites_base ```