We could implement a chunk wise sample_n / sample_frac with:
library(tidyverse)
big <- rerun(1000, iris) %>% bind_rows()
path <- tempfile()
write_csv(big, path)
library(chunked)
sample_n.chunkwise <- function(.data, size){
cmd <- lazyeval::lazy(sample_n(.data, size))
chunked:::record(.data, cmd)
}
read_csv_chunkwise(path) %>%
sample_n(1) %>%
collect()
The sample would be done in each chunk that way.
What do you think about that?
If it sounds like a good idea, let me know and I'll send you a PR.
We could implement a chunk wise sample_n / sample_frac with:
The sample would be done in each chunk that way.
What do you think about that?
If it sounds like a good idea, let me know and I'll send you a PR.