-
Notifications
You must be signed in to change notification settings - Fork 42
Closed
Description
Inspired by advice from @eddelbuettel here, I am attempting to leverage fst for arbitrary data. Essentially, I would like to take an arbitrary (and arbitrarily large) data structure in memory, serialize it to a raw vector, save it in a one-column data frame with write_fst(), and retrieve it later with read_fst(). Have you tried this before? What would it take to make it work.
Benchmarks for small-ish data are encouraging
library(fst)
wrapper <- data.frame(actual_data = raw(2^31 - 1))
system.time(write_fst(wrapper, tempfile()))
#> user system elapsed
#> 0.362 0.019 0.103
system.time(writeBin(wrapper$actual_data, tempfile()))
#> user system elapsed
#> 0.314 1.340 1.689but there are some roadblocks with big data / long vectors.
library(fst)
x <- data.frame(x = raw(2^32))
#> Warning in attributes(.Data) <- c(attributes(.Data), attrib): NAs
#> introduced by coercion to integer range
#> Error in if (mirn && nrows[i] > 0L) {: missing value where TRUE/FALSE needed
x <- list(x = raw(2^32))
as.data.frame(x)
#> Warning in attributes(.Data) <- c(attributes(.Data), attrib): NAs
#> introduced by coercion to integer range
#> Error in if (mirn && nrows[i] > 0L) {: missing value where TRUE/FALSE needed
class(x) <- "data.frame"
file <- tempfile()
write_fst(x, file) # No error here...
# read_fst(file) # but I get a segfault here.Created on 2019-06-16 by the reprex package (v0.3.0)