Skip to content

fst for arbitrary data storage? #201

@wlandau

Description

@wlandau

Inspired by advice from @eddelbuettel here, I am attempting to leverage fst for arbitrary data. Essentially, I would like to take an arbitrary (and arbitrarily large) data structure in memory, serialize it to a raw vector, save it in a one-column data frame with write_fst(), and retrieve it later with read_fst(). Have you tried this before? What would it take to make it work.

Benchmarks for small-ish data are encouraging

library(fst)
wrapper <- data.frame(actual_data = raw(2^31 - 1))
system.time(write_fst(wrapper, tempfile()))
#>    user  system elapsed 
#>   0.362   0.019   0.103
system.time(writeBin(wrapper$actual_data, tempfile()))
#>    user  system elapsed 
#>   0.314   1.340   1.689

but there are some roadblocks with big data / long vectors.

library(fst)
x <- data.frame(x = raw(2^32))
#> Warning in attributes(.Data) <- c(attributes(.Data), attrib): NAs
#> introduced by coercion to integer range
#> Error in if (mirn && nrows[i] > 0L) {: missing value where TRUE/FALSE needed
x <- list(x = raw(2^32))
as.data.frame(x)
#> Warning in attributes(.Data) <- c(attributes(.Data), attrib): NAs
#> introduced by coercion to integer range
#> Error in if (mirn && nrows[i] > 0L) {: missing value where TRUE/FALSE needed
class(x) <- "data.frame"
file <- tempfile()
write_fst(x, file) # No error here...
# read_fst(file)   # but I get a segfault here.

Created on 2019-06-16 by the reprex package (v0.3.0)

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions