-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi all,
I am the author of the RSpectra package, and I found a potential issue of plot_varimax_z_pairs() when I tried to update RSpectra to a new version.
The source code of plot_varimax_z_pairs() is simple:
> plot_varimax_z_pairs
function (fa, factors = 1:min(5, fa$rank), ...)
{
stop_if_not_installed("dplyr")
stop_if_not_installed("GGally")
stop_if_not_installed("purrr")
fa %>% get_varimax_z(factors) %>% dplyr::select(-id) %>%
dplyr::mutate(leverage = purrr::pmap_dbl(., sum)) %>%
dplyr::sample_n(min(nrow(.), 1000), weight = leverage^2) %>%
dplyr::select(-leverage) %>% GGally::ggpairs(ggplot2::aes(alpha = 0.001),
...) + ggplot2::theme_minimal()
}Consider the following reproducible code:
data(enron, package = "igraphdata")
fa <- vsp(enron, rank = 3)
res <- fa %>% get_varimax_z(factors) %>% dplyr::select(-id) %>%
dplyr::mutate(leverage = purrr::pmap_dbl(., sum))
res[160:170, ]And the output is:
> r[160:170, ]
# A tibble: 11 × 4
z1 z2 z3 leverage
<dbl> <dbl> <dbl> <dbl>
1 1.34e- 2 6.60e- 2 4.39e- 2 1.23e- 1
2 -1.83e- 3 -2.07e- 3 4.31e- 1 4.27e- 1
3 7.73e- 2 1.79e- 1 1.68e- 2 2.73e- 1
4 -1.60e- 2 -1.47e- 1 3.91e+ 0 3.75e+ 0
5 8.22e- 2 4.80e+ 0 -1.03e- 2 4.87e+ 0
6 -5.00e-15 -2.17e-13 1.21e-12 9.90e-13
7 -8.80e- 3 -8.05e- 2 1.91e+ 0 1.83e+ 0
8 -2.01e- 4 -1.19e- 3 4.94e- 2 4.80e- 2
9 4.39e- 3 5.72e- 1 1.83e- 2 5.95e- 1
10 2.09e- 5 3.84e- 3 1.53e- 2 1.92e- 2
11 -2.63e- 2 -1.43e- 1 8.96e+ 0 8.79e+ 0
Note that the leverage value in line 6 is very close to zero. Now with the new version of RSpectra, the output is almost identical, except for some rounding errors:
> r1[160:170, ]
# A tibble: 11 × 4
z1 z2 z3 leverage
<dbl> <dbl> <dbl> <dbl>
1 0.0134 0.0660 0.0439 0.123
2 -0.00183 -0.00207 0.431 0.427
3 0.0773 0.179 0.0168 0.273
4 -0.0160 -0.147 3.91 3.75
5 0.0822 4.80 -0.0103 4.87
6 0 0 0 0
7 -0.00880 -0.0805 1.91 1.83
8 -0.000201 -0.00119 0.0494 0.0480
9 0.00439 0.572 0.0183 0.595
10 0.0000209 0.00384 0.0153 0.0192
11 -0.0263 -0.143 8.96 8.79
And here line 6 has exact zero values.
This is where the issue occurs. The sample_n(tbl, size, replace, weight) function in the implementation of plot_varimax_z_pairs() requires positive weights when size == nrow(tbl) and replace = FALSE. So in case the data set is small and leverage contains exact zeros, the sample_n() function will throw errors:
Error in `dplyr::sample_n()`:
! Can't compute indices.
Caused by error in `sample.int()`:
! too few positive probabilities
The fix is also simple: we can just make weight = leverage^2 + 1e-10, so that every weight is strictly positive, and it does not deviate much from the exact value.
Could you consider making this change in a future version? Thanks.