Skip to content

Issue of plot_varimax_z_pairs() when leverage value is zero #78

@yixuan

Description

@yixuan

Hi all,

I am the author of the RSpectra package, and I found a potential issue of plot_varimax_z_pairs() when I tried to update RSpectra to a new version.

The source code of plot_varimax_z_pairs() is simple:

> plot_varimax_z_pairs
function (fa, factors = 1:min(5, fa$rank), ...) 
{
    stop_if_not_installed("dplyr")
    stop_if_not_installed("GGally")
    stop_if_not_installed("purrr")
    fa %>% get_varimax_z(factors) %>% dplyr::select(-id) %>% 
        dplyr::mutate(leverage = purrr::pmap_dbl(., sum)) %>% 
        dplyr::sample_n(min(nrow(.), 1000), weight = leverage^2) %>% 
        dplyr::select(-leverage) %>% GGally::ggpairs(ggplot2::aes(alpha = 0.001), 
        ...) + ggplot2::theme_minimal()
}

Consider the following reproducible code:

data(enron, package = "igraphdata")
fa <- vsp(enron, rank = 3)
res <- fa %>% get_varimax_z(factors) %>% dplyr::select(-id) %>% 
        dplyr::mutate(leverage = purrr::pmap_dbl(., sum))
res[160:170, ]

And the output is:

> r[160:170, ]
# A tibble: 11 × 4
          z1        z2        z3 leverage
       <dbl>     <dbl>     <dbl>    <dbl>
 1  1.34e- 2  6.60e- 2  4.39e- 2 1.23e- 1
 2 -1.83e- 3 -2.07e- 3  4.31e- 1 4.27e- 1
 3  7.73e- 2  1.79e- 1  1.68e- 2 2.73e- 1
 4 -1.60e- 2 -1.47e- 1  3.91e+ 0 3.75e+ 0
 5  8.22e- 2  4.80e+ 0 -1.03e- 2 4.87e+ 0
 6 -5.00e-15 -2.17e-13  1.21e-12 9.90e-13
 7 -8.80e- 3 -8.05e- 2  1.91e+ 0 1.83e+ 0
 8 -2.01e- 4 -1.19e- 3  4.94e- 2 4.80e- 2
 9  4.39e- 3  5.72e- 1  1.83e- 2 5.95e- 1
10  2.09e- 5  3.84e- 3  1.53e- 2 1.92e- 2
11 -2.63e- 2 -1.43e- 1  8.96e+ 0 8.79e+ 0

Note that the leverage value in line 6 is very close to zero. Now with the new version of RSpectra, the output is almost identical, except for some rounding errors:

> r1[160:170, ]
# A tibble: 11 × 4
           z1       z2      z3 leverage
        <dbl>    <dbl>   <dbl>    <dbl>
 1  0.0134     0.0660   0.0439   0.123 
 2 -0.00183   -0.00207  0.431    0.427 
 3  0.0773     0.179    0.0168   0.273 
 4 -0.0160    -0.147    3.91     3.75  
 5  0.0822     4.80    -0.0103   4.87  
 6  0          0        0        0     
 7 -0.00880   -0.0805   1.91     1.83  
 8 -0.000201  -0.00119  0.0494   0.0480
 9  0.00439    0.572    0.0183   0.595 
10  0.0000209  0.00384  0.0153   0.0192
11 -0.0263    -0.143    8.96     8.79  

And here line 6 has exact zero values.

This is where the issue occurs. The sample_n(tbl, size, replace, weight) function in the implementation of plot_varimax_z_pairs() requires positive weights when size == nrow(tbl) and replace = FALSE. So in case the data set is small and leverage contains exact zeros, the sample_n() function will throw errors:

Error in `dplyr::sample_n()`:
! Can't compute indices.
Caused by error in `sample.int()`:
! too few positive probabilities

The fix is also simple: we can just make weight = leverage^2 + 1e-10, so that every weight is strictly positive, and it does not deviate much from the exact value.

Could you consider making this change in a future version? Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions