-
Notifications
You must be signed in to change notification settings - Fork 48
Implement reductions with optional skipna argument #101
Conversation
|
It seems like some of the nightlies have not been updated in 11 days, so this is going to have to wait. |
|
Now with reductions across dimensions! (although not yet for |
|
What should reductions do if there are no non-NA values and
Should the same rules apply when reducing a non-empty vector that contains NAs? R does the same things with However, generalization to reductions across dimensions suggests that maybe |
|
Okay, this should be ready to merge. With few NAs, performance is quite good, typically within 50% of Base for both |
Implement reductions with optional skipna argument
This implements reductions with an optional
skipnaargument, fixing #3, fixing JuliaData/DataFrames.jl#259, and fixing JuliaData/DataFrames.jl#354, and mostly superseding #32 (except forskewnessandkurtosis). The following reductions are implemented, all in terms ofmapreduce:sumprodmaximumminimumBase.sumabsBase.sumabs2varvarmstdstdmWith
skipna=false, for some reductions that are guaranteed to return NA when any input is NA, we first check for NAs and then call_mapreducefrom Base on thedataArray if there are none. This has basically no overhead. For other reductions, we simply call the implementations in Base on the DataArray, which is slow due to type instability in indexing, but is the most obvious way to guarantee correctness.With
skipna=true, we first check if there are NA values. If not, we call_mapreducefrom Base on thedataArray. If there are, we use either an algorithm that branches on NA or an algorithm that does not branch on NA depending on the types and functors involved. For summation, we use a pairwise algorithm that divides blocks along BitArray chunk boundaries based on the number of non-NA elements. For blocks that contain no NAs, we call the implementation in Base. This has little overhead when there are few NA elements.The tests currently fail on Travis because I based the implementation of
varoff of JuliaLang/julia#7502 and so it has slightly different semantics than the current Julia master, but the tests here are pretty comprehensive, and as soon as that is merged this should be good to go.