-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
I have a function which looks vaguely like this:
struct Rect { real: f64, imag: f64 }
struct KetRef<'a> { real: &'a [f64], imag: &'a [f64] }
impl<'a> KetRef<'a> {
pub fn dot(self, other: KetRef) -> Rect {
assert_eq!(self.real.len(), other.real.len());
assert_eq!(self.real.len(), other.imag.len());
assert_eq!(self.real.len(), self.imag.len());
zip!(self.real, self.imag, other.real, other.imag)
.map(|(ar, ai, br, bi)| {
let real = ar * br + ai * bi;
let imag = ar * bi - ai * br;
Rect { real, imag }
})
.fold(Rect::zero(), |a,b| a + b)
}
}Converting it to use faster requires two passes over the arrays; I am unable to produce both real and imag in one pass because simd_map requires the function output to be a single vector:
pub fn dot<K: AsKetRef>(self, other: K) -> Rect {
use ::faster::prelude::*;
let other = other.as_ket_ref();
assert_eq!(self.real.len(), other.real.len());
assert_eq!(self.real.len(), other.imag.len());
assert_eq!(self.real.len(), self.imag.len());
let real = (
self.real.simd_iter(f64s(0.0)),
self.imag.simd_iter(f64s(0.0)),
other.real.simd_iter(f64s(0.0)),
other.imag.simd_iter(f64s(0.0)),
).zip().simd_map(|(ar, ai, br, bi)| {
ar * br + ai * bi
}).simd_reduce(f64s(0.0), |acc, v| acc + v).sum();
let imag = (
self.real.simd_iter(f64s(0.0)),
self.imag.simd_iter(f64s(0.0)),
other.real.simd_iter(f64s(0.0)),
other.imag.simd_iter(f64s(0.0)),
).zip().simd_map(|(ar, ai, br, bi)| {
ar * bi - ai * br
}).simd_reduce(f64s(0.0), |acc, v| acc + v).sum();
Rect { real, imag }
}So is it faster? Well, actually, yes! It is plenty faster... up to a point:
Change in run-time for different ket lengths
dot/16 change: -33.973%
dot/64 change: -29.575%
dot/256 change: -26.762%
dot/1024 change: -34.054%
dot/4096 change: -36.297%
dot/16384 change: -7.3379%
Yikes! Once we hit 16384 elements there is almost no speedup!
I suspect it is because at this point, memory has become the bottleneck, and most of what was gained by using SIMD was lost by making two passes over the arrays. It would be nice to have an API that allowed this do be done in one pass by allowing a mapping function to return a tuple (producing a new PackedZippedIterator or similar).
Metadata
Metadata
Assignees
Labels
No labels