2:30:23Brian Q: It was surprising to see how much latency was on division. Square root seems to have a similar latency, but reciprocal square root (i.e. _mm_rsqrt_ps) is as quick as multiply (with just half the throughput). Why is rsqrt much faster than regular sqrt?
5🗪
2:30:23Brian Q: It was surprising to see how much latency was on division. Square root seems to have a similar latency, but reciprocal square root (i.e. _mm_rsqrt_ps) is as quick as multiply (with just half the throughput). Why is rsqrt much faster than regular sqrt?
5🗪