Each negative sign in the proposed bottleneck solution
-bottleneck.partsort(-a, 10)[:10]
makes a copy of the data. We can remove the copies by doing
bottleneck.partsort(a, a.size-10)[-10:]
Also the proposed numpy solution
a.argsort()[-10:]
returns indices not values. The fix is to use the indices to find the values:
a[a.argsort()[-10:]]
The relative speed of the two bottleneck solutions depends on the ordering of the elements in the initial array because the two approaches partition the data at different points.
In other words, timing with any one particular random array can make either method look faster.
Averaging the timing across 100 random arrays, each with 1,000,000 elements, gives
-bn.partsort(-a, 10)[:10]: 1.76 ms per loopbn.partsort(a, a.size-10)[-10:]: 0.92 ms per loopa[a.argsort()[-10:]]: 15.34 ms per loop
where the timing code is as follows:
import timeimport numpy as npimport bottleneck as bndef bottleneck_1(a): return -bn.partsort(-a, 10)[:10]def bottleneck_2(a): return bn.partsort(a, a.size-10)[-10:]def numpy(a): return a[a.argsort()[-10:]]def do_nothing(a): return adef benchmark(func, size=1000000, ntimes=100): t1 = time.time() for n in range(ntimes): a = np.random.rand(size) func(a) t2 = time.time() ms_per_loop = 1000000 * (t2 - t1) / size return ms_per_loopt1 = benchmark(bottleneck_1)t2 = benchmark(bottleneck_2)t3 = benchmark(numpy)t4 = benchmark(do_nothing)print "-bn.partsort(-a, 10)[:10]: %0.2f ms per loop" % (t1 - t4)print "bn.partsort(a, a.size-10)[-10:]: %0.2f ms per loop" % (t2 - t4)print "a[a.argsort()[-10:]]: %0.2f ms per loop" % (t3 - t4)