The bottleneck
module has a fast partial sort method that works directly with Numpy arrays: bottleneck.partition()
.
Note that bottleneck.partition()
returns the actual values sorted, if you want the indexes of the sorted values (what numpy.argsort()
returns) you should use bottleneck.argpartition()
.
I've benchmarked:
z = -bottleneck.partition(-a, 10)[:10]
z = a.argsort()[-10:]
z = heapq.nlargest(10, a)
where a
is a random 1,000,000-element array.
The timings were as follows:
bottleneck.partition()
: 25.6 ms per loopnp.argsort()
: 198 ms per loopheapq.nlargest()
: 358 ms per loop