Answer by Tacio Medeiros for A fast way to find the largest N elements in an numpy array

I had this problem and, since this question is 5 years old, I had to redo all benchmarks and change the syntax of bottleneck (there is no partsort anymore, it's partition now).

I used the same arguments as kwgoodman, except the number of elements retrieved, which I increased to 50 (to better fit my particular situation).

I got these results:

bottleneck 1: 01.12 ms per loopbottleneck 2: 00.95 ms per looppandas      : 01.65 ms per loopheapq       : 08.61 ms per loopnumpy       : 12.37 ms per loopnumpy 2     : 00.95 ms per loop

So, bottleneck_2 and numpy_2 (adas's solution) were tied.But, using np.percentile (numpy_2) you have those topN elements already sorted, which is not the case for the other solutions. On the other hand, if you are also interested on the indexes of those elements, percentile is not useful.

I added pandas too, which uses bottleneck underneath, if available (http://pandas.pydata.org/pandas-docs/stable/install.html#recommended-dependencies). If you already have a pandas Series or DataFrame to start with, you are in good hands, just use nlargest and you're done.

The code used for the benchmark is as follows (python 3, please):

import timeimport numpy as npimport bottleneck as bnimport pandas as pdimport heapqdef bottleneck_1(a, n):    return -bn.partition(-a, n)[:n]def bottleneck_2(a, n):    return bn.partition(a, a.size-n)[-n:]def numpy(a, n):    return a[a.argsort()[-n:]]def numpy_2(a, n):    M = a.shape[0]    perc = (np.arange(M-n,M)+1.0)/M*100    return np.percentile(a,perc)def pandas(a, n):    return pd.Series(a).nlargest(n)def hpq(a, n):    return heapq.nlargest(n, a)def do_nothing(a, n):    return a[:n]def benchmark(func, size=1000000, ntimes=100, topn=50):    t1 = time.time()    for n in range(ntimes):        a = np.random.rand(size)        func(a, topn)    t2 = time.time()    ms_per_loop = 1000000 * (t2 - t1) / size    return ms_per_loopt1 = benchmark(bottleneck_1)t2 = benchmark(bottleneck_2)t3 = benchmark(pandas)t4 = benchmark(hpq)t5 = benchmark(numpy)t6 = benchmark(numpy_2)t0 = benchmark(do_nothing)print("bottleneck 1: {:05.2f} ms per loop".format(t1 - t0))print("bottleneck 2: {:05.2f} ms per loop".format(t2 - t0))print("pandas      : {:05.2f} ms per loop".format(t3 - t0))print("heapq       : {:05.2f} ms per loop".format(t4 - t0))print("numpy       : {:05.2f} ms per loop".format(t5 - t0))print("numpy 2     : {:05.2f} ms per loop".format(t6 - t0))

Answer by Tacio Medeiros for A fast way to find the largest N elements in an numpy array

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

Attharintiki Daaredhi: Bappu Gari Bommo Lyrics Translation

Online এ তৈরি করুন Fake Smart ID Card

Bureau of Internal Revenue: Regional Offices (Directory)

Gabriela Bee & Powfu – Blue – Single [iTunes Plus M4A]

99 Rain Status for Whatsapp - Best Rain Dp Collection

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

[Download MP3] Iyzeal Feat. Okpo Records –“Ekaette Ibak”

گزارش کار اموزي بيسکويت و آدامس(شرکت پارس مينو)

The 10 Tennessee Cities With The Largest Black Population For 2021

Master swindler Vinod Kalyani makes a trader his new victim, decamps with Rs...

In Court: Cases heard at Central Devon Magistrates' Court

ZARIA CUMMINGS

Black Angus Grilled Artichokes

Moondru Mudichu 15-07-2015 – Polimer tv Serial

RNS 510 C14 bricked after NAND erase

SMOKO ROBERT T. AGE 62, OF FAR...

Kfar Chabad Alert – Chaim Gajer –חיים גייר

Chittoor District Police Officers Mobile Numbers