Top 100 algorithm

For Beginners

Started by mike74 January 20, 2024 09:23 PM

5 comments, last by krumpet291 10 months ago

mike74

Author

100

January 20, 2024 09:23 PM

Let's say you need to find the top 100 best-selling books out of a list of a million books.

What is the appropriate algorithm?

I thought it would be QuickSelect, but that only chooses one item.

Thank you.

Mike C.http://www.coolgroups.com/zoomer/http://www.coolgroups.com/ez/

Aressera

3,171

January 20, 2024 11:14 PM

https://en.cppreference.com/w/cpp/algorithm/partial_sort

frob

46,297

January 21, 2024 02:49 AM

What language?

As mentioned, for C++ in a general purpose collection then std::partial_sort() is probably best.

For sales numbers of a million books you're probably in an SQL database so the term will be whatever matches your DBMS and your system's best performance options. Different systems have different options SELECT TOP n, FETCH FIRST n ROWS ONLY, WHERE ROWNUM ≤ n, and LIMIT n are various ways to do it with varying support and performance.

The general algorithm that the C++ standard library uses is a good general purpose version. First you heapify the collection (a heap is a type of loose partially-sorted data structure) then pull out the top n items from the heap and finish sorting them. If you're doing it indirectly you'll be processing the indices rather than the items so you'll need to implement that as your comparison function.

If you have knowledge about the data structure, particularly if it is already fully or partially sorted, the relative amounts of data, if there is any type of indexing going on or you're using a database, or other factors, you might be able to have a more efficient first step than heapifying.