Let's say you need to find the top 100 best-selling books out of a list of a million books.
What is the appropriate algorithm?
I thought it would be QuickSelect, but that only chooses one item.
Thank you.
Let's say you need to find the top 100 best-selling books out of a list of a million books.
What is the appropriate algorithm?
I thought it would be QuickSelect, but that only chooses one item.
Thank you.
What language?
As mentioned, for C++ in a general purpose collection then std::partial_sort() is probably best.
For sales numbers of a million books you're probably in an SQL database so the term will be whatever matches your DBMS and your system's best performance options. Different systems have different options SELECT TOP n
, FETCH FIRST n ROWS ONLY
, WHERE ROWNUM ≤ n
, and LIMIT n
are various ways to do it with varying support and performance.
The general algorithm that the C++ standard library uses is a good general purpose version. First you heapify the collection (a heap is a type of loose partially-sorted data structure) then pull out the top n items from the heap and finish sorting them. If you're doing it indirectly you'll be processing the indices rather than the items so you'll need to implement that as your comparison function.
If you have knowledge about the data structure, particularly if it is already fully or partially sorted, the relative amounts of data, if there is any type of indexing going on or you're using a database, or other factors, you might be able to have a more efficient first step than heapifying.
He didn't ask to get the top 100 books sorted, so perhaps this:
https://en.cppreference.com/w/cpp/algorithm/nth_element