Advertisement

Container Holy Wars (from realloc thread)

Started by February 24, 2000 01:49 PM
22 comments, last by Stoffel 24 years, 6 months ago
There''s an STL container holy war going on in the realloc thread. I gave a presentation on STL a while ago (from a beginner''s point-of-view), and one of the things I presented was when to use each kind of container. I thought I''d throw it up here and see if there are any comments/criticisms: First, there are two groups: sequential and sorted. Sequential must be used when the order in which elements are added are important, regardless of their values. Sorted containers should be used when order is unimportant but searches by value through the container must be made often. The sequential containers are std::vector, std::deque, std::list, and the sorted containers are std::set, std::map, std::multiset and std::multimap. std::vector This is the fastest container, but should only be used when elements are to be added and removed from one end of the container. You are allowed to remove or insert elements in the middle, but it is an inefficient operation. std::deque (double-ended queue) Slower than vector, this container will let you add or remove elements from either end of a sequence. If insertions and deletions happen only at the beginning or end of the sequence, use this container. std::list Slowest of the sequential containers, the list allows constant insertion or deletion from any point in the list. It also allows one list to be inserted at any point in a second list in constant time. However, random access is not allowed, therefore searching for the Nth element takes linear time. std::map Map is implemented as a red-black tree. It allows storing of elements (map::value_type) that can be indexed by a separate key (map::key_type). Use maps if your container must be searched often and you want to reference elements by some arbitrary type or struct. Only one object can exist per key. std::multimap Same as map, only allows multiple objects with the same key. std::set Elements are stored in a red-black tree. Essentially, it is the same as a map, but there is no key. Finding an element in a set requires supplying the element for which to search. Only one instance of each element can exist. Use a set if it is simple to build the object for which you wish to search. std::multiset Same as set, but multiple objects of the same value are allowed. Well, hope that adds some fuel to the fire. =)
Well, as one of the participants in the realloc thread holy war, I''m quite happy with your synopsis. I think you did a great job of summarizing the performance characteristics of each container. (Though I suspect my adversaries will disagree.)

-Brian
Advertisement
You learn something new everyday...
oh bah, i had a huge reply, then i (damnit), accidentally deleted it all.

d''oh.

===============================================
I saw a man pursuing the horizon;
Round and round they sped. I was disturbed at this; I accosted the man.
"It is futile," I said, You can never -- "

"You lie," he cried, And ran on.
This is my signature. There are many like it, but this one is mine. My signature is my best friend. It is my life. I must master it as I must master my life. My signature, without me, is useless. Without my signature, I am useless.
anyway.

Batch commands:
So what if the cacheing is inefficient on a linked list batch? Its not really a big deal, since the time lost is almost always insignificant compared to the actual algorithm being performed.

Binary search trees:
BST''s are linked lists. really. Its a list, that has nodes, and points to the next data member in the list.
who ever said a Linked list could only point to one data member?

radix sort:
almost any sort can be accomplished radixally. Dunno what you''re talking about. The absolute fastest Sort i''ve ever seen was an 8 bit radix sort, where 8 bits of the data was masked at a time, and added to one of the 256 linked lists, then each list combined at the end. Cant do that on an array.

Quicksort:
Okay, the data needs to be in an array to start with, but how does the quicksort work internally? HMMMM? This is an algorithm where having access to an easily insertable structure is a must. Hence, linked lists.

Resizing a vector/array is a pain in the ass, and very slow.
not only do you need to re-allocate the memory, but you need to copy ALL of it over, AND delete the old memory too.


Im not anti-array, im just trying to show people that array''s arent the be all and end all of the planet.

And I''ll try to ignore that comment you made about me being ignorant.

===============================================
I saw a man pursuing the horizon;
Round and round they sped. I was disturbed at this; I accosted the man.
"It is futile," I said, You can never -- "

"You lie," he cried, And ran on.
This is my signature. There are many like it, but this one is mine. My signature is my best friend. It is my life. I must master it as I must master my life. My signature, without me, is useless. Without my signature, I am useless.
quote: Original post by Mithrandir
Batch commands:
So what if the cacheing is inefficient on a linked list batch? Its not really a big deal, since the time lost is almost always insignificant compared to the actual algorithm being performed.

I don''t know how many times I''ve seen a batch command that was simply: "Increment all data by one". We''re talking, maybe 5 clock cycles for this? In a linked list throw in the overhead of an extra indirection and potential cache miss. You might consider this a degenerate case, so here''s a more realistic example: translate all 3d-points in the list by a given vector. This is on the order of magnitude of 20 clock cycles. A cache miss is usually closer to 40 clock cycles. Even if you cache miss only 50% of the time in a linked list (and that''s being generous) that''s doubling the execution time. And remember paging is a form of cacheing. And page faults are on the order of magnitude of seconds, not clock cycles.

quote:
Binary search trees:
BST''s are linked lists. really. Its a list, that has nodes, and points to the next data member in the list.
who ever said a Linked list could only point to one data member?

Linked lists by *definition* have pointers to only one other node. And I quote: "A linked list is a data structure in which objects are arranged in a *linear* order. Unklike an array, though, in which the linear order is determined by the array indices, the order in a linked list is determined by *a* pointer in each object." From _Introduction to Algorithms_, by Cormen, Leiserson and Rivest. Note the "linear" keyword and the "a" in front of pointer, as in singular.

quote:
radix sort:
almost any sort can be accomplished radixally. Dunno what you''re talking about. The absolute fastest Sort i''ve ever seen was an 8 bit radix sort, where 8 bits of the data was masked at a time, and added to one of the 256 linked lists, then each list combined at the end. Cant do that on an array.

Yes radix sort is fast. No, it can''t be applied in almost every situation. For example: when the quantity of data is on the same order of magnitude of the available memory. And I''m sorry, but it was pioneered in digital computers on arrays, when they didn''t have the memory to waste on pointers for every data member. (The first reference to radix sort was on punch cards.) Read Knuth on this subject.

quote:
Quicksort:
Okay, the data needs to be in an array to start with, but how does the quicksort work internally? HMMMM? This is an algorithm where having access to an easily insertable structure is a must. Hence, linked lists.

You seem to be misguided on how the internals of Quicksort work. Quicksort is a divide an conquer algorithm that works *in place* on the basis of having directly indexable positions into your sets in O(1) time. Quicksort works with *swaps* not insertions. While Quicksort can be adapted to work on a linked list, the lack of spatial locality of data destroys the O(1) time on access. Instead access times becomes a function of previous accesses to other memory locations. Such a adaptation would no longer be properly classified as a quicksort. In addition the partitioning step of quicksort, especially if done in any of the median techinque, would suffer horribly under linked lists.

quote:
And I''ll try to ignore that comment you made about me being ignorant.

Mithrandir, he called you ignorant because you *are* ignorant. Go read a good book on data structures and algorithms. I recommend Cormen, Leiserson and Rivest for a start, and then Knuth, once you''ve gotten past the basics.

And I''m not bashing linked lists. It''s just that I can''t let such irresponsible statements about linked lists stand.
Advertisement
Thanks, Si
Funny thing, I too was just about to quote from Cormen, Lesierson, and Rivest.

Mithrandir:
I don''t recall ever calling you ignorant, I apologize if I did. But as SiCrane pointed out, many of the statements you made regarding lists were ignorant. (Ignorance is not something to be ashamed of, as EVERYONE is ignorant, just about different topics. I know next to nothing about Win32 programming...)

I''m also not out to eradicate use of lists, just discover why so many people here seem to like them so much more than arrays, which is exactly opposite of conventional wisdom.

Finally, to touch on one more point, resizing a vector is not really that slow. Yes, you need to allocate new memory, copy everything over, and delete the old memory. But who cares? If you''re using a data structure that hovers around a given size most of the time, it won''t matter, it will grow until it''s big enough, then stay there. If you continue to add things to it throughout the program, it''s amortized O(1) to insert in the back, which means you really won''t notice. If you do notice, just do what I do, throw a vector.reserve() at the beginning, giving yourself enough space for later.

-Brian
There is really no need for a holy war on this except for the people who think that either one is perfect solution for any job. The fact is that they both have their strengths and weaknesses. I would never use a linked list for a list of vertices since you are unlikely to ever need to do insertion or deletion from within the list, and speed of access is of utmost importance. But if I were writing a parser that will generate lists of unknown length with large blocks of data, I would never think of using an array.

osmanb, resizing an array requires you to have enough memory to do such a thing. What if your array is really big? Guess you haven''t gotten to that chapter in your textbook yet.
Mike Weldon, a.k.a. FalloutBoymweldon@san.rr.com
Ignoring your dig about me and my textbook, I have to wonder... how often do you allocate arrays that are big enough to exhaust virtual memory? And if you''ve got a data set that big, you''ve got problems far beyond deciding what container to store it in.

Lastly, food for thought:
Assume that the objects being stored in [insert favorite container here] are in fact pointers to objects. Maybe you''re one of those OO people, and you''ve got base class pointers. OK, two options:

1) Use a vector<BaseClass*>
The class stores three pointers internally, plus a MAXIMUM of twice the amount of space you currently need. So you could potentially be wasting N+3 pointers worth of memory.

2) Use a list<BaseClass*>
The class itself stores begin and end pointers, plus N nodes, each containing the actual data, and two more pointers. That gives you a total waste of 2N+2 pointers, regardless of how many are in the list. Sure, it might be more wasteful, but at least the waste is constant! (:

3) Ok, so I cheated and used a double-linked list. I''m sure someone here can come up with 100 reasons that single-linked lists are the greatest data structure ever, so we do the analysis for those too. We only need one pointer (begin) plus N nodes with the data, and next pointers. We''re now down to wasting N+1 pointers worth of memory. Woo Woo! We''re saving a whole two pointers over the vector, in all of those very common cases where we''ve JUST doubled the size of the vector, and haven''t pushed more than two things on! I''m sold.

-Brian
osmanb: I'm not sure what you're saying about base class pointers. When you declare a pointer like so:

CBaseClass* pMyObject = new CDerivedClass;

That's just one pointer. If you add that pointer to the list, it's just adding (typically) 4 bytes to the list size, no matter if it has 20 derived classes. It does add to the class's memory usage, but not the list's memory usage. You're not "wasting" pointers everytime you create a pointer to a class object. When you do this (or something like it):

pList->AddItem( pMyObject );

The AddItem list function just creates a temporary pointer (like a temporary int, long, char, etc.) to point to that object. It's not storing the actual pointer, which goes out of scope if it was declared locally. It merely created that temporary pointer to use on its own, independent of anything else. The only simililarity between the temporary pointer and the pointer you gave it are their values (which is the data being pointed to). In other words, pointers are just a special type of variable, with their own operators. Remember, C++ functions pass by value (by default).

Using a pointer to a derived class doesn't automatically bring in all of the pointers for the rest of the objects that make up the class. The vtable found in the object does all of the work when handling inheritance and virtual functions -- a pointer to the class just calls whatever functions are in the class through the vtable.

So, what are you saying?


- null_pointer


Edited by - null_pointer on 2/27/00 7:17:55 AM

This topic is closed to new replies.

Advertisement