Container Holy Wars (from realloc thread)

Stoffel · 2000-02-29T15:02:11

There''s an STL container holy war going on in the realloc thread. I gave a presentation on STL a while ago (from a beginner''s point-of-view), and one of the things I presented was when to use each kind of container. I thought I''d throw it up here and see if there are any comments/criticisms: First, there are two groups: sequential and sorted. Sequential must be used when the order in which elements are added are important, regardless of their values. Sorted containers should be used when order is unimportant but searches by value through the container must be made often. The sequential containers are std::vector, std::deque, std::list, and the sorted containers are std::set, std::map, std::multiset and std::multimap. std::vector This is the fastest container, but should only be used when elements are to be added and removed from one end of the container. You are allowed to remove or insert elements in the middle, but it is an inefficient operation. std::deque (double-ended queue) Slower than vector, this container will let you add or remove elements from either end of a sequence. If insertions and deletions happen only at the beginning or end of the sequence, use this container. std::list Slowest of the sequential containers, the list allows constant insertion or deletion from any point in the list. It also allows one list to be inserted at any point in a second list in constant time. However, random access is not allowed, therefore searching for the Nth element takes linear time. std::map Map is implemented as a red-black tree. It allows storing of elements (map::value_type) that can be indexed by a separate key (map::key_type). Use maps if your container must be searched often and you want to reference elements by some arbitrary type or struct. Only one object can exist per key. std::multimap Same as map, only allows multiple objects with the same key. std::set Elements are stored in a red-black tree. Essentially, it is the same as a map, but there is no key. Finding an element in a set requires supplying the element for which to search. Only one instance of each element can exist. Use a set if it is simple to build the object for which you wish to search. std::multiset Same as set, but multiple objects of the same value are allowed. Well, hope that adds some fuel to the fire. =)

osmanb

2,082

February 27, 2000 11:15 AM

I was saying that one of the most common idioms is to store pointers to the base class of a hierarchy. I know that you''re still only storing one copy of each pointer you put into the container. The waste I was talking about was totally external to that, and imposed by the data structure itself. Remember, every node in a linked list looks like:

template <class T>
struct ListNode
{
ListNode<T> *prev, *next;
T *data;
};

For this example, the waste is prev and next for every element you store. In a vector, there is no direct overhead like that, but the vector doubles its allocated memory every time it runs out of space:

[0,1,2,3,4,5,-,-,-,-,-]

If we previously had 5 slots in the vector, and pushed on ''5'', we''d get the above scenario. And the vector would now be ''wasting'' 5 more slots. (The other three are the private data members of vector, which point to the beginning of storage, the end of allocated storage, and the end of used storage.)

The only reasons I chose the base class pointer thing for this example were:
a) Storing pointers for a hierarchy is a common and useful practice.
b) Storiing pointers (of any kind) makes the math of demonstrating waste within a container much easier, since everything is the same size.

All of that said, I wasn''t totally fair, because as the size of your stored data grows, (actual copies of big structures?) the linked list is still only wasting two pointers worth of memory for every item, while the vector is wasting up to double the size of all objects in the vector.

Hope this clears up my example a little.

-Brian

Strauss

122

February 27, 2000 07:43 PM

All data structures have their uses, it's pretty useless to use a linked list ( simple, double, circular ) when you will not insert data in the container, or when the data amount is low.

The advantage of linked list is in it's insert algorithm, the O of the (double) linked list algorithm is a constant, the O of the vector's insert algorithm is linear. The same is valid for deleting a item from the list.

So when, I think, should be used linked lists .:

* When you will have to insert or (and) delete a large amount of data from the list.
* When the amount of data in the container must be dynamic ( the overhead of resizing an array can be huge, if there's a large amount of data involved ... ).

When should be used arrays .:

* When I won't insert or delete a large amount of data from the container.
* When memory saving is a issue.
* When the amount of data in the container will not change.

Double Linked Lists are used in text editing, Quad Linked Lists are perfect for spreadshets, Circular Lists can be used for several things, just use your imagination

.

Edited by - Strauss on 2/28/00 11:10:15 AM

osmanb

2,082

February 28, 2000 11:15 AM

I''m sorry Strauss, but I can''t totally agree. Some of what you said is true, but if all insertion or deletion is going to be at the end of the container, I have no problem using an array. Before one more person makes some ridiculous, unfounded, ignorant babbling statement about the inefficiency of increasing the size of an array, I want you ALL to build and run the following program:

int main()
{
unsigned long i;
{
cerr << "Starting list" << endl;
list<double> l;
for (i = 0; i < 1000000; ++i)
l.push_back(double(i));
cerr << "Done with list. Cleaning up" << endl;
}
{
cerr << "Starting vector" << endl;
vector<double> v;
for (i = 0; i < 1000000; ++i)
v.push_back(double(i));
cerr << "Done with vector. Cleaning up" << endl;
}
cerr << "All done" << endl;
return 0;
}

Try it. It''s fun.
-Brian

Stoffel

Author

250

February 28, 2000 01:26 PM

Some minor points:
- In my original post, when I rated the efficiency of each container, it was done experimentally and ranked according to speed using each to implement a circular buffer or byte-streaming buffer (something we do constantly here). Admittedly, I didn''t look at memory allocation at all because our targets are speed-limited, not memory-limited. Under this experiment, vector was fastest, list the slowest, and deque in the middle.

- About the vector doubling: you can avoid this with the reserve command if you know how much data you expect to put in the vector. Vector is the only STL container that has this pre-allocation feature.

- STL''s std::list is a double-linked list, because it gives you a bi-directional iterator (not a forward iterator).

kressilac

110

February 28, 2000 02:38 PM

Each of these have their strengths and weaknesses. The only real comment I have is to say that Mith was correct in thinking that a linked list is a binary tree or better yet an N-Tree. The algorithms used for insertion, deletion and traversal of an N-Tree work perfectly in a linked list. Regardless of what Knuth says, there is a relationship between linked lists, binary trees and N-trees. The latter is the most complex insertion, deletion and traversing routine. As you eliminate the number of branches to two and then one, you can begin to make assumptions that enable you to decrease the complexity of the N-tree algorithms. This is all I think he was trying to get across there.

If a Linked list is not a binary tree then please explain how an insertion of sorted data into a binary tree is not a linked list. In this case all of the left or right side pointers in each node are null. Anyone who has taken math courses in discrete structures knows that trees, rings, and lists are all fairly related in theory. Why shouldn''t this apply to data structures.

Kressilac

Derek Licciardi (Kressilac)Elysian Productions Inc.

FalloutBoy

122

February 28, 2000 06:53 PM

osmanb,

"How often do you allocate arrays that are big enough to exhaust virtual memory? And if you''ve got a data set that big, you''ve got problems far beyond deciding what container to store it in."

This screams ignorance. That''s the reason for the dig about the textbook. You talk like you''ve never done anything but Windows programming and you are basing all your assumptions on that. Most systems don''t have unlimited memory. The answer to your question here is "quite often".

Mike Weldon, a.k.a. FalloutBoymweldon@san.rr.com

SiCrane

11,840

February 28, 2000 06:54 PM

What Mithrandir stated was that a BST was a linked list, not that a linked list was a BST (or even an N-tree). I''ll agree with a linked list being an N-tree, but not a linked list being a BST or a BST being a linked list.

Yes there is a relationship, but the definition of linked list is very rigid in scope. A linked list is a specialization of N-tree, where N=1. A BST is a specialization of an N-tree where N=2. I can say 1 is an integer and I can say 2 is an integer, but I can''t say 1 is 2.

Even if you argue that a linked list is a degenerate binary tree extending either completely on the left or the right, you still cannot say a binary tree is a linked list, because the set of linked lists is contained within the set of binary trees. Furthermore a BST is a different specialization of binary tree necessarily disjoint with the linked list specialization of binary tree. If they were non-disjoint an out of order insert on a sorted linked list would result in a split in the linked list, with that split, the linked list would no longer be a linked list. Because data structures are determined by algorithmic properties, not particular states, a BST therefore cannot be a linked list.

osmanb

2,082

February 28, 2000 11:01 PM

FalloutBoy,
Actually, I almost NEVER do Windows programming. Most of my programming has been in UNIX-like environments, or doing pseudo-embedded programming for a manufacturer of Fibre Channel RAID arrays. I know very well that many systems don''t have unlimited memory. My point is that even if you''re developing on a system with extremely limited memory, and you''ve got to store enough data that you''re going to exhaust all available resources, you should probably rethink your design from square one. Picking an array versus a linked list isn''t going to solve the deep-seeded design issues you obviously introduced to your project at the start.

And I will now restate my position, again... If I know that I will be doing many insertions and deletions in the MIDDLE of a container, and have absolutely no need for random access, and preventing iterator invalidation is indescribably crucial, I will pick a list. But 99% of the time, without even thinking, I will automatically use a vector when I need a container. And I will do so confident that my decision is backed up not only by conventional wisdom, but by loads of empirical evidence.

-Brian

vince

122

February 28, 2000 11:34 PM

Hey folks, just going to chime in with my two cents:

1) What type of container to use depends completely on the problem at hand. There is no "best" container. Sometimes a list is better, sometimes a vector. The nice thing about STL is you can easily switch out the underlying implementation of the container to test different methods.
2) Don''t optimize code unless you know its a problem area from running under a profiler or your own timing code. Concentrate on the critical areas of performance.
2) Creative solutions beyond what STL provides can provide performance improvement - sometimes breaking out of what you think is true with vectors and lists can be a big help:

an example:

a while ago i was modifying an OpenInventor parser meant for loading large files (on average 35k polygons and up to 100k polygons). Due to the structure of the parser (it was done in YACC), we''d be reading in long strings of floating point numbers (i.e. tens of thousands), where we didn''t initially know how many numbers there would be until we were done. Now, YACC probably wasn''t the best choice here but i didn''t have the luxury of rewriting that code. But I noticed that nearly half of the time in the parser was spent allocating and re-allocating memory in the arrays that stored the numbers.

Rather than just use a linked list with the associated memory overhead, I wrote my own implementation of vector called bigvector that used the underlying virtual memory manager. It would reserve a large block of address space (nearly 16 meg) but would only commit 4K pages as the vector grew. So growing a vector caused no re-allocation or copying.

using this data structure increased the speed of the entire parser by a factor of two.

Remember: there''s more than one correct solution to a problem.

-vince

-vince

osmanb

2,082

February 29, 2000 08:01 AM

Thanks vince,
I agree with you. And that story is another reason I like the STL. If you do have very strict memory management requirements, remember that every container takes an extra template parameter, allowing YOU to supply the allocator.

-Brian

Container Holy Wars (from realloc thread)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Container Holy Wars (from realloc thread)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines