Advertisement

Container Holy Wars (from realloc thread)

Started by February 24, 2000 01:49 PM
22 comments, last by Stoffel 24 years, 8 months ago
I was saying that one of the most common idioms is to store pointers to the base class of a hierarchy. I know that you''re still only storing one copy of each pointer you put into the container. The waste I was talking about was totally external to that, and imposed by the data structure itself. Remember, every node in a linked list looks like:

template <class T>
struct ListNode
{
ListNode<T> *prev, *next;
T *data;
};

For this example, the waste is prev and next for every element you store. In a vector, there is no direct overhead like that, but the vector doubles its allocated memory every time it runs out of space:

[0,1,2,3,4,5,-,-,-,-,-]

If we previously had 5 slots in the vector, and pushed on ''5'', we''d get the above scenario. And the vector would now be ''wasting'' 5 more slots. (The other three are the private data members of vector, which point to the beginning of storage, the end of allocated storage, and the end of used storage.)

The only reasons I chose the base class pointer thing for this example were:
a) Storing pointers for a hierarchy is a common and useful practice.
b) Storiing pointers (of any kind) makes the math of demonstrating waste within a container much easier, since everything is the same size.

All of that said, I wasn''t totally fair, because as the size of your stored data grows, (actual copies of big structures?) the linked list is still only wasting two pointers worth of memory for every item, while the vector is wasting up to double the size of all objects in the vector.

Hope this clears up my example a little.

-Brian
All data structures have their uses, it's pretty useless to use a linked list ( simple, double, circular ) when you will not insert data in the container, or when the data amount is low.

The advantage of linked list is in it's insert algorithm, the O of the (double) linked list algorithm is a constant, the O of the vector's insert algorithm is linear. The same is valid for deleting a item from the list.

So when, I think, should be used linked lists .:

* When you will have to insert or (and) delete a large amount of data from the list.
* When the amount of data in the container must be dynamic ( the overhead of resizing an array can be huge, if there's a large amount of data involved ... ).

When should be used arrays .:

* When I won't insert or delete a large amount of data from the container.
* When memory saving is a issue.
* When the amount of data in the container will not change.

Double Linked Lists are used in text editing, Quad Linked Lists are perfect for spreadshets, Circular Lists can be used for several things, just use your imagination .

Edited by - Strauss on 2/28/00 11:10:15 AM
Advertisement
I''m sorry Strauss, but I can''t totally agree. Some of what you said is true, but if all insertion or deletion is going to be at the end of the container, I have no problem using an array. Before one more person makes some ridiculous, unfounded, ignorant babbling statement about the inefficiency of increasing the size of an array, I want you ALL to build and run the following program:

int main()
{
unsigned long i;
{
cerr << "Starting list" << endl;
list<double> l;
for (i = 0; i < 1000000; ++i)
l.push_back(double(i));
cerr << "Done with list. Cleaning up" << endl;
}
{
cerr << "Starting vector" << endl;
vector<double> v;
for (i = 0; i < 1000000; ++i)
v.push_back(double(i));
cerr << "Done with vector. Cleaning up" << endl;
}
cerr << "All done" << endl;
return 0;
}

Try it. It''s fun.
-Brian
Some minor points:
- In my original post, when I rated the efficiency of each container, it was done experimentally and ranked according to speed using each to implement a circular buffer or byte-streaming buffer (something we do constantly here). Admittedly, I didn''t look at memory allocation at all because our targets are speed-limited, not memory-limited. Under this experiment, vector was fastest, list the slowest, and deque in the middle.

- About the vector doubling: you can avoid this with the reserve command if you know how much data you expect to put in the vector. Vector is the only STL container that has this pre-allocation feature.

- STL''s std::list is a double-linked list, because it gives you a bi-directional iterator (not a forward iterator).
Each of these have their strengths and weaknesses. The only real comment I have is to say that Mith was correct in thinking that a linked list is a binary tree or better yet an N-Tree. The algorithms used for insertion, deletion and traversal of an N-Tree work perfectly in a linked list. Regardless of what Knuth says, there is a relationship between linked lists, binary trees and N-trees. The latter is the most complex insertion, deletion and traversing routine. As you eliminate the number of branches to two and then one, you can begin to make assumptions that enable you to decrease the complexity of the N-tree algorithms. This is all I think he was trying to get across there.

If a Linked list is not a binary tree then please explain how an insertion of sorted data into a binary tree is not a linked list. In this case all of the left or right side pointers in each node are null. Anyone who has taken math courses in discrete structures knows that trees, rings, and lists are all fairly related in theory. Why shouldn''t this apply to data structures.

Kressilac
Derek Licciardi (Kressilac)Elysian Productions Inc.
osmanb,

"How often do you allocate arrays that are big enough to exhaust virtual memory? And if you''ve got a data set that big, you''ve got problems far beyond deciding what container to store it in."

This screams ignorance. That''s the reason for the dig about the textbook. You talk like you''ve never done anything but Windows programming and you are basing all your assumptions on that. Most systems don''t have unlimited memory. The answer to your question here is "quite often".
Mike Weldon, a.k.a. FalloutBoymweldon@san.rr.com
Advertisement
What Mithrandir stated was that a BST was a linked list, not that a linked list was a BST (or even an N-tree). I''ll agree with a linked list being an N-tree, but not a linked list being a BST or a BST being a linked list.

Yes there is a relationship, but the definition of linked list is very rigid in scope. A linked list is a specialization of N-tree, where N=1. A BST is a specialization of an N-tree where N=2. I can say 1 is an integer and I can say 2 is an integer, but I can''t say 1 is 2.

Even if you argue that a linked list is a degenerate binary tree extending either completely on the left or the right, you still cannot say a binary tree is a linked list, because the set of linked lists is contained within the set of binary trees. Furthermore a BST is a different specialization of binary tree necessarily disjoint with the linked list specialization of binary tree. If they were non-disjoint an out of order insert on a sorted linked list would result in a split in the linked list, with that split, the linked list would no longer be a linked list. Because data structures are determined by algorithmic properties, not particular states, a BST therefore cannot be a linked list.
FalloutBoy,
Actually, I almost NEVER do Windows programming. Most of my programming has been in UNIX-like environments, or doing pseudo-embedded programming for a manufacturer of Fibre Channel RAID arrays. I know very well that many systems don''t have unlimited memory. My point is that even if you''re developing on a system with extremely limited memory, and you''ve got to store enough data that you''re going to exhaust all available resources, you should probably rethink your design from square one. Picking an array versus a linked list isn''t going to solve the deep-seeded design issues you obviously introduced to your project at the start.

And I will now restate my position, again... If I know that I will be doing many insertions and deletions in the MIDDLE of a container, and have absolutely no need for random access, and preventing iterator invalidation is indescribably crucial, I will pick a list. But 99% of the time, without even thinking, I will automatically use a vector when I need a container. And I will do so confident that my decision is backed up not only by conventional wisdom, but by loads of empirical evidence.

-Brian
Hey folks, just going to chime in with my two cents:

1) What type of container to use depends completely on the problem at hand. There is no "best" container. Sometimes a list is better, sometimes a vector. The nice thing about STL is you can easily switch out the underlying implementation of the container to test different methods.
2) Don''t optimize code unless you know its a problem area from running under a profiler or your own timing code. Concentrate on the critical areas of performance.
2) Creative solutions beyond what STL provides can provide performance improvement - sometimes breaking out of what you think is true with vectors and lists can be a big help:

an example:

a while ago i was modifying an OpenInventor parser meant for loading large files (on average 35k polygons and up to 100k polygons). Due to the structure of the parser (it was done in YACC), we''d be reading in long strings of floating point numbers (i.e. tens of thousands), where we didn''t initially know how many numbers there would be until we were done. Now, YACC probably wasn''t the best choice here but i didn''t have the luxury of rewriting that code. But I noticed that nearly half of the time in the parser was spent allocating and re-allocating memory in the arrays that stored the numbers.

Rather than just use a linked list with the associated memory overhead, I wrote my own implementation of vector called bigvector that used the underlying virtual memory manager. It would reserve a large block of address space (nearly 16 meg) but would only commit 4K pages as the vector grew. So growing a vector caused no re-allocation or copying.

using this data structure increased the speed of the entire parser by a factor of two.

Remember: there''s more than one correct solution to a problem.



-vince


-vince


Thanks vince,
I agree with you. And that story is another reason I like the STL. If you do have very strict memory management requirements, remember that every container takes an extra template parameter, allowing YOU to supply the allocator.

-Brian

This topic is closed to new replies.

Advertisement