Advertisement

Realloc for C++

Started by February 23, 2000 03:24 PM
14 comments, last by NightShade 24 years, 8 months ago
This is getting off topic, but I have to wonder why people prefer linked lists for dynamic storage. The rule of thumb I follow (and also, for example, Alex Stepanov follows) is: "Unless you have some earth-shattering reason to do otherwise, use a vector<>". Of course, if you don''t use STL, that''s one thing, but since I started using STL, I haven''t written a container class from scratch once. (It''s been about 4 years now.)

Fundamentally, linked lists incur much more overhead than dynamic arrays, and for most (not all) applications, vector puts list to shame in terms of performance.

null_pointer? shmaLbus? Any input? I''m just curious.

-Brian

PS: Actually, Stepanov is pretty damn funny in person. I imagined he''d be very "out there" as far as abstractions and bizarre data structures, but he''s absolutely focued on low-level performance and optimization. Hence, his undying love of arrays, and hatred of linked lists.
Arrays are only faster than linked lists when you know where the data you want to access is.

If you just have a batch command to execute on an entire list, linked lists are not much slower, and if programmed well, are just as fast as arrays.

If you need to search through a list, the fastest way would be to use a Hash table (which is an array with linked lists) or a binary tree (which is a special linked list)

If you need to sort through a list, the fastest way would be to use a radix sort (best accomplished with linked lists) or a Quick sort (best accomplished with linked lists).

if you need to insert data into the middle of a list, or delete data from the middle of a list, linked lists are much faster than arrays.

If you need to resize a linked list, no problem at all, since that is their nature.

Linked lists have many uses, so do not discount them.


For rapid creation and deletion of objects, however (say, bullets in a game?), Linked lists are not ideal. The memory managing is slow.

For a list of monsters in a game though, why wouldn''t you use a linked list?

Save memory, have no limits, and be flexible.

===============================================
I saw a man pursuing the horizon;
Round and round they sped. I was disturbed at this; I accosted the man.
"It is futile," I said, You can never -- "

"You lie," he cried, And ran on.
This is my signature. There are many like it, but this one is mine. My signature is my best friend. It is my life. I must master it as I must master my life. My signature, without me, is useless. Without my signature, I am useless.
Advertisement
The only time I use an array for dynamic storage is when I am being lazy and speed of implementation is more important than efficient memory usage. I deal mostly with really large sets of data though so I have grown quite accustomed to using linked lists. I even have some code that I reuse quite often that maintains a sorted doublely-linked list and abstracts it to a convenient set of interface functions. No template crap necessary.

Mike Weldon, a.k.a. FalloutBoymweldon@san.rr.com
This one is dedicated to SteveC''s post
Quote:
"I would highly recommend against using memcpy on classes.

This is a shallow copy and may not have the desired results.

In the case of when you are resizing an array most of the time that will be OK, however, if you have any self-referencing member pointers then you will get bad results

e.g.

class PointsToSelf
{
private:

long iVal;
long iOtherVal;
long iAnotherVal;

long *iCurTargetVal;

};

and then you set

iCurTargetVal = &iOtherVal


The problem is that when you memcpy the data to the new location the iCurTargetVal will be pointing to the same pointer value that it was before, which is likely to be off in the middle of nowhere.

Now, granted this example is a bit contrived and most folks don''t use self-referencing pointers.


Personally, I would go with std::vector.

It has performance characteristics ( I think ) that are similar to an array ( constant time to access an element regardless of array size ) and has the ability to grow.
Although this process is known to be slow. ( But you can seed it with an initial size so that it doesn''t have to allocate more memory until it goes past that boundary )"
(sorry, don''t know the tag for quotes)

The problem you appointed is that the object that pointer''s pointing to has changed location (it could to a pointer outside the array pointing to one of the array''s members, for instance.) it won''t go away with the use of std::vector, unless you created a copy constructor (= operator?) that specifically updates that pointer upon copy. And if that pointer is pointing to another object in the array, you''ll have a hard time coding such a func(calculating the difference between the this pointer and the old object and adding it? (not sure if it would work))
Were I''m trying to get is, if you move an object in memory, you''ll allways have to update any pointer to that object, or face the consequences. And this is true wether you''re using memcpy() or not. (Or at least that''s my reconing)
OK, Mithrandir, I'm sorry but I have to disagree with much of what you've said:

I'm going to start this post with a lemma which I will (for the sake of argument) call osmanb's lemma:

Your computers cache is designed to leverage the principle of locality. Arrays (hence vectors) fully leverage that. Linked lists destroy any performane gain your cache might have given you by wrecking locality. Your data are stored all over memory, preventing prefetch from having any useful consequences.

> Arrays are only faster than linked lists when you know
> where the data you want to access is.

I don't think so. But this is just a blanket statement, so I'll wait to pick apart your sub-arguments below.

> If you just have a batch command to execute on an entire
> list, linked lists are not much slower, and if programmed
> well, are just as fast as arrays.

They can be CLOSE to being as fast as an array for iterating over the container -- The same Big-O (more importantly, the same theta). However, any such iterating loop requires that you increment your iterator. For a vector/array, that requires incrementing a pointer == one add instruction (since the compiler know the type of the pointer and can statically determine the number of bytes to add to the pointer at compile time.) For a list, that requires a member access to your node structure [cur = cur->next, which is really cur = (*cur).next]. (Step ##) For sake of argument, assume you've read something from *cur, so *cur.next has been loaded into cache on the read miss. You still have to dereference cur, then do an addition to get the location of cur->next. We've already surpassed the work of an array. But... we're going to get hosed by osmanb's lemma here, as the next element probably won't be in cache when we go to use it.

Thing's get worse. When I said (in step ##) that we assume that something in *cur has been READ. I really meant it. What if you're assigning to something in *cur? (During initialization of each element, for example.) If your cache doesn't do a read on a write-miss (very possible) then you're going to have to go to main memory just to do the increment statement [cur = cur->next], which will be an order of magnitude slower than [cur++]. the array case. Ouch. Moving on...

> If you need to search through a list, the fastest way
> would be to use a Hash table (which is an array with
> linked lists) or a binary tree (which is a special linked
> list)

Huh? Hirst of all, hash tables shouldn't enter into this discussion. A hash table is neither an array nor a linked list. It's a container all its own, with its own properties, which can be achieved using an array of linked lists, but also using an array alone, with probing. Regardless, how does this help your argument? Hash tables are explicitly designed (when using separate chaining) to keep the linked lists SHORT to overcome the fact that they require linear time to search. The true power of a hash table derives from the constant time random access of the array. (See the bottom of my post for a summary, including this point.)

Second, a binary tree is NOT a special linked list. It's a binary tree. That's like saying that a scalene triangle is a special circle. Again, a binary tree (actually, you're talking about a binary search tree, which is a special case of a binary tree) has special properties which do not derive from any distant relationship to a linked list, but from the ordering of elements within the structure. All of which can be obtained with an array using binary search. And algorithmically both may be the same speed, but the array will be FASTER. (For all of the reasons explained up until now.) Furthermore, the space will again be of the same order, but the array will be smaller. Of course, if you need to keep adding and removing elements, then you probably want a balanced binary search tree or hash table, but none of these is linked list, so I still don't see your point.

> If you need to sort through a list, the fastest way would
> be to use a radix sort (best accomplished with linked
> lists) or a Quick sort (best accomplished with linked
> lists).

OK. Now you've moved from naive ignorance to flat out lies. I don't want to get into a holy war over containers, but claiming that the fastest way to sort things is to have them stored in a linked list (and worse, to then use quick sort) is ludicrous.

I'm not going to touch on radix sort (because it's not generally applicable.) If you happen to be able to use radix sort for your application, that's great, but I get the impression that people are talking about storing complex user defined types, where radix sort usually won't work.

Where do I begin? First of all, using quick sort on linked lists, ummm, how do I put this... it DOESN'T WORK. You need random access to elements perform the algorithm. That's why STL has a separate member sort() for lists. The correct way to sort a list is with merge sort, which is still O(n lg n), but is nowhere near as fast as quick sort on an array (which DOES work). [An aside: quick sort isn't really the best general purpose sorting algorithm either. Try IntroSort (introspection sort) instead. It's based on quick sort, but detects the worst case (n^2) behavior of quick sort as its happening, and dynamically switches to heap sort to maintain (n log n) running time. It's part of SGI's (and hopefully everyone else's) STL, and was invented by my thesis advisor, Dave Musser.]

> if you need to insert data into the middle of a list, or
> delete data from the middle of a list, linked lists are
> much faster than arrays.

Yes. I must grant you these victories, as linked lists require constant time for insertion at any known point, while arrays require linear time for insertion anywhere but the end.

> If you need to resize a linked list, no problem at all,
> since that is their nature.

And if you need to resize an array, no problem, since that's what vector<> is for. Sure, writing an array class that expands might suck more than writing a linked list, but that's why we have libraries. Let someone else do it. Then pick the best component from that library.

> Linked lists have many uses, so do not discount them.

I wouldn't dream of it... But I also think people on this board have some really misguided and misinformed views of linked lists vs. vectors.

=== Important Summary ===

Ultimately, you need to look at what you're trying to do, and decide what operations are important to you. If you really NEED certain runtime guarantees on certain operations find the container that provides them and use it. Often though, none of those are important (you're just going to iterate over it repeatedly) in which case I strongly suggest you use vector. Why? It's the simplest, and fastest container for doing so. It's important to think of containers not in terms of implementaion, or some other contorted view. Instead, view every container as a set of bounds on the runtime for each elementary operation:

Vector: O(1) random access, amortized O(1) insertion/deletion at end, O(N) insertion/deletion in middle.

List: O(1) insertion/deletion at any known point, O(N) access for arbitrary element [often a prerequisite for insertion or deletion in middle]

Balanced binary search tree: O(log n) insertion/deletion/location of any element.

Hash Table: amortized O(1) insertion/deletion/location of any element. [Seems ideal, but lacks an ordering of elements, which is the primary drawback.]

---

-Brian Osman

Edited by - osmanb on 2/24/00 12:18:19 PM
to alexmoura

You're absolutely right. The copy constructor has to exist and be correct. ( You can't get away with using the default copy constructor in that case. )

I guess I was assuming such a copy constructor existed, sorry for not putting it in the class declaration to make it clearer

The important thing to remember is that even if the copy constructor existed and was correct it would not be invoked by using memcpy, but if you use STL then it will invoke the copy contructor.

Anyway, sorry about leaving out that important detail


Edited by - SteveC on 2/27/00 11:29:46 AM

This topic is closed to new replies.

Advertisement