Introduction
Many of the beginners on the site are pre-college students. Often beginners will learn by reading tutorials on the Internet, copying code from books, and trying out things that they find interesting.
This article is part of a series that focuses on giving pre-college developers the basics they need to understand data structures.
This article is about the first three basic data structures: the array; the dynamic array; and the linked list.
The Array
When you need one object, you create one object. When you need several objects... well, you have several options.
I have seen many beginners with code like the following:
// High Scores
int score1 = 0;
int score2 = 0;
int score3 = 0;
int score4 = 0;
int score5 = 0;
This gives you five high score values. This method works just fine until you need fifty or a hundred or a thousand of something.
Rather than creating your objects individually you can use an array.
// High Scores
const int NUM_HIGH_SCORES = 5;
int highScore[NUM_HIGH_SCORES] = {0};
This will create a buffer of 5 items, like so:
Note that the array index starts at zero. If you have five items, then they range from zero to four.
Drawbacks of a Simple Array
If all you want is a fixed number of objects an array can work just fine.
Let's say you want to add an item to the array. You can't do that with a simple array.
Let's say you want to remove an item from the array. You can't do that with with a simple array.
You are stuck with the same size of items for the duration of the object.
What we need is an array that can change size. This brings us to...
The Dynamic Array
A dynamic array is an array that can change sizes.
Major languages support dynamic arrays in their standard libraries. In C++ it is a
vector. In Java it is an
ArrayList. In C# it is a
List. All of these are dynamic arrays.
Under the hood a dynamic array is a simple array, but it also has two extra pieces of data. It stores how big the simple array actually is, and it stores how much data the simple array can actually store.
A dynamic array might look something like this:
// Internals of a dynamic array class
sometype *internalArray;
unsigned int currentLength;
unsigned int maxCapacity;
The
internalArray member points to a dynamically allocated buffer. That actual size of the buffer is stored in
maxCapacity. The number of items actually used is represented with
currentLength.
Adding to a Dynamic Array
When you add an object to a dynamic array several things happen.
The array class verifies that there is enough space. If
currentLength < maxCapacity then there is room to add to the array. If there is not enough space then a larger internal array is allocated, and everything is copied over to th new internal array. The maxCapacity value is increased to the new larger size.
When there is enough space, the new item is added. Every element after the insertion point must be copied over one space in the internal array, and after the copies are made then the hole is filled with your new object, then the
currentLength value is incremented.
Since every object after the insertion point must be moved, the best case is when you add to the end; zero objects must be moved (unless the internal array must be expanded). A dynamic array works best when you only add to the end rather than adding to the middle.
When you add an object to a dynamic array it is possible for every object to move in memory. For languages like C and C++, adding to a dynamic array means ALL pointers to array objects become invalid.
Removing from a Dynamic Array
Removing objects is less work than adding objects.
First, the object itself is destroyed. Second, every object after that location is moved over one space. Finally, the currentLength is decreased.
Just like adding to the end of the array, removing from the end of the array is the best case because zero objects need to be moved.
Also note that we don't need to resize the internal array to make it smaller. The space can hang around in case we add objects later.
Removing an object to a dynamic array causes everything after the removed item to move in memory. For languages like C and C++, removing from a dynamic array means pointers to everything after the removed object become invalid.
Drawbacks of Dynamic Arrays
Let's say your array is very large, and you need to add and remove objects frequently. This can cause objects to be frequently copied to new locations, and it can cause many pointers to be invalidated.
If you need to make frequent changes in the middle of a dynamic array, there is a better type of linear data structure for you...
Linked Lists
An array is a continuous block of memory, each item after the other. A linked list is a chain of objects.
Again, the major languages have linked lists in their standard libraries. In C++ it is called a
list. In Java and C# it is called a
LinkedList.
A linked list is made of a series of nodes. Each node is basically made like this:
// Linked list node
sometype data;
Node* next;
That generates this type of structure:
Each node chains to the next.
Adding to a Linked List
When an object is added to a linked list it starts by creating a new node. Your data is copied inside the node.
Next, the insertion point is found. The new node's pointer to the next object is updated to point to the node that will follow it.
Finally, the node before the new node is modified to point to your new node.
Removing From a Linked List
When an object is removed from a linked list, the node before the removed node is found. It is altered to point to the removed object's next node. Then the removed object can be safely deleted.
Benefits of a Linked List
The biggest benefit of a linked list comes when adding and removing objects from the list. Making changes to the middle of the list is very quick. Remember that a dynamic array potentially caused every item to move, a linked list keeps every other object in place.
Drawbacks of a Linked List
Recall that a dynamic array is a continuous block of memory. If you need to get the 500th item in the array the code can simply look 500 spaces ahead. In a linked list the memory is chained together. If you need to find the 500th item, you must start at the beginning of the chain and follow its pointer to the next item. Then you need to follow the pointer in the next node. Then you need to follow the pointer in the next node. Repeat 500 times. Random access to a linked list is very slow.
The other major drawback of a linked list isn't quite as obvious. Each node requires a little extra space. How much space does it require? You might think it only requires the size of a pointer, but that isn't quite right.
There is a little bit of overhead every time an object is created dynamically. Some languages, like C++, work with pages of memory. Typically a page is 4 kilobytes. When you use the default new and delete operators a full page of memory will be allocated even if you will only use a single byte. Both Java and C# were designed differently and have special rules for small objects. These languages do not require a full 4kB memory page, but they do have a small amount of overhead.
You don't need to worry about the second drawback if you are using the standard libraries. They have been written in a way that minimizes the wasted space.
Conclusion
These three types, the array, dynamic array, and linked list, form the basis for nearly all the more complex data containers. When you get to college some of the first assignments in a data structures class will be to implement a dynamic array and a linked list class on your own.
These structures are fundamental to programming. It doesn't matter what language you learn, you will be using them to work with your data.
NEXT: Stacks and Queues
Updates
2013-03-29 - typos.
I think it would be helpful if you added a code snippet of how a Vector looks like as a Class. At least a barebones version.
Something like: