Many of the beginners on the site are pre-college students. Often beginners will learn by reading tutorials on the Internet, copying code from books, and trying out things that they find interesting.
This article is part of a series that focuses on giving pre-college developers the basics they need to understand data structures.
The previous article covered
Non-Linear Structures.
This article is to help beginners understand how to choose a data structure or container class.
The Data Structures
The previous articles in the series listed the most frequently used data structures. Here is a recap.
There are the linear structures:
the array, the dynamic array, and the linked list. They are linear because they stay in the order you place them. Arrays are very fast for random-access and have moderately good performance when adding and removing at the end. A linked list is very good when frequently adding and removing in the middle.
There are linear endpoint data structures:
the stack and the queue family. Both work very much the same way as the real-world namesakes. A stack, such as a stack of plates or a stack of data, you can "push" something on to the top, you can access the top item, and you can "pop" an item off. A queue, just like a queue of people or a queue of data, works by adding to the end of the line and removing from the front of the line.
Then there are the non-linear data structures:
the data dictionary, the ordered set, and the unordered set. They are non-linear internally, the order you put them in is basically unrelated to the order you will get them back out. The data dictionary works much like a real life dictionary. It has a key (the word to look up) and a value (the definition of the word). An ordered set is exactly the same as a data dictionary that contains keys but no values, and is sorted. An unordered set is just a grab-bag of objects. The name is a little misleading since they really are ordered, it just isn't ordered in a way that is useful to people. All of these structures are ideal for fast lookup.
The Impact of a Good Selection
Most of the time programmers just need to iterate over a collection. Generally we don't care what order the collection is in, just that we can start at the beginning and visit each item. In this very common situation, the choice of a data structure really doesn't matter.
When in doubt, the best choice is usually the dynamic array. It can grow to any capacity, and it is fairly neutral, making it easy to swap out with a different data structure later.
But sometimes it matters very much.
One of the more common problems in games is pathfinding. You need to find a path from A to B. One of the most common pathfinding algorithms is A-star. In the A-star algorithm you have a data structure containing partial paths. The structure is sorted so that the most likely partial path is at the front of the container. That path is evaluated and if it isn't the complete path the algorithm will make that partial path into multiple bigger partial paths, and add them to the container.
Using a dynamic array for this container would be a bad choice for several reasons. First, removing elements from the beginning of a dynamic array is one of the slowest operations we can perform. Second, re-sorting the dynamic array after every addition can also be slow.
If you remember from above, there is a data structure that is optimized for this type of access. We are removing from the front and adding to the back, and automatically sorting based on which path is best. The priority queue is ideal for an A-star path container. It is pre-built and fully debugged.
Choosing Between the Patterns
Choosing your data structure mostly depends on your usage pattern.
The Dynamic Array -- The Default Choice
When in doubt, use a dynamic array. In C++ that is a
vector. In Java that is an
ArrayList. In C# that is a
List.
The dynamic array generally does the right thing. It has good performance for most operations, and not bad performance for the rest. If you ever discover that you need a different data structure, it is the easiest one to move from.
The Stack -- One End Only
If you are only adding and removing from a single end, use a stack. That is
stack in C++, and a
Stack in both Java and C#.
There are many algorithms that rely on the stack data structure. The first one that comes to my mind is the two-stack calculator. Numerical problems like Towers of Hanoi can be solved with a stack. You probably won't use either of those in a game.
Game tools will frequently parse data. Parsers rely heavily on stack data structures to ensure that pairs of items are paired correctly.
If you are working with a wide range of AI types, the stack data structure is incredibly useful for a family of automata called a pushdown automaton.
The Queue Family -- First In, First Out.
If you are only adding and removing from both ends, use either a queue or a double-ended queue. In C++ that is a
queue or
deque. In Java you can use the
Queue or
Deque interface, both are implemented with
LinkedList. In C# there is a
Queue class, but no built-in Deque.
If you need to make sure the important stuff gets done first but otherwise everything happens in order, then reach for the priority queue. In C++ that is a
priority_queue, in Java it is a
PriorityQueue. In C#, you are on your own.
Non-Linear Structures -- Fast Search
If you create a stable group of items and mostly perform random lookups, you will want one of the non-linear structures.
Some of them hold pairs of data, some of them hold individual data. Some are ordered in a useful manner, others are ordered in a computer-friendly manner. Trying to make a list of all the combinations would be an article in itself. In fact, it was the previous article. For a list of which one meets the specific searching needs,
have a look back there.
The Linked List -- Frequent Modifications with Order Preserved
If you are frequently modifying the middle of the container, and if you only need to traverse the list sequentially, use a linked list. In C++ it is called a
list. In Java and C# it is called a
LinkedList.
The linked list is a great container when data is always coming and going and must be kept in order, or when you need to periodically sort and move items around.
Conclusion
Choosing the right data structures can make a big difference in how algorithms will perform.
Understanding the major data structures, including their benefits and their drawbacks, can help guide you to using the most efficient structure for whatever you need.
I recommend that eventually you study them in depth. A full study of these data structures inside a Computer Science degree program will usually last several weeks.
Hopefully you have learned about the major data structures and when to choose one structure over another without the multi-week college level study.
This finishes off the series of articles. Thanks for reading.
Great!
I have been following the series, and while I am not a pre-college student (3 years graduated now), I enjoyed reading each article and was able to learn a few things. Thanks!