1. Introduction
For all of C++'s brilliance of strongly type-safe generic programming design, its standard library selection leaves much to be desired. Unlike Java, which has libraries for just about everything, C++ only has a handful. These C++ standard libraries are for containers, algorithms, streams and the like. Surprisingly, C++'s list of standard containers does not include a tree container [Musser1].
While hopes exist that C++0x may come with tree container support, C++'s current lack of native tree container support is enough to push programmers away from correctly designed systems using trees, in favor of poorly designed systems that use currently available standard containers. I created the core::tree<> container primarily to overcome this hurdle and add a missing piece to C++'s already powerful and elegantly designed generic containers.
After presenting the arguments made within this article to senior software engineers at Quark, Inc., they began adopting the core::tree<> family (tree, multitree, tree_pair, multitree_pair). Quark has since licensed the core::tree family (royalty free) and has been using it since 2002. They are currently using the core::tree family in their world class software, QuarkXPress 6.0 and expanding its use greatly for QuarkXPress 7.0 [Quark1].
This article is only part one of a series of articles written on the core::tree<> family. Part one primarily focuses on explaining the limitations of using C++'s map<> and multimap<> as tree containers and show the advantages of using the core::tree<> in its stead.
Additionally, simplistic sample core::tree<> code is given to increase basic familiarity with the core::tree<> design. Later installments of to the core::tree<> series will explain more complex usage of the core::tree<> and its flexibility in design.
Lastly, the core::tree<> source code is included for anyone to use, royalty free. The only request is that the licensing agreement found in the tree.h header is followed.
The layout of this article is as follows:
- Introduction
- std::map<> versus core::tree<>
- Using the core::tree<>
- Conclusion
- References
- core::tree<> source (tree.h header)
2. std::map<> versus core::tree<>
Many experienced C++ programmers use C++'s standard template library's (STL) map<> and multimap<> to generate trees. Yet, these containers make poor substitutes for true tree designs for a variety of reasons. This section will take a closer look at the pitfalls of using C++'s std::map<> and std::multimap<> for tree containment.
2.1 Basic std::map<> tree implementation
Before beginning the implementation of a tree using std::map<>, a framework must be defined in which the tree will work. For the purposes of this article the following namespace will be used to control the number of leaves generated at the root level and maximum branch depth of the trees constructed:
namespace nTreeData { const int kMaxLeaves = 10; const int kMaxDepth = 5; } Figure 2.1 Additionally, the following class defined (in figure 2.2) will be used as the leaf node, which will be inserted at every branch of the tree.
class Leaf { public: Leaf() : value_(0) {} explicit Leaf(const int &value) : value_(value) {} const int &value() const { return value_; } bool operator==(const Leaf &rhs) const { return this->value() == rhs.value(); } bool operator<(const Leaf &rhs) const { return this->value() < rhs.value(); } private: int value_; }; Figure 2.2 Combining the definitions provided in figure 2.1 and figure 2.2, an example of an std::map<> tree can be constructed:
#include #include typedef std::map LeafMapConcrete; typedef std::map* LeafMapPointer; typedef std::map LeafMap; void fun() { using namespace nTreeData; LeafMap leafTree; //////////////////////////////////////////////////// // create a simple leaf tree //////////////////////////////////////////////////// for (int i = 0; i < kMaxLeaves; ++i) { // insert a starter leaf LeafMapPointer p = new LeafMapConcrete; leafTree.insert(LeafMap::value_type(Leaf(i), p)); LeafMap::iterator iter = leafTree.find(Leaf(i)); // continue inserting children inside of children for (int depth = 0; depth < kMaxDepth; ++depth) { LeafMapPointer inner = new LeafMapConcrete; LeafMap* outer = (LeafMap*)(iter->second); outer->insert(LeafMap::value_type(Leaf(depth), inner)); iter = outer->find(Leaf(depth)); } } //////////////////////////////////////////////////// // deallocate the leaf tree //////////////////////////////////////////////////// for (LeafMap::iterator destroy = leafTree.begin(); destroy != leafTree.end(); ++destroy) { LeafMap::const_iterator inner = destroy; LeafMap* iterMap = (LeafMap*)(destroy->second); LeafMap* lastMap; for (inner = iterMap->begin(); inner != iterMap->end(); inner = iterMap->begin()) { lastMap = iterMap; // move the iterMap forward iterMap = (LeafMap*)inner->second; delete lastMap; } } } Figure 2.3 Figure 2.3 demonstrates how to use a std::map<> to implement a tree in C++. The above implementation is a common method for overcoming STL's lack of native tree container support. Unfortunately, it has many problems:
- Dynamic memory allocation and deallocation must be implemented for the tree by the programmer. Additional code must be added in figure 2.3 to do full depth and breadth tree iteration to allocate and deallocate each branch. Currently, figure 2.3 only provides simple deallocation (since we're assuming knowledge of the tree layout). Correct and complete memory deallocation requires more work. Additionally, it is very common for programmers to make mistakes when implementing complex memory management, which leads to memory leaks. One of the advantages of using STL containers is that they perform memory management internally [Josuttis1]. However, when using STL maps to construct trees in this manner, the memory management advantage of STL is lost.
- Many C-style / reinterpret_cast<> type-casts are required. Type-casting is dangerous (especially, C-style casting and reinterpret_cast<>). Accidental incorrect type-casting can cause unexpected run time behavior. As Stroustrup reminds us, we should avoid explicit casting [Stroustrup1].
- The code is complex. Simply doing the construction and destruction of the tree is rather hard to understand. If the code to generate and populate a simple tree is this difficult to write, how difficult will it be to write more complex trees? What about the difficulty of maintaining this tree code (especially by someone who didn't originally write the code)? Sutter and Alexandrescu point out in their "C++ Coding Standards", simple should always be preferred over complex [Sutter1]. Figure 2.3's tree implementation clearly violates this rule.
- There is a great deal of wasted space. For each branch of the tree generated, an extra integer is generated serving no purpose except as a filler. Unfortunately, there is no way to remove this extra filler. This is another clue that the design is flawed - unnecessary pieces are present in the implementation that do not aid the design.
#include "tree.h" #include void fun() { using namespace nTreeData; using namespace core; core::tree leafTree; //////////////////////////////////////////////////// // create a simple leaf tree //////////////////////////////////////////////////// for (int i = 0; i < kMaxLeaves; ++i) { // insert a starter leaf tree::iterator iter = leafTree.insert(Leaf(i)); // continue inserting children each time for (int depth = 0; depth < kMaxDepth; ++depth) { // insert and step into the newly created branch iter = iter.insert(Leaf(depth)); } } } Figure 2.4 The code within figure 2.4 implements the same tree as the std::map<> solution within figure 2.3. However, the tree constructed using the core::tree<> container (figure 2.4) is less complex than the tree constructed using std::map<>. Furthermore, the core::tree<> implementation requires less code than the std::map<> implementation. Lastly, the core::tree<> implementation has none of the pitfalls the std::map<> implementation has:
- No dynamic memory allocation or deallocation is needed, it is all handled internally.
- Not even a single type-cast is used.
- The code is very straightforward.
- There is no wasted space.
3. Using the core::tree<>
C++'s STL implementation is genius. The advantages and subtleties of its implementation are so numerous, it is impossible to cover them in a single article, or even in a single book. It was with STL in mind, that the core::tree<> was designed. The core::tree<> container follows as many of the STL design practices as possible, while still ensuring its own tree behavior remains correct.
The core::tree<> implements both const_iterators and iterators for tree iteration. By default, it uses operator<() and operator==() for inserts and finds/removes, but allows predicates to be used to overload that functionality. The core::tree<> implements begin() and end() on its containers as well as post-increment and pre-increment on its iterators. For moving in and out within the tree, methods in() and out() can be called - this makes tree depth iteration very simple. Many other powerful pieces of functionality are implemented as well, such as size(), level() and, of course, full tree copying (just like all STL containers) by use of operator=().
Perhaps the biggest downfall of the std::map<> tree implementation is its lack of simple "complete" tree copying. Calling operator=() on the std::map<> would result in pointer copying, not object copying and implementing a full tree copy would require even more dynamical memory allocation introducing more possibilities for erroneous programmer made memory management mistakes (more memory leaks).
Again, the above problems dissolve when using the core::tree<>. Copying a tree into another tree is as simple as:
core::tree tree1; // ... do work on tree1 here // now copy the entire contents of tree1 into tree2 core::tree tree2 = tree1; Thus, the core::tree<> container's learning curve is very low for anyone who is already familiar with C++'s STL containers.
The below sample code demonstrates the ease of use of the core::tree<> container. The code in figure 3.1 performs simple tree construction and tree output.
#include #include "tree.h" void fun() { // the tree containers all sit inside the core namespace using namespace core; using namespace nTreeData; tree leafTree; //////////////////////////////////////////////////// // create a simple leaf tree //////////////////////////////////////////////////// for (int i = 0; i < kMaxLeaves; ++i) { // insert a starter leaf tree::iterator iter = leafTree.insert(Leaf(i)); // continue inserting children each time for (int depth = 0; depth < kMaxDepth; ++depth) { // insert and step into the newly created branch iter = iter.insert(Leaf(depth)); } } //////////////////////////////////////////////////// // output the leaf tree //////////////////////////////////////////////////// for (tree::const_iterator iter = leafTree.begin(); iter != leafTree.end(); ++iter) { std::cout << iter.data().value() << std::endl; tree::const_iterator inner = iter; // tree's iterators are containers themselves - use the // same iterator to traverse inwards through the tree for (inner = inner.in(); inner != inner.end(); inner = inner.in()) { for (int tabs = 1; tabs < inner.level(); ++tabs) std::cout << "\t"; std::cout << (*inner).value() << std::endl; } } } Figure 3.1 For clarity, a step-by-step analysis of the core::tree<> code in figure 3.1 is given below.
- Construct the tree container: tree leafTree;
- Populate the tree with data. The outer for loop inserts branches into the root tree. The inner for loop inserts branches into each additional branch inserted. Thus, the outer for loop is a breadth population and the inner for loop is a depth population. Notice that the inner for loop assigns the inserted iterator to itself: iter = iter.insert(). This forces iter to continue stepping inward, becoming the iterator of the branch inserted by the iter.insert() operation. for (int i = 0; i < kMaxLeaves; ++i) { // insert a starter leaf tree::iterator iter = leafTree.insert(Leaf(i)); // continue inserting children each time for (int depth = 0; depth < kMaxDepth; ++depth) { // insert and step into the newly created branch iter = iter.insert(Leaf(depth)); } }
- Iterate through the tree. The outer for loop iterates through the branches of the root tree. The inner for loop iterates inward. Again, notice how inner = inner.in() is performed within the nested for loop. This forces the inner iterator to continue to step into itself until there are no further branches inward. for (tree::const_iterator iter = leafTree.begin(); iter != leafTree.end(); ++iter) { std::cout << iter.data().value() << std::endl; tree::const_iterator inner = iter; // tree's iterators are containers themselves - use // the same iterator to traverse inwards through the tree for (inner = inner.in(); inner != inner.end(); inner = inner.in()) { for (int tabs = 1; tabs < inner.level(); ++tabs) std::cout << "\t"; std::cout << (*inner).value() << std::endl; } }
- The output of the leafTree that's built above is as follows: 0 0 1 2 3 4 1 0 1 2 3 4 2 0 1 2 3 4 3 0 1 2 3 4 4 0 1 2 3 4 5 0 1 2 3 4 6 0 1 2 3 4 7 0 1 2 3 4 8 0 1 2 3 4 9 0 1 2 3 4
Due to the core::tree<> design following some of the STL container paradigms, certain STL algorithms can be followed. For example, the std::for_each algorithm can be used to iterate across any breadth of tree (keep in mind, tree depth iteration requires a bit more work). Figure 3.2 demonstrates a simple core::tree<> implementation using std::for_each.
#include #include #include "tree.h" //////////////////////////////////////////////////// void outputLeaf(const Leaf &l) { std::cout << l.value() << std::endl; } //////////////////////////////////////////////////// void fun() { core::tree leafTree; for (int i = 0; i < nTreeData::kMaxLeaves; ++i) leafTree.insert(Leaf(i)); std::for_each(leafTree.begin(), leafTree.end(), outputLeaf); } Figure 3.2 To perform complete tree output iteration (both breadth and depth) in a single function, a minor amount of recursion is generally suggested (although it can be performed iteratively). However, the code is still very straightforward and easy to write:
#include #include "tree.h" //////////////////////////////////////////////////// void outputLeaf(core::tree::const_iterator &tree) { // a tree iterator can check itself for its end for (core::tree::const_iterator i = tree.begin(); i != tree.end(); ++i) { for (int tabs = 1; tabs < i.level(); ++tabs) std::cout << "\t"; std::cout << (*i).value() << std::endl; outputLeaf(i); } } //////////////////////////////////////////////////// void fun() { // the tree containers all sit inside the core namespace using namespace core; using namespace nTreeData; core::tree leafTree; //for (int i = 0; i < nTreeData::kMaxLeaves; ++i) // leafTree.insert(Leaf(i)); //////////////////////////////////////////////////// // create a simple leaf tree //////////////////////////////////////////////////// for (int i = 0; i < kMaxLeaves; ++i) { // insert a starter leaf tree::iterator iter = leafTree.insert(Leaf(i)); // continue inserting children each time for (int depth = 0; depth < kMaxDepth; ++depth) { // insert a 100 placeholder leaf, then insert a leaf // and step into it iter.insert(Leaf(100)); iter = iter.insert(Leaf(depth)); } } outputLeaf(static_cast