One of those days

posted in So much for Creativity

Published June 27, 2004

It's been one of those days. I've spent the last hour trying to track down a bug that didn't exist. I wrote the code *perfectly* the first time around, but I screwed up the test case in such a subtle manner that it seemed like the class invariants got borked.

In any case, the current item being folded into the library is a specialized string class, the sortable_string. Basically it's a string class that exists for one reason only: to live happily in a std::set or std::map. And by live happily I mean that the class is optimized for operator<() when used in a std::set/map with a large number of elements.

Basically, when optimizing access in a binary search tree, the most significant factor is memory access. (Which, incidently, is often the most significant factor in many optimization problems.) So, one step in the right direction is to minimize the size of the string class. Under MSVC 7.1 this doesn't take much, as the std::string implementation it ships with weighs in at 28 bytes. The smallest you can make a string class on current systems would be 4 bytes, the size of a pointer. However, this isn't too effective either, because then every comparison operation hits two cache lines for the string, one for the main body of the string, and one for the buffer that the string proper lives in. Actually four lines because there's two strings involved in every comparison.

So the implementation of sortable_string uses eight bytes. One part is a key, another part is a pointer to the string buffer. The key is basically the first four bytes of the string arranged in an integer. When doing comparison between two sortable_strings the keys are compared, and only if the keys are different then the strings in the buffers are compared. This means that in the common case only body of the string needs to be accessed, which reduces total memory access.

This technique can be generalized to other data types, but in general, I find that usually switching to a hash based container is more effective. It's also more effective with strings, but it's unfortunately common that I need to get a range of strings (like iterators that define a range for all strings from "apple" to "bird") hence why I'm putting the sortable_string class in my code library.

Previous Entry Ignore this entry

Next Entry The joy of horribly biased benchmarks

0 likes 2 comments

Comments

Kylotan

I'm surprised it's even 28 bytes large on MSVC 7.1. Does that include a local buffer for short strings?

June 27, 2004 08:20 AM

SiCrane

Yes, it's 16 bytes for the small string optimization local buffer (unioned with the pointer to an external string buffer if it's past the local buffer size), 4 bytes for the string size, 4 bytes for the reserved data size and 4 bytes to store the allocator by value as a member variable of a base class. From the code structure it seems like the last part is used for exception safety reasons.

June 27, 2004 05:03 PM

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

SiCrane

Author

One of those days

Comments

SiCrane

Latest Entries

New bug to me

Week/Class 9

Week/Class 8

Week/Class 7

The promised files

Week/Class 6

Week/Session 5

Week/Session 4

On the soapbox

Week/Session 3

One of those days

Comments

SiCrane

Latest Entries

New bug to me

Week/Class 9

Week/Class 8

Week/Class 7

The promised files

Week/Class 6

Week/Session 5

Week/Session 4

On the soapbox

Week/Session 3

Reticulating splines