A Complete Listing of C++ String Types

Published May 07, 2008
Advertisement
Deadlines at work, moving apartments, and GTA IV. Where the heck am I supposed to find time for anything else?

A Complete Listing of C++ String Types

Inspired by a recent IRC conversation, I decided to make a complete list of all of the major string types you're in danger of encountering when working in C++, along with a brief description.
  • char[] -- By far the most primitive string type, but fairly common. A fixed size array of (almost certainly ANSI) characters. Might be null terminated, might not. Kinda depends on what the programmer felt like at the time.

  • char*, unsigned char*, signed char* -- Your good ol' (usually) null terminated array of bytes. Usually correspond 1<->1 with characters, unless you're using an actual encoding such as UTF-8.

  • wchar_t* -- The most basic Unicode string type; very nearly always indicates a UCS2 encoding. Again, null terminated.
  • std::string -- Still an array of bytes, but null termination is dispensed with in favor of a length stored inside the string class (not in the string data itself). It's pretty much assumed that you're looking at ANSI here without any encoding.

  • std::wstring -- Like the string class, but an array of wchar_t instead of char. UCS2 encoding is assumed.

  • std::basic_string -- This is the template that's used to form std::string and std::wstring.

  • boost::const_string -- An immutable string that provides a subset of std::basic_string functionality.

  • LPSTR -- This one's a Win32 typedef for char*.

  • LPWSTR -- And this one's a Win32 typedef for wchar_t*.

  • TCHAR* -- A null terminated array of wchar_t if UNICODE is defined, or char otherwise.

  • LPTSTR -- Win32 typedef for TCHAR*.

  • LPCSTR, LPCWSTR, LPCTSTR -- Const versions of the Win32 string types. Take out the C to figure out what they are.

  • BSTR -- This is a COM string. It's an array of characters, prefixed by a 4 byte length specifier. It's an array of OLECHAR; the characters are UCS2 on Windows, ANSI on Mac. It can can contain null characters. It's also terminated by two null characters. The BSTR pointer always points to the first character, so the length is at ptr[-2], at least on Windows. Oh, and Visual Basic 6 uses these for all its strings.

  • CString -- ATL/MFC string class. C++ code that does anything serious with Windows inevitably ends up using these things. Character size is controlled by the UNICODE define again. Oh, and it comes in CStringA and CStringW variations if you want.

  • CStringT -- This is the template class used to form CString.

  • CSimpleStringT -- This is the base class for CStringT.

  • PXSTR, PYSTR -- PXSTR is the internal type used by CSimpleStringT.
  • If PXSTR is Unicode, then PYSTR isn't, and vice versa. Also comes in const (C) variations.
  • CAtlString -- Apparently this appears when you're using ATL but not the CRT or something. Also comes in A and W variations. It's actually just a typedef for CString.

  • CComBSTR -- This is a class that wraps BSTR.

  • _bstr_t -- This is also a class that wraps BSTR.

  • VARIANT -- Here's a fun one. It's a COM variant type. It can be VT_BSTR, VT_LPSTR, or VT_LPWSTR. (And it doesn't have to be a string, of course.)

  • wxString -- The string class from wxWidgets. Can be Unicode or not, and can contain null characters.

  • GString -- The string class from GLIB, used by GTK+ and GNOME. It stores the length of the string and can contain null characters. Sadly, the people working on and with these libraries still think the name is funny.

  • QString -- The string class from Qt. It's a null terminated array of Unicode/UCS2 characters.

  • QCString -- Same as QString, but with ANSI single byte characters instead.

  • FString -- The string class from Unreal.

  • CStr -- The string class from 3D Studio Max.

  • MString -- The string class from Maya.

  • System::String^ -- If you're lucky enough to be working with C++/CLI, you get to use this .NET string, which is an immutable UTF-16 string. You heard me. UTF-16. Not UCS2. A single character can be more than one System::Char long.

  • System::StringBuilder^ -- A mutable statically sized buffer of System::Char. This is usually for doing a lot of string manipulation without ending up with tons of extra allocations.

Current list length: 30

This list is still a work in progress, mind you. I'm sure there's plenty of string types from major libraries that I'm missing, and there's probably lots of detail that can be added to the ones I've listed so far.
0 likes 8 comments

Comments

Dragon88
Have I completely missed the point here, or are you just pointing out the fact that C++ is hopelessly bloated?
May 07, 2008 08:22 PM
jpetrie
Well, if you're going to include VARIANT why not boost::any? Or void*? Irrlicht has core::stringw and core::stringc. Ogre has Ogre::String. DevIL has ILstring, but that may be a simple typedef...

This is what happens when you don't include a good first-class string type in your language I guess.
May 07, 2008 08:32 PM
Ravuya
Don't forget the various convolutions of Mac OS strings throughout the years, from the Macintosh Toolbox Pascal strings to the modern-day CFString/NSString (of which both Unicode and non-Unicode versions exist).

Note that the early Win16 API (and probably the Win32 API to this very day) supported Pascal strings as well.

I seem to remember another version of the strings that was shipped around Mac OS 8.5 to deal with that operating system's pre-CarbonLib hackjob of Unicode support. I think it was called String, but I'm not sure, and it may have been part of Core Foundation before it was called Core Foundation or even Carbon.
May 08, 2008 07:55 PM
Anon Mike
Dragon, this doesn't really have anything to do with C++ the language. At least IMHO it's a more a commentary on legacy cruft and library developers constently reinventing the wheel. You'll see these things in the wild but you don't necessarily have to use them.

std::basic_string is the native C++ type (counting the standard library as native). std::string and std::wstring are varients because there is more than one definition of "character". In this case "normal" and "wide". If you think "normal" should be good enough then I've got a billion Chinese people ready to argue with you. On the other hand if you think "wide" should be good enough then I've got a billion whining posts about wasted memory for you to peruse.

char[], char*, wchar_t*, and similiar varients are legacy cruft from C that you don't need to use unless you're interfacing with older libraries. There are a lot of such library (including OS calls) so they come up a lot. Typically you shouldn't be using them in the guts of your own program.

LPSTR, LPCSTR, etc. are just different names for the old C types. Windows likes to do this for some reason.

BSTR is legacy cruft from Visual Basic that got co-opted into some parts of COM.

boost::const_string is a third-party optimization.

I think CString actually predates std::string. Legacy cruft.

CStringT, PXSTR, etc. There are internal details of CString (assuming Promit's definition is correct). You normally wouldn't have to know about these anymore than you have to know what some helper class of std::map is called. Maybe they come up often in some circumstances, but I've never seen them.

CComBSTR, _bstr_t. Having a RAII wrapper for a POD type isn't a bad thing. Having TWO wrappers is.

wxString, MString, etc. These are libraries writers wanting to reinvent the wheel (unless like above they were created before the standard string type).

VARIANT is not a string type. The fact that it can hold a string doesn't make it a string type.

String^, StringBuilder^. These are from a language that is not C++. Granted, it's very similiar to C++ in many ways, but it's not C++. Spurious.
May 09, 2008 05:12 PM
Dragon88
Quote: Original post by Anon Mike
Dragon, this doesn't really have anything to do with C++ the language. At least IMHO it's a more a commentary on legacy cruft and library developers constently reinventing the wheel. You'll see these things in the wild but you don't necessarily have to use them.


Well, yes. It seems like reinvention of the type wheel has been one of those things various people (of dubious merit) have been pushing for the sake of "portability" for a rather long time now (at least in C/C++).
May 12, 2008 04:27 PM
gharen2
This is a perfect example of why I despise working in c/c++ :)

So very bloated an unnecessarily complicated.

And yes I know this is more of an API issue than an issue with the language itself, but the fact remains you have to deal with this stuff any time you touch c++.
May 13, 2008 11:37 AM
MustEatYemen
The problem is you get onto platforms where you don't have STD readily available and open up a whole new can of worms.
May 24, 2008 12:32 PM
Rydinare
Great list, but...

I read through the list and had nightmares about all the ridiculous bloat that Microsoft repeatedly added over the years.

While we're at it, my coworkers wrote their own string class too. With all the hard (pointless) work they put into it, it's not up to STL string quality. But reinventing the wheel is fun, right? I bet if we really searched for all the string classes out there, you'd probably find that 50% of projects have their own string classes. One day someone will have to show me why most C++ programmers must reinvent the wheel. Not even new wheels, just the same wheels over and over again.
December 03, 2008 10:07 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement