A Complete Listing of C++ String Types
Inspired by a recent IRC conversation, I decided to make a complete list of all of the major string types you're in danger of encountering when working in C++, along with a brief description.
- char[] -- By far the most primitive string type, but fairly common. A fixed size array of (almost certainly ANSI) characters. Might be null terminated, might not. Kinda depends on what the programmer felt like at the time.
- char*, unsigned char*, signed char* -- Your good ol' (usually) null terminated array of bytes. Usually correspond 1<->1 with characters, unless you're using an actual encoding such as UTF-8.
- wchar_t* -- The most basic Unicode string type; very nearly always indicates a UCS2 encoding. Again, null terminated.
- std::string -- Still an array of bytes, but null termination is dispensed with in favor of a length stored inside the string class (not in the string data itself). It's pretty much assumed that you're looking at ANSI here without any encoding.
- std::wstring -- Like the string class, but an array of wchar_t instead of char. UCS2 encoding is assumed.
- std::basic_string -- This is the template that's used to form std::string and std::wstring.
- boost::const_string -- An immutable string that provides a subset of std::basic_string functionality.
- LPSTR -- This one's a Win32 typedef for char*.
- LPWSTR -- And this one's a Win32 typedef for wchar_t*.
- TCHAR* -- A null terminated array of wchar_t if UNICODE is defined, or char otherwise.
- LPTSTR -- Win32 typedef for TCHAR*.
- LPCSTR, LPCWSTR, LPCTSTR -- Const versions of the Win32 string types. Take out the C to figure out what they are.
- BSTR -- This is a COM string. It's an array of characters, prefixed by a 4 byte length specifier. It's an array of OLECHAR; the characters are UCS2 on Windows, ANSI on Mac. It can can contain null characters. It's also terminated by two null characters. The BSTR pointer always points to the first character, so the length is at ptr[-2], at least on Windows. Oh, and Visual Basic 6 uses these for all its strings.
- CString -- ATL/MFC string class. C++ code that does anything serious with Windows inevitably ends up using these things. Character size is controlled by the UNICODE define again. Oh, and it comes in CStringA and CStringW variations if you want.
- CStringT -- This is the template class used to form CString.
- CSimpleStringT -- This is the base class for CStringT.
- PXSTR, PYSTR -- PXSTR is the internal type used by CSimpleStringT. If PXSTR is Unicode, then PYSTR isn't, and vice versa. Also comes in const (C) variations.
- CAtlString -- Apparently this appears when you're using ATL but not the CRT or something. Also comes in A and W variations. It's actually just a typedef for CString.
- CComBSTR -- This is a class that wraps BSTR.
- _bstr_t -- This is also a class that wraps BSTR.
- VARIANT -- Here's a fun one. It's a COM variant type. It can be VT_BSTR, VT_LPSTR, or VT_LPWSTR. (And it doesn't have to be a string, of course.)
- wxString -- The string class from wxWidgets. Can be Unicode or not, and can contain null characters.
- GString -- The string class from GLIB, used by GTK+ and GNOME. It stores the length of the string and can contain null characters. Sadly, the people working on and with these libraries still think the name is funny.
- QString -- The string class from Qt. It's a null terminated array of Unicode/UCS2 characters.
- QCString -- Same as QString, but with ANSI single byte characters instead.
- FString -- The string class from Unreal.
- CStr -- The string class from 3D Studio Max.
- MString -- The string class from Maya.
- System::String^ -- If you're lucky enough to be working with C++/CLI, you get to use this .NET string, which is an immutable UTF-16 string. You heard me. UTF-16. Not UCS2. A single character can be more than one System::Char long.
- System::StringBuilder^ -- A mutable statically sized buffer of System::Char. This is usually for doing a lot of string manipulation without ending up with tons of extra allocations.
This list is still a work in progress, mind you. I'm sure there's plenty of string types from major libraries that I'm missing, and there's probably lots of detail that can be added to the ones I've listed so far.