So this question has gone through my mind from time to time. Is the small-string optimization in std::string in modern c++ really even a benefit, or is it now more of a hindrance? Consider the following example-code:
#include <string>
#include <string_view>
#include <memory>
std::string staticString;
std::string_view staticStringView;
std::unique_ptr<char[]> staticStringPtr;
__attribute__((noinline)) void callString(std::string a)
{
staticString = std::move(a);
}
__attribute__((noinline)) void callStringView(std::string_view b)
{
staticStringView = b;
}
__attribute__((noinline)) void callStringPtr(std::unique_ptr<char[]> a)
{
staticStringPtr = std::move(a);
}
Those are three ways to store string-data in modern c++ - std::string, owning the string data; string_view, as just a pointer to some data; and lastely I made a char[]-pointer just for comparison.
All three functions are the same: A sink for string-data to be stored in a global. obviously, by the virtual of string-view only being a dumb start-end-pointer pair, it is obviously much faster/smaller code, but the actual difference really shocked me. From goldbolt (using clang -O3):
callString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >): # @callString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
pushq %rbx
movq staticString[abi:cxx11](%rip), %rax
leaq staticString[abi:cxx11]+16(%rip), %rcx
cmpq %rcx, %rax
je .LBB1_1
movq (%rdi), %rsi
leaq 16(%rdi), %rcx
cmpq %rcx, %rsi
je .LBB1_4
movq staticString[abi:cxx11]+16(%rip), %rdx
movq %rsi, staticString[abi:cxx11](%rip)
movups 8(%rdi), %xmm0
movups %xmm0, staticString[abi:cxx11]+8(%rip)
testq %rax, %rax
je .LBB1_14
movq %rax, (%rdi)
movq %rdx, 16(%rdi)
movq $0, 8(%rdi)
movb $0, (%rax)
popq %rbx
retq
.LBB1_1:
movq (%rdi), %rdx
leaq 16(%rdi), %rcx
cmpq %rcx, %rdx
je .LBB1_2
movq %rdx, staticString[abi:cxx11](%rip)
movups 8(%rdi), %xmm0
movups %xmm0, staticString[abi:cxx11]+8(%rip)
.LBB1_14:
movq %rcx, (%rdi)
movq %rcx, %rax
movq $0, 8(%rdi)
movb $0, (%rax)
popq %rbx
retq
.LBB1_2:
movq %rcx, %rsi
.LBB1_4:
leaq staticString[abi:cxx11](%rip), %rcx
cmpq %rcx, %rdi
je .LBB1_5
movq 8(%rdi), %rdx
testq %rdx, %rdx
je .LBB1_10
cmpq $1, %rdx
jne .LBB1_9
movzbl (%rsi), %ecx
movb %cl, (%rax)
jmp .LBB1_10
.LBB1_9:
movq %rdi, %rbx
movq %rax, %rdi
callq memcpy@PLT
movq %rbx, %rdi
.LBB1_10:
movq 8(%rdi), %rax
movq %rax, staticString[abi:cxx11]+8(%rip)
movq staticString[abi:cxx11](%rip), %rcx
movb $0, (%rcx,%rax)
movq (%rdi), %rax
movq $0, 8(%rdi)
movb $0, (%rax)
popq %rbx
retq
.LBB1_5:
movq %rsi, %rax
movq $0, 8(%rdi)
movb $0, (%rax)
popq %rbx
retq
This is the code for the variant using std::string. The actual flying fuck? Obviously, all operators are being inlined, but the amount of code this very simple move produces is ungodly aweful. This is true in all compilers btw.
Now compare this to string_view:
callStringView(std::basic_string_view<char, std::char_traits<char> >): # @callStringView(std::basic_string_view<char, std::char_traits<char> >)
movq %rdi, staticStringView(%rip)
movq %rsi, staticStringView+8(%rip)
retq
Obviously, this doesn't own the string-data, so it has the change for dangling pointers. However, the difference is night and day. And now for my main point:
callStringPtr(std::unique_ptr<char [], std::default_delete<char []> >): # @callStringPtr(std::unique_ptr<char [], std::default_delete<char []> >)
movq (%rdi), %rax
movq $0, (%rdi)
movq staticStringPtr(%rip), %rdi
movq %rax, staticStringPtr(%rip)
testq %rdi, %rdi
jne operator delete[](void*)@PLT # TAILCALL
retq
This is the variant that uses a char[]-pointer. Obiviously I left out the end-ptr/size argument, but that would only be 1-2 instructions more.
-----------------------------------------------
So this begs the question. Is it really still a benefit of having to potentially not allocate for small strings, if this causes any copy/move of the string to create an ungodly abomination of assembly having to be executed? The SSO-code can apparently not be optimized in the slightest. Even when passing a constant-string and the function is inlined, it will not be able to remove the execution of the constructors. Since C++11, moves are being use to often safe a lot of expensive copies; so can rvalue-optimizations. std::string seems to mess this up badly. There is also some other downsides as well, like not being able to construct a std::string_view from a std::string when the string is being moved afterwards, etc…
I've been using std::string_view excessively for some time, whenever I don't need to hold on to the string data (or know that it's sources lifetime outlifes the consumer. However, I also started a custom string-class implementation that stores data direclty as a pointer. I didn't start using it everywhere that std::string is currently used, mainly as it's a lot of work, but seeing this mess of a codegen makes me really want to go forward and start porting everything.
So I'm wondering, what is other peoples perspective? If you use std::string, do you actually benefit from its SSO?