Advertisement

How many of you use C for game programming?

Started by January 24, 2011 04:33 AM
107 comments, last by Washu 13 years, 6 months ago

But it just did! It took code that was requesting "i++" to be performed (increment and return previous), but the compiler decided that a simple increment (with no return) would work just as well.


Yes, the compiler performed an i++, not a ++i. This was precisely my point, you seemed to have missed it.

In other words "an optimising compiler can fix this mistake in simple cases".[/quote]

I don't think you've been following the conversation. I already said this much.

What you seemed to have missed is that there are cases where even an optimizing compiler cannot turn i++ into the equivalent of ++i. For example, if the definition of i++ isn't known (e.g., not defined in a header file), it won't be able to optimize out the unused return value.

Alternatively, if the definition of i++ *is* complex, the compiler may be unable to inline appropriately.

Alternatively again, if the compiler is setup to optimize for size, and not for speed, it may turn those inline methods into function calls. In which case, the compiler won't inline and optimize the call to i++, and again, you'll be paying for the overhead of using a post-increment instead of a pre-increment.


Of course in reality you'd just use a raw-pointer instead of some abstract iterator data type, in which case p++ or ++p will optimise to the same assembly[/quote]
Yes, in C, p++ and ++p turn into the same assembly because you are working with a primitive type. You are guaranteed the compiler will always have 'the function definition' for ++p or p++ when working with a primitive type. Iterators and overloaded increment operators are hardly primitive types, and you certainly can't guarantee the compiler will have access to the definition in order to optimize it.

Again, a C++ developer is forced to pay attention to small syntax nonsense that a C developer can ignore.
To provide a totally balanced counter-point, you should also make a video on youtube that starts with the quote 'Trust Me, I can manage your memory!", followed by the Java loading icon animating for 10 minutes ;)
Fair enough. The Java VM is a giant POS.

Memory management ends up being imperfect most of the time. But it's much worse when left in the hands of the random programmer who thinks he's a lot more clever than he actually is. Everyone wants to roll their own solution, and usually to disastrous results.
[font="arial, verdana, tahoma, sans-serif"] [/font][font="arial, verdana, tahoma, sans-serif"]We've had it pretty good in the last few years that .Net, STL, and improved programming practices have come into their own and are becoming common place. Average software stability has come a long way. But I think people forget what it was like before that.[/font][font="arial, verdana, tahoma, sans-serif"] [/font]
[font="arial, verdana, tahoma, sans-serif"][/font]
I grew up in the days when every program was plagued with memory errors. When games full of bad allocations were programmed on top of 32 bit extenders, and memory managers with similar errors. (All these programmers seem to work at Bioware, Activision and especially Abode now...)

[font="arial, verdana, tahoma, sans-serif"] [/font][font="arial, verdana, tahoma, sans-serif"]Basic memory handling is easy. But it gets much harder as program complexity increases, and it turns into a tight-rope walk. One mis-step and you're a goner. :)[/font]
[font="arial, verdana, tahoma, sans-serif"] [/font]
[font="arial, verdana, tahoma, sans-serif"](forum keeps eating my new lines)[/font]
Advertisement
[quote name='Hodgman' timestamp='1300243460' post='4786326']In other words "an optimising compiler can fix this mistake in simple cases".
I don't think you've been following the conversation. I already said this much.[/quote]Thanks to the magic of the scroll wheel, we can both go back up the conversion and see that I said that much and you disagreed:

[quote name='Hodgman' timestamp='1300168980' post='4785935']6. Again, this is something that separates a junior C++ programmer from an experienced one. Not an STL issue. Plus an optimising compiler can fix this mistake in simple cases.
A compiler can absolutely NOT fix that mistake.[/quote]So... having now come full circle and argued for my original line that spawned this thread of debate, you seem to be arguing for the sake of arguing.
I get it. C++ is too complicated. Operator overloading makes things too hard. Ok.

What you seemed to have missed is that there are cases where even an optimizing compiler cannot turn i++ into the equivalent of ++i. [/quote]Um, no, I only mentioned simple cases. The fact that I specifically mentioned "simple cases" should imply that more complex cases will defeat the optimiser...

I also mentioned that using the correct operator is something that separates a junior C++ programmer from an experienced one - once you've learnt what the operators do, then using the right one is the same as choosing "+" over "^" or "*" in cases where you want to do an addition. Yes, these particular operators (i++/++i) are an area where C programmers (or junior C++ programmers) often use the wrong one, though it's a very easy lesson to correct that behaviour.
For example, if the definition of i++ isn't known (e.g., not defined in a header file), it won't be able to optimize out the unused return value.[/quote]Again, it sounds like you need to upgrade your compiler/linker, because mine can do that (LTCG) and has been for 9 years.
Thanks to the magic of the scroll wheel, we can both go back up the conversion and see that I said that much and you disagreed:[quote name='Hodgman' timestamp='1300168980' post='4785935']6. Again, this is something that separates a junior C++ programmer from an experienced one. Not an STL issue. Plus an optimising compiler can fix this mistake in simple cases.
[/quote]I guess we're both having conflicting definitions of 'fix'. The mistake, in my mind, was using 'i++' when '++i' should have been used instead. In order for the compiler to 'fix' that mistake, it'd need to invoke the pre-increment instead of the post-increment operator. I disagree that the compiler can fix the mistake, however, I do agree that in some cases the compiler can hide the mistake.

Also, a verbatim quote from me, prior to your 'Plus an optimizing compiler can fix this mistake in simple cases' statement:

[quote name='agottem']In the case of a vector, the methods for either implementation are simple enough that the compiler can optimize them both to the point of identical assembly. As the iterators become more complicated for the compiler to analyze, or, if the definition is not available to the compiler...you may see less optimal code due using post/pre increment inappropriately.

You keep repeating something I already stated! It doesn't change the fact that the compiler hasn't fixed your mistake, but has merely hidden it.





Now you're changing the subject. The point isn't that an experienced programmer knows when to use which, the point is that it's stupid to have to.

Again, it sounds like you need to upgrade your compiler/linker, because mine can do that (LTCG) and has been for 9 years.



That's nice. See how the compiler does when the definition is in an external DLL, and all you have is an import library to link against. Also, still doesn't change the fact that when optimizing for size, you now have to pay a performance penalty for using the wrong operator. Clearly the compiler is *not* fixing your mistake.

I guess we're both having conflicting definitions of 'fix'. The mistake, in my mind, was using 'i++' when '++i' should have been used instead. In order for the compiler to 'fix' that mistake, it'd need to invoke the pre-increment instead of the post-increment operator. I disagree that the compiler can fix the mistake, however, I do agree that in some cases the compiler can hide the mistake.[/quote]
I don't see the point in this argument. The point here is the result. If the resulting executable is equally fast, then there is no "mistake". If I know the compiler will manage this for me, why should I waste precious brain cycles worrying about it, when I can put them to better use finding and eliminating bottlenecks. Your whole line of argument appears to be counter-productive and just pedantic really.

Optimising for size doesn't change much. Here is the output assembly:

; 13 : // Print post
; 14 : for(std::vector<int>::iterator it = v.begin(); it != v.end(); it++) {

mov esi, ebx
cmp ebx, edi
je SHORT $LN4@main
$LL82@main:

; 15 : printf("%d\n", *it);

push DWORD PTR [esi]
push OFFSET $SG-31
call _printf
add esi, 4
pop ecx
pop ecx
cmp esi, edi
jne SHORT $LL82@main
$LN4@main:

; 16 : }
; 17 :
; 18 : // Print pre
; 19 : for(std::vector<int>::iterator it= v.begin(); it != v.end(); ++it) {

mov esi, ebx
cmp ebx, edi
je SHORT $LN1@main
$LL112@main:

; 20 : printf("%d\n", *it);

push DWORD PTR [esi]
push OFFSET $SG-32
call _printf
add esi, 4
pop ecx
pop ecx
cmp esi, edi
jne SHORT $LL112@main
$LN1@main:

Code speaks louder than words. Next time you make an assertion that can be trivially proven using code, kindly do so. It spares me the time debunking your statements.

If you're putting your iterator implementation in a DLL and optimising for size then maybe you aren't too worried about performance after all.
Now this is interesting, and not something I expected. I changed the container type to std::map:

#include <map>
#include <vector>
#include <iostream>

int main()
{
std::map<int, int> v;
int i;
while(std::cin >> i) {
// v.push_back(i);
v.insert(std::make_pair(i,i));
}


// Print post
for(std::map<int, int>::iterator it = v.begin(); it != v.end(); it++) {
printf("%d\n", *it);
}

// Print pre
for(std::map<int, int>::iterator it= v.begin(); it != v.end(); ++it) {
printf("%d\n", *it);
}


}


When compiling with optimise for code size I got the following:

; 15 : // Print post
; 16 : for(std::map<int, int>::iterator it = v.begin(); it != v.end(); it++) {

mov ecx, DWORD PTR _v$[esp+68]
mov eax, DWORD PTR [ecx]
mov DWORD PTR _it$31323[esp+64], eax
jmp SHORT $LN158@main
$LL88@main:

; 17 : printf("%d\n", *it);

push DWORD PTR [eax+16]
push DWORD PTR [eax+12]
push OFFSET $SG-31
call _printf
add esp, 12 ; 0000000cH
lea eax, DWORD PTR _it$31323[esp+64]
call ??E?$_Tree_const_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$$CBHH@std@@@2@$0A@@std@@@std@@@std@@QAEAAV01@XZ ; std::_Tree_const_iterator<std::_Tree_val<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> > >::operator++
mov eax, DWORD PTR _it$31323[esp+64]
mov ecx, DWORD PTR _v$[esp+68]
$LN158@main:

; 12 : }
; 13 :
; 14 :
; 15 : // Print post
; 16 : for(std::map<int, int>::iterator it = v.begin(); it != v.end(); it++) {

cmp eax, ecx
jne SHORT $LL88@main

; 18 : }
; 19 :
; 20 : // Print pre
; 21 : for(std::map<int, int>::iterator it= v.begin(); it != v.end(); ++it) {

mov eax, DWORD PTR [ecx]
mov DWORD PTR _it$31360[esp+64], eax
cmp eax, ecx
je SHORT $LN1@main
$LL124@main:

; 22 : printf("%d\n", *it);

push DWORD PTR [eax+16]
push DWORD PTR [eax+12]
push OFFSET $SG-32
call _printf
add esp, 12 ; 0000000cH
lea eax, DWORD PTR _it$31360[esp+64]
call ??E?$_Tree_const_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$$CBHH@std@@@2@$0A@@std@@@std@@@std@@QAEAAV01@XZ ; std::_Tree_const_iterator<std::_Tree_val<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> > >::operator++
mov eax, DWORD PTR _it$31360[esp+64]
cmp eax, DWORD PTR _v$[esp+68]
jne SHORT $LL124@main
$LN1@main:

It looks like the compiler is calling the same function implementation for the increment! Clever girl...
Advertisement

I don't see the point in this argument. The point here is the result. If the resulting executable is equally fast, then there is no "mistake". If I know the compiler will manage this for me, why should I waste precious brain cycles worrying about it, when I can put them to better use finding and eliminating bottlenecks. Your whole line of argument appears to be counter-productive and just pedantic really.
....


Code speaks louder than words. Next time you make an assertion that can be trivially proven using code, kindly do so. It spares me the time debunking your statements.

If you're putting your iterator implementation in a DLL and optimising for size then maybe you aren't too worried about performance after all.




You have to "waste precious brain cycles" because they aren't the same thing. How many times do I need to explain it to you? The compiler CANNOT change i++ to ++i, as such, it stands to reason there will be differences in certain scenarios.

Why don't you look at something a little more complicated than the vector iterator? How about the following code:




void foo (std::map<int, int>& m)
{
for(std::map<int, int>::iterator i = m.begin(); i != m.end(); i++ /*or ++i */)
{
printf("foo\n");
}
}







When compiling with "cl /FA /Os /c foo.cpp", here's the assembly you get for the post-increment case:



push ebp
mov ebp, esp
sub esp, 12 ; 0000000cH


lea eax, DWORD PTR _i$23161[ebp]
push eax
mov ecx, DWORD PTR _m$[ebp]
call ?begin@?$_Tree@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@QAE?AV?$_Tree_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@@2@XZ ; std::_Tree<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> >::begin
jmp SHORT $LN3@foo

$LN2@foo:
push 0
lea eax, DWORD PTR $T23885[ebp]
push eax
lea ecx, DWORD PTR _i$23161[ebp]
call ??E?$_Tree_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@@std@@QAE?AV01@H@Z ; std::_Tree_iterator<std::_Tree_val<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> > >::operator++


$LN3@foo:
lea eax, DWORD PTR $T23886[ebp]
push eax
mov ecx, DWORD PTR _m$[ebp]
call ?end@?$_Tree@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@QAE?AV?$_Tree_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@@2@XZ ; std::_Tree<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> >::end
push eax
lea ecx, DWORD PTR _i$23161[ebp]
call ??9?$_Tree_const_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@@std@@QBE_NABV01@@Z ; std::_Tree_const_iterator<std::_Tree_val<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> > >::operator!=
movzx eax, al
test eax, eax
je SHORT $LN4@foo


push OFFSET $SG23168
call _printf
pop ecx

jmp SHORT $LN2@foo
$LN4@foo:
leave
ret 0




Here you can see it call the post-increment operator (in section $LN2@foo). And, for completeness, here's the assembly for the post-increment operator:



push ebp
mov ebp, esp
push ecx
push ecx
mov DWORD PTR _this$[ebp], ecx

mov eax, DWORD PTR _this$[ebp]
mov eax, DWORD PTR [eax]
mov DWORD PTR __Tmp$[ebp], eax

mov ecx, DWORD PTR _this$[ebp]
call ??E?$_Tree_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@@std@@QAEAAV01@XZ ; std::_Tree_iterator<std::_Tree_val<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> > >::operator++

mov eax, DWORD PTR ___$ReturnUdt$[ebp]
mov ecx, DWORD PTR __Tmp$[ebp]
mov DWORD PTR [eax], ecx
mov eax, DWORD PTR ___$ReturnUdt$[ebp]

leave
ret 8







Next, pre-increment:



push ebp
mov ebp, esp
push ecx
push ecx

lea eax, DWORD PTR _i$23161[ebp]
push eax
mov ecx, DWORD PTR _m$[ebp]
call ?begin@?$_Tree@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@QAE?AV?$_Tree_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@@2@XZ ; std::_Tree<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> >::begin
jmp SHORT $LN3@foo

$LN2@foo:
lea ecx, DWORD PTR _i$23161[ebp]
call ??E?$_Tree_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@@std@@QAEAAV01@XZ ; std::_Tree_iterator<std::_Tree_val<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> > >::operator++

$LN3@foo:
lea eax, DWORD PTR $T23879[ebp]
push eax
mov ecx, DWORD PTR _m$[ebp]
call ?end@?$_Tree@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@QAE?AV?$_Tree_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@@2@XZ ; std::_Tree<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> >::end
push eax
lea ecx, DWORD PTR _i$23161[ebp]
call ??9?$_Tree_const_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@@std@@QBE_NABV01@@Z ; std::_Tree_const_iterator<std::_Tree_val<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> > >::operator!=
movzx eax, al
test eax, eax
je SHORT $LN4@foo

push OFFSET $SG23167
call _printf
pop ecx

jmp SHORT $LN2@foo

$LN4@foo:
leave
ret 0







And the assembly for the pre-increment operator:



push ebp
mov ebp, esp
push ecx
mov DWORD PTR _this$[ebp], ecx

mov ecx, DWORD PTR _this$[ebp]
call ??E?$_Tree_const_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$CBHH@std@@@2@$0A@@std@@@std@@@std@@QAEAAV01@XZ ; std::_Tree_const_iterator<std::_Tree_val<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> > >::operator++

mov eax, DWORD PTR _this$[ebp]

leave
ret 0






Clearly, the difference is there. Additionally, obviously, the compiler (visual studio 2010) could not 'fix' the mistake. You lose, have fun updating updating all your post-increments you thought the compiler would 'fix' into pre-increments.
I actually use pre-increments everywhere, so I'm actually good for updating my code thanks.

I actually tried it earlier for std::map. I am not using the exact same compile options as you, as I am throwing this into a project I used for random internet help. The configuration is near enough the defaults, just some settings such as "disable language extensions" and "iterator debugging" removed, along with increasing the warning level. I generally try to reset any configuration changes I make, but maybe I've changed something important and forgotten about it.

The command line includes:

/I"C:\Program Files (x86)\boost\boost_1_40" /Zi /nologo /W4 /WX- /Ox /Oi /Ot /Oy- /GL /D "_HAS_ITERATOR_DEBUGGING=0" /D "_SECURE_SCL=0" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm- /EHsc /MT /GS- /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Fp"Release\Help.pch" /FAs /Fa"Release\" /Fo"Release\" /Fd"Release\vc100.pdb" /Gd /analyze- /errorReport:queue

I couldn't really be bothered to go through each setting in detail to see what is causing the differences between what we see.

Here is the code I used.

#include <map>
#include <vector>
#include <iostream>

int main()
{
std::map<int, int> v;
int i;
while(std::cin >> i) {
// v.push_back(i);
v.insert(std::make_pair(i,i));
}


// Print post
for(std::map<int, int>::iterator it = v.begin(); it != v.end(); it++) {
printf("%d\n", it->first);
}

// Print pre
for(std::map<int, int>::iterator it= v.begin(); it != v.end(); ++it) {
printf("%d\n", it->first);
}
}

With favour size, both times the same function was called. It is actually changing a call from i++ to ++i. The instruction sequence was slightly different however, but I'm not sure this is detrimental.

mov eax, DWORD PTR _v$[esp+68]

; Loop 1
mov ecx, DWORD PTR [eax]
mov DWORD PTR _it$31323[esp+64], ecx
jmp SHORT $LN162@main
$LL88@main:
push DWORD PTR [ecx+12]
push OFFSET $SG-31
call _printf
pop ecx
pop ecx
lea eax, DWORD PTR _it$31323[esp+64]
call ??E?$_Tree_const_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$$CBHH@std@@@2@$0A@@std@@@std@@@std@@QAEAAV01@XZ ; std::_Tree_const_iterator<std::_Tree_val<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> > >::operator++
mov ecx, DWORD PTR _it$31323[esp+64]
mov eax, DWORD PTR _v$[esp+68]
$LN162@main:
cmp ecx, eax
jne SHORT $LL88@main
; Loop 2
mov ecx, DWORD PTR [eax]
mov DWORD PTR _it$31360[esp+64], ecx
cmp ecx, eax
je SHORT $LN1@main
$LL126@main:
push DWORD PTR [ecx+12]
push OFFSET $SG-32
call _printf
pop ecx
pop ecx
lea eax, DWORD PTR _it$31360[esp+64]
call ??E?$_Tree_const_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$$CBHH@std@@@2@$0A@@std@@@std@@@std@@QAEAAV01@XZ ; std::_Tree_const_iterator<std::_Tree_val<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> > >::operator++
mov ecx, DWORD PTR _it$31360[esp+64]
cmp ecx, DWORD PTR _v$[esp+68]
jne SHORT $LL126@main
$LN1@main:

In the first case, the compiler jumps to the loop end first, and then does the loop. In the second, the compiler inserts a test before entering the loop. From what I can see, these differences are minor and whatever speed might be between them is paid once for the very first iteration.

However, I do not see an unnecessary copy of the iterator in the body, and in fact in this case the compiler calls the same function in both loop bodies. The loop body is essentially the same. If anything, the second loop actually contains less instructions in the body, which I find bizarre.

With favour speed, its a bit harder, as the compiler inlines the loops entirely. From examining the assembly back to back, I can only see the label values changing, and the "npad" values. They otherwise appear identical, but I could be wrong as there is quite a bit of code there:

; 15 : // Print post
; 16 : for(std::map<int, int>::iterator it = v.begin(); it != v.end(); it++) {

mov eax, DWORD PTR _v$[esp+84]
mov esi, DWORD PTR [eax]
cmp esi, eax
je SHORT $LN4@main
$LL109@main:

; 17 : printf("%d\n", it->first);

mov eax, DWORD PTR [esi+12]
push eax
push OFFSET $SG-31
call _printf
add esp, 8
cmp BYTE PTR [esi+21], 0
jne SHORT $LN281@main

; 12 : }
; 13 :
; 14 :
; 15 : // Print post
; 16 : for(std::map<int, int>::iterator it = v.begin(); it != v.end(); it++) {

mov eax, DWORD PTR [esi+8]
cmp BYTE PTR [eax+21], 0
jne SHORT $LN274@main
mov esi, eax
mov eax, DWORD PTR [esi]
cmp BYTE PTR [eax+21], 0
jne SHORT $LN281@main
npad 4
$LL124@main:
mov esi, eax
mov eax, DWORD PTR [esi]
cmp BYTE PTR [eax+21], 0
je SHORT $LL124@main
jmp SHORT $LN281@main
$LN274@main:
mov eax, DWORD PTR [esi+4]
cmp BYTE PTR [eax+21], 0
jne SHORT $LN107@main
$LL108@main:
cmp esi, DWORD PTR [eax+8]
jne SHORT $LN107@main
mov esi, eax
mov eax, DWORD PTR [eax+4]
cmp BYTE PTR [eax+21], 0
je SHORT $LL108@main
$LN107@main:
mov esi, eax
$LN281@main:
mov eax, DWORD PTR _v$[esp+84]
cmp esi, eax
jne SHORT $LL109@main
$LN4@main:

; 18 : }
; 19 :
; 20 : // Print pre
; 21 : for(std::map<int, int>::iterator it= v.begin(); it != v.end(); ++it) {

mov esi, DWORD PTR [eax]
cmp esi, eax
je SHORT $LN1@main
$LL181@main:

; 22 : printf("%d\n", it->first);

mov ecx, DWORD PTR [esi+12]
push ecx
push OFFSET $SG-32
call _printf
add esp, 8
cmp BYTE PTR [esi+21], 0
jne SHORT $LN284@main

; 18 : }
; 19 :
; 20 : // Print pre
; 21 : for(std::map<int, int>::iterator it= v.begin(); it != v.end(); ++it) {

mov eax, DWORD PTR [esi+8]
cmp BYTE PTR [eax+21], 0
jne SHORT $LN277@main
mov esi, eax
mov eax, DWORD PTR [esi]
cmp BYTE PTR [eax+21], 0
jne SHORT $LN284@main
npad 1
$LL196@main:
mov esi, eax
mov eax, DWORD PTR [esi]
cmp BYTE PTR [eax+21], 0
je SHORT $LL196@main
jmp SHORT $LN284@main
$LN277@main:
mov eax, DWORD PTR [esi+4]
cmp BYTE PTR [eax+21], 0
jne SHORT $LN179@main
$LL180@main:
cmp esi, DWORD PTR [eax+8]
jne SHORT $LN179@main
mov esi, eax
mov eax, DWORD PTR [eax+4]
cmp BYTE PTR [eax+21], 0
je SHORT $LL180@main
$LN179@main:
mov esi, eax
$LN284@main:
mov eax, DWORD PTR _v$[esp+84]
cmp esi, eax
jne SHORT $LL181@main
$LN1@main:

; 23 : }


/I"C:\Program Files (x86)\boost\boost_1_40" /Zi /nologo /W4 /WX- /Ox /Oi /Ot /Oy- /GL /D "_HAS_ITERATOR_DEBUGGING=0" /D "_SECURE_SCL=0" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm- /EHsc /MT /GS- /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Fp"Release\Help.pch" /FAs /Fa"Release\" /Fo"Release\" /Fd"Release\vc100.pdb" /Gd /analyze- /errorReport:queue






Your command line is not set for 'Optmize for space'. From the "cl.exe /?" output:







/O1 minimize space

/O2 maximize speed

/Ob<n> inline expansion (default n=0)

/Od disable optimizations (default)

/Og enable global optimization

/Oi[-] enable intrinsic functions

/Os favor code space

/Ot favor code speed

/Ox maximum optimizations

/Oy[-] enable frame pointer omission




Set your compiler options correctly, and I'll bother to take a look. There's no reason we shouldn't see the same assembly output, as we're using the same compiler.
Well, I had already written both tests when I posted my command line, it might have had /Ot rather than /Os at the time. I think its worth looking at what the compiler is doing.

I didn't realise there was two settings for size/speed. I was just changing the obvious looking value in the IDE configuration. I wonder what the expected result is with "/O1 /Ot" or "/O2 /Os".

You are using the "favour" option, rather than "minimise space" (O1), which generates some decent assembly for me (this time a command line build, cl /FAs /O1 /c help.cpp):

; 15 : // Print post
; 16 : for(std::map<int, int>::iterator it = v.begin(); it != v.end(); it++) {

mov ecx, DWORD PTR _v$[ebp+4]
mov eax, DWORD PTR [ecx]
mov DWORD PTR _it$31314[ebp], eax
mov esi, OFFSET ??_C@_03PMGGPEJJ@?$CFd?6?$AA@
jmp SHORT $LN139@main
$LL65@main:

; 17 : printf("%d\n", it->first);

push DWORD PTR [eax+12]
push esi
call _printf
pop ecx
pop ecx
lea ecx, DWORD PTR _it$31314[ebp]
call ??E?$_Tree_unchecked_const_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$$CBHH@std@@@2@$0A@@std@@@std@@U_Iterator_base0@2@@std@@QAEAAV01@XZ ; std::_Tree_unchecked_const_iterator<std::_Tree_val<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> >,std::_Iterator_base0>::operator++
mov eax, DWORD PTR _it$31314[ebp]
mov ecx, DWORD PTR _v$[ebp+4]
$LN139@main:

; 12 : }
; 13 :
; 14 :
; 15 : // Print post
; 16 : for(std::map<int, int>::iterator it = v.begin(); it != v.end(); it++) {

cmp eax, ecx
jne SHORT $LL65@main

; 18 : }
; 19 :
; 20 : // Print pre
; 21 : for(std::map<int, int>::iterator it= v.begin(); it != v.end(); ++it) {

mov eax, DWORD PTR [ecx]
mov DWORD PTR _it$31351[ebp], eax
cmp eax, ecx
je SHORT $LN1@main
$LL105@main:

; 22 : printf("%d\n", it->first);

push DWORD PTR [eax+12]
push esi
call _printf
pop ecx
pop ecx
lea ecx, DWORD PTR _it$31351[ebp]
call ??E?$_Tree_unchecked_const_iterator@V?$_Tree_val@V?$_Tmap_traits@HHU?$less@H@std@@V?$allocator@U?$pair@$$CBHH@std@@@2@$0A@@std@@@std@@U_Iterator_base0@2@@std@@QAEAAV01@XZ ; std::_Tree_unchecked_const_iterator<std::_Tree_val<std::_Tmap_traits<int,int,std::less<int>,std::allocator<std::pair<int const ,int> >,0> >,std::_Iterator_base0>::operator++
mov eax, DWORD PTR _it$31351[ebp]
cmp eax, DWORD PTR _v$[ebp+4]
jne SHORT $LL105@main
$LN1@main:

; 23 : }

Again the same pattern as before, the compiler choose to jump to the "end" of the loop for the first loop, and chooses to do an extra test for the second, but the loop bodies are mostly identical, with the same caveats above.

In any case I don't particularly care what the compiler does when minimising size - we are talking about the efficiency, which is clearly maximise speed.

And as for "Set your compiler options correctly", the assembly you posted clearly isn't optimised at all. The code is calling end() and a non-inlined operator!= every iteration.

This topic is closed to new replies.

Advertisement