Advertisement

TGA Swap - XOR or Temporary ?

Started by July 11, 2001 07:18 PM
16 comments, last by Pho 23 years, 6 months ago
At optimum performance (a.k.a. if things worked the way they really should), the XOR operation should go faster than the swapping method. Most likely, the reason why it''s going slower is that Visual C++ doesn''t optimize code for you. When you compile C++ code, as some of us know, it''s reduced down to assembly code first. However, VC++ does a pretty pathetic job of making assembly, making it horribly inefficient. Though it''s a shot in the dark, here''s roughly how VC++ would implement the two methods from an assembly perspective. (The C++ code is shown at right)

(Let BLUE be the memory address of the blue byte, RED be the memory address of the red byte, and TEMP be the temp byte. Registers are denoted R, followed by a number.)

Swapping
  LOAD  BLUE, R1	temp = blue;STORE R1, TEMPLOAD  RED, R1	blue = red;STORE R1, BLUELOAD  TEMP, R1	red = temp;STORE R1, RED  


XOR
  LOAD  RED, R1	red ^= blue;LOAD  BLUE, R2XOR   R1, R2STORE R1, REDLOAD  BLUE, R1	blue ^= red;LOAD  RED, R2XOR   R1, R2STORE R1, BLUELOAD  RED, R1	red ^= blue;LOAD  BLUE, R2XOR   R1, R2STORE R1, RED  


Note how few registers (extremely fast memory) we use, while we''re hitting memory (compartively much slower) quite a bit. In the swapping routine, we hit memory 6 times, while in the XOR routine, that''s 9 times. In this way, the swapping routine is 33% faster than the XOR routine.

In actuality, it''s even worse. The red and blue bytes of code are within an index of an array. So not only do we have to hit the red and blue bytes themselves, but we ALSO have to retrieve the memory address of the array holding the data AND the offset into the array the data is (the number in the brackets). That adds 8 more memory hits to the swapping method, but EIGHTEEN more to the XOR methods. We''re now up to 14 memory hits versus 27. Now the swapping routine is roughly 50% faster. Hence why you''re getting the results you''re getting.

However, there''s a way to get around that, to make the XOR more effecient. In fact, the only reason it is more efficient in this way is because we don''t have to allocate space for the temporary variable. That''s done by using a wonderful keyword called "register". By putting "register" in front of a variable name, the compiler will attempt to force that variable to remain in one of the processor''s registers throughout it''s entire scope, thus speeding it up since we''re not pinging memory all the time . Here''s an example of how you''d do just a simple swap of two bytes.

  		register byte a = 4, b = 5, temp;LOAD  a, R1	temp = a;  // R1 is now a fancy term for "temp"LOAD  b, R2	a = b;  // R2 is just "b" nowSTORE R2, a	STORE R1, b	a = temp;  


As you can see, just 4 memory hits instead of 6 this time. The XOR will be similar.

  		register byte a = 4, b = 5;LOAD  a, R1	// R1 is just "a"LOAD  b, R2	// R2 is just "b"XOR   R1, R2	a ^= b;XOR   R2, R1	b ^= a;XOR   R1, R2	a ^= b;STORE R1, aSTORE R2, b  


4 memory hits here as well. Though you have three more assembly operations here, the XOR is then quicker because the XOR operations are extremely fast, but allocating space for the "temp" variable takes more time.

Hope that explains everything!

~ Dragonus
Dragonus, what can I say ?...

I love you man.

Advertisement
You''re not getting my Bud Light... (j/k)

You''re quite welcome.

~ Dragonus
Pointers are not very fast compared to normal ops (check the asm code): 4 pointer calls in the temp var, 9 in the XOR.

Ideally, this is what you''d want:

mov ax, WORD PTR &blue
mov BYTE PTR &red, ah
mov BYTE PTR &blue, al

3 pointer ops, 3 movs

PS. my byte ordering is a little rusty. you might have to swap the al & ah

(didn''t read the entire post, somone may have already said this)
Sorry about confusion about my code(XOR)... But the problem is probably in compiler optimisation. I haven''t used MSVC yet. For now I''m using Borland Visual C++ & Delphi. In both cases XOR is faster than using temp var.

There are more worlds than the one that you hold in your hand...
You should never let your fears become the boundaries of your dreams.
Correct me if I''m wrong here, but isn''t optimising something like swapping the R and B color bytes of a .tga file a waste of valuable development time? So you are loading a .tga file from the HD, and you are probably doing it just once: when loading the level. And occasionally, when loading a new model or so...

Why would you optimise something as trivial as that? It''s not like it happens every single frame or so.

My 2 Cents
Advertisement
Flous: One reason why you might want to optimize loading images is to reduce load times when switching levels. Another reason why you might need to optimize texture loading is if you were trying to stream some of the data during the game to eliminate load times, like they''re doing in the Legacy of Kain games (Soul Reaver, and Blood Omen). It probably isn''t quite as necessary on the PC as it would be on a console, since you could assume that most PC''s would have a lot more RAM than most consoles. It''d be nice to see more games that have the continuous flow to them, with no load times, and transition between the game, and the cinimas virtually seamless

J.W.
Owkay, I hear what you are saying...
I was (again) ignoring the other types of games, the one''s which we are not making

This topic is closed to new replies.

Advertisement