TGA Swap - XOR or Temporary ?

Pho · 2001-07-31T06:33:23

Hello all ! I’ve been experimenting with the TGA loader from Tutorial 34. Specifically, with the RED and BLUE swapping. I have implemented Mirko Ravnikar’s XOR byte swapping into my own TGA loader, which is based on Win32 API File I/O ( instead of ANSI C like the one in Evan Pipho ’s tutorial ). Nevertheless, the byte swapping method is the same, so there should be no speed difference between the two methods in that area. However, on my machine ( D800, 384 MB, VC6 student version ), XOR method is actually slower than the one that uses temporary variable. With the 24bit TGA, resolution 512*512, it takes 25 ms in average for byte swapping the entire TGA image with the XOR method. On the other hand, swapping with the temporary variable is only 14 ms in average. I am not quite sure, but perhaps this could be because of the ''student’ version of VC6 ? I would like if someone with the real version of VC6 could verify this. Thanks. Pho

NeHe Productions Affiliates

Started by Pho July 11, 2001 07:18 PM

16 comments, last by Pho 23 years, 6 months ago

Dragonus

122

July 25, 2001 02:03 PM

At optimum performance (a.k.a. if things worked the way they really should), the XOR operation should go faster than the swapping method. Most likely, the reason why it''s going slower is that Visual C++ doesn''t optimize code for you. When you compile C++ code, as some of us know, it''s reduced down to assembly code first. However, VC++ does a pretty pathetic job of making assembly, making it horribly inefficient. Though it''s a shot in the dark, here''s roughly how VC++ would implement the two methods from an assembly perspective. (The C++ code is shown at right)

(Let BLUE be the memory address of the blue byte, RED be the memory address of the red byte, and TEMP be the temp byte. Registers are denoted R, followed by a number.)

Swapping

  LOAD  BLUE, R1	temp = blue;STORE R1, TEMPLOAD  RED, R1	blue = red;STORE R1, BLUELOAD  TEMP, R1	red = temp;STORE R1, RED

XOR

  LOAD  RED, R1	red ^= blue;LOAD  BLUE, R2XOR   R1, R2STORE R1, REDLOAD  BLUE, R1	blue ^= red;LOAD  RED, R2XOR   R1, R2STORE R1, BLUELOAD  RED, R1	red ^= blue;LOAD  BLUE, R2XOR   R1, R2STORE R1, RED

Note how few registers (extremely fast memory) we use, while we''re hitting memory (compartively much slower) quite a bit. In the swapping routine, we hit memory 6 times, while in the XOR routine, that''s 9 times. In this way, the swapping routine is 33% faster than the XOR routine.

In actuality, it''s even worse. The red and blue bytes of code are within an index of an array. So not only do we have to hit the red and blue bytes themselves, but we ALSO have to retrieve the memory address of the array holding the data AND the offset into the array the data is (the number in the brackets). That adds 8 more memory hits to the swapping method, but EIGHTEEN more to the XOR methods. We''re now up to 14 memory hits versus 27. Now the swapping routine is roughly 50% faster. Hence why you''re getting the results you''re getting.

However, there''s a way to get around that, to make the XOR more effecient. In fact, the only reason it is more efficient in this way is because we don''t have to allocate space for the temporary variable. That''s done by using a wonderful keyword called "register". By putting "register" in front of a variable name, the compiler will attempt to force that variable to remain in one of the processor''s registers throughout it''s entire scope, thus speeding it up since we''re not pinging memory all the time .

Here''s an example of how you''d do just a simple swap of two bytes.

  		register byte a = 4, b = 5, temp;LOAD  a, R1	temp = a;  // R1 is now a fancy term for "temp"LOAD  b, R2	a = b;  // R2 is just "b" nowSTORE R2, a	STORE R1, b	a = temp;

As you can see, just 4 memory hits instead of 6 this time. The XOR will be similar.

  		register byte a = 4, b = 5;LOAD  a, R1	// R1 is just "a"LOAD  b, R2	// R2 is just "b"XOR   R1, R2	a ^= b;XOR   R2, R1	b ^= a;XOR   R1, R2	a ^= b;STORE R1, aSTORE R2, b

4 memory hits here as well. Though you have three more assembly operations here, the XOR is then quicker because the XOR operations are extremely fast, but allocating space for the "temp" variable takes more time.

Hope that explains everything!

~ Dragonus

Pho

Author

122

July 25, 2001 05:31 PM

Dragonus, what can I say ?...

I love you man.

Dragonus

122

July 26, 2001 10:06 AM

You''re not getting my Bud Light...

(j/k)

You''re quite welcome.

~ Dragonus

Wulf_

122

July 27, 2001 01:43 AM

Pointers are not very fast compared to normal ops (check the asm code): 4 pointer calls in the temp var, 9 in the XOR.

Ideally, this is what you''d want:

mov ax, WORD PTR &blue
mov BYTE PTR &red, ah
mov BYTE PTR &blue, al

3 pointer ops, 3 movs

PS. my byte ordering is a little rusty. you might have to swap the al & ah

(didn''t read the entire post, somone may have already said this)

_DarkWIng_

602

July 28, 2001 06:30 AM

Sorry about confusion about my code(XOR)... But the problem is probably in compiler optimisation. I haven''t used MSVC yet. For now I''m using Borland Visual C++ & Delphi. In both cases XOR is faster than using temp var.

There are more worlds than the one that you hold in your hand...

You should never let your fears become the boundaries of your dreams.

Flous

122

July 30, 2001 11:55 AM

Correct me if I''m wrong here, but isn''t optimising something like swapping the R and B color bytes of a .tga file a waste of valuable development time? So you are loading a .tga file from the HD, and you are probably doing it just once: when loading the level. And occasionally, when loading a new model or so...

Why would you optimise something as trivial as that? It''s not like it happens every single frame or so.

My 2 Cents

jwace81

160

July 30, 2001 03:50 PM

Flous: One reason why you might want to optimize loading images is to reduce load times when switching levels. Another reason why you might need to optimize texture loading is if you were trying to stream some of the data during the game to eliminate load times, like they''re doing in the Legacy of Kain games (Soul Reaver, and Blood Omen). It probably isn''t quite as necessary on the PC as it would be on a console, since you could assume that most PC''s would have a lot more RAM than most consoles. It''d be nice to see more games that have the continuous flow to them, with no load times, and transition between the game, and the cinimas virtually seamless

J.W.

Flous

122

July 31, 2001 06:33 AM

Owkay, I hear what you are saying...
I was (again) ignoring the other types of games, the one''s which we are not making

TGA Swap - XOR or Temporary ?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

TGA Swap - XOR or Temporary ?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines